View previous topic :: View next topic |
Author |
Message |
curt2go
Joined: 21 Nov 2003 Posts: 200
|
Super fast addition? |
Posted: Mon Jul 16, 2018 10:19 am |
|
|
I know the title may be a bit illusive. But what I am doing is I need to process 512 words of info in a loop.
Processor 24EP256GP206 running at 140MHz
Right now I am not running a loop because writing out the individual lines is way faster.
variable[0] += variable2[0];
That is what I am doing 512 times. variable and variable2 are global because they are used elsewhere. To do this 512 times takes 23uS. If it is done in a for loop it takes 175uS. But the issue is that I have to process variable2 before it gets added. I need to add a factor to it. Now these variables are signed int16's. I need to factor them like this:
float factor = 1.14;
float result;
result = variable2[0] * factor;
variable[0] += result;
The factor will change all the time and this process gets done all the time. Every 23mS to be exact. The above statement took like 7mS to do 512 times or something, i cant remember exactly, to process. Which is way, way too long. Now would changing it to a signed INT32 then making the factor an INT8 or something be way better? Or even using pointers?
Ideally i would like to be under 200uS.
I thought I would ask here because you guys are phenomenal at knowing which way is the best. Thank you in advance. |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Jul 16, 2018 11:18 am |
|
|
I found the post where PCM shows how to use the simulator in MPLAB. I'm going to run a bunch of simulations in the meantime. I also read in another thread where there were some options about using int32 but only using the top bytes. Not sure I understand that. Any light would help. I need this to be as fast as possible.
Sorry that this is a dumb question anyways. |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Jul 16, 2018 1:44 pm |
|
|
Ok this is where I am at with the SIM. Normally I would just use floats cause i dont care about speed and thus I am having issues with the math and such here.
Let me know what you think. The math does not seem to be coming out right. At least on the sim.
Code: | //
#include <24EP256GP206.h>
#device adc=12 *=16
#FUSES NOWDT //No Watch Dog Timer
#FUSES LPRC
#use delay(clock=140000000)//need to stay at 16MHz or the interupt for the comms does not work
void main(){
signed int16 variable[512];
signed int16 variable2[512];
setup_oscillator(OSC_INTERNAL,140000000);
variable[0] = 6000;
variable2[0] = 1000;
while(1){
unsigned int16 i;
unsigned int16 p = 1.14 * 256;
unsigned int32 w;
for(i=0;i<512;i++){ //total of 277uS looping adds about 0.11uS each loop
w = variable2[i] * p;// 0.21uS
w >>=8; // 0.1uS
variable[i] += w;// 0.15uS
}
}
} |
|
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9253 Location: Greensville,Ontario
|
|
Posted: Mon Jul 16, 2018 2:29 pm |
|
|
comment.
you should post a few of the results that you get vs what you expect, as well as the interim values....
Jay |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Jul 16, 2018 3:08 pm |
|
|
I put the actual SIM numbers beside each line. I really was hoping for under 200uS but it looks like that is not even possible. The big one is converting from the int16 to int32 for the math. I also will have to do some error checking in each one as i cant go over 32767 or under 32767. I only have the positive error check in there right now.
I also need to figure out how to do the math with negative numbers because I cant do a bit shift like I am with the positive numbers. Unless I am missing something.
I have to run this routine 3 times for each 23mS between SD card reads. I also have alot of other stuff going on so that is why the shorter the better. I am using the DMA to write to the codec so that is not getting in the way at all now.
So any suggestions would be awesome. I can try them on the SIM to see pretty quick.
Code: |
//
#include <24EP256GP206.h>
#device adc=12 *=16
#FUSES NOWDT //No Watch Dog Timer
#FUSES LPRC
#use delay(clock=140000000)
void main(){
signed int16 variable[512];
signed int16 variable2[512];
setup_oscillator(OSC_INTERNAL,140000000);
variable[0] = 6000;
variable2[0] = 32000;
while(1){
unsigned int16 i;
unsigned int16 p = 0.06 * 128;//1 = 100%
unsigned int32 w;
for(i=0;i<512;i++){ //total of 695uS
w = variable2[i];// 0.17uS
w *= p;// 0.67uS
w >>=8; // 0.1uS
w +=variable[i];// 0.2uS
if(w > 32767)//0.27uS
w=32767;//0.07uS
variable[i] = w;//0.07uS
}
}
} |
|
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Jul 16, 2018 3:59 pm |
|
|
Here is the latest with handling negative numbers as well. This one is 944uS.
Let me know if you can see any efficiencies.
Code: | //
#include <24EP256GP206.h>
#device adc=12 *=16
#FUSES NOWDT //No Watch Dog Timer
#FUSES LPRC
#use delay(clock=140000000)
void main(){
signed int16 variable[512];
signed int16 variable2[512];
setup_oscillator(OSC_INTERNAL,140000000);
variable[0] = 32000;
variable2[0] = -32000;
while(1){
unsigned int16 i;
int1 neg = 0;
unsigned int16 p = 0.06 * 128;//1 = 100%
signed int32 w;
for(i=0;i<512;i++){ //total of 944uS
w = abs(variable2[i]);// 0.17uS
w *= p;// 0.67uS
if(variable2[i] < 0)
neg = 1;
w >>=8; // 0.1uS
if(neg)
w = 0xFFFFFFFF - w;//turn it back negative again if it was.
w +=variable[i];//0.2uS
if(w > 32767)//0.27uS
w=32767;//0.07uS
if(w < -32767)//0.27uS
w = -32767;//0.07uS
variable[i] = w;//0.07uS
}
}
} |
|
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9253 Location: Greensville,Ontario
|
|
Posted: Mon Jul 16, 2018 4:59 pm |
|
|
just an idea...
instead of this...
*** if(variable2[i] < 0)
*** neg = 1;
w >>=8; // 0.1uS
*** if(neg)
*** w = 0xFFFFFFFF - w;//turn it back negative again if it was.
w +=variable[i];//0.2uS
if(w > 32767)//0.27uS
w=32767;//0.07uS
if(w < -32767)//0.27uS
w = -32767;//0.07uS
if(variable2[i] < 0)
w = 0xFFFFFFFF - w;//turn it back negative again if it was.
w >>=8; // 0.1uS
w +=variable[i];//0.2uS
if(w > 32767)//0.27uS
w=32767;//0.07uS
if(w < -32767)//0.27uS
w = -32767;//0.07uS
variable[i] = w;//0.07uS
Only the first statement after an IF() gets executed, so my thinking is you can eliminate the settting of the neg variable and the later test.
If I'm correct it should speed up the overall process.
If I'm wrong, well, it's 90*F in the shade and drier than the desert here, sorry, my brain's fried !
Jay |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Mon Jul 16, 2018 6:27 pm |
|
|
curt2go wrote: | Let me know if you can see any efficiencies.
|
If were you, I would look at the .LST file and look for any lines of C code
that produce an excessive amount of ASM code for what they do.
Then think of some clever method to re-write the code that produces a
much smaller .LST file for it. This is assuming it's all inline code.
By small, I don't mean to use loops. I mean, with it unrolled. |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Jul 16, 2018 6:32 pm |
|
|
Yeh. The version i use now is all unrolled. Its takes up more ROM but i have that space to do so. The biggest ones are the converting to INT32 i do see. But not sure how I can get around that as I need the larger numbers instead of using floats. I will take a look more into the LST file and see where I can do some stuff.
And Temtronic that is a good idea. I will try that one. Since the data is probably half negative only It will cut down the time on the negative portion for sure. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19562
|
|
Posted: Mon Jul 16, 2018 10:49 pm |
|
|
Be aware:
w >>=8; //
Only gives /256, for a +ve number. Not -ve.
Look at:
<https://en.wikipedia.org/wiki/Arithmetic_shift>
Look at the section on 'Non-equivalence of arithmetic right shift and division'.
Don't do scaling like this.
If you want to use an integer factor, use int32 arithmetic. Multiply the factor by 65536, rather than 256. Then take the upper two bytes of the result as the int16 value. This can be done efficiently using a union.
Code: |
union {
signed int32 wrapper;
signed int16 parts[2];
} value;
signed int32 scale=1.14*65536;
value.wrapper *=scale;
result = value.parts[1];
|
This gives you the integer put into 'value.wrapper', multiplied by 1.14 as an int16 result in value.parts[1].
Some time ago, I needed fast scaling for a servo application, so I wrote custom int24 basic arithmetic routines, arranged so they used the upper three bytes of an int32, then took the upper two bytes of this as the int16 result for the same effect. On a PIC18, without hardware division, this gave a significant saving over using int32, however on the PIC24, the int32 should give you quite good results without this complexity. |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Tue Jul 17, 2018 10:48 am |
|
|
That's is a very cool and efficient solution!
It saves some time which is awesome.
But what might be the best way to check for min and max values doing the math this way? I need to add variable[x] += variable2[x]; But the min and max is to be 32767 and -32767.
One weird thing in the simulator the math is always coming out double.
For instance if i use -1000 and do the math with 0.14*65536 as the scale the math should come out with -140 as the answer. But it is always double in this case its -280. I have just assumed its something in the SIM? Any thoughts?
This is the new math in the SIM it is cutting out 200uS so far.
Code: |
union {
signed int32 wrapper;
signed int16 parts[2];
} value;
signed int32 scale=0.14*65536;
while(1){
unsigned int16 i;
for(i=0;i<512;i++){ //total of 775uS
value.wrapper = variable2[i];//0.085uS
value.wrapper *=scale; //0.67uS
value.wrapper = value.parts[1];
value.wrapper += variable[i];
if(value.wrapper > 32767)//0.06uS
value.wrapper=32767;//0.07uS
if(value.wrapper < -32767)//0.06uS
value.wrapper = -32767;//0.07uS
variable[i] = value.wrapper;//0.07uS
}
} |
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19562
|
|
Posted: Tue Jul 17, 2018 11:14 am |
|
|
I'd be worried about this:
value.wrapper = value.parts[1];
Remember value.parts, is 'part' of wrapper. This is putting part of a number back into the same RAM area. No idea quite what the effect would actually be!... Suspect the compiler may be having a hiccup on this which is resulting in the doubling. |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Tue Jul 17, 2018 11:35 am |
|
|
It was doubling before I was using this math. It does the same thing here.
variable2 = 6000;
6000 *0.14 = 840 but it comes out with 1679
Code: |
value.wrapper = variable2[i];
value.wrapper *=scale;
p = value.parts[1];
p += variable[i];
//checkLimits();
if(p > 32767)//0.06uS
p=32767;//0.07uS
if(p < -32767)//0.06uS
p = -32767;//0.07uS
variable[i] = p;//0.07uS |
|
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Tue Jul 17, 2018 11:49 am |
|
|
If i use 32768 in the scale then I come out with the right number. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19562
|
|
Posted: Tue Jul 17, 2018 2:08 pm |
|
|
Just stuck a basic program together and run it up in a different PIC, and it works fine:
Code: |
void main()
{
int16 source;
union {
signed int32 wrapper;
signed int16 parts[2];
} value;
signed int32 scale=0.14*65536;
for (source=-500; source<600;source+=100)
{ //Basic loop to test scaling.
value.wrapper=source;
printf("%5d ", source);
value.wrapper *=scale;
printf("%05d\r",value.parts[1]);
}
while(TRUE)
;
}
|
Gives (on terminal):
Quote: |
-500 -0070
-400 -0056
-300 -0042
-200 -0028
-100 -0014
0 00000
100 00013
200 00027
300 00041
400 00055
500 00069
|
You either have a problem in your debugging environment, or something really screwy going on!... |
|
|
|