View previous topic :: View next topic |
Author |
Message |
curt2go
Joined: 21 Nov 2003 Posts: 200
|
Optimizing For Loop |
Posted: Mon Apr 23, 2018 2:18 pm |
|
|
I have a loop doing some addition. I am already using pointers to speed things up but wondering if there is a faster way to do this. This is on a 24EP256GP206 running at 140MHz.
Code: |
unsigned int16 buffer_x[256];
unsigned int16 buffer_y[256];
int a;
unsigned int16 *bufferPointer_x;
unsigned int16 *bufferPointer_y;
bufferPointer_x = &buffer_x;
bufferPointer_y = &buffer_y;
for(a=0;a<256;a++){
*bufferPointer_x = *bufferPointer_x + *bufferPointer_y;
*bufferPointer_x++;//increment the pointer
*bufferPointer_y++;
}
bufferPointer_x = &buffer_x;//start them pointing back to the proper place again.
bufferPointer_y = &buffer_y; |
|
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Mon Apr 23, 2018 4:31 pm |
|
|
You're adding two arrays, element by element.
You can always trade code space for speed by unrolling the loop and
using fixed indexes. This gets rid of the loop overhead and the indirect
indexing and will be a lot faster:
Code: |
buffer_x[0] += buffer_y[0];
buffer_x[1] += buffer_y[1];
buffer_x[2] += buffer_y[2];
buffer_x[3] += buffer_y[3];
.
.
.
.
buffer_x[255] += buffer_y[255];
|
|
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Apr 23, 2018 4:34 pm |
|
|
Yeh I have done that before. I have the space but I was thinking their might be a better way..
Edit. I'm not sure I have an option on the indexing. I fill and add to the buffer i'm not currently spitting out to the SPI.
The above example was only an example. I am actually using a pointer because I am switching the location of the buffer I am currently doing the adding into.
But it still might be faster so set a bit and still use the bunch of lines.
Thanks for the input. I will do a whole bunch of copy and pastes. |
|
|
curt2go
Joined: 21 Nov 2003 Posts: 200
|
|
Posted: Mon Apr 23, 2018 5:19 pm |
|
|
I this case it takes 1% more ROM and is 5 times as fast. After looking at the .lst file. |
|
|
soonc
Joined: 03 Dec 2013 Posts: 215
|
|
Posted: Mon May 07, 2018 9:57 pm |
|
|
In your code example:
int a; in the for loop will never reach 256 for the loop be run forever... use int16 a;
Try this:
Code: |
.................... void test()
.................... {
.................... int16 bx[256];
.................... int16 by[256];
.................... int16 i;
.................... for(i=0;i<256; i++)
*
0072A: MOVLB A
0072C: CLRF x71
0072E: CLRF x70
00730: MOVF x71,W
00732: SUBLW 00
00734: BNC 07B6
.................... {
.................... bx[i] += by[i];
00736: BCF 3FD8.0
00738: RLCF x70,W
0073A: MOVWF 02
0073C: RLCF x71,W
0073E: MOVWF 03
00740: MOVF 02,W
00742: ADDLW 48
00744: MOVWF 01
00746: MOVLW 0A
00748: ADDWFC 03,F
0074A: MOVFF 01,A72
0074E: MOVFF 03,A73
00752: MOVFFL 03,3FEA
00758: MOVFFL 01,3FE9
0075E: MOVFFL 3FEC,A75
00764: MOVF 3FED,F
00766: MOVFFL 3FEF,A74
0076C: BCF 3FD8.0
0076E: RLCF x70,W
00770: MOVWF 02
00772: RLCF x71,W
00774: MOVWF 03
00776: MOVF 02,W
00778: ADDLW 5C
0077A: MOVWF 3FE9
0077C: MOVLW 0A
0077E: ADDWFC 03,W
00780: MOVWF 3FEA
00782: MOVFFL 3FEC,03
00788: MOVF 3FED,F
0078A: MOVF 3FEF,W
0078C: ADDWF x74,W
0078E: MOVWF 01
00790: MOVF x75,W
00792: ADDWFC 03,F
00794: MOVFFL A73,3FEA
0079A: MOVFFL A72,3FE9
007A0: MOVFFL 03,3FEC
007A6: MOVF 3FED,F
007A8: MOVFFL 01,3FEF
007AE: INCF x70,F
007B0: BTFSC 3FD8.2
007B2: INCF x71,F
007B4: BRA 0730
.................... }
007B6: MOVLB 0
007B8: GOTO 63E6 (RETURN)
....................
....................
.................... } |
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19576
|
|
Posted: Mon May 07, 2018 11:23 pm |
|
|
Actually it will Soonc.
This is a DsPIC. On these the default 'int' is a signed int16, so it will work fine.
This though is why you should always use explicit sizes. |
|
|
pmuldoon
Joined: 26 Sep 2003 Posts: 218 Location: Northern Indiana
|
|
Posted: Tue May 08, 2018 5:26 am |
|
|
what about a compromise.
unwrap the loop to do 8 updates per iteration and increment by 8. and rewrite as two loops with absolute addressing and decide which loop to run when the function is called.
That should be faster and still easy to read and follow the code.
Just thinking... |
|
|
pmuldoon
Joined: 26 Sep 2003 Posts: 218 Location: Northern Indiana
|
|
Posted: Tue May 08, 2018 5:43 am |
|
|
Couldn't you take advantage of the #INLINE directive and write a function called in a loop and let the compiler do the tedious work of unwrapping it?
And here's a tricky one. Is there a way to tell the compiler that the address you're referencing is really fixed (constant) even tho you're incrementing to derive it in the pre-compile stage?
I've never taken advantage of many of the things the compiler can do for me, but this problem has gotten me thinking... |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9257 Location: Greensville,Ontario
|
|
Posted: Tue May 08, 2018 6:07 am |
|
|
faster ? yeesh poor little PIC's running at 140meg ! I remmeber Z80's doing TWO meg and thought wow..sigh, guess I'm old...
It would be interesting to hear how fast(the actual time) this loop takes though.
Silly Q. Can you overclock the PIC? It's a 'cheat' but might work.
Jay |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19576
|
|
Posted: Tue May 08, 2018 8:40 am |
|
|
Problem is that if a variable is involved the compiler has to assume it is 'variable' it can only assume constants when everything is constant.
Probably the easiest way to code it is to take advantage of macros:
Code: |
#define BUFF_ADD(n) bufferPointer_x[n]+=bufferPointer_y[n]
BUFF_ADD(0);
BUFF_ADD(1);
BUFF_ADD(2);
BUFF_ADD(3);.....
|
I was actually trying to work out if this could be done using DMA. With the CLC block. However though you could do things like AND or OR with this I don't think you could perform addition....
However in this case the really efficient way is to use assembler:
Code: |
int16 ctr;
ctr=256;
#ASM
MOV buffer_x, W0
MOV buffer_y, W2
// REPEAT 256 *
loop:
MOV [W0], W1
ADD W1, [W2++], [W0++]
DEC ctr
BRA NZ, loop
#ENDASM
|
Will 'run rings' round any other solution I think. |
|
|
|