View previous topic :: View next topic |
Author |
Message |
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
unlooped draw |
Posted: Wed Sep 13, 2006 7:28 am |
|
|
I need to improve the speed of a loop, and its been suggested that I take it from a C loop, to an unlooped goto in asm.
Code: |
for (x=0; x<width; x++) mysend(data);
|
This can be done quicker in asm with
Code: |
1 jump (10-width) lines forward
2 mysend(data)
3 mysend(data)
4 mysend(data)
5 mysend(data)
6 mysend(data)
etc
|
This is faster because your not messing around with checking x against width after every call of the mysend function.
Yes, speed is this critical for my routine, and yes there is enough of a difference to make it worthwhile.
I know nothing of assembly, please can somebody help with how I could acheive this as an inline ASM ? |
|
|
asmallri
Joined: 12 Aug 2004 Posts: 1635 Location: Perth, Australia
|
|
Posted: Wed Sep 13, 2006 7:39 am |
|
|
If x is declared as an unsigned int or unsigned char then the 4 loop is as optimized as you are going to get however mysend could be optimized. Is it small enough that you can add it inline? If so you will save on unneeded call and returns. _________________ Regards, Andrew
http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!! |
|
|
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
|
Posted: Wed Sep 13, 2006 7:49 am |
|
|
Are you sure?
The guy who suggested this was pretty clear that the for loop would slow things down.
I doubt that the compiler puts 128 calls to mysend in a line and jumps into that list, as its a waste of program space, but for me thats better than the time checking the for loop each time.
As for having mysend inline, see my other post on 9bit spi |
|
|
asmallri
Joined: 12 Aug 2004 Posts: 1635 Location: Perth, Australia
|
|
Posted: Wed Sep 13, 2006 7:57 am |
|
|
Yes I am sure. The little bit of overhead (and it is little) that the loop introduces is far outweighed by the loss in efficiency of making function calls. Also with the look there would be a singhle inline instance of your called routine. _________________ Regards, Andrew
http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!! |
|
|
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
|
Posted: Wed Sep 13, 2006 8:00 am |
|
|
Thanks for your help.
Ill put the existing mysend routine inline, but I still need to optimise that into asm as im sure it could be done better than how ive got it in C. |
|
|
Ttelmah Guest
|
|
Posted: Wed Sep 13, 2006 8:06 am |
|
|
The jump forward approach, can be made to work, but you are having to calculate the offset, adjust this for the size of the calls, and the total saving will be tiny (may actually be non-existent, since this approach will force a call for each of the subroutines). The 'for' loop will be fractionally quicker with:
for(x=width;x;--x)
The advantage is that you only have to access one variable, not two in the loop. If you combine this with declaring 'mysend' as inline, there may be a slight saving.
For the 'jump' approach, the problem is that each call will need to be setup with the 'data', so the total program space needed for each call will be a significant size, making the jump calculation more complex. However if no data is needed for the call, then something like:
Code: |
int8 jump;
jump=(width-10)<<1;
#asm
movf jump
addwf PC,F
#endasm
mysend();
mysend();
mysend();
.....
|
With a suitable declaration of the PC storage register (depending on whether this is a 16, or 18 chip), and 'mysend' declared as separate, provided the routines all sit in one bank of memory, should be close.
Best Wishes |
|
|
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
|
Posted: Wed Sep 13, 2006 5:31 pm |
|
|
ive done the suggested improvments, changed the format of the for loop and put the mysend inline. Its faster, but not dramatically so.
I will still try the inline asm I think. I will have to benchmark them and see what ends up being faster
Thanks for all the help guys, on both my topics!
Ive made lots of progress because of your answers. Much appreciated.
Check out
www.pyrofersprojects.com/3dcube.php
to see what its all gone towards. |
|
|
Ttelmah Guest
|
|
Posted: Thu Sep 14, 2006 9:18 am |
|
|
As a further comment, anything you can do to improve 'mysend', will have as big an effect. The actual overhead of the loop, is a few instructions, and just one instruction wasted in mysend, will have just as big an effect.
Best Wishes |
|
|
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
|
Posted: Thu Sep 14, 2006 11:59 am |
|
|
mysend has now been optimised, its basically a 9bit spi routine, there is only so much that can be done.
Would having the data byte as a global so it doesnt need to get passed to mysend be any quicker? |
|
|
Ttelmah Guest
|
|
Posted: Thu Sep 14, 2006 2:52 pm |
|
|
Yes.
There is probably as much overhead from passing a variable, as is involved in the entire loop!...
Best Wishes |
|
|
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
|
Posted: Fri Sep 15, 2006 2:43 am |
|
|
Thanks for that!
I was always taught when programming in C to avoid globals like the plague. I dont know why, my tutor came up with some excuses but I never really beleived them. I guess I just tried to avoid them because id been taught it was good programming practice, they never mentioned it slowed down performance!
Ill basically convert all my variables into globals now, I have enough ram and if there is a speed saving each time then I should be able to take the whole program up a notch. |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Fri Sep 15, 2006 5:41 am |
|
|
It is good programming practice to keep local variables local as it helps to save RAM and makes your program easier to maintain (the variable declaration is close to where it is used and you don't run into accidentally using the same variable twice).
That said, global variables can help to speed up your program in some very specific situations. An example is where the same data is used by multiple functions (it saves passing of function parameters).
So in my programs I use some global variables, but only when I can point out for each variable that it has significant advantages over using a local variable. Don't make all variables global because someone told you it is faster, you are the one who has _know_ it makes a difference or not.
As a general speed optimization rule: The critical parts are often in less than 5% of the total program code. Identify this small part and then look for improvements.
As a possible optimization: You said the SPI routine is now 9-bits and I assume this is your own bit toggling routine? Why not use the inbuilt SPI hardware which is always faster than any routine you can create? I know the inbuilt hardware only accepts 8-bits, but there are several ways to cheat on this. (8-bits by hardware + 1 bit-bang bit, or concatenate multiple 9-bit words, or...) |
|
|
libor
Joined: 14 Dec 2004 Posts: 288 Location: Hungary
|
|
Posted: Sat Sep 16, 2006 9:38 am |
|
|
In a similar situation (in a bit-toggling routine to send out bits of variables one-by-one with a fixed intrabit timing with no allowable intrabyte overhead) I use the intrabit 'idle' timeslots (thus I have 7 occasions of these) to prepare data needed by the loop's next iteration to save time at the loop's header. I can split this task into up to seven timeslots.
look at my pseudo-code:
Code: |
for (i=0, i<length, ;) { //e.g. no increment at the iteration level to save time
bit_toggle_7th_bit
i++; //split task, increment loop variable now when
//I have idle time between bits
wait_for_timer_flag //i use a timer as the bps timebase, no interrupts
//just test the flag
reset_timer_flag
bit_toggle_6th_bit
nextdata=buffer[i]; //split task done here when I have idle time
wait_for_timer_flag
reset_timer_flag
etc. //overall I have 7 intrabit idle timeslots to move
//out as many code from the loop header as possible
}
|
|
|
|
Pyrofer
Joined: 13 Sep 2006 Posts: 16
|
|
Posted: Sat Sep 16, 2006 4:37 pm |
|
|
Ok, here is the routine that sends the data to the lcd
Code: |
SSPEN=0;
output_low(LCDCLOCK);
output_high(LCDDATA);//send data
output_high(LCDCLOCK);
output_low(LCDCLOCK);
SSPEN=1;
spi_write(color);
|
I think thats as good as its ever going to get. |
|
|
libor
Joined: 14 Dec 2004 Posts: 288 Location: Hungary
|
|
Posted: Sun Sep 17, 2006 2:33 am |
|
|
spi_write(color);
this instruction puts 'color' into the SSPBUF and then waits doing nothing till all the bits have left the port, looping and testing until SSPSTAT.BF flag gets set (this is to avoid SSP buffer overwrites in consecutive spi_write instructions.) Your code continues only after SSPBUF has been completely sent by the hardware.
you can use this idle time to do more useful things by splitting up the spi_write() using assembly. e.g. wait for the BF flag before the bit-toggling part of your code, and then you'll have plenty of time for useful-code execution in the end of the routine while the PIC sends out the 8 bits in hardware.
Just by putting the wait before sending (bit-toggling 9th bit), all the code in the loop can go on with the execution up to the beginning of the next iteration, so no time will be waisted.
BTW Do you really need that much speed optimization ? |
|
|
|