View previous topic :: View next topic |
Author |
Message |
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
How to optimize int16 = int8 * int8? |
Posted: Sun Dec 19, 2004 5:51 pm |
|
|
I'm trying to multiply two 8-bit values and to put the result into an int-16. Looking at the resulting code I see the compiler is doing an int16 * int16 multiplication which is much less efficient than a straight int8 * int8.
The PIC18 MULLW instruction multiplies two 8 bit values and the 16-bit result is placed in PRODH:PRODL register pair and I would like to see the CCS compiler use this instruction to the full, right now it is ignoring the PRODH existence.
An int16 = int16 * int16 requires about 32 instructions.
An int16 = int8 * int8 requires about 6 instructions.
Does someone know how I can make the CCS compiler generate the more optimal int16 = int8 * int8 code?
Often you can manipulate the compiler by using cast operators, but I haven't found the magical combination yet....
In both v3.187 and 3.212 I tried the following combinations:
Code: | #include <18F458.h>
#fuses HS,NOWDT,NOPROTECT,PUT,BROWNOUT,NOLVP
#list
void main()
{
int8 Var_8bit;
int8 Another_8bit;
int16 Var_16bit;
Var_16bit = (int16)(Var_8bit * Another_8bit); // Gives 8-bit result
Var_16bit = Var_8bit * Another_8bit; // Gives 8-bit result
Var_16bit = Var_8bit * (int16)Another_8bit; // Gives 16-bit result, but long code
Var_16bit = (int16)Var_8bit * Another_8bit; // Gives 16-bit result, but long code
Var_16bit = (int16)Var_8bit * (int16)Another_8bit; // Gives 16-bit result, but long code
} |
|
|
|
Mark
Joined: 07 Sep 2003 Posts: 2838 Location: Atlanta, GA
|
|
Posted: Sun Dec 19, 2004 6:15 pm |
|
|
I have done stuff like:
Code: |
level = level_list[i].level;
/* this is where we actually compute the ccp value.
Note that this is a short cut for doing the math.
Also, don't use an array here because the compiler
will do a little multiplying itself and destory the
value we just computed. */
W = level;
#asm
mullw PWM_STEP
#endasm
ptrList->pwm_value = PROD;
|
|
|
|
Guest
|
|
Posted: Sun Dec 19, 2004 10:41 pm |
|
|
for PIC18 with hardware multiplier, do this:
Code: |
int16 mul_prod;
#byte mul_prod = 0x0ff3
int16 int8_times_int8 (int8 a, b)
{
a * b;
return(mul_prod);
}
|
put it into macro may speed it up even further.
Best wishes |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Mon Dec 20, 2004 1:02 am |
|
|
I like this macro:
Code: | #define mul(a8,b8,p16) \
{ \
a8*b8; \
p16=make16(PRODH,PRODL); \
} |
I am looking for a collection of asm code (macros) and start to use inline asm in my programs, this way I learn and optimize what I dont like.
I see that the compiler uses a set of generic routines, it would be nice if it could do the job like proton+ does. |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Mon Dec 20, 2004 9:50 am |
|
|
Thanks for all the suggestions!
All suggestions are workarounds though.... I'm wondering what the C-specifications are saying about type casting in this case. Shouldn't the compiler do an implicit type conversion and give me the 16-bit result instead of the 8-bit result?
Can I post this as a bug/request with CCS? |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Mon Dec 20, 2004 1:14 pm |
|
|
Are you saying that:
Var_16bit = (int16)(Var_8bit * Another_8bit); // Gives 8-bit result
Results in an overflowed byte? |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Mon Dec 20, 2004 5:08 pm |
|
|
Quote: | Are you saying that:
Var_16bit = (int16)(Var_8bit * Another_8bit); // Gives 8-bit result
Results in an overflowed byte? |
Look for yourself: Code: | .................... Var_16bit = (int16)(Var_8bit * Another_8bit);
0036: MOVF 06,W
0038: MULWF 07
003A: MOVF FF3,W
003C: CLRF 09 // Clears upper 8 bits of Var_16bit
003E: MOVWF 08 // Copies PRODL to lower 8 bits of Var_16bit
|
Only PRODL (register 0xFF3) is copied, the upper 8 resulting bits in PRODH (register 0xFF4) are not used. Such a waste because the data really is there available....
I got my code working using one of the other cast alternatives, but I would like to see the compiler squizing the most out of my ROM space. |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Mon Dec 20, 2004 8:59 pm |
|
|
I think we should organize a collection of asm routines and macros optimized to pic18 and wrap them in C functions.
I would like to see for example 32div16 with rounding, pure 32div16, 16div8... all as fast as possible.
CCS casts 16div8 to a 16div16, maybe it does not make any difference, but it looks like it does to me. |
|
|
|