How to optimize int16 = int8 * int8?

ckielstra · Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands

I'm trying to multiply two 8-bit values and to put the result into an int-16. Looking at the resulting code I see the compiler is doing an int16 * int16 multiplication which is much less efficient than a straight int8 * int8.
The PIC18 MULLW instruction multiplies two 8 bit values and the 16-bit result is placed in PRODH:PRODL register pair and I would like to see the CCS compiler use this instruction to the full, right now it is ignoring the PRODH existence.

An int16 = int16 * int16 requires about 32 instructions.
An int16 = int8 * int8 requires about 6 instructions.

Does someone know how I can make the CCS compiler generate the more optimal int16 = int8 * int8 code?
Often you can manipulate the compiler by using cast operators, but I haven't found the magical combination yet....

In both v3.187 and 3.212 I tried the following combinations:

Mark · Joined: 07 Sep 2003 Posts: 2838 Location: Atlanta, GA

I have done stuff like:

Guest · Posted: Sun Dec 19, 2004 10:41 pm

for PIC18 with hardware multiplier, do this:

future · Joined: 14 May 2004 Posts: 330

I like this macro:

ckielstra · Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands

Thanks for all the suggestions!

All suggestions are workarounds though.... I'm wondering what the C-specifications are saying about type casting in this case. Shouldn't the compiler do an implicit type conversion and give me the 16-bit result instead of the 8-bit result?
Can I post this as a bug/request with CCS?

future · Joined: 14 May 2004 Posts: 330

Are you saying that:

Var_16bit = (int16)(Var_8bit * Another_8bit); // Gives 8-bit result

Results in an overflowed byte?

ckielstra · Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands

future · Joined: 14 May 2004 Posts: 330

I think we should organize a collection of asm routines and macros optimized to pic18 and wrap them in C functions.

I would like to see for example 32div16 with rounding, pure 32div16, 16div8... all as fast as possible.

CCS casts 16div8 to a 16div16, maybe it does not make any difference, but it looks like it does to me.