CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

times ten in fewest CPU cycles?

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
asmboy



Joined: 20 Nov 2007
Posts: 2128
Location: albany ny

View user's profile Send private message AIM Address

times ten in fewest CPU cycles?
PostPosted: Tue Nov 12, 2013 3:58 pm     Reply with quote

I need to do the fastest possible multiplication by ten for unsigned 16 bit INTS with a value range of 0-4095 ( output of 12bit ADC) and make process decisions as rapidly as i can after getting a new reading.

over the range of input values ( 0-4095) , code for *10 can profile out to
a few hundred cpu cycles. so i created a dedicated function for this that does what i need deterministically, irregardless of input ,
in a constant 25 instructions.

every cycle counts in this application and
what i'm not sure of is if it can be done more efficiently yet?

Anybody?

Code:

unsigned int16 tenx(unsigned int16 in){
  unsigned int16 a;
  a=in<<3;  // *8
  a +=in;   // *9
  return ((a +in)); //*10
}
temtronic



Joined: 01 Jul 2010
Posts: 9282
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Tue Nov 12, 2013 4:18 pm     Reply with quote

While slowly reading your request, I was thinking shift and add twice..hmmm..scolled down to your function and it's the same !!
I don't think there is a faster way to do it

So great minds do think alike !!

hth
jay
dmendesf



Joined: 31 Dec 2005
Posts: 32

View user's profile Send private message

PostPosted: Tue Nov 12, 2013 7:21 pm     Reply with quote

The fastest way I can think is to trade space for speed:

Code:

const int16 tablemult[4096] = {0,10,20,30,40, ...... 40950};

a = tablemult[b]; // a = b*10;


dmendesf



Joined: 31 Dec 2005
Posts: 32

View user's profile Send private message

PostPosted: Tue Nov 12, 2013 7:48 pm     Reply with quote

You can also trade less space for less speed using this:

k = a*256+b // b from 0 to 255, a from 0 to 15

10 * k = 10*a*256 + 10*b

Code:


union data_holder {

int8 data_byte[2];

int16 data_long;
} data_in, data_out;                       


const int16 tablemult[256] = {0, 10, 20, 30 ... 2550};

data_out.long = tablemult[data_in.data_byte[0]]; // = 10*b
data_out.data_byte[2] += tablemult[data_in.data_byte[1]]; // = 10*b+10*a*256... data_out.data_byte[2] started with at most (2550 / 256) = 9.
                                                          //since data_in.data_byte[1] is at most 15, the result is at most 159, and fits a byte



result is at data_out.data_long
asmboy



Joined: 20 Nov 2007
Posts: 2128
Location: albany ny

View user's profile Send private message AIM Address

PostPosted: Tue Nov 12, 2013 8:30 pm     Reply with quote

nice ideas above.

1-having less than fastest speed is a nogo,

2- the pic16 chip in question (selected for its unique hardware functions)
only has 8k worth of 'word space' Very Happy Sad

and a table of 4096 'words' of int16 data
leaves appx zero code space for the rest of the program. Sad

the table lookup would surely be the bomb if i was using an 18f46k80,
which Sad , i am not.
dmendesf



Joined: 31 Dec 2005
Posts: 32

View user's profile Send private message

PostPosted: Tue Nov 12, 2013 8:43 pm     Reply with quote

I think that the second option I posted can be faster than your approach and it needs "only" 512 words of memory. You should benchmark it and also a small variation using make8 to acess 16 bit variables... I donĀ“t know if CCS compiler is smart enough to do all pointer arithmetic in compile time for the union... maybe the make8 is the key for it.
alan



Joined: 12 Nov 2012
Posts: 357
Location: South Africa

View user's profile Send private message

PostPosted: Tue Nov 12, 2013 11:34 pm     Reply with quote

Hi asmboy

if you put all in the return with another shift, it shaves some instructions off on my 4.141.
Code:
return (((in<<3) +(in<<1)));


Regards
Alan
Ttelmah



Joined: 11 Mar 2010
Posts: 19605

View user's profile Send private message

PostPosted: Wed Nov 13, 2013 12:04 am     Reply with quote

This is one of those cases where using a function is less efficient...

The compiler knows that it can perform *8, and *2, using rotations. The optimiser handles this, so:
Code:

#define tentimes(x) ((x*8)+(x*2))

void main()
{
   int16 val=something;
   int16 result;
   
   result=tentimes(val);
}

Avoids the need to transfer values to temporary variables, and is quicker than the function solutions. In fact I think it may even beat the table lookup. It takes a sometimes surprising number of cycles to access a table (either ROM or RAM) on the PIC.

As a comment, I remember one of the CCS examples does this.

Just took the time to compare the _total_ instruction count for 'tenx' (original version), 'tenx' (as coded by alan), 'tentimes', and a table lookup. This is the time including the calls, setup etc., on a PIC16.

tenx, uses 41 instructions including the setup.
tenx (alan), uses 38 including setup.
tablemult, uses 30.
tentimes as a #define, uses 24.

The count given by asmboy in his original remark, is ignoring the time to call the function, and the time to load the temporary variables.

I 'rest my case'..... Smile

Best Wishes
ckielstra



Joined: 18 Mar 2004
Posts: 3680
Location: The Netherlands

View user's profile Send private message

PostPosted: Wed Nov 13, 2013 5:07 am     Reply with quote

Instead of using a macro I tried to #inline the function because I like the parameter type checking from functions. Too bad it seems like #inline isn't working in my v4.147.

We now only had a look at just one line of your algorithm. Multiplying a sampled value by 10 'smells' like there are more optimizations possible given a look at the total algorithm.
Ttelmah



Joined: 11 Mar 2010
Posts: 19605

View user's profile Send private message

PostPosted: Wed Nov 13, 2013 5:51 am     Reply with quote

4.147?....

I thought 4.141 was the last.

Anyway on a version where it does work, it still adds the extra instructions to do an extra move to an extra variable.

If one is prepared to use two lines, then the most efficient I can think of with the compiler, is:
Code:

   result=val*2;
   result+=result*4;

Gets it down to 19 instructions.

Best Wishes
jeremiah



Joined: 20 Jul 2010
Posts: 1362

View user's profile Send private message

PostPosted: Wed Nov 13, 2013 8:08 am     Reply with quote

I checked on PCD 5.013 and inlining appears to work. It's still not as efficient as the macro. I found that making the input a reference parameter shaves off the unnecessary moves, but there is still a single "extra" instruction that I assume comes from their code generation algorithm (or maybe it does have a purpose...I didn't go too deeply into it)

Code:

.................... #define tentimes(x) ((x*8)+(x*2))
.................... 
.................... #inline unsigned int16 anotherTenTimes(unsigned int16 x){
....................    return ((x*8)+(x*2));
*
39EA:  MOV     8AC,W5
39EC:  SL      W5,#3,W5
39EE:  MOV     8AC,W0
39F0:  SL      W0,#1,W0
39F2:  ADD     W0,W5,W0
39F4:  MOV     W0,0
.................... }
.................... 
.................... #inline unsigned int16 yetAnotherTenTimes(unsigned int16 &x){
....................    return ((x*8)+(x*2));
*
39F8:  MOV     8AA,W5
39FA:  SL      W5,#3,W5
39FC:  MOV     8AA,W0
39FE:  SL      W0,#1,W0
3A00:  ADD     W0,W5,W0
3A02:  MOV     W0,0
.................... }
.................... 


Code:

....................    test = tentimes(test);
39DA:  MOV     8AA,W5
39DC:  SL      W5,#3,W5
39DE:  MOV     8AA,W0
39E0:  SL      W0,#1,W0
39E2:  ADD     W0,W5,W0
39E4:  MOV     W0,8AA
....................    test = anotherTenTimes(test);
39E6:  PUSH    8AA
39E8:  POP     8AC
*
39F6:  MOV     W0,8AA
....................    test = yetAnotherTenTimes(test);
*
3A04:  MOV     W0,8AA


Though, in PCD a mult instruction only takes one cycle, so this is just for academic purposes
bkamen



Joined: 07 Jan 2004
Posts: 1615
Location: Central Illinois, USA

View user's profile Send private message

PostPosted: Wed Nov 13, 2013 9:07 am     Reply with quote

jeremiah wrote:

Though, in PCD a mult instruction only takes one cycle, so this is just for academic purposes



PIC18's are surprisingly fast too having their 8x8 dedicated hardware multiplier.

(although it's not a single instruction)
_________________
Dazed and confused? I don't think so. Just "plain lost" will do. :D
asmboy



Joined: 20 Nov 2007
Posts: 2128
Location: albany ny

View user's profile Send private message AIM Address

PostPosted: Wed Nov 13, 2013 9:42 am     Reply with quote

many thanks to all who contributed.

i have to use the 16f1509 for its NCO feature , found in none of the 18f family
and also have to live with a reduced 10 bit ADC and needing to use external EEPROM as well. Sad
how microchip can release a chip with so many unique and desirable functions , yet omit EEPROM is a mystery and a frustration to this fellow.

the macro define solution is the icing on the cake!

special thanks to Mr. T for your singularly focused analysis.
temtronic



Joined: 01 Jul 2010
Posts: 9282
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Wed Nov 13, 2013 12:13 pm     Reply with quote

It could be that a customer wanted a special PIC with those specific features and low,low cost ,thus the '16F1509' was born.
It wouldn't be the first time a 'new product' was added to a manufacturer's line. Heck I did that decades ago with energy control products.
These days it's super easy to 'make-my-PIC', all you need is tell uChip what you need and some cash.

hth
jay
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group