CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

Fast "SPI like" using assembly inside C

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
henriquesv



Joined: 20 Feb 2011
Posts: 8

View user's profile Send private message

Fast "SPI like" using assembly inside C
PostPosted: Tue Aug 23, 2011 10:07 am     Reply with quote

Hello everyone.

I am working with some multiplexing hardware and I am sending serial bits to a shift register.

The thing is: pulling this data out is taking too much time of my window:

Code:

    for(d=0; d<8; d++)
    {
      y = dsp_x & BIT_0;
      output_bit(SDA, y);
      output_bit(CLK,1);
      output_bit(CLK,0);
      dsp_x = dsp_x >> 1;
    }


Yes, it is as simple as that.

Here is what the compiler is generating (I think there is a better way of doing this):

Code:

....................     for(d=0; d<8; d++)
0020:  CLRF   27
0021:  MOVF   27,W
0022:  SUBLW  07
0023:  BTFSS  03.0
0024:  GOTO   03D
....................     {
....................       y = dsp_x & BIT_0;
0025:  MOVF   28,W
0026:  ANDLW  01
0027:  MOVWF  29
....................       output_bit(SDA, y);
0028:  MOVF   29,F
0029:  BTFSS  03.2
002A:  GOTO   02D
002B:  BCF    05.6
002C:  GOTO   02E
002D:  BSF    05.6
002E:  BSF    03.5
002F:  BCF    05.6
....................       output_bit(CLK,1);
0030:  BCF    03.5
0031:  BSF    05.7
0032:  BSF    03.5
0033:  BCF    05.7
....................       output_bit(CLK,0);
0034:  BCF    03.5
0035:  BCF    05.7
0036:  BSF    03.5
0037:  BCF    05.7
....................       dsp_x = dsp_x >> 1;
0038:  BCF    03.0
0039:  BCF    03.5
003A:  RRF    28,F
....................     }
003B:  INCF   27,F
003C:  GOTO   021


I plan to use a function such as this:

Code:

int spi(int data){
    int count;

    #asm
    PORTA equ 0x05
    MOVLW 0x08
    MOVWF count;

    loop:
    ;XOR.B data,W0
    ;RRC data,W0
    DECF count,1    ; Decrement f in file register f
    ;BRA NZ, loop
    ;MOV #0x01,W0
    ;ADD count,F
    ;MOV count, W0
    ;MOV W0, _RETURN_
    #endasm
 }


Could anyone help me out? I bet it is going to be useful to many others.

I have already worked with other CISC processors, which made my work easier.

Thanks!

Best Regards.
Henrique


Last edited by henriquesv on Wed Aug 24, 2011 10:15 am; edited 1 time in total
SherpaDoug



Joined: 07 Sep 2003
Posts: 1640
Location: Cape Cod Mass USA

View user's profile Send private message

PostPosted: Tue Aug 23, 2011 10:29 am     Reply with quote

You could save 6 instructons per loop if you used fastIO. I would prefer that to embedded assembly.
_________________
The search for better is endless. Instead simply find very good and get the job done.
PCM programmer



Joined: 06 Sep 2003
Posts: 21708

View user's profile Send private message

PostPosted: Tue Aug 23, 2011 1:15 pm     Reply with quote

In addition to the fast i/o, you could do the usual tactic of unrolling the
loop. You trade using more ROM, for more speed. Compile this and
look at the code. Within the bit code, I put in a delay_cycles(1) statement
to try to make the time be equal for either bit case (high or low data bit).

I also noted that you're sending the data LSB first. I don't know if you're
trying to emulate SPI, but SPI is usually sent MSB first. But anyway, I
kept it with LSB first in the example below.
Code:

#include <16F877.H>
#fuses XT, NOWDT, NOPROTECT, BROWNOUT, PUT, NOLVP
#use delay(clock=4000000)
#use rs232(baud=9600, xmit=PIN_C6, rcv=PIN_C7, ERRORS)

#define SDA PIN_B4
#define CLK PIN_B3


void spi_write_sw(int8 data)
{
// Set TRIS to output and set SDA and CLK low.
output_low(SDA);   
output_low(CLK);

#use fast_io(B)    // Temporarily use fast i/o for speed

//---------------------
// Send bit 0
if(bit_test(data, 0))
  {
   output_high(SDA);     
  }
else
  {
   output_low(SDA);
   delay_cycles(1);
  }

output_high(CLK);
output_low(CLK);

//---------------------
// Send bit 1
if(bit_test(data, 1))
  {
   output_high(SDA);     
  }
else
  {
   delay_cycles(1);
   output_low(SDA);
  }

output_high(CLK);
output_low(CLK);
//-----------------

// And continue with sections for bits 2, 3, 4, 5, 6, 7

#use standard_io(B)  // Return to standard i/o
}


//==========================================
void main()
{
spi_write_sw(0x55);

while(1);
}
henriquesv



Joined: 20 Feb 2011
Posts: 8

View user's profile Send private message

PostPosted: Tue Aug 23, 2011 3:26 pm     Reply with quote

I want to thank both of you. Great strategies indeed.
I'll make a few tests, compare the results and let you know.

Best regards.
henriquesv



Joined: 20 Feb 2011
Posts: 8

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 6:14 am     Reply with quote

Ok, here's the whole scenario:

First I sent you guys just a pick of the whole. I thought that solving a small problem would do for something bigger.

The thing is: I don't really have to follow "SPI protocols". I am sending 16 bits to a shift register and I want it to happen as fast as possible.

I am using a PIC16F628 an here's where I got after your help:

Code:

extern char   channel;
extern char   chunks[6];

union
{

  int16 full_data;
  struct
  {
    unsigned char control_x:8;
    unsigned char data_x:8;
  }parts;
} spi_data;

char  d = 0;
spi_data.full_data = 0x0080;

    #pragma use fast_io(A)

    spi_data.parts.control_x = spi_data.parts.control_x >> channel;   

    spi_data.parts.data_x = decode_table[chunks[channel]];

    // Stream de DSP
    for(d=0; d<16; d++)
    {
      output_bit(SDA, (spi_data.full_data & 0x0001));
      output_bit(CLK,1);
      output_bit(CLK,0);
      spi_data.full_data = spi_data.full_data >> 1;
    }


    channel++;
    if (channell>5)
    {
      channel=0;
    }   

    #pragma use standard_io(A)


This piece of code is taking me average 49 cycles.

I tried to follow your last trick unrolling the loop, but it takes about 10 cycles for each bit.

Do you guys see any other way to optimize this process?

Thank you!

Best Regards.
SherpaDoug



Joined: 07 Sep 2003
Posts: 1640
Location: Cape Cod Mass USA

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 9:21 am     Reply with quote

Something is wrong if the unrolled code is acutally slower. It may be very bulky but it should be very fast.
_________________
The search for better is endless. Instead simply find very good and get the job done.
Ttelmah



Joined: 11 Mar 2010
Posts: 19553

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 9:58 am     Reply with quote

I think the poster must be confusing codespace instructions, with operation cycles.
The code might well use 49 instructions, but when run, the centre twenty or so are repeated 16 times.
You need to run the code in something like MPLAB SIM, with the stopwatch function, and see how many cycles it uses. I'd guess about 350.
The unraveled version will be larger, but probably a good 50 to 100 instructions faster...
Have you also just tried using the CCS SPI software functions to do this?.

Best Wishes
temtronic



Joined: 01 Jul 2010
Posts: 9247
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 10:07 am     Reply with quote

just a comment.

If you're NOT using SPI protocols you might consider renaming your variables, etc. to something other than 'SPI'. Having SPI mentioned puts some of us older guys into the mindset that you're really using SPI which is not the case here. It'll take us down the wrong paths trying to figure out 'modes', data lengths, timings...that are SPI based and not relevant to your shift register I/O.
henriquesv



Joined: 20 Feb 2011
Posts: 8

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 10:20 am     Reply with quote

Hi temtronic,
You're right. I even changed the topic's name.

Ttelmah,
What I really did was to check every instruction used for a code block and then taking in account the number of cycles each one takes (checking PIC Datasheet).

But I am willing to run the code in something like MPLAB SIM. Never used it though, any tip where I should start looking?

Once I tried to use CCS SPI software functions but they seemed to be slower...

Thank you guys.

Best Regards.
PCM programmer



Joined: 06 Sep 2003
Posts: 21708

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 11:06 am     Reply with quote

The overhead of your for() loop is, by itself, about equal to the per bit
time of the example. There is no way your code is faster. Your
output_bit() line is, by itself, substantially longer than the per bit time
for my code. Compare the .LST files for each program.
henriquesv



Joined: 20 Feb 2011
Posts: 8

View user's profile Send private message

PostPosted: Wed Aug 24, 2011 4:28 pm     Reply with quote

Bottom line:
I want to thank you all for the tips and tricks.

Using Stopwatch I got:
391us - Using for looping.
263us - Using spi_xfer() CCS api.
And as PCM programmer told us... unrolling the loop got the best result:
127us (of course in an expense of some extra ROM) !

Problem solved.

My best wishes!
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group