|
|
View previous topic :: View next topic |
Author |
Message |
dluu13
Joined: 28 Sep 2018 Posts: 395 Location: Toronto, ON
|
printf vs char array assignment speed |
Posted: Thu May 16, 2019 8:01 am |
|
|
I have been looking at using BCD conversion because of another thread on this forum because of its speed benefits over scaling by div/10 and even over %lw and I came across a peculiar problem. I started to describe it here in the other thread:
http://www.ccsinfo.com/forum/viewtopic.php?p=224145#224145
Here is the code again:
Code: | /*
* File: CuriosityPrint.c
* Author: dluu
*
* Created on Apr 5, 2019
*/
#include<24FJ128GA204.h>
#FUSES NOWDT, NODEBUG, NOWRT, NOPROTECT, NOJTAG, ICSP1
#FUSES NOLVR, NOBROWNOUT, NOIOL1WAY, NODSBOR, NODSWDT
#FUSES NOALTCMPI, FRC_PLL, PLL_FROM_FRC, PLL8X
#PIN_SELECT U3RX=PIN_B5
#PIN_SELECT U3TX=PIN_B6
#USE DELAY(clock=32MHZ)
#USE RS232(BAUD=115200, UART3, BITS=8, PARITY=N, STOP=1, STREAM=PC, ERRORS, RECEIVE_BUFFER=128)
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
// slightly modified from gaugeguy on CCS forum
uint32_t Int16toBCD5(uint16_t local_convert)
{
//converts 16bit value, to four BCD digits. Tries to do it fairly
//efficiently, both in size, and speed.
uint16_t bit_cnt = 16;
uint32_t BCD;
BCD = 0;
{
do
{
if ((BCD & 0x0000000F) >= 0x00000005) BCD += 0x00000003;
if ((BCD & 0x000000F0) >= 0x00000050) BCD += 0x00000030;
if ((BCD & 0x00000F00) >= 0x00000500) BCD += 0x00000300;
if ((BCD & 0x0000F000) >= 0x00005000) BCD += 0x00003000;
if ((BCD & 0x000F0000) >= 0x00050000) BCD += 0x00030000;
// if ((BCD & 0x00F00000) >= 0x00500000) BCD += 0x00300000;
// if ((BCD & 0x0F000000) >= 0x05000000) BCD += 0x03000000;
// if ((BCD & 0xF0000000) >= 0x50000000) BCD += 0x30000000;
shift_left(&BCD, 3, shift_left(&local_convert, 2, 0));
}
while (--bit_cnt != 0);
}
return BCD;
}
#define BCDNIBBLES 5
void printScaledBCD(uint16_t num, uint8_t decimalPlaces)
{
uint32_t BCD5 = Int16toBCD5(num);
if (decimalPlaces == BCDNIBBLES) fprintf(PC, "0");
for (int i = 0; i < BCDNIBBLES; ++i)
{
if (BCDNIBBLES - i == decimalPlaces) fprintf(PC, ".");
fprintf(PC, "%x", (BCD5 >> ((BCDNIBBLES - 1 - i) << 2))&0x0F);
}
}
void ScaledBCDtoStr(uint16_t num, uint8_t decimalPlaces, char * buf) // very slow...
{
uint32_t BCD5 = Int16toBCD5(num);
uint8_t decimal = 0;
uint8_t j = 0;
if (decimalPlaces > 0) decimal = 1;
if (decimalPlaces == BCDNIBBLES) buf[j++] = '0';
for (int i = 0; i < BCDNIBBLES+decimal; ++i)
{
if (BCDNIBBLES - i == decimalPlaces) buf[j++] = '.';
buf[j] = ((BCD5 >> ((BCDNIBBLES - 1 - i) << 2))&0x0F)+0x30;
++j;
}
buf[BCDNIBBLES+decimal] = '\0';
}
int main(void)
{
uint16_t test[] = {11, 222, 3333, 44444, 55555, 12222, 23333, 34444, 45555};
delay_ms(100);
fprintf(PC, "\r\n\r\n");
fprintf(PC, "test BCD dec: ");
output_high(PIN_B8);
for (int i = 0; i < 9; ++i)
{
printScaledBCD(test[i], 3);
fprintf(PC, ",");
}
output_low(PIN_B8); // 5.5 ms
fprintf(PC, "\r\n");
fprintf(PC, "test BCD str: ");
char bcdstr[10];
output_high(PIN_B9);
for (int i = 0; i < 9; ++i)
{
ScaledBCDtoStr(test[i], 3, bcdstr);
fprintf(PC, "%s,", bcdstr);
}
output_low(PIN_B9); // over 200 ms...
fprintf(PC, "\r\n");
while (1)
{
}
return 0;
} |
In this code, I have a function to directly print out the BCD and inserting a decimal point where I want one, and another to do the same thing but instead of printing it out, it puts it into a char array.
The puzzling thing is that the one that puts the BCD into a char array is about 40 times slower!
I have found out that in the char array method, it is
Code: | buf[j] = ((BCD5 >> ((BCDNIBBLES - 1 - i) << 2))&0x0F)+0x30; |
that's making it slow, but I don't know why. I use this in my fprintf as well, except without the +0x30 at the end.
Looking at the .lst file, the two are nearly identical.
Code: | .................... fprintf(PC, "%x", (BCD5 >> ((BCDNIBBLES - 1 - i) << 2))&0x0F);
00AC0: MOV #4,W4
00AC2: MOV 8C4,W3
00AC4: SUB W4,W3,W5
00AC6: SL W5,#2,W0
00AC8: MOV W0,W4
00ACA: MOV 8C0,W5
00ACC: MOV 8C2,W6
00ACE: INC W4,W4
00AD0: DEC W4,W4
00AD2: BRA Z,ADA
00AD4: LSR W6,W6
00AD6: RRC W5,W5
00AD8: BRA AD0
00ADA: AND W5,#F,W5
00ADC: CLR W6
00ADE: MOV W5,W0
00AE0: MOV W6,W1
00AE2: MOV #0,W2
00AE4: MOV #0,W3
00AE6: MOV #2710,W4
00AE8: CALL A36
00AEC: INC 08C4
00AEE: GOTO AA4
.................... buf[j] = ((BCD5 >> ((BCDNIBBLES - 1 - i) << 2))&0x0F)+0x30;
00B56: MOV 8C2,W4
00B58: CLR.B 9
00B5A: MOV W4,W0
00B5C: ADD 8C0,W0
00B5E: MOV W0,W5
00B60: MOV #4,W4
00B62: MOV 8C8,W3
00B64: SUB W4,W3,W6
00B66: SL W6,#2,W0
00B68: MOV W0,W4
00B6A: MOV 8C4,W6
00B6C: MOV 8C6,W7
00B6E: INC W4,W4
00B70: DEC W4,W4
00B72: BRA Z,B7A
00B74: LSR W7,W7
00B76: RRC W6,W6
00B78: BRA B70
00B7A: AND W6,#F,W6
00B7C: CLR W7
00B7E: MOV #30,W4
00B80: ADD W6,W4,W0
00B82: ADDC W7,#0,W1
00B84: MOV.B W0L,[W5] |
Anybody have an idea why it's so much slower?
If instead of
Code: | buf[j] = ((BCD5 >> ((BCDNIBBLES - 1 - i) << 2))&0x0F)+0x30; |
I put
then instead of over 200 ms, it only takes 5.4 ms. |
|
|
gaugeguy
Joined: 05 Apr 2011 Posts: 303
|
|
Posted: Thu May 16, 2019 8:27 am |
|
|
Doing a rotate (or shift) by a variable amount takes a lot of time for the calculations and looping. Doing a rotate by a fixed value is very fast. |
|
|
dluu13
Joined: 28 Sep 2018 Posts: 395 Location: Toronto, ON
|
|
Posted: Thu May 16, 2019 8:48 am |
|
|
I'll try using a ptr increment as you suggested a little later.
As for shifting by a variable amount, I do it in my print function and it nowhere as slow as doing it in my string function... |
|
|
dluu13
Joined: 28 Sep 2018 Posts: 395 Location: Toronto, ON
|
|
Posted: Thu May 16, 2019 10:35 am |
|
|
It's certainly not pretty, but it works. Using this function instead of the old one will decrease the time to print all 9 numbers in my array from 222 ms to 5.4 ms.
Code: | void ScaledBCDtoStr(uint16_t num, uint8_t decimalPlaces, char * buf) // very slow...
{
uint32_t BCD5 = Int16toBCD5(num);
uint8_t decimal = 0;
uint8_t j = 0;
char* ptr = buf;
if (decimalPlaces > 0) decimal = 1;
if (decimalPlaces == BCDNIBBLES) *ptr++ = '0';
for (int i = 0; i < BCDNIBBLES+decimal; ++i)
{
if (BCDNIBBLES - i == decimalPlaces) *ptr++ = '.';
switch(i)
{
case 0:
*ptr++ = ((BCD5 >> 16)&0x0F)+0x30;
break;
case 1:
*ptr++ = ((BCD5 >> 12)&0x0F)+0x30;
break;
case 2:
*ptr++ = ((BCD5 >> 8)&0x0F)+0x30;
break;
case 3:
*ptr++ = ((BCD5 >> 4)&0x0F)+0x30;
break;
case 4:
*ptr++ = ((BCD5 >> 0)&0x0F)+0x30;
break;
}
}
*ptr = '\0';
} |
Thanks for your comment, gaugeguy and Ttelmah in the other thread. |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9245 Location: Greensville,Ontario
|
|
Posted: Thu May 16, 2019 3:06 pm |
|
|
Wow 40X faster !!! Who cares about 'pretty' ? Just add a few comments so 3 days or 3 hrs (for me ) from now you'll say, yes, that's what that means.....' |
|
|
dluu13
Joined: 28 Sep 2018 Posts: 395 Location: Toronto, ON
|
|
Posted: Thu May 16, 2019 3:08 pm |
|
|
The mystery is that I use variable shifts in the function that prints instead of writing to a string, and that runs just as fast this one... |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19548
|
|
Posted: Fri May 17, 2019 12:56 am |
|
|
Look to find what is different:
1) Is the variable a local 'fixed' variable, or itself addressed by a pointer
or as an array?. Indexed addressing costs a lot.
2) Is the variable a byte, int16 or int32?. The bigger the variable, the more
the overhead is.
Variable shifts always cost a lot, but the overhead more than doubles when
you go to int16, and leaps up when indexed addressing is used. I'm
guessing you will find that something about the nature of the operation
is actually different. |
|
|
dluu13
Joined: 28 Sep 2018 Posts: 395 Location: Toronto, ON
|
|
Posted: Fri May 17, 2019 7:17 am |
|
|
I must be really missing something here as to why it's not working...
Anyway since it's working now I might revisit it some other time. There's a lot of other work to do :D |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|