memset efficiency improvement ?

bamby · Joined: 31 Oct 2019 Posts: 2

hi all
using dsPIC33FJ64GS610
CCS 5.073

trying to clear an array:

jeremiah · Joined: 20 Jul 2010 Posts: 1354

You can make it more efficient for you and "in general", but making it more efficient always is not possible without having specific memset() operations for all potential cases and you manually choosing the right one.

What a lot of other platforms do is look at the address and length of the destination and see how it lines up with the alignment of the processor, and then pick an algorithm based on it. For example, if result were an 8 bit array and started at an odd address, then word by word copies might cause a trap interrupt to occur on some chips (word operation on an odd address). so to accommodate this, the algorithm would check to see if the address is odd or even, then if it were odd, do a byte by byte copy, and if it were even, do a word by word copy. Not done yet though. If the length of the array were odd then it would need logic to detect that and only do word by word until then end and then do a quick byte copy. Still not done though because it is often times faster to do loop unrolling, so they might also throw in additional logic to see if the array is big enough to do the copying using an unrolled loop for part of the copy.

The end result is that when you have an array that starts on an even address and is larger than a specific length, the copying is faster. Otherwise, all the logic checks make the other scenarios slower.

The other trade off is you just replaced 4 lines of assembly with 100s of lines of assembly, so there is also a space tradeoff to consider.

Side note: for memcpy() it is even more convoluted because all the same questions apply to the source array as well as the destination array AND you have to account for things like when one starts on an odd address and the other starts on an even address among other things.

If you are interested, this has nothing to do with PICs specifically, but all types of processors. It is a link to some attempts to benchmark optimizing memcpy():
https://www.embedded.com/optimizing-memcpy-improves-speed/

temtronic · Posted: Thu Oct 31, 2019 11:29 am

While I don't use that PIC, you could hardcode in assembly to clear 256 words, as that PIC is 'word' based.

Instead of using memset(), try a for(....) loop that puts 0 into each element of the array. The compiler may code it 'better/faster' than you memset() method.

If I could I'd code as it's raining here for the next day or so....

Ttelmah · Joined: 11 Mar 2010 Posts: 19539

Yes, the problem is that they have designed the memset to be generic,
so it has to be able to cope with odd numbers of bytes. Hence byte based.
They perhaps need to code a word_memset function. Not hard to code
actually. If I'm feeling bored tomorrow, will try to put a version together.

Ttelmah · Joined: 11 Mar 2010 Posts: 19539

OK. Crude 'word_clear' function. Only sets to 0, and needs the buffer
to be word aligned:

bamby · Joined: 31 Oct 2019 Posts: 2

Thanks, Ttelmah
looks perfect Smile