math speed help.

curt2go · Joined: 21 Nov 2003 Posts: 200

I have a routine that I need to make faster if possible. It's an older routine which I got some good tips to get it this fast to begin with, like using the wrapper union.
-- Processor is a 24EP256GP206 running @ 140MHz.
-- Newest compiler.
-- The scales are supposed to add up to 65535 in normal cases.

-- I am audio summing and normalizing the sound from 3 files.

-- Right now this routine takes 1.6ms in the real world. I can get it way down if I set the scales to an unsigned int16 but then the math does not work properly.

-- Any help would be awesome.

PCM programmer · Joined: 06 Sep 2003 Posts: 21708

Post the .LST file code for the for() loop only. Then we can see what might be done.

Ttelmah · Joined: 11 Mar 2010 Posts: 19544

Are the values ever actually 'signed'?.

It takes significantly more time to 'extend' a 16bit signed value to 32bits
than it does to do the same for an unsigned value.

Your code involves taking 1536 signed int16 values, extending each to
32bits, then performing a signed int32 multiply. If you change the values
to all be unsigned (in the declarations and in the unions), you will see perhaps
a 25% improvement straight away.

The scales need to be 32bit, since it is the fact that these are 32bit, that
forces 32bit maths to be used. If you change these to int16, you only use
16bit maths, and since you are using the upper 16bits of the 32bit result,
this will never work. You could use unsigned int16 for these, but only
by casting them to int32 when used, which is another operation and will
slow things more.

curt2go · Joined: 21 Nov 2003 Posts: 200

i will post the lst file section.

Yes I do have negative numbers since its audio files. I'm not sure there is any way to treat them as unsigned or not. Been racking my brain on this one.

Even trying not using a range between 0-65535 and going to 0-16 for the scales so I could just do a bitwise multiplication but it seems that takes a long time as well. Probably once again because of the signed numbers.

curt2go · Joined: 21 Nov 2003 Posts: 200

alan · Joined: 12 Nov 2012 Posts: 357 Location: South Africa

Change your scale declaration to int16 instead of int32.

Loop gets much smaller, hardware will return int32 when multiplying two int16.

Ttelmah · Joined: 11 Mar 2010 Posts: 19544

No it won't.
If you multiply two int16 values you get an int16 result.
This is not a chip with a hardware maths unit. Overflow has to be done
in software not hardware.

alan · Joined: 12 Nov 2012 Posts: 357 Location: South Africa

Sorry then if I created confusion, I have read this then wrong:

Ttelmah · Joined: 11 Mar 2010 Posts: 19544

Key is part of the data sheet:

alan · Joined: 12 Nov 2012 Posts: 357 Location: South Africa

OK missed that, thanks Ttelmah.
Could you maybe explain what is the difference then between these instructions, except that one uses the accumulator and the other one two 16 bit registers, might just run into problems in the future and at least I then wouldn't have to pull my hair out Very Happy

Ttelmah · Joined: 11 Mar 2010 Posts: 19544

Testing with a 24EP256MC206, that has the extended instruction, gives
841uSec runtime for the loop. So nearly twice as fast.
Now though this chip doesn't support this, it does have the ability
to generate a 32bit reply, using a register pair. So decided to code this:

curt2go · Joined: 21 Nov 2003 Posts: 200

Wow. Thank you. I will test this tonight. You are a mad man! That is so fast!!!

curt2go · Joined: 21 Nov 2003 Posts: 200

Well i snuck in a test at work... Smile

Looks like it is as fast as its supposed to be, but one thing i noticed is that with a full scale of 65535 , this will give double the value. It looks like this routine max scale is 32768, which is totally fine with me. I can just quickly scale the scales before entering the routine. Unless you have a very quick fix, but I can't see where the issue is. Once again thank you for your awesome efforts here. Its amazing!

Ttelmah · Joined: 11 Mar 2010 Posts: 19544

Since the multiply is of a signed value, the result has to be signed.
This is then in the top 16bits of the result, and copied out, so yes, the
result can only support up to 32768 as a result (and to -32767).
The 'parts' in the union really should be declared as signed int16, otherwise
sign will not be maintained in the addition.

curt2go · Joined: 21 Nov 2003 Posts: 200

That is just fine. I will adjust the scales accordingly. Thank you again for all your efforts.