HardwareHeaven.com

HardwareHeaven.com

Looking for the skin chooser?
 
 
  • Home

  • Hardware reviews

  • Articles

  • News

  • Tools

  • Gaming at HardwareHeaven

  • Forums

 

Go Back   HardwareHeaven.com > Forums > Hardware and Related Topics > kX Project Audio Driver Support Forum > Effects and the DSP


Reply
 
Thread Tools
Old Mar 7, 2004, 04:58 AM   #1
DriverHeaven Newbie
 
Join Date: Mar 2004
Posts: 8
Rep Power: 0
Holy Av is on a distinguished road

Optimized DSP code

I was trying to optimize a bit the DANE code... For the EQ filters (peaking, highshelf, lowshelf ...) it is possible to reduce from 18 instructions to 14. With the same declarations (but removing the temp variables) the code is:
;left
macs 0x0, 0x0, inl, b0
macmv lx2, lx1, lx2, b2
macmv lx1, inl, lx1, b1
macmv ly2, ly1, ly2, a2
macs ly1, accum, ly1, a1
macints ly1, 0x0, ly1, sca
macs outl, ly1, 0x0, 0x0
;right
macs 0x0, 0x0, inr, b0
macmv rx2, rx1, rx2, b2
macmv rx1, inr, rx1, b1
macmv ry2, ry1, ry2, a2
macs ry1, accum, ry1, a1
macints ry1, 0x0, ry1, sca
macs outr, ry1, 0x0, 0x0

Then I was thinking that two of the lines, those that set 'outl' and 'outr', should be removed by an intelligent compiler when linking this block to another code block (simple copy propagation, which would be easy to implement). A more C style representation of DSP code could be more readable and a global code optimiser would be cool, but who has time to do it?
Holy Av is offline   Reply With Quote


Old Mar 8, 2004, 12:24 AM   #2
kX Project Lead Programmer and Coordinator
 
Join Date: Dec 2002
Posts: 3,119
Rep Power: 75
Eugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud ofEugene Gavrilov has much to be proud of

>> but who has time to do it

100% agreed

/E
Eugene Gavrilov is offline   Reply With Quote
Old Mar 8, 2004, 01:18 PM   #3
kX Project DSP Engineer
 
Join Date: Dec 2002
Location: Denmark
Posts: 94
Rep Power: 0
Soeren_B is on a distinguished road

Re: Optimized DSP code

Quote:
Originally posted by Holy Av
I was trying to optimize a bit the DANE code... For the EQ filters (peaking, highshelf, lowshelf ...) it is possible to reduce from 18 instructions to 14. With the same declarations (but removing the temp variables) the code is:
Sweet! These optimized danes should be used instead of the current ones if they work as they are supposed to

Will you test them?

Cheers
Soeren (original author)
Soeren_B is offline   Reply With Quote
Old Mar 27, 2004, 01:56 AM Threadstarter Thread Starter   #4
DriverHeaven Newbie
 
Join Date: Mar 2004
Posts: 8
Rep Power: 0
Holy Av is on a distinguished road

Of course I test all the code I write... except when writing on the black board

I started to write a global "copy propagation" and "dead code elimination" optimizer, and it was able to remove some instructions, so this could be a good thing. Of course it's not finished due to lack of time...

And just another little optimization... How to generate a sine wave in 3 instructions equivalent to the 5 instructions in Wave Generator 3.0 :
macs 0, 0, f, s1
macmv s0, s1, s0, 0x80000000
macs s1, accum, f, s1

Also, I think that ramp generators should be used to interpollate parameters set externally, so that the sound does not 'click' everytime we change something (like when we change the volume).
Holy Av is offline   Reply With Quote
Old Mar 27, 2004, 04:18 PM   #5
h/h member-shmember
 
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69
Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!

>macmv s0, s1, s0, 0x80000000

btw. this will work on K2, but may not always work on K1 processor
(the result (R) register of macmv should not be the same as X or Y registers - yes this is another "rule" ;)
Max M. is offline   Reply With Quote
Old Mar 27, 2004, 07:25 PM Threadstarter Thread Starter   #6
DriverHeaven Newbie
 
Join Date: Mar 2004
Posts: 8
Rep Power: 0
Holy Av is on a distinguished road

Well... If the instruction is correct on a k2 but is "undefined" on k1, then I think the instruction should be used and the compiler should expand it for the k1.
A direct possible expansion of:
macmv a, b, c, d

Is:
macmv 0, 0, c, d
macmv a, b, 0, 0

So in fact it would still take 4 instructions instead of 5 on a k1, and 3 on a k2.

Of course this expansion is correct except if 'b' is 'accum', and would be used only if 'a' is the same as 'c' or 'd'. And it's very easy to add in the compiler (well, I have not seen the source, but it should be easy). Also, if there are other easy to verify "rules" like that, the compiler should generate warnings on them...
Holy Av is offline   Reply With Quote
Old Mar 27, 2004, 08:09 PM   #7
h/h member-shmember
 
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69
Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!

>And it's very easy to add in the compiler

the only correction...
There're two (in kX) independent tools (although they are both hidden from user) "Compiler" and "Loader"

"Compiler" ("Dane" assembler in kX) generates _portable_ ,_model independent_ and _position independent_ intermediate "object"...
"Loader" translates this "object" into actual microcode to be loaded into DSP...

so it's all up to "Loader" to generate any "model depended" (as well as "module inter-connection" and "module microcode offset") related fixes and optimizations...

(since "Compiler" is also used to generate "object" for kxl plug-ins, and there it knows (at "compile-time") nothing about target processor model, microcode offset etc...)

----
well, of course this is "naming convention" matter only - it is not the killing case about where exactly optimization takes place (but "compile-time" vs. "run-time" optimizations model is easier to maintain)

^ [knowledgebase, dsp programming, low level]

Last edited by Max M.; Mar 27, 2004 at 08:58 PM.
Max M. is offline   Reply With Quote
Old Mar 28, 2004, 12:49 PM Threadstarter Thread Starter   #8
DriverHeaven Newbie
 
Join Date: Mar 2004
Posts: 8
Rep Power: 0
Holy Av is on a distinguished road

If you realy want to be correct about the terms, you are not talking about "compile-time" vs. "run-time" but vs. "load-time" And if you think of the whole process as a compiler, then the front-end and intermediate code generation is done at compile-time, and the back-end generation of real instructions is done at load-time. So it's part of the compilation process anyway...

Doing optimization at run-time is something else; it would be self-modifing code, like we used to write 20 years ago, when resources were very limited.
Holy Av is offline   Reply With Quote
Old Mar 28, 2004, 01:11 PM   #9
h/h member-shmember
 
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69
Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!

well, if we are talking about kX DSP then my "run-time" term includes both "load-time" and "run-time"... Many optimizations and "fixups" should be made _after_ module is loaded (for example during modules interconnections and routing)... And this is "run-time" but not "self-modified code" ...
But again it is more like "linguistic" discussion - "terms" do not affect implementation...
Max M. is offline   Reply With Quote
Old Mar 28, 2004, 02:00 PM   #10
h/h member-shmember
 
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69
Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!

In fact, the only reason i mentioned "compiler" vs "loader" difference is that "compiler" is a part of "user-level" library ("kxapi.dll") and therefore there's no problem making it open-source (for example the one can put compiler into a separate dll)...
Contrary, "Loader" is the part of "kernel-level" driver code ("kx.sys") and it's hard to make its code available for separate development (e.g. you need full driver sources to be able to test and debug "loader") - see other threads here for "kX and open-source story"...

E.g. you can start to design and implement "advanced optimizing compiler" right now (and you do not depend on anything else...)
The "Dane" is the _very basic_ non-optimizing assembler/translator and in fact, if i would plan to write new version i had to re-write it completely... (with no use of the present sources)...
I think you've got the idea...

Last edited by Max M.; Mar 28, 2004 at 10:56 PM.
Max M. is offline   Reply With Quote
Old Mar 28, 2004, 02:18 PM   #11
h/h member-shmember
 
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69
Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!Max M. is just super!

and, btw, a small tip... (just for reference)

although "copy propagation" mentioned above is the must have for "optimizing compiler", you can do that manually and use optimized code right today...
Code:
macs 0x0, 0x0, inl, b0
macmv lx2, lx1, lx2, b2
macmv lx1, inl, lx1, b1
macmv ly2, outl, ly2, a2
macs outl, accum, outl, a1
macints outl, 0x0, outl, sca
This code will work without any problems (but note K1's "macmv" limitations) until being connected to "epiloglt_xx" module. Anyway those "lite" versions of epilog are _hardly_ not recommended to use - they cause tons of weird side-effects...

Last edited by Max M.; Mar 28, 2004 at 03:01 PM.
Max M. is offline   Reply With Quote
Old Apr 1, 2004, 06:17 AM   #12
DriverHeaven Addict
 
Join Date: Dec 2002
Posts: 259
Rep Power: 0
eyagos is on a distinguished road

Hi.

Holy, I have been looking for the Wavegenerator optimization that you sugest: The macmv is a good idea, but the algorithm needs a macints (that you have omited), and so macmv can't be used.

And about the interpolated parameters, it is not possible with this algorithm, because it's impossible (I don't see the manner) to control the phase when you change the frecuency. Well, I have seen another algorithm which would be good for this pourpouse, but it would be less precise, and would use more resources. I'll try to write it, and see if it is suitable.
eyagos is offline   Reply With Quote
Old Apr 1, 2004, 10:32 AM   #13
DriverHeaven Addict
 
Join Date: Dec 2002
Posts: 259
Rep Power: 0
eyagos is on a distinguished road

Here it is: Is a bit larger, but produces a very good quality, and there are not phase problems when changing frecuency (no clicks). Saw and triangular are inside the code. Square is easily obtained form saw. They are not bandlimited.

Code:
name "Sinus";
copyright "by eYagos";
engine "kX";
created "01/04/2004";
comment "Sinus generator";
guid "e6580857-eb69-45a6-bb78-9ecf4f74eea1";

; Registers
output Sinus
control fs=0.041666666666667 ; frecuency control = f / 24000
static   saw1=0
temp    saw,saw2,tmp

; Phase
macwn saw1,saw1,fs,1  ;; saw  
tstneg  tmp,saw1,saw1,0
macsn  tmp,tmp,0.5,1 ;; Triangle
; Taylor aproximation      
macs    saw,0,tmp,0.628318530718
macs    saw2,0,saw,saw 
macs    tmp,-0.124007936508,0.043058311287,saw2 
macs    tmp, 0.208333333333,tmp,saw2
macs    tmp,-0.166666666667,tmp,saw2
macs    tmp,0,tmp,saw2
macints tmp,0,tmp,0x19  
macs    tmp,1,tmp,1 
macs    tmp,0,tmp,saw
macints tmp,0,tmp,0x5 
; Output
macs    Sinus,0,tmp,0.25
end

Last edited by eyagos; Apr 1, 2004 at 10:40 AM.
eyagos is offline   Reply With Quote
Old Apr 6, 2004, 05:10 AM Threadstarter Thread Starter   #14
DriverHeaven Newbie
 
Join Date: Mar 2004
Posts: 8
Rep Power: 0
Holy Av is on a distinguished road

Quote:
Originally posted by eyagos
[...] Holy, I have been looking for the Wavegenerator optimization that you sugest: The macmv is a good idea, but the algorithm needs a macints (that you have omited), and so macmv can't be used. [...]
I just wanted to say that the multiplication by 2 is there, even if there is no macints (I removed the macints for optimization, by replacing it with something else...). If you look at the original code, what it does is:
tmp = f*s1;
tmp = tmp - s0/2;
tmp = tmp*2;
s0 = s1;
s1 = tmp;

And my code does:
accum = f*s1;
accum = accum - s0; s0 = s1;
s1 = accum + f*s1;

The f*s1 is added twice (in the first and last instruction), which is how the multiplication by 2 is done. Both code will compute (after simplification):
s1 = f*s1*2 - s0;

I didn't had time to check your sine generator with correct variable frequency. Of course this is interresting, but this is more complicated than what I was thinking of. I was just saying that when you change the volume of a channel, or the master volume, it is not smooth at all; and when you plug the output of the card in a power amplifier (not a pre-amp with volume) this could be important if you want to change volume while music is playing.
Holy Av is offline   Reply With Quote
Old Apr 6, 2004, 08:17 PM   #15
DriverHeaven Addict
 
Join Date: Dec 2002
Posts: 259
Rep Power: 0
eyagos is on a distinguished road

Ops, you are true. I didn't see that.

About the ramp generators you pourpused, I thinked you where talikng about the wave generator. But no, you where talking about something more general, excuse me. If we only think in volume parameters, I think that it would be easily implemented as a separated plugin. Something like this:


Code:
	input   in0,in1
	output  out0,out1
	control gain=1
	static  g=1
	static  smooth=0.01
	
	interp  g,g,smooth,gain
	macs    out0,0,in0,g
	macs    out1,0,in1,g
eyagos is offline   Reply With Quote
Reply

Thread Tools