|
|||||||
![]() |
|
|
Thread Tools |
|
|
#1 |
|
DriverHeaven Newbie
Join Date: Mar 2004
Posts: 8
Rep Power: 0 ![]() |
Optimized DSP code
I was trying to optimize a bit the DANE code... For the EQ filters (peaking, highshelf, lowshelf ...) it is possible to reduce from 18 instructions to 14. With the same declarations (but removing the temp variables) the code is:
;left macs 0x0, 0x0, inl, b0 macmv lx2, lx1, lx2, b2 macmv lx1, inl, lx1, b1 macmv ly2, ly1, ly2, a2 macs ly1, accum, ly1, a1 macints ly1, 0x0, ly1, sca macs outl, ly1, 0x0, 0x0 ;right macs 0x0, 0x0, inr, b0 macmv rx2, rx1, rx2, b2 macmv rx1, inr, rx1, b1 macmv ry2, ry1, ry2, a2 macs ry1, accum, ry1, a1 macints ry1, 0x0, ry1, sca macs outr, ry1, 0x0, 0x0 Then I was thinking that two of the lines, those that set 'outl' and 'outr', should be removed by an intelligent compiler when linking this block to another code block (simple copy propagation, which would be easy to implement). A more C style representation of DSP code could be more readable and a global code optimiser would be cool, but who has time to do it?
|
|
|
|
|
|
#2 |
|
kX Project Lead Programmer and Coordinator
Join Date: Dec 2002
Posts: 3,119
Rep Power: 75 ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
>> but who has time to do it
100% agreed /E |
|
|
|
|
|
#3 | |
|
kX Project DSP Engineer
Join Date: Dec 2002
Location: Denmark
Posts: 94
Rep Power: 0 ![]() |
Re: Optimized DSP code
Quote:
Will you test them? Cheers Soeren (original author) |
|
|
|
|
|
|
|
|
DriverHeaven Newbie
Join Date: Mar 2004
Posts: 8
Rep Power: 0 ![]() |
Of course I test all the code I write... except when writing on the black board
![]() I started to write a global "copy propagation" and "dead code elimination" optimizer, and it was able to remove some instructions, so this could be a good thing. Of course it's not finished due to lack of time... And just another little optimization... How to generate a sine wave in 3 instructions equivalent to the 5 instructions in Wave Generator 3.0 : macs 0, 0, f, s1 macmv s0, s1, s0, 0x80000000 macs s1, accum, f, s1 Also, I think that ramp generators should be used to interpollate parameters set externally, so that the sound does not 'click' everytime we change something (like when we change the volume). |
|
|
|
|
|
#5 |
|
h/h member-shmember
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69 ![]() ![]() ![]() ![]() ![]() ![]() |
>macmv s0, s1, s0, 0x80000000
btw. this will work on K2, but may not always work on K1 processor (the result (R) register of macmv should not be the same as X or Y registers - yes this is another "rule" ;) |
|
|
|
|
|
|
|
DriverHeaven Newbie
Join Date: Mar 2004
Posts: 8
Rep Power: 0 ![]() |
Well... If the instruction is correct on a k2 but is "undefined" on k1, then I think the instruction should be used and the compiler should expand it for the k1.
A direct possible expansion of: macmv a, b, c, d Is: macmv 0, 0, c, d macmv a, b, 0, 0 So in fact it would still take 4 instructions instead of 5 on a k1, and 3 on a k2. Of course this expansion is correct except if 'b' is 'accum', and would be used only if 'a' is the same as 'c' or 'd'. And it's very easy to add in the compiler (well, I have not seen the source, but it should be easy). Also, if there are other easy to verify "rules" like that, the compiler should generate warnings on them... |
|
|
|
|
|
#7 |
|
h/h member-shmember
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69 ![]() ![]() ![]() ![]() ![]() ![]() |
>And it's very easy to add in the compiler
the only correction... There're two (in kX) independent tools (although they are both hidden from user) "Compiler" and "Loader" "Compiler" ("Dane" assembler in kX) generates _portable_ ,_model independent_ and _position independent_ intermediate "object"... "Loader" translates this "object" into actual microcode to be loaded into DSP... so it's all up to "Loader" to generate any "model depended" (as well as "module inter-connection" and "module microcode offset") related fixes and optimizations... (since "Compiler" is also used to generate "object" for kxl plug-ins, and there it knows (at "compile-time") nothing about target processor model, microcode offset etc...) ---- well, of course this is "naming convention" matter only - it is not the killing case about where exactly optimization takes place (but "compile-time" vs. "run-time" optimizations model is easier to maintain) ^ [knowledgebase, dsp programming, low level] Last edited by Max M.; Mar 27, 2004 at 08:58 PM. |
|
|
|
|
|
|
|
DriverHeaven Newbie
Join Date: Mar 2004
Posts: 8
Rep Power: 0 ![]() |
If you realy want to be correct about the terms, you are not talking about "compile-time" vs. "run-time" but vs. "load-time"
And if you think of the whole process as a compiler, then the front-end and intermediate code generation is done at compile-time, and the back-end generation of real instructions is done at load-time. So it's part of the compilation process anyway...Doing optimization at run-time is something else; it would be self-modifing code, like we used to write 20 years ago, when resources were very limited. |
|
|
|
|
|
#9 |
|
h/h member-shmember
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69 ![]() ![]() ![]() ![]() ![]() ![]() |
well, if we are talking about kX DSP then my "run-time" term includes both "load-time" and "run-time"... Many optimizations and "fixups" should be made _after_ module is loaded (for example during modules interconnections and routing)... And this is "run-time" but not "self-modified code" ...
But again it is more like "linguistic" discussion - "terms" do not affect implementation... |
|
|
|
|
|
#10 |
|
h/h member-shmember
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69 ![]() ![]() ![]() ![]() ![]() ![]() |
In fact, the only reason i mentioned "compiler" vs "loader" difference is that "compiler" is a part of "user-level" library ("kxapi.dll") and therefore there's no problem making it open-source (for example the one can put compiler into a separate dll)...
Contrary, "Loader" is the part of "kernel-level" driver code ("kx.sys") and it's hard to make its code available for separate development (e.g. you need full driver sources to be able to test and debug "loader") - see other threads here for "kX and open-source story"... E.g. you can start to design and implement "advanced optimizing compiler" right now (and you do not depend on anything else...) The "Dane" is the _very basic_ non-optimizing assembler/translator and in fact, if i would plan to write new version i had to re-write it completely... (with no use of the present sources)... I think you've got the idea... Last edited by Max M.; Mar 28, 2004 at 10:56 PM. |
|
|
|
|
|
#11 |
|
h/h member-shmember
Join Date: Dec 2002
Location: Evil Empire
Posts: 2,639
Rep Power: 69 ![]() ![]() ![]() ![]() ![]() ![]() |
and, btw, a small tip...
(just for reference)although "copy propagation" mentioned above is the must have for "optimizing compiler", you can do that manually and use optimized code right today... Code:
macs 0x0, 0x0, inl, b0 macmv lx2, lx1, lx2, b2 macmv lx1, inl, lx1, b1 macmv ly2, outl, ly2, a2 macs outl, accum, outl, a1 macints outl, 0x0, outl, sca Last edited by Max M.; Mar 28, 2004 at 03:01 PM. |
|
|
|
|
|
#12 |
|
DriverHeaven Addict
Join Date: Dec 2002
Posts: 259
Rep Power: 0 ![]() |
Hi.
Holy, I have been looking for the Wavegenerator optimization that you sugest: The macmv is a good idea, but the algorithm needs a macints (that you have omited), and so macmv can't be used. And about the interpolated parameters, it is not possible with this algorithm, because it's impossible (I don't see the manner) to control the phase when you change the frecuency. Well, I have seen another algorithm which would be good for this pourpouse, but it would be less precise, and would use more resources. I'll try to write it, and see if it is suitable. |
|
|
|
|
|
#13 |
|
DriverHeaven Addict
Join Date: Dec 2002
Posts: 259
Rep Power: 0 ![]() |
Here it is: Is a bit larger, but produces a very good quality, and there are not phase problems when changing frecuency (no clicks). Saw and triangular are inside the code. Square is easily obtained form saw. They are not bandlimited.
Code:
name "Sinus"; copyright "by eYagos"; engine "kX"; created "01/04/2004"; comment "Sinus generator"; guid "e6580857-eb69-45a6-bb78-9ecf4f74eea1"; ; Registers output Sinus control fs=0.041666666666667 ; frecuency control = f / 24000 static saw1=0 temp saw,saw2,tmp ; Phase macwn saw1,saw1,fs,1 ;; saw tstneg tmp,saw1,saw1,0 macsn tmp,tmp,0.5,1 ;; Triangle ; Taylor aproximation macs saw,0,tmp,0.628318530718 macs saw2,0,saw,saw macs tmp,-0.124007936508,0.043058311287,saw2 macs tmp, 0.208333333333,tmp,saw2 macs tmp,-0.166666666667,tmp,saw2 macs tmp,0,tmp,saw2 macints tmp,0,tmp,0x19 macs tmp,1,tmp,1 macs tmp,0,tmp,saw macints tmp,0,tmp,0x5 ; Output macs Sinus,0,tmp,0.25 end Last edited by eyagos; Apr 1, 2004 at 10:40 AM. |
|
|
|
|
|
|
|
|
DriverHeaven Newbie
Join Date: Mar 2004
Posts: 8
Rep Power: 0 ![]() |
Quote:
tmp = f*s1; tmp = tmp - s0/2; tmp = tmp*2; s0 = s1; s1 = tmp; And my code does: accum = f*s1; accum = accum - s0; s0 = s1; s1 = accum + f*s1; The f*s1 is added twice (in the first and last instruction), which is how the multiplication by 2 is done. Both code will compute (after simplification): s1 = f*s1*2 - s0; I didn't had time to check your sine generator with correct variable frequency. Of course this is interresting, but this is more complicated than what I was thinking of. I was just saying that when you change the volume of a channel, or the master volume, it is not smooth at all; and when you plug the output of the card in a power amplifier (not a pre-amp with volume) this could be important if you want to change volume while music is playing. |
|
|
|
|
|
|
#15 |
|
DriverHeaven Addict
Join Date: Dec 2002
Posts: 259
Rep Power: 0 ![]() |
Ops, you are true. I didn't see that.
About the ramp generators you pourpused, I thinked you where talikng about the wave generator. But no, you where talking about something more general, excuse me. If we only think in volume parameters, I think that it would be easily implemented as a separated plugin. Something like this: Code:
input in0,in1 output out0,out1 control gain=1 static g=1 static smooth=0.01 interp g,g,smooth,gain macs out0,0,in0,g macs out1,0,in1,g |
|
|
|
![]() |
| Thread Tools | |
|
|