|
|||||||
![]() |
|
|
Thread Tools |
|
|
#1 |
|
HardwareHeaven Senior Member
Join Date: Jul 2004
Posts: 452
Rep Power: 0 ![]() |
Eugene, what's the "weak link" in Audigy's ASIO performance?
Hi! I would just like to ask Eugene if Audigy's ASIO performance can be improved (even more) or if it there's some hardware limitation to reach 1,5ms reliably, for instance.
I recently upgraded from a 2000mhz AXP, 512 single channel system to an A64@2500mhz, 1Gb dual-channel, and the minimum usable ASIO latency remained in 2.67ms (with heavy load). This is with APIC off, fresh install on both configurations. I had to activate APIC later on the A64 system because otherwise the raid wouldn't work. [color=DarkRed](note: Amazingly, even with a relatively recent board like MSI K8N neo2 and a A64@2500, APIC off mode provides better ASIO performance (at least with my Audigy 2).)[/color] With APIC on the minimum I can run my audigy 2 with no glitches under heavy load is 4ms. It's curious, I have two installations of windows: - A stripped down installation, no firewall, services disabled, all tuned up for music. - A normal office/home installation, full of stuff, antivirus, firewall, fancy crap, games, etc. In both installations the minimal usable latency (with no glitches) is the same: 4 ms. Therefore I conclude the bottleneck is somewhere else. (Unfortunaly I can't use APIC off anymore, since my raid0 needs apic on to be recognized by windows.) Thanks in advance for any possible reply to my question.
Last edited by JGSF; Nov 21, 2005 at 04:21 AM. |
|
|
|
|
|
#2 |
|
Tail Razer
Join Date: Jun 2005
Location: Bernyurass, AZ - USA
Posts: 4,027
Rep Power: 50 ![]() ![]() |
Although Im no expert Eugene is, but our world is ruled by physics...
At the most basic level, it boils down to the speed in which electrons flow through a conductor (~ the speed of light or slower) and a PC can have miles of wire/conductors that them poor electrons to have to travel through. With that said... besides the BIOS and OS 'overhead' needed to keep things going, there are PCI timing and RAM latencies, HDD speed - so theres a lot of potential hardware 'brickwalls'. If you have 2.67ms or even 4ms latency- be happy, I think it'll be a bit before thats reliably and/or economically surpassed in a PC. The day will come (I think) that CPU/s and DSPs will be able to perform multple instructions per clock cycle as opposed to a single instruction, some requiring multiple clock cycles to execute. Dual core / multi CPU's are just now (economically) starting to move in this direction. It will be then, latencies lower than 1ms can be possible as it seems buss and device speeds have approached their limits. |
|
|
|
|
|
|
|
HardwareHeaven Senior Member
Join Date: Jul 2004
Posts: 452
Rep Power: 0 ![]() |
You are right, but memory latencies and disk speed(in fact disk speed doesn't affect ASIO playback in my case, since all my samples are stored in the RAM) are not the bottleneck here. In fact, I was running 2.67ms in a previous system with half the memory bandwith (I believe that 2000mb/s is more than enough for sample playback, and I presently have 5000mb/s).
Although I'm no expert I think that Athlon XP's execute 2 instructions per cicle, something like that, but please correct me if I am wrong. As for pci latency, well... I would rather say bandwith, that could be an explanation for even in a more powerful system the card won't go below 2,67 reliably, but still there are some other cards that can do 1.5ms, such as the Audiophile 192... Maybe because they have less circuitry than the Audigy? Just my guess... who knows? What I mean with my question is, what can *eventually* be the bottleneck in the Audigy card itself. cheers
|
|
|
|
|
|
#4 |
|
Tail Razer
Join Date: Jun 2005
Location: Bernyurass, AZ - USA
Posts: 4,027
Rep Power: 50 ![]() ![]() |
Oh, ok - you just mention ASIO performance - most of us record to a HDD, and then also playback that way. But I see now what you are talking about...
Is the A64 or AXP a multi-core CPU? - if not, it can only execute one instruction at a time.. Unless of course your refering to the SSE,MMX instructions that may have 'replaced' whole routines required by the older CPU's, but I am not counting that. As data buss widths increase along with multi-core CPUs- will eventually eliminate some the over heads involved we see now - but we have M$ working very diligently, making sure the OS eats up those performance gains. Ive always said the software markets drive the hardware markets - gaming and graphics/animation in particular. Us sound guys of course, benifit as a side effect. I remeber when win95 first came out - a big selling point was its ability for multi-tasking. I thought that its impossible for it to 'multi-task' - as CPUs can only execute 1 instruction at a time. I always seen M$ for what they are since that blatent marketing LIE to sell their new OS. It should have said 'better at task managing than win 31'. But thats less dramatic. So, let me change my wording on my vision... when a single core cpu can say, perform an advanced mathmatic operation, read the Interupt controllers registers and transfer a block of ram from 1 location to another - all in 1 clock cycle - will be a good day. Impossible (um, I mean Impractical) now, I know. Multi-core CPUs are the step in that direction. Larger data busses will be required to provide all the data needed to supply all the aruements as well as all those commands to be performed at one time. I have also dreamed of CPU's 256 bit datapaths with integrated hardware compression that would allow a CPU to read instructions and arguments much like .RAR files can encapsulate serveral 'commands and arguments' as well as 'split them into '256 bit volumes' that would be read and executed by the cpu in 1 clock cycle. enough of my dreams/vision - sorry if it bored you.. ![]() Id think the HDD could still affect your performance if the OS uses a swap file.?? Also, bandwidth and Latency are not the same. For instance, I may be able to acheive 10GB/sec transfers, but it may take several clock cycles to initiate the transfer - like the difference between the Transfer Rate (Bandwidth) and The seek time (Latency) on a HDD. Not sure how that applies to RAM - but the CAS (I thought) was the RAMS seek and respond time. Which would equate to a latency (albiet small, but we have to add/stack all this up for the total we see in real life application) A different design, faster DSP, faster ADC/DAC's could also account for lower latencies with other cards. Even the drivers can affect latency (doh, how did I forget that one). But the bottle neck on an Audigy, (Ill guess) will be the DSP (when refering to playback), as speed = $$$$ when refering to componants like a dsp or a CPU. And.. like a cpu. - supporting componants have to be specified to accomodate the faster speeds also. For ASIO recording latency, the ADC's start becomming more of a factor also. DACS are relaitvely easy to get fast speeds from. That is assuming were removing the rest of a PC related componants (like the PCI buss interface/cpu etc..) as possible bottlenecks. Its also possible, the audigy is not the bottle neck at all on your system. Your claim to setting APIC and it changing your latency is potential proof of this notion. Then theres the 'What triggers your samples' using a midi keyboard to trigger samples will be a definite source of latency - as MIDI is limited to 38K baud (or is it 32..??) Coupled with the fact theres overhead in that protcol also... It all stacks up when measuring latencies. A PC keyboard triggering samples will also suffer an additional latency for the same reasons. A PC based sequencer triggering of samples- add program execution time into the equation. It really isnt so simple as your question leads to one to beleive. Theres just sooo many factors involved. Piece. |
|
|
|
|
|
#5 |
|
DriverHeaven Newbie
Join Date: Mar 2003
Location: Belgium
Posts: 11
Rep Power: 0 ![]() |
The problem is that as Johnny said, there exist cards out there that can get a lower latency on the same system, bus, cpu, hdd.
I've heard this many times about the rme ( http://www.rme-audio.com/ ) sound cards as well. This basically makes it either a sound card hardware bottleneck, or a driver bottleneck. I don't know how the hardware could be the bottleneck, since if the dsp can process 1024 samples in 23 milliseconds, it should be able to process 64 samples in 1.5 milliseconds. The only thing that I can think of here is that if the dsp is faster, you are able to deliver the 64 samples right before they have to be played, and the time between receiving the samples and being ready to play them is smaller on a better card. The driver could of course also play an important part, and this is kind of proven by some people on this board saying that they get better performance with kx than with creative's drivers on the same hardware. Basically I'd also be interested to know where the bottleneck is, but I'm afraid I don't know enough about how the sound card actually works to know for sure. |
|
|
|
|
|
#6 | ||||
|
DriverHeaven Junior Member
Join Date: Sep 2005
Posts: 38
Rep Power: 0 ![]() |
Quote:
Quote:
Quote:
Quote:
Also, the current builds of KX driver are objchk (checked build aka unoptimized debug). So, there's a hope, we can imrove performance, when free build will be released. |
||||
|
|
|
|
|
#7 | |
|
Tail Razer
Join Date: Jun 2005
Location: Bernyurass, AZ - USA
Posts: 4,027
Rep Power: 50 ![]() ![]() |
Quote:
I think this is also marketing 'slight of hand' because I have serious doubts that on the data bus of a A64 will you ever see multiple commands and or arguments in 1 clock cycle. I could be wrong, but boy, have I completely fallen off the planet of understanding todays CPU's - I looked at AMDs site for the data sheets on how many clock cycles required to execute each instruction and could not find it. So I cant confirm or use it to back up my statements. But, I beleive, they can say this because of the onboard cach RAM that is needed to allow the CPU to execute 'long' instructions and still keep the Address and data buss available for DMA, interupts or pre-fetching the next CPU instruction. The core itself is only executing 1 instruction at a time. So I still beleive (until I see a datasheet from amd that proves otherwise) ALL NON MULTI CORE CPUS only actually execute 1 instruction at a time, and how many instructions require multiple clock cycles to execute - especially the 3DNow, SSE and MMX instructions - which I also have to beleive will be used extensively with audio applications. My guess those commands are what make ASIO's low latency even possible these days. Well that and clever use of DMA. Its moot anyway - becasue like you said its not much of a bottle neck - my point is tho, all these things add up to our total latency we see in real life. |
|
|
|
|
|
|
#8 | ||
|
DriverHeaven Newbie
Join Date: Mar 2003
Location: Belgium
Posts: 11
Rep Power: 0 ![]() |
Quote:
I did find this: Quote:
Besides working on multiple calculations simultaneous, I suppose that fetching data from memory can also be done at the same clock cycle as a calculation. In the end I don't think it really matters how the speed is achieved though (faster clock speed, multiple cores, better caches, ...) since due to the low latencies at that level (nanoseconds) only the number of instructions per second matter. So the question still remains: What's the difference between having to process 64 samples in 1.5 ms, or 1024 samples in 23 ms, and why can some card/driver combo's do this, and others not on the same hardware? |
||
|
|
|
|
|
#9 | |
|
DriverHeaven Junior Member
Join Date: Sep 2005
Posts: 38
Rep Power: 0 ![]() |
Quote:
All Intel cpus since pentium (1993) and AMD since K5 (1995) are superscalar, they can execute more than one instruction per cycle. Of course you're right, that there are simple and complex instructions present, and their time vary, but it's in all mass of billions of them, they rarely impact speed. Compilers and processor itself do many tricks to avoid empyness of pipelines and usually cpu is allways busy. It's ok, that execution is long, worst of all is nothing to execute (i.e. cache miss) or pipeline reload and re-execute (i.e. error in conditional prediction). |
|
|
|
|
|
|
#10 |
|
DriverHeaven Lover
Join Date: Mar 2003
Posts: 228
Rep Power: 0 ![]() |
Seems ive read somewhere that even hardware synths, hardware effects, etc add somewhere between 1 and 4 ms of delay in there processing.
When i first got kX i was amazed that i went from 40ms with Creative to 5.33ms with kX. I spent many hours trying to squeeze lowere latencies out of my P4 3Ghz, 1gb dual channel, 120Gb SATA system, Audigy PC. But at some point i felt i had to accept the 5.33 for it truly is blazing fast (instantainious to my ears). At that point i decided to stop tweaking and start making music. But dont get me wrong, If i could go to 2ms or 1ms I would be very proud of it! I will follow this thread so see if someone can find a magic combination. Toad |
|
|
|
|
|
#11 |
|
Tail Razer
Join Date: Jun 2005
Location: Bernyurass, AZ - USA
Posts: 4,027
Rep Power: 50 ![]() ![]() |
Hmm - I do stand corrrected - there are multple ALUs and FPU's now - interesting.
All my formal training was pre-pentiums - and I never heard since that they contained the additional units.... So, to be clear on my understanding - todays CPUs are receiving multiple instructions and the nedded arguments from the databuss in 1 clock cycle. - If so, Im having a hard time understanding how thats possible (especially for the advanced instructions), even with a true 64 bit data buss. I guess Ill have to do more reading... the whole 'out of order instruction execution' sounds like work arounds to gain some efficiency..?? Good discussion tho - btw. (maybe off topic but... ) 'So the question still remains: What's the difference between having to process 64 samples in 1.5 ms, or 1024 samples in 23 ms, and why can some card/driver combo's do this, and others not on the same hardware?' I obviously cant give a solid answer - but I do know programming techniques can play a HUGE factor in performance. Source Programming techniques as well as the compilers techniques and algrythms. Its my understanding Assembly is more efficient than compiled C/++. Which is my understanding what KX is written in (prolly most everything driver related)... maybe the lower latency card has a driver written in raw assembly language?? And now the whole SW patent thing, may have prevented those more efficient techniques from being implemented in fear of legal problems. |
|
|
|
|
|
#12 | |
|
HardwareHeaven Extreme Member
Join Date: Jan 2005
Posts: 5,507
Rep Power: 61 ![]() ![]() ![]() ![]() ![]() ![]() |
Quote:
As for why some cards can do it while others can not, I am not sure. Are we talking about real world values here, or marketing material (i.e. theoretical values)? I was reading the manual for a card that claims 1.5 ms, and in the troubleshooting section it stated the lowest real usuable setting was 3 ms. |
|
|
|
|
|
|
#13 | |
|
Tail Razer
Join Date: Jun 2005
Location: Bernyurass, AZ - USA
Posts: 4,027
Rep Power: 50 ![]() ![]() |
Quote:
But, thats is also part of my point - marketing ALWAYS shows best case scenario and NEVER real world application. Then there is the flst out LYING factor that you can find if you just look. For instance, all because I *may* see a setting for .5 ms, some may assume its possible - when in reality, it *may* not be. Things of this nature is design to instill confusion to sell more product. I'm not the trusting type anyway... lol |
|
|
|
|
|
|
#14 | |
|
DriverHeaven Newbie
Join Date: Mar 2003
Location: Belgium
Posts: 11
Rep Power: 0 ![]() |
Quote:
|
|
|
|
|
|
|
#15 |
|
Tail Razer
Join Date: Jun 2005
Location: Bernyurass, AZ - USA
Posts: 4,027
Rep Power: 50 ![]() ![]() |
Whats your software?
|
|
|
|
|
|
#16 | |
|
HardwareHeaven Extreme Member
Join Date: Jan 2005
Posts: 5,507
Rep Power: 61 ![]() ![]() ![]() ![]() ![]() ![]() |
Quote:
<edit> Never mind. The point is that if you are saying that the difference is the sound card itself, then the only valid head to head test would be in the same exact system, and using the same exact Asio driver implemention (which isn't likley to heppen with an Audigy and some Pro card, etc.), at the same sample rate, etc.. Last edited by Russ; Nov 21, 2005 at 11:01 PM. |
|
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|