HardwareHeaven.com

HardwareHeaven.com

Looking for the skin chooser?
 
 
  • Home

  • Reviews

  • Articles

  • News

  • Tools

  • GamingHeaven

  • Forums

  • Network

 

Go Back   HardwareHeaven.com > Forums > News > Other Tech News


Reply
 
LinkBack Thread Tools
Old Jan 25, 2004, 10:15 PM   #1
News Guru
 
Join Date: May 2002
Location: UK
Posts: 1,400
Rep Power: 0
NewsFactory is on a distinguished road

Are 64-bit Binaries Really Slower than 32-bit Binaries?

OSNews have thrown up a new guide called Are 64-bit Binaries Really Slower than 32-bit Binaries? Read on and find out!

When running tests, installing operating systems, and compiling software for my Ultra 5, I came to the stunning realization that hey, this system is 64-bit, and all of the operating systems I installed on this Ultra 5 (can) run in 64-bit mode.

I wondered if it would be best to compile my applications in 32-bit mode or 64-bit mode. The modern dogma is that 32-bit applications are faster, and that 64-bit imposes a performance penalty. Time and time again I found people making the assertion that 64-bit binaries were slower, but I found no benchmarks to back that up. It seemed it could be another case of rumor taken as fact.

So I decided to run a few of my own tests to see if indeed 64-bit binaries ran slower than 32-bit binaries, and what the actual performance disparity would ultimately be.


Are 64-bit Binaries Really Slower than 32-bit Binaries?
NewsFactory is offline   Reply With Quote


Old Jan 26, 2004, 01:14 PM   #2
Flash Banner Hater
 
Join Date: Jun 2002
Location: UK
Posts: 3,228
Rep Power: 63
Matth is just super!Matth is just super!Matth is just super!Matth is just super!Matth is just super!Matth is just super!Matth is just super!Matth is just super!
System Specs

OUCH! - not a good showing for 64 bit there!


Mind you, compiler optimization can only do so much, and if the application does not make significant use of 64 bit tweakable code, it probably just makes the instuctions larger.


If youv'e ever looked at the AMD MEMCPY optimized code (not that I fully understand it), you'll see what hand coded optimization can do, and unless a compiler has some preset substitutions, it's never going to be able to make that kind of jump.

The AMD MEMCPY has several steps, including loop unrolling with load/store clustering, using 8x 8 byte MMX registers for temporary store (cache-line sized load/store clustering), and finally 3Dnow based streamed uncahed memory writes for >cache sizes.
Matth is offline   Reply With Quote
Reply

Bookmarks

Thread Tools