Visiontek have just released their XTASY series X800 based boards and today ill be taking an indepth look at their X800pro card. This isnt our first look at the X800pro board, my colleague Stuart gave a X800pro board a good going over in a head to head against the 6800 Ultra, if you havent seen this review head over here to see his comparisons.


The XTASY X800pro comes in a tasteful white box with some futuristic robot artwork a par for the course concept but quite tastefully done especially when compared to some of the tasteless far eastern designs and layouts.

Opening the box reveals the standard set of cables and connectors with the board itself - Visiontek have supplied an offical reference ATI driver CD with this package but when the cards hit the market they will have their own CDs. Unfortunately there are no games bundled with this package but due to the competitive pricing of these cards it is understandable.

ABOVE: The X800pro (top) and the X800XT (bottom) - front and rear views

The card is a reference ATI board, if you look at the front view picture you can see the video connector just below the molex has been removed and the more astute viewers can also see from the rear pictures that the Rage theatre chip is also removed. This was not the case with the PRO reference board ATI Europe supplied us as you can see here. This is more than likely a cost cutting option that will be applied to all pro cards across the board.

Graphics Technology has been constantly evolving, in 2001 we had fixed function T&L DDR, 2002 brought GDDR onto the table with programmable shaders and last year graphical innovation continued with cinematic rendering and floating point / GDDR II. This year is no exception with ATI's creative ability to push hardware with innovative technologies - 2004 is ATIs year for "high definition gaming" with the arrival of 3DC compression, temporal AA powered by the low heat, high speed GDDRIII. Ill be delving into these technologies in much greater detail later in this article.

ATIs goals are to deliver hardware capable of giving games a fast, immersive stable platform coupled with the highest image quality possible. The new range of hardware strives to achieve these goals and today thats what we will be testing.

The Radeon X800 series offers 3DC support, ultra fast GDDR3 memory on a 256 bit memory interface, AGP8x with fully Microsoft Direct X 9.0 and Open GL 1.5 support, the boards are single slot cooling with VGA, DVI and VIVO display I/O.

To demonstrate the power of the new X800 cards ATI have worked with a production company called Rhino FX to product a technology demo called "Double Cross" staring a futuristic style Lara Croft named "Ruby". This demo pushes even the XT800 cards to their full with 400,000 polygons being rendered in each scene. This demo uses multiple light sources as well as many movie based special effects like depth of field to give incredible visuals. To give an indication of relative performance ATI state this runs 2-3x as fast on the X800 series cards as the last generation 9800XT boards.

Further into the review there will be game tests comparing the new cards to the older 9800XT, and before that im going to take you through some of the ATI technologies in detail, but on this page ill be skimming over some of the new terminologies for those of you who arent that interested to read page after page on the technologies.

The X800 parallel pixel processing can handle 80 concurrent shader operations per clock with 192 gigaflops and a 8.8 gigapixel per second fill rate, moving onto the 6 vertex shader pipelines this can cope with 750 million vertices per second, twice the vertex processing performance of the 9800 series, this extra power enables scene animating effects like soft shadows and blend shapes.

HyperZ HD supports HD resolutions including hierarchical Z with colour and Z compression, early Z, Z compression and fast Z clear.

Normal maps are specialised textures composed of light vectors rather than colours, they can make low poly objects appear more detailed than they are, created from a super high resolution model of an object, the problem with this is they can consume significant portions of video memory. This is were 3DC comes into play it solves the massive drain of normal maps, it also improves image quality without sacrificing performance, it accentuates and accelerates the subtle nuances that made characters look more realistic with a 4:1 compression ratio it optimises memory and facilitates graceful scaling of artwork to previous generation hardware. An example of 3DC in action is seen in the "Ruby - Double Cross" demo in which over 100 megabytes of data is saved due to 3DC compression.

Smartshader HD can handle 1536 instructions and has a flexible shader architecture with a new high performance compiler.

Smoothvision HD has its own detailed section later for those interested in learning more about this, for those of you who couldnt care less, ATI have basically improved the Anti Aliasing along with performance, Temporal AA, centroid sample anti aliasing and programmable sample patterns.

Videoshader HD which ATI class as the best in the industry, high quality video processing and acceleration, real time programmable video effects, video post processing and filtering which also handles MPEG 1,2,4 encoding and decoding acceleration, fullstream video deblocking, WMV9 decode acceleration, high quality resolution scaling, adaptive per pixel deinterlacing, motion compensation, noise removal filtering and display rotation.

ATI have also improved upon the design of the 9800XT by optimising power consumption and heat output, this will ensure good market presense in the small form factor sector.

Avg. continuous power draw while looping 3DMark03: X800 Pro - 49W, X800 XT - 65W
Avg. continuous power draw while looping 3DMark01: X800 Pro - 58W, X800 XT - 76W

The X800 series uses a quad pumped 256 bit interleaved GDDR3 interface with 50% higher memory clocks compared to the 9800XT.

When ATI where designing the X800 it was important they improved key areas to remain competitive - the pixel shader unit can handle up to 5 floating point shader instructions per clock cycle - two three component vector ops, two scalar ops and one texture address op, this shader architecture should make it easy for the complier in the driver to determine the optimal instruction ordering. Its just just how many ALUs you use, but how busy you can keep them. Another important areas is avoiding stalls to high latency texture look up dependencies with the added benefit being no need to drop to partial precision to increase thread count. Regarding Smartshader Pixel shaders temporary registers have been increased from 12 to 32 with an added facing register. The hardware is designed to maintain full performance regardless of the number of registers used. The maximum shader instruction count has also been increased from 160 to 1536 - 512 vector, 512 scalar and 512 texture.

F-buffer now has better memory management and has efficient handling of multipass shaders - only affected pixels are processed each pass rather than the entire frame.

Hyper Z supports buffering at all resolutions including HD 1900x1080 and 1600x1200, this rejects up to 25% occluded pixels per clock for minimal overdraw and with up to 32 Z stencil ops per clock with significant performance gain at high resolutions, this scales with the number of pixel pipelines.

Shader Optimisations

Optimisation is an act of balancing shader length and complexity with performance using HLSL to concentrate efforts on algorithmic optimisations, its worth noting lower level optimisations specific for shader processors is not really necessary as the ATI runtime optimiser in the driver does all the heavy lifting.

Top to bottom GPU scalability is to focus on this early in game engine development - using the right shader for the job is imperative, there is no need to use 2.0 shader when 1.4 will suffice, also allowing scaling back to lower shader models on lower end GPUS will also come into play. Not everyone has high end X800 so features like normal maps, depth of field effects, motion blur effects and high dynamic range lighting can be dropped.

With the advent of anti-aliasing alone can mean 70 mb or more of frame buffer usage, as high resolution texture and normal maps are now commonplace, managing the size of normal maps is becoming a serious issue - this is were 3DC normal map compression comes into action providing 4:1 space saving. Double cross for example saves over 100 meg with 3DC technology.

A key element to the speed of the X800 is down to the memory used, GDDR3 is the new kid on the block. ATI have been active in defining DRAM technology and have invested in JEDEC.

Joe Macri, Director of Technology ATI Santa Clara is the chairman of JC42.3 committee for DRAM, they led the JEDEC definition of DDRII and they also led the industry definition of GDDR3 and its subsequent JEDEC standardisation - Joe is also the chairman of the JEDEC definition on GDDR4. ATIs partners on the GGDR3 are Hynix, Micron, Infineon, Elpida and Samsung.

GGDR3 is not GDDR2 - the I/O is POD-18 compared to with GDDR2s SSTl-18 with ODT. The clocking interface has improved from differential DQS/DQS to unidirectional DQS. Incredible speeds in excess of 1000mhz are possible (the X800XT is 560/1120mhz). POD-pseudo open dram - voltage based open dram vs current based (reduced area to implement driver). 1.8v with 1.0v swing (vref at 1.26v vs 0/9v on DDR2).

GDDR3 is being hailed as the overclockers dream, lower power obviously means less heat which aids in overclocking, once ram reaches 105c it starts leaking data. GDDR3 also auto calibrates, the I/O is constantly being calibrated to provide consistent characteristics across temperate and voltage settings.


Next: X800 Architecture