DriverHeaven Forums

Advertisement
 



This article uses custom javascript to display high resolution images

 

R520 Architecture

Pipes/Shader/units and the like:

First of all, let’s talk about the new X1800 3D architecture. It’s a 16 Pixel Shader processor chip, featuring an Ultra – Threading Dispatch processor and 4 Shader cores. There are also 8 Vertex Shader processors, 16 Texture Address units, 16 Texture units and 16 Render Back-to-Ends. You have probably noticed the absence of the word “Pipeline” from all of that, and it is intentional – ATI believes that it’s no longer useful to talk about how many pipelines a part has, since its efficiency is what determines its performance, not the number of pipelines. So, if we really wanted to describe the X1800 in “old” terms, we would say that it is a 16 pipeline part. That should put some of the 32/24/32 with 8 broken/32 with 16 broken pipeline myths that have been circulating the past months to rest once and for all. There is also the X1600 series, which has 12 Pixel shader processors, and the X1300 series with only 4. As mentioned already, the new series of cards is fully Shader Model 3.0 compliant, and in the following paragraphs we will see the features ATI believes are the key to this new architecture.

One of the buzz words that everyone’s been hearing when first Shader Model 3.0 hardware was announced was Dynamic Flow Control – ATI fully supports it, with Branching (IF…THEN…ELSE statements), Looping and Subroutines. That allows the X1000 series of cards to execute different paths through the same shader on adjacent pixels. This architecture also gives the developers many opportunities for optimizations, since early parts of a shader which that don’t need to be executed can be skipped on the fly, and plus, developers can combine many related shaders into one uber shader; a method that has the advantage of avoiding state change overhead (read: speed penalty) when going from one shader to the next. ATI insists that its dynamic flow control implementation is not a “tick-box” technology, but it actually solves real problems and helps with performance and detail management.

For dynamic flow control acceleration, ATI uses a new technique which is called “Ultra-Threading”. As we said, the key word that characterises every aspect of the new X1000 series is “efficiency”. What ATI wanted to do with this generation of graphics cards was to always keep the GPU busy, no matter what; and that gives an actual speed increase, since the GPU uses various techniques so as not to sit idle, waiting for things to happen (which is one of the most typical reasons that slows down a GPU these days). So what is Ultra – Threading? ATI decided to use hundreds of simultaneous threads across multiple cores. Each thread can perform up to 6 different shader instructions on 4 pixels per clock cycle, and the analogy is 16 pixels per thread (on the Radeon X1800). The reason for using many threads is really simple, if you stop and think about it. When you have many threads, you can essentially do many things at once, since obviously the large number of threads promotes parallelism. A single thread architecture obviously has a disadvantage here, since if you want to do several things, you have to do them one at a time (which obviously takes much more time than doing these all at once). You also have Fast Branch Execution thrown into the mix, with dedicated units that handle flow control without giving any overhead to the ALU (Arithmetic Logical Unit) and large Multi Ported registry arrays, which results in fast thread switching.

 

Another key aspect, as far as ATI is concerned, is 128-bit Floating Point processing. ATI wants to make no sacrifices here – all shader calculations will use 128-bit floating point precision, and that will run at full speed, without the need to drop the precision anymore. This has implications beyond the scope of traditional GPU usage, as we will see later in this article. Last but not least, the Vertex Engine has been upgraded as well, in order to provide support for SM3.0. Dynamic flow control is there, and so is support for much longer shaders (1024 to be exact, although with dynamic flow control you can theoretically have billions of instructions in there).

ATI meant for its new architecture to be flexible; so in order to do that, it has de-coupled the components of the rendering pipeline, which allows it to create solutions based on the needs of the market. The following table shows exactly what each card series is comprised of:

 

New and enhanced memory controller

With the X1000 series, ATI went back to the drawing board in order to redesign its memory controller, in order to boost both speed and efficiency, be software upgradeable and future-proof, and of course simplify its design so that the whole chip could operate at faster clocks. The result was the new 512-bit ring bus memory controller, which ATI claims can do all that it was set to do, and much more. Its features include support for GDDR3 and its successor, GDDR4, a much simpler and efficient design that allows extreme memory clock scaling, two 256-bit ring buses that run in opposite directions.

to minimize latency, with a crossbar switch that handles memory writes, and one ring stop per pair of memory channels, linked directly to the memory interface. Its design may look somewhat complicated, but what it does is pretty easy to understand. Suppose that a client makes a memory request for something. The memory controller determines which DRAM will serve the client, and sends the request to it. The DRAM will answer the request, which will travel via the Ring bus, until it reaches the closest Ring Stop to the client. After that, the answer will return to the client. The advantage of this new memory controller is that it’s smart enough to determine which requests to serve first, and it prioritises them accordingly. Furthermore, since the memory parameters can be changed via software, ATI can finetune its performance with driver updates, since the client weights and memory request efficiency is now programmable on a per-application basis.

 


Previous: Introduction

 


Navigation:
Visit DriverHeaven

 

 

Graphics developed by: eXtremepixels

Copyright ©2002-2005 DriverHeaven.net, All rights reserved.

PureHeaven design based on Tren_z adapted by craig5320. Additional artwork/DH logo by Zardon.
DH logo & Artwork may NOT be used without express permission of the Administration Team, protected under Copyright Law.

Contact Us - DriverHeaven.net - Top