 |
Advertisement |
 |
 |
This article uses custom javascript
to display high resolution images |
 |
 |
|
R520
Architecture
Pipes/Shader/units
and the like:
First of all,
let’s talk about the new X1800 3D architecture.
It’s a 16 Pixel Shader processor chip, featuring
an Ultra – Threading Dispatch processor and
4 Shader cores. There are also 8 Vertex Shader processors,
16 Texture Address units, 16 Texture units and 16
Render Back-to-Ends. You have probably noticed the
absence of the word “Pipeline” from all
of that, and it is intentional – ATI believes
that it’s no longer useful to talk about how
many pipelines a part has, since its efficiency is
what determines its performance, not the number of
pipelines. So, if we really wanted to describe the
X1800 in “old” terms, we would say that
it is a 16 pipeline part. That should put some of
the 32/24/32 with 8 broken/32 with 16 broken pipeline
myths that have been circulating the past months to
rest once and for all. There is also the X1600 series,
which has 12 Pixel shader processors, and the X1300
series with only 4. As mentioned already, the new
series of cards is fully Shader Model 3.0 compliant,
and in the following paragraphs we will see the features
ATI believes are the key to this new architecture.
One of the
buzz words that everyone’s been hearing when
first Shader Model 3.0 hardware was announced was
Dynamic Flow Control – ATI fully supports it,
with Branching (IF…THEN…ELSE statements),
Looping and Subroutines. That allows the X1000 series
of cards to execute different paths through the same
shader on adjacent pixels. This architecture also
gives the developers many opportunities for optimizations,
since early parts of a shader which that don’t
need to be executed can be skipped on the fly, and
plus, developers can combine many related shaders
into one uber shader; a method that has the advantage
of avoiding state change overhead (read: speed penalty)
when going from one shader to the next. ATI insists
that its dynamic flow control implementation is not
a “tick-box” technology, but it actually
solves real problems and helps with performance and
detail management.
For dynamic
flow control acceleration, ATI uses a new technique
which is called “Ultra-Threading”. As
we said, the key word that characterises every aspect
of the new X1000 series is “efficiency”.
What ATI wanted to do with this generation of graphics
cards was to always keep the GPU busy, no matter what;
and that gives an actual speed increase, since the
GPU uses various techniques so as not to sit idle,
waiting for things to happen (which is one of the
most typical reasons that slows down a GPU these days).
So what is Ultra – Threading? ATI decided to
use hundreds of simultaneous threads across multiple
cores. Each thread can perform up to 6 different shader
instructions on 4 pixels per clock cycle, and the
analogy is 16 pixels per thread (on the Radeon X1800).
The reason for using many threads is really simple,
if you stop and think about it. When you have many
threads, you can essentially do many things at once,
since obviously the large number of threads promotes
parallelism. A single thread architecture obviously
has a disadvantage here, since if you want to do several
things, you have to do them one at a time (which obviously
takes much more time than doing these all at once).
You also have Fast Branch Execution thrown into the
mix, with dedicated units that handle flow control
without giving any overhead to the ALU (Arithmetic
Logical Unit) and large Multi Ported registry arrays,
which results in fast thread switching.
|
|
Another key
aspect, as far as ATI is concerned, is 128-bit Floating
Point processing. ATI wants to make no sacrifices
here – all shader calculations will use 128-bit
floating point precision, and that will run at full
speed, without the need to drop the precision anymore.
This has implications beyond the scope of traditional
GPU usage, as we will see later in this article.
Last but not least, the Vertex Engine has been upgraded
as well, in order to provide support for SM3.0.
Dynamic flow control is there, and so is support
for much longer shaders (1024 to be exact, although
with dynamic flow control you can theoretically
have billions of instructions in there).
ATI meant
for its new architecture to be flexible; so in order
to do that, it has de-coupled the components of
the rendering pipeline, which allows it to create
solutions based on the needs of the market. The
following table shows exactly what each card series
is comprised of:

New
and enhanced memory controller
With the
X1000 series, ATI went back to the drawing board
in order to redesign its memory controller, in order
to boost both speed and efficiency, be software
upgradeable and future-proof, and of course simplify
its design so that the whole chip could operate
at faster clocks. The result was the new 512-bit
ring bus memory controller, which ATI claims can
do all that it was set to do, and much more. Its
features include support for GDDR3 and its successor,
GDDR4, a much simpler and efficient design that
allows extreme memory clock scaling, two 256-bit
ring buses that run in opposite directions.

to minimize
latency, with a crossbar switch that handles memory
writes, and one ring stop per pair of memory channels,
linked directly to the memory interface. Its design
may look somewhat complicated, but what it does
is pretty easy to understand. Suppose that a client
makes a memory request for something. The memory
controller determines which DRAM will serve the
client, and sends the request to it. The DRAM will
answer the request, which will travel via the Ring
bus, until it reaches the closest Ring Stop to the
client. After that, the answer will return to the
client. The advantage of this new memory controller
is that it’s smart enough to determine which
requests to serve first, and it prioritises them
accordingly. Furthermore, since the memory parameters
can be changed via software, ATI can finetune its
performance with driver updates, since the client
weights and memory request efficiency is now programmable
on a per-application basis.
|
|
 |
|
 |
|