FiringSquad: Home of the Hardcore Gamer - Games, Hardware, Reviews and NewsSubmit your own or view users' CPU overclocking results!

  
 Home   News   THE MATRIX   Deals   Hardware   Games   Features   Media   Products   Forums   FS China 
AddThis Social Bookmark Button

Home : Hardware : Video Cards : NVIDIA GF100 'Fermi' Graphics Architecture Overview
» Join the Greatest Gaming Community NOW! (It's free)

Already a member? Login
 



Random Gallery >> 
Click to view high-res Image!
Syndicate Reboot DART Vision Screenshots [5] (0)

My Crank that S#!t up entry :) (15) by ZEZgames
My Entry For The Contest. (6) by D4rk Force
Crank That S#!t Up! ENTRY :) (2) by CamoDaGreat
My First Video (3) by Stryker
My First Entry For Crank That S#!T Up! (2) by deathknight.92
Superlative Computer (6) by arvernis
CRANG That S#!T Up! (15) by ElwinRansom
[FX] 3-Screen Effect - Guide (part-3) (0) by nGAGE
[FX] 3-Screen Effect - Guide (part-2) (0) by nGAGE
The Nvidia "Crank That S#!T Up" Quiz Show, Part 2 (6) by mohawkade

More Blogs >>




NVIDIA GF100 'Fermi' Graphics Architecture Overview
January 17, 2010   Brandon Sandman Bell > [View My Other Articles]
Product Info | User Reviews | Article Images(14) | Image Gallery | Comments | Forum Thread
GF100 high-level architecture overview


We’re going to start with the high level, 36,000 foot overview of GF100. From this altitude, GF100 looks somewhat similar to previous GeForce GPUs, but trust us, the differences will be more apparent at low level. Here’s the block diagram:



If you recall previous NVIDIA architectures, you’ll note that some of the terminology from the G80/GT200 days has changed. What NVIDIA once called Streaming Processors (SPs) are now called CUDA Cores. The basic functionality is the same, only someone in marketing decided mixing “CUDA” and “Core” sounded better. There is one interesting change however: the Texture Processing Clusters (TPCs) from previous GPUs have been replaced by more capable Streaming Multiprocessors (SMs) in GF100.

The CUDA Cores (we refered to them as shaders on the previous page) are the small green squares in the block diagram above. Again, GF100 has 512 of them.

Each SM has 32 CUDA Cores, four texture units, NVIDIA’s PolyMorph engine, dedicated caches, and more. Previous architectures combined 8 CUDA Cores per SM. Texture filtering units and their L1 cache were then grouped to these SMs to make the TPC. We’ll be taking a closer look at GF100’s new SMs a little later in this article.

As you can see, the SMs are organized into groups called graphics processing clusters (GPCs). A GPC consists of four SMs and one raster engine. Here’s where you’ll see another key difference between GF100 and GT200. Whereas GT200 was limited to just one raster engine for the whole GPU, each GPC has its own raster engine. With the exception of ROP functions, it’s essentially its own self-contained GPU, hence the reason it’s called a graphics processing cluster.

GF100 has four GPCs, with each GPC containing 128 CUDA Cores. Add it all up, and you’ve got 512 cores.

Tied to all that is the memory subsystem, which consists of six 64-bit memory controllers (384-bit total), L2 cache, and 48 ROPs. The ROPs are organized into six groups of eight and are the dark blue squares in the block diagram above, with each ROP group paired up with its own memory controller.

NVIDIA can tailor the number of GPCs and memory controllers down to address different markets. To compete with the Radeon 5700 series in the mainstream market for example, you could see a cut-down GF100 derivative with just two GPCs (256 shaders total) and 128-bit memory interface.

The following chart highlights some of the key differences between GF100 and its predecessor, GT200:

GeForce GPU Features Comparison
GT200GF100
# of Transistors1.4 billion3.0 billion
CUDA Cores240512
Raster Engines14
PolyMorph Engines-16
Special Function Units (per SM)24
Texture Units8064*
ROPs3248*
Warp Schedulers (per SM)12
Total Shared Memory16KBConfigurable 48KB or 16KB
L1 Texture Cache (per quad)12KB12KB
Dedicated L1 Load/Store CacheNone16KB or 48KB
L2 Cache256KB (for texture reads only)768KB (all clients read/write)
Concurrent KernelsNoUp to 16
*Improved Clock Speed


And that’s the quick and dirty high level overview. We’re now going to go low level, taking a closer look at the functional blocks that make up GF100.




Back! Page 1     Breaking down the architecture Next!
Blog + Share: Digg Del.icio.us Reddit SU furl • More: AddThis Social Bookmark Button
Send This Article to a Friend!  
Table of Contents
  Print Entire Article  

MATRIX CONTENT » RANDOM MEDIA BLOG More Blogs >>
No ratings yet
» Please rate this
Read this Media-Blog entry!» The Nvidia "Crank That S#!T Up" Quiz Show! (21)
by mohawkade (35) Talk with this user on their Shout Box (My other blogs) Posted 18 months ago


 Latest Headlines
Square Enix's Sleeping Dogs is the new True Crime (0)
PC Game Sales for Tuesday, February 7th (0)
Skyrim mod toolkit and Steam Workshop integration arrive with free official high-resolution texture pack (4)
DiRT Showdown goes flat-out in raw gameplay videos (0)
PC Game Sales for Monday, February 6th (0)
Today's News >>
Today's Siteseeing >>


 Table of Contents


FiringSquad is powered by... Back to Top Site MapContact UsAdvertise With Us Privacy StatementAbout Us  
News RSSSiteseeing RSSArticle RSS   © 1998-2012 FS Media, Inc. All Rights Reserved