Today NVIDIA is launching two video cards, the first of the next generation DirectX 10 compliant cards to hit the market, the GeForce 8800GTX and the GeForce 8800GTS. The companion review to this piece, the ASUS 8800GTX takes the flagship card through its paces. This review will cover the EVGA 8800GTS to give as complete a picture of the new cards as possible in two separate reviews.
NVIDIA is now basically the sole independent provider of graphics chips in the world, as companies like S3 Matrox and VIA are not a factor in the graphics market. ATI is now part of AMD as the merger was recently completed. As to whether we will continue to see graphics cards from AMD, that is a probability, but not this year. NVIDIA has annual revenues exceeding $2 billion US dollars, and is growing into other markets with the recent purchase of PortalView expanding their company.
EVGA is one of the top NVIDIA partners here in the United States. The company has a commitment to its customers second to none, with support that features a Lifetime Warranty on its retail graphics cards, a Step-Up program that no other company offers anything comparable to, and generally the best performance of the cards with the same chip.
The first thing that struck me about both the 8800GTS and the 8800GTX cards was the cooling system and length of the card. The card is a little shorter than the ASUS 8800GTX and that’s a good thing. Fitting the card in the case we use for video card testing was easy and without issue, the 8800GTX required inserting the card sideways. The length of the card is about the same as a X1950XTX card from ATI.
The cooling fan on the 8800GTS is the same as that found on the ASUS 8800GTX a 29-fin fan that blows air through the funnel that is the heatsink on the card out through a grill on the card’s back plate. The heatsink covers the memory chips, keeping all components on the card cool in operation.
The 8800GTS is a PCI Express native card. Its unlikely NVIDIA will release the 8800GTS or GTX to the AGP platform which is dead in the water due to the advent of PCI Express. The 8800GTS has a single PCI Express power plug. Note that the 8800GTX has 2 PCI Express power plugs, the first commercially available video card to do so.
The memory configuration of the 8800GTS is quite different. 10 512kilobit memory chips are present, giving a 320-bit memory bus and 640MB of memory. The 8800GTS and 8800GTX have moved away from multiples of four memory chips and the 256-bit memory bus. The clock speed of the EVGA 8800GTS memory is 1.6GHz, meaning with a 320-bit memory bus that the memory bandwidth on the card is a theoretical 64GB. Once we get to the performance area, you’ll notice areas where more memory bandwidth would be better.
The GTS from eVGA has two dual-link DVI connectors. If you have two of those wonderful Dell 3007 monitors, the 8800GTS and 8800GTX will be able to game at insane resolutions with excellent image quality options. The dual-link DVI connectors can drive two monitors to resolutions of up to 2560×1600, and with NV’s new cards it might even be useful resolutions.
- 2x AA
- 4x AA
- Coverage Sample Anti-Aliasing 8x
- 8xQ AA
- Coverage Sample Anti-Aliasing 16x
- 16xQ anti-aliasing
- near-perfect angle-independent anisotropic filtering
- DirectX 10
- Pixel Shader 4
- Vertex Shader 4
- 96 Streaming processors
- 20 ROPs
- 320-bit memory interface
- 1.2GHz Shader core speed
- 500MHz core speed
- 1.6GHz memory speed
The 8800GTS is a modified version of the 8800GTX. The first difference is that there are 96 Stream processors instead of 128. The second main difference is there is a 320-bit memory bus (5×64-bit partitions) instead of 384-bit on the GeForce 8800 GTX. There are also 20 ROPs instead of the 24 on the 8800GTX. In many ways the GeForce 8800GTS is a slimmed down version of its bigger brother. EVGA 8800GTS is core clocked at 500MHz, the same as the reference speed. As we move further into 2006/7 I would imagine an ACS version, and perhaps other versions to follow.
Traditional Graphics Architecture
The traditional graphics card architecture starts with the CPU which feeds the graphics chip in the vertex stage. Early video cards (pre-DX7) had the CPU handle the transformation and lighting of the vertices. DX 7 cards had fixed-function T+L hardware that did limited shading operations, like. With the advent of the GeForce 3, programmable pixel shaders this part of the graphics pipeline. The RADEON 9700 from ATI brought programmable pixel shaders 2.0, and the GeForce 6800 brought dynamic flow control, geometry instancing and longer shader programs in Shader Model 3.0
The next step in the pipeline is the Setup stage, where vertices are made into shapes like triangles, lines, or points called primitives. These primitives are then converted in the rasterization stage into pixel fragments. The next step of the classic graphics pipeline is the ROP stage. In this stage the fragments are subjected to shading, Z-Culling, frame-buffering and anti-aliasing. Here’s where the term fragment shading is more accurately used than pixel shading, because they are not called pixels until the fragments are written into the framebuffer but pixel shader is the most common terminology. The next step of the classic graphics pipeline is the memory stage where the final outputted pixel is sent to be displayed.
Early graphics cards didn’t have to worry about shading pixels, as they only did 2D. The advent of 3D cards like the Voodoo Graphics card allowed the card to do flat shading and simple texturing to the pixel. The advent of the TNT allowed two textures a clock to be applied to each pixel. The GeForce 256 card was the first commercially successful 3D card to be able to perform some limited shading with its NSR (NVIDIA Shader Rasterizer) such as bump mapping (emboss, Dot Product 3), shadow volumes, volumetric explosion, elevation maps, vertex blending, waves, specular lighting on a per-pixel basis.
The advent of the GeForce3 brought limited programmable pixel shaders into the limelight, as it was the basis for the graphics chip on the Xbox as well as a PC graphics chip. The GeForce DDR did the shading with fixed-function hardware, that couldn’t be used for other features. The GeForce 3 could do up to 4 texture address operations for fetching texels (4 texels in a pass), and 8 texture blending operations per pass, totaling 12 pixel shader operations per program. The RADEON 8500 was capable of 22 pixel shader operations in a single pass. The next step was the RADEON 9700 Pro and Shader Model 2.0. The advantages SM 2.0 include up to 96 Pixel Shader operations per pass (65,536 operations in SM 2.0+ on the GeForce 5 series but that’s another story altogether), introduced such features as 96-bit precision color, HDR, and many other features. Vertex Shader 2.0 introduced long VS programs (65,535+ via dynamic branching) and other features
NVIDIA decided to go with a Unified Shader Architecture with the G80 chip. The benefits of Unified versus discrete Pixel and Vertex Shader units is that the unified shader can be allocated to either Pixel, Vertex, Geometry or Physics shaders as needed by the application. In a scene where more Vertex Shading is required, the G80 can allocate more of its Stream Processors to Vertex Shading.
The 8800GTS has 96 individual Stream processors. Each processor is capable of being used for pixel, vertex, geometry or physics operations. The classical graphics card had many pipeline stages for each of the major stages. The pixel shader alone on the GeForce 7 required over 200 pipeline stages. The G80 architecture streamlines the number of stages because of the nature of the Unified Shader, meaning that the data will move from the shader core for processing to the top of the shader core to continue processing until all shader operations are done and the fragment is sent to the ROP to write to the memory. Each Stream Processor is fully decoupled, scalar can dual-issue a MUL and a MAD (Multiply and Multiply Add) and supports IEEE754 floating-point precision.
Each Stream Processor is clocked at 1.2 GHz. You heard that right, 1.2 GHz. The GeForce 8800GTS has three different clock domains, the core clock, the memory clock and the shader clock. The 8800GTS does approximately 390 Gigaflops of shader power, but is much more efficient with its floating point operations than a traditional architecture with instructional issue limitations like the x1950 which can only issue 3+1 instructions in a clock.
In gaming situations, there generally has been more need for pixels than vertices, hence the classical graphics card came with a higher number of pixel shader units than vertex shader units. There are situations where the application needs more vertices than pixels, and the graphics card is limited to the vertex speed. Take for example a GeForce 7800GTX with 24 Pixel Shader units and 8 Vertex Shader Units. There are instances where the card will only go as fast as the vertices are fed.
The Unified Shader architecture on the G80 can allocate the units in a different way. If the application calls for it, it can allocate 24 Vertex Shader units to vertices and 8 Pixel Shader units for pixel shading or geometry shading or physics shading which is new with Microsoft’s DirectX 10 coming with Microsoft’s Vista Operating System. The exact number of shader units applied to each application can be dynamically changed, allowing efficient use of the resources of the card compared to a 48:8 (R580) or 24:8 (G70) ratio of earlier video cards.
The new image quality enhancing features have been lumped together with a new NVIDIA trademark called Lumenex. Lumenex offers up to 16XQ anti-aliasing, 16x anisotropic filtering that is almost perfect and of higher quality than ATI’s AF, and support for up to 128-bit HDR. Image quality is important to me as a reviewer, and the 8800GTS and GTX deliver the highest possible IQ on the market today.
Here’s a scene from Oblivion magnified. Note that no AA shows jaggies throughout the scene. 2x AA shows a limited smoothing of the jaggies. 4x AA shows much cleaner lines for the chapel. 8x AA looks like 4x AA but look closely at the lines, see how they blur. 8xS looks about how 8 samples should look, looking at 2x 4x 8xQ you can see the progression from step to step. 16x looks like 4x+cleaner lines on the chapel. 16xQ looks the cleanest of all the modes, but is also the most expensive in terms of performance.
NVIDIA totally redid their Anisotropic Filtering with the 8800 series cards. GeForce 3 cards had mostly angle-independent anisotropic filtering. What this means is that the GeForce 3 and 4 had the best looking anisotropic filtering. NVIDIA decided to change their AF to a more angle-dependent algorithm with the GeForce 5 series. This caused the AF to look worse than the competition by default. With the 8800, the AF pendulum has swung back in NVIDIA’s favor. Here’s a series of pictures of the AFTester. Note how NV’s 16x AF looks almost perfect, ATI’s 16x AF has a bunch of spikes.
Texturing and Filtering
The GeForce 8800GT S has fully decoupled texturing units from the stream processors. The 8800GTS can deliver up to 40 pixels per clock of raw texture filtering horsepower, (versus the 24 on the GeForce 7900 GTX), 20 Pixels per clock of texture addressing, 20 pixels per clock of 2x anisotropic filtering, and 20 bilinear filtered pixels per clock. What this means is that bilinear anisotropic filtering is almost free on the GeForce 8 (nothing is free in 3D). FP16 bilinear filtering output is 20 pixels per clock, and FP16 2:1 AF is done at 10 pixels per clock.
The texture units are clocked at 500 MHz on the GTS. With 20 pixels per clock of bilinear filtered texels that gives a texel fill rate of 10 Gigatexels a second which is the standard way to derive texel fill rate. However, when 2:1 bilinear anisotropic filtering is applied, two bilinear texels are used, giving the card an effective 20 gigatexel fill rate. Ugh, NVIDIA’s math isn’t important to me, as I like to deal with real numbers.
The GeForce 8800GTS can output 20 pixels (ROPs) in a single clock with color and Z-processing. The 20 ROPs are divided into 5 partitions of 4 (16 subpixels samples). There is a new mode called Z-only processing that allows 160 samples per clock. If 4X MSAA is applied then 40 Pixels Z-only processing is possible. Again, ugh, in terms of pure pixel fillrate, the 8800GTS can output 20×500 MHz=10 Gigapixels a second theoretical, more than enough to fill a 2560×1600 screen many times.
The memory architecture on the 8800GTS is split into 5 partitions of 64-bit memory. This matches well with the 20 outputted pixels per clock. Graphics cards have been stuck on the 256-bit memory bus since the RADEON 9700 Pro and the GeForce 5900 series. The 8800GTS is the first card on the commercial market to sport a 320-bit memory bus. EVGA clocked the memory on their 8800GTS at 800MHz; 1.6GHz effective, the same as the reference clock this provides the EVGA card with up to 51.2GB of memory bandwidth.
Shader Model 4- Microsoft Vista and DirectX 10
The GeForce 8800 GTX and GTS are the first video cards on the market to fully support DirectX 10 Shader Model 4 in Microsoft Vista, as there will not be a DirectX 10 for Windows XP. Microsoft will launch Vista early next year and the 8800GTX will be ready for it. DirectX 10 is a new API for Microsoft that is brand new and radically different than the last iteration DirectX 9.
Early versions of DirectX had to check to see what hardware was capable of a feature like Pixel Shaders, or Hardware Transformation and Lighting, or other features, as the first iteration of DirectX didn’t support these features and the hardware was limited at the time to simple texturing. To determine if hardware supported a feature, DirectX checked the hardware’s “Capability bits” or Cap bits. DirectX 10 has gotten rid of caps bits, shortening the process.
DirectX 10 delivers new features including a texture array, predicated draw, and stream out. Texture arrays allow the graphics card to process up to 512 textures in an array at one time. Previously this was done on the processor, thus this frees up the CPU for other tasks. Predicated draw is what was known as an occlusion query in earlier DirectX models. Basically, if an object overlaps another in a scene the area that is covered by the other object is not rendered, saving power.
State Objects have been redefined for DirectX 10. In DirectX 9, state objects were defined for virtually every stage of the graphics pipeline. In DirectX 10, there are five state objects: InputLayOut (vertex buffer layout), Sampler, Rasterizer, DepthStencil and Blend. These take the place of various pipeline stages, allowing one draw call when many were required before.
Constants are predefined values used as parameters in shader programs. For example, the number of lights in a scene, along with their intensity color and position are all defined by constants. As a scene changes the constants need updating. Constant Buffers allow for up to 4096 constants to be stored in a buffer that can be updated in one call saving computation time.
Here’s a chart of the resources in DirectX 9 and 10
|Resources||Directx 9||DirectX 10|
|Maximum Texture Size||4048×4048||8096×8096|
DirectX 10 hardware can also do Physics Shading. Physics is normally done on the CPU. With most CPUs, the number of onscreen objects bouncing around, the number of particles being displayed on the screen etc is limited by the CPU. The GeForce 8800 can do Physics Shading to alleviate the burden on the CPU and move it to the graphics card. The Havok Physics engine and the Physics Shader in DirectX 10 allow the physics to be done on the graphics card and the CPU together.
- EVGA 8800GTS video card
- Dark Messiah Might and Magic
- 2 DVI-I to D-Sub adapters
- HDTV Out cable
- User’s Manual
- Driver CD
- EVGA Stickers
- Lifetime Warranty
- Toll-Free Technical Support 24/7
EVGA has teamed with Ubisoft to offer Ubisoft’s latest and greatest game, Dark Messiah Might and Magic with both their 8800GTS and their 8800GTX cards. Might and Magic is a long-running series of games starting from the earliest days of computers. Dark Messiah uses Pixel Shaders 2.0 and Vertex Shader 2.0 programs to show off the monsters, effects and environment of the game.
The hardware bundle is what you would expect from a company like EVGA. Two DVI-I to D-sub adapters means you won’t need to purchase an adapter to run with two CRTs. Of course the CRT is virtually non-existent on the market today, but it’s great that two are included. An HDTV cable is included to allow you to attach the video card to an HDTV. A TV-Out cable is also included.
NVIDIA has done away with the traditional Control Panel with their 96.xx series drivers that are being released with the 8800GTS and 8800GTX cards. The new CP was introduced with the 7900 series last year, but the option was always there to use the old CP. With the advent of Windows Vista, the new interface matches the Vista model pretty closely.
EVGA Support is second to none in the video card industry and this continues with their GeForce 8 series of cards. They offer a Lifetime Warranty for all of their retail cards with a AR, AX, BR, BX, DR, DX, FR, FX, SG, SL or S2 prefixes. That means if you have a defective product they will replace it. The second facet of their support is the 24/7 Telephone support that’s Toll-Free. The last aspect of their support is their Trade-Up program. The Trade-Up program allows you to turn in your graphics card for a new one by paying the difference within 90 days of purchase.
Test Setup and Performance
Our test platform is slightly different. I decided to move to the fastest AMD processor available on the market, the FX-62 instead of the 5000+ we used previously. The performance metric of the FX-62 with its dual cores makes it a good choice. One other thing about the performance section, I’ve decided to show performance of the new 16x and 16XQ anti-aliasing modes. Most people will want to use 8XQ or 16x AA with the 8800GTS, as 16xQ mode is limited.
- AMD Athlon FX-62 CPU
- 2 GB Crucial Ballistix DDR2 667MHz memory
- EVGA 8800GTS video card running 96.41 Forceware drivers
- ASUS M2N32-SLI motherboard
- ASUS DVD-ROM
- Ultra X-Finity 600W PSU
- Windows XP with SP2
- DirectX 9.0c
- Doom 3 HQ settings Timedemo 1 1024×768 1280×1024 1600×1200 no AA no AF 4x AA 16x AF
- Quake4 1024×768 HQ settings Custom Timedemo 1280×1024, 1600×1200 no AA no AF
- COD2 Max settings 1280×1024, 1600×1200, 1920×1200, 2048×1536 no AA no AF, 4x AA 16x AF
- F.E.A.R. Max settings 1280×960, 1600×1200, 2048×1536 no AA no AF, 4x AA 16x AF
- Oblivion Max settings 1280×1024, 1600×1200, 1920×1200, 2048×1536 no AA no F, 4x AA 16x AF
- 3Dmark06 1280×1024, 1600×1200, 1920×1200, 2048×1536 no AA no AF, 4x AA 16x AF
- 3Dmark05 1280×1024, 1600×1200, 1920×1200, 2048×1536, no AA no AF, 4xAA 16x AF
I reran the game performance tests with two of the new AA modes, 16x which is the new Coverage Sample AA and 16xQ, which is the highest quality mode of AA available on the GeForce 8800 series. The performance was great throughout in 16x mode, but 16xQ mode takes a lot of resources, meaning the 8800GTS will struggle with it in certain applications. A further note in that all performance is with Transparency SuperSample AA, gamma correct AA turned on, and the AF optimizations turned off. After all buying a 8800GTS or GTX card to turn off the IQ features is wrong ;). A further note, HDR is enabled in the Oblivion benchmarks.
Oblivion is one of the defining games of the Shader Model 3 era. Using a lot of Pixel and Vertex Shader 2.0 programs, the game is a resource hog. The 8800GTS is capable of playing the game with HDR and MSAA enabled. The 96.64 Forceware drivers enable FSAA and HDR if you force it with the Control Panel. The 8800GTS plays the game very well, all the way through the max resolution with 16x AA and 16x AF.
Dark Messiah Might and Magic is the game that was bundled with the 8800GTS from EVGA. This game is an excellent showcase of the 8800 cards as it uses the same engine as Half-Life 2 using Pixel Shader 2.0 and Vertex Shader 2.0. Role-Playing Games are my favorite genre along with Real-Time Strategy games. The 8800GTS plays this game wonderfully with up to 1280×1024 playable with the max AA + AF settings (16xQ and 16x AF). 2048×1536 is fully playable throughout with 16x AA and 16x AF.
Tomb Raider was one of the first games in the modern computer era that said “3D”. Starring Lara Croft as an adventurer, the Tomb Raider motif has spread to television, movies, action figures, and more. Tomb Raider Legend is the latest game in the series. This game features Pixel and Vertex Shaders throughout. The 8800GTS plays this game wonderfully with full resolution at 16xAA and 16x AF.
The 8800GTS has an MSRP of $449 from EVGA. At this price point, you get better image quality, features, and performance that’s faster than any other video card on the market today except the 8800GTX, which is also $150 more money retail. The 8800GTS has about 25% less streaming processors than the 8800GTX accounting for the performance difference.
16X Coverage Anti-Aliasing allows the graphics card to improve image quality without the serious hit that 16XQ AA would inter. The 8800GTS is the perfect upgrade for those wanting a graphics card today without wanting to pay the $599 premium price for the 8800GTX, as a lot of situations are CPU limited.
In earlier reviews, I’ve talked about the next major step in graphics, DirectX 10 is the next real jumping on point. Unfortunately, Microsoft delayed Vista until next year, and games supporting the new features of the 8800 series won’t appear till then. However, you get the performance and image quality that’s second to none in the industry at the moment. DirectX 10 is the future and the 8800GTS and GTX are ready for it.