AMD rarely spoils with fresh processor architects. If Intel updates the structure every two years, then the competitor was the last time in 2007, releasing the K10, the converted version of the old K8. So the appearance of a fresh Bulldozer is a significant event. Over the next few years, architecture will become the basis for all AMD crystals, as well as the first chance for a long time to compete with Intel in the race for performance.
We walk with a couple
Creating Bulldozer, AMD engineers abandoned the proven strategy for improving and partial copying of old developments. The structure of the stones is fundamentally different from what we are used to seeing in the X86 systems.
The first and most important innovation is the original layout. All top versions of Bulldozer are officially equipped with eight cores. However, in fact, there are four full -fledged modules, just each has two computing units. It looks like this: two integer arithmetic clusters (they are called nuclei and are responsible directly for calculations) divide the Front-End, a cluster of calculations with a swimming of a decimalum (FPU) and an increased second-level cache.
The benefits of such a tandem are the savings of area, a decrease in energy consumption and production costs. Minus – the joint use of the same sets does not affect the final performance well. With a large load, one Front-End may not cope with two cores. AMD does not deny a loss of performance: according to her, the duet is about 20% weaker than a full -fledged dual -core.
Difficulties in communication
To exclude a narrow place, Front-End had to be taught to effectively share the resources between the two cores. To do this, the branch prediction unit and the command andar of the commands, which received the fourth channel for processing instructions (as in Sandy Bridge) and technology were processed Branch Fusion. The latter allows you to glue part of the instructions in one operation. All this should speed up the work of Front-End and prevent the crystal from idle.
As for the nuclei themselves, this is a set of out-off -ord, loading/unloading, L1-cache and two computing clusters. Extraordinary execution unit is now equipped with a physical file registration. As in Sandy Bridge, the addresses of the storage of work data are dropped into it, which allows you to unload the main conveyor OUT-OF-Order. The load/unloading processor received an enlarged buffer, doubled bit and the ability to work with virtual addresses, which theoretically should increase the speed of work with L1-cash data. The latter in Bulldozer became four times less: 16 versus 64 kb in K10. The loss was compensated by the speed of work. Association of L1 increased from two to four channels, which means twice as for the more efficiency of the search.
There are three computing clusters in one module: two integer and one for working with data with swimming. Compared to K10, the first pair lost one ALU (engaged in calculations) and Agu (understands the addresses of memory). In theory, this means a decrease in peak performance. In practice, the change will be practically not noticeable: it is difficult to fully load integer clusters.
The main changes were affected by the FPU, which is responsible for complex calculations with a floating comm. In K10 it has become much more powerful: I got a pair of MMX and 128-bit FMAC devices to perform addition and multiplication operations. Unlike K10, FMAC was made universal: they can replace each other, which positively affects the calculation speed. Plus, they learned to combine operations in the same terms, which increased the accuracy of calculations.
In addition, FPU received an updated series of instructions. Firstly, the processor now works with AVX, supporting registers 256 bits long. For their calculations, as in Sandy Bridge, two FMAC. Secondly, Bulldozer can work with SSE 4 instructions.2, Aensi, FMA4 and XOP. The last two sets are unique for AMD. For you and me, all these changes mean only one thing – the teams that were previously made for several measures will now be calculated for one, and this directly affects performance. True, in order to feel the speed increase, you need support for instructions by software.
Glue and scissors
As a result, each Bulldozer module consists of one Front-End, L2- and L1-cache data, two integer clusters and a block for working with a swimming -ap. There can be up to four such sets on one stone. At the same time, each of them has opened access to a number of common elements. The first is a two-channel memory controller with support for DDR3-1866 MHz. The second is L3-Cash, the volume of which, compared with K10, increased from 6 to 8 MB, and associativity-from 48 to 64 channels. Note that, unlike Sandy Bridge, the frequency of L3-Kash does not coincide with the speed of nuclei. If the top sample functions at a speed of 3.6 GHz, then the memory of the last level is https://big-win-box-casino.co.uk/ 2.2 GHz. This leads to tangible delays that negatively affect performance. According to AMD, such a sacrifice made for stable work at high frequencies.
Tadam!
Despite architectural tricks and 32-nm technology process, Bulldozer occupies impressive 315 square meters. millimeters. This is about one and a half times more than the quad -core Sandy Bridge and the senior Llano. Fortunately, energy consumption was preserved within reasonable limits – 125 watts.
In addition to eight -core models, there are versions with six and four computing blocks. The younger brothers are based on the same design with eight nuclei, but they are disabled one or two modules.
The base frequency varies from 3.1 to 3.6 GHz. Like Sandy Bridge, Bulldozer has automatic acceleration technology. A special chip responsible for Turbo Core 2.0 , monitors the current load on the nuclei and the level of TDP and, as soon as the possibility appears, increases the frequency of the processor. In the case of a top crystal, when all the modules are involved, the speed can be increased by 300 MHz. If part of the resources is idle – 600 MHz. At low loads, Bulldozer goes into energy -saving mode, technology is responsible for this Cool’N’QUIT.
Manual acceleration is simple. Firstly, the entire line is unlocked by the multiplier. Secondly, newcomers gain a height well: under liquid nitrogen, the senior Bulldozer set a new world record-8429 MHz.
Companions
Work Bulldozer on Socket AM3+. In fact, this is a slightly advanced AM3 with one additional contact. Chipsets with a new processor connector are called 990FX , 990x And 970. They differ in PCIe 2 controller.0. The senior model is equipped with 32 lines, the younger ones – 16. At the same time, 990FX and 990X support Crossfirex. Of the features of chipsets, we note six SATA Rev ports. 3 and 14 USB 2 connectors.0. USB 3 controller.0 no.
Note that bulldozer can work on old boards. All that is needed for this is the updated BIOS. Restrictions: Turbo Core and Cool’n’quiet decrease the reaction rate, and some of the energy -saving functions are not available.
Bulldozer processor architecture turned out to be interesting. Finally AMD stopped self-digging and came up with something really new. Unfortunately, there are few obvious advantages over competitors. There are no eight nuclei. In a good way, we have four-core models with an increased number of computing blocks, something like Intel Hyper-Threading, but at an iron level. The idea is good, but the performance will depend on how fast Front-End turned out. Of the real advantages of Bulldozer, only a powerful FPU can be distinguished for calculating numbers with a swimming and increased frequency of work increased compared to K10.
Roll out! Sweep!
AMD voiced plans for the release of the following processors. The company expects to renew the architecture annually, every time achieving approximately 15 percent increase in performance on watts. If AMD adheres to the intended plan, then in 2012 we will see architecture PILEDRIVER ("Coper"), after another year – Steamroller ("Steam rink"), and 2014 will be remembered by the announcement Excavator. Such construction work.
Wrong windows
According to AMD, Windows 7 unable to reveal the full potential of new creation: the OS planner does not take into account the features of Bulldozer. For example, for new processors, it is important that interconnected flows are assigned to one module, otherwise the cores will exchange data not through a quick L2-cache, but through a third-level memory. Some separate flows are also better processed in the same way in order to increase the efficiency of Turbo Core 2.0. At the same time, certain tasks create a large load on the Front End unit, and it is better to throw them out on different modules. Thanks to cooperation with Microsoft These nuances will be taken into account in the planner Windows 8. However, you should not wait for a significant increase in performance.
Dictionary
The integer computing cluster – engaged in operations with integers (1, 2, 10).
Front-End – PREACH BOCT. Receives commands from the program and translates them into a language that is understandable to the processor.
FPU – cluster of data calculations with a swimming comma. Produces calculations with fractional numbers (1.2345) and large values with degrees (1.2345E-10).
Bloc of the prediction of branches – predicts in advance what data and operations may need the program at the next moment. Prevents the processor.
Decoder teams – Missing the program into microoperations, which then computational clusters use.
Out-OF-Order – Extraordinary execution unit. Engaged in the distribution of actions between nuclei. Sends only those teams for which there is data.
Loading/unloading unit ( LSU )-monitors the movement of data between the exit from the conveyor and L1-cash data.
Cache associativity -Binding the lines and columns of the cache. The higher the associativity, the lower the search speed, but higher its effectiveness.
MMX – set of blocks for working with numbers up to 8 bytes.
Sets of instructions – allow one command to perform an operation on a few data.
Table 1
Technical characteristics of AMD Bulldozer processors
The number of computing nuclei
FX-8150
8
FX-8120
8
FX-8100
8
FX-6100
6
FX-4100
4
Basic frequency
FX-8150
3.6 GHz
FX-8120
3.1 GHz
FX-8100
3.1 GHz
FX-6100
3.3 GHz
FX-4100
3.6 GHz
Frequency in Turbo Core mode
FX-8150
4.2 GHz
FX-8120
4 GHz
FX-8100
3.7 GHz
FX-6100
3.9 GHz
FX-4100
3.8 GHz
L2-Cash
FX-8150
8 Mb
FX-8120
8 Mb
FX-8100
8 Mb
FX-6100
6 Mb
FX-4100
4 Mb
L3-Cash
FX-8150
8 Mb
FX-8120
8 Mb
FX-8100
8 Mb
FX-6100
8 Mb
FX-4100
8 Mb
Memory support
FX-8150
DDR3-1866
FX-8120
DDR3-1866
FX-8100
DDR3-1866
FX-6100
DDR3-1866
FX-4100
DDR3-1866
Energy consumption
FX-8150
125 watts
FX-8120
125 watts
FX-8100
125 watts
FX-6100
95 watts
FX-4100
95 watts
Process process
FX-8150
32 nm
FX-8120
32 nm
FX-8100
32 nm
FX-6100
32 nm
FX-4100
32 nm
SOCKET
FX-8150
Am3+
FX-8120
Am3+
FX-8100
Am3+
FX-6100
Am3+
FX-4100
Am3+
The price of November 2011
FX-8150
9200 rub.
FX-8120
7200 rub.
FX-8100
Not known
FX-6100
5500 rub.
FX-4100
4000 rub.