Professional Documents
Culture Documents
5.1
TB/AMP/2010
(a) How to issue multiple instruct, and (b) How to execute them concurrently. Keeping in
view these two issues, I architectures may again be redivided in two classes of architectures
(i) Very Long Instruction Word (VLIW) architecture and (ii) Superscalar architecture.
5.2
TB/AMP/2010
Superscalar Execution
The salient feature of Pentium is that it supports superscalar architecture. For execution of
multiple instructions concurrently, Pentium microprocessor issues two instructions in
parallel to the two independent integer pipelines known as U and V pipelines. Each of these
two pipelines has 5 stages, as shown in Fig. 5.2. These pipeline stages are similar to the one
in 80486 CPU. Functions of these pipelines have been presented in brief.
1. In the prefetch stage of the pipeline, the CPU fetches the instructions from the
instruction cache, which stores the instructions to be executed. In this stage, the CPU
also aligns the codes appropriately. This is required since the instructions are of
variable length and the initial opcode bytes of each instruction should be
appropriately aligned. After the prefetch stage, there are two decode stages D1 and
D2.
2. In the D1 stage, the CPU decodes the instruction and generates a control word. For
simple RISC like instructions involving register data transfer or arithmetic and
5.3
TB/AMP/2010
logical operations, only a single control word might be sufficient enough for starting
the execution. However, as we know X86 architecture supports complex CISC
instruction and require microcoded control sequencing.
3. Thus a second decode stage D2 is required where the control word from D1 stage is
again decoded for final execution. Also the CPU generates addresses for data
memory references in this stage.
4. In the execution stage, known as E stage, the CPU either accesses the data cache for
data operands or executes the arithmetic/logic computations or floating-point
operations in the execution unit.
5. In the final stage of the five stage pipeline, which is the WB (writeback) stage, the
CPU updates the registers contents or the status in the flag register depending upon
the execution result.
Although, as we mentioned Pentium pipeline structure is somewhat similar to the 80486
pipeline structure, Pentium achieves a lot of speed-up by integrating additional hardware in
each pipeline stages. Thus while 80486 may take two clock cycles to decode some
instructions, Pentium takes only one.
5.4
TB/AMP/2010
Fig. 5.3. The 8-byte wide memory banks of the Pentium microprocessor.
Memory selection is accomplished with the bank enable signals ( BE 7 BE 0 ). These
separate memory banks allow the Pentium to access any single byte, word, doubleword, or
quadword with one memory transfer cycle. As with earlier memory selection logic, we often
generate eight separate write strobes for writing to the memory system.
5.5
TB/AMP/2010
A new feature added to the Pentium is its capability to check and generate parity for the
address bus (A3 1A5) during certain operations. The AP pin provides the system with
parity information and the APCHK indicates a bad parity check for the address bus. The
Pentium takes no action when an address parity error is detected. The error must be assessed
by the system and the system must take appropriate action (an interrupt), if so desired.
The Pentium can function with a 32-bit wide memory system by using a multiplexer to
convert the 64-bit data bus to a 32-bit data bus. A set of bi-directional multiplexers (bidirectional buffers are used as multiplexers) are used to convert the Pentiums 64-bit data
bus into a 32-bit data bus. Care must be taken when using this arrangement because
software could access a doubleword that crosses the boundary between the lower and upper
halves of the data bus. All doublewords must be stored at doubleword boundaries. Note that
a doubleword boundary is an address that is divisible by 4.
Input/output System
The input/output system of the Pentium is completely compatible with earlier Intel
microprocessors. The I/O port number appears on address lines A15A3 with the bank
enable signals used to select the actual memory banks used for the I/O transfer.
Beginning with the 80386 microprocessor, I/O privilege information is added to the TSS
segment when the Pentium is operated in the protected mode. This allows I/O ports to be
selectively inhibited. If the blocked I/O location is accessed, the Pentium generates type13
interrupt to signal an I/O privilege violation.
5.6
TB/AMP/2010
CD
Cache disable controls the internal cache. If CD = 1, the cache will not fill
with new data for cache misses, but it will continue to function for cache hits.
If CD = 0, misses will cause the cache to fill with new data.
NW
Not write-through selects the mode of operation for the data cache. If NW =
1, the data cache is inhibited from cache write-through.
AM
Alignment mask enables alignment checking when set. Note that alignment
checking only occurs for protected mode operation when the user is at
privilege level 3.
5.7
TB/AMP/2010
WP
Write protect protects user level pages against supervisor level write
operations. When WP = 1, the supervisor can write to user level segments.
NE
ET
TS
Indicates that the 80386 has switched tasks (in protected mode, changing the
contents of TR places a 1 into TS). If TS = 1, a numeric coprocessor
instruction causes a type 7 (coprocessor not available) interrupt.
EM
MP
PE
Is set to select the protected mode of operation for the 80386. It may also be
cleared to re-enter the real mode. This bit can only be set in the 80286. The
80286 could not return to real mode without a hardware reset, which
precludes its use most systems that use protected mode.
VME
Virtual mode extension enables support for the virtual interrupt flag in
protected mode. If VME = 0, virtual interrupt support is disabled.
PVI
Protected mode virtual interrupt enables support for the virtual interrupt flag
in protected mode.
5.8
TB/AMP/2010
TSD
DE
PSE
MCE
The Pentium contains new features that are controlled by CR4 and a few bits in CR0.
EFLAG Register
The extended flag (EFLAG) register has been changed in the Pentium microprocessor.
Figure5.5 pictures the contents of the EFLAG register. Four new flag bits have been added
to this register to control or indicate conditions about some of the new features in the
Pentium.
The identification flag is used lb test for the CPUID instruction. If a program can set
and clear the ID flag, the processor supports the CPUID instruction.
VIP
VIF
Virtual interrupt is the image of the virtual interrupt flag IF used with VIP
AC
VM
Virtual Mode Flag If this flag is set, the 80386 enters the virtual 8086 mode within
the protected mode. This is to be set only when the 80386 is in protected mode. In
this mode, if any privileged instruction is executed an exception 13 is generated.
This bit can be set using the IRET instruction or any task switch operation only in
the protected mode.
5.9
TB/AMP/2010
RF
Resume Flag This flag is used with the debug register break points. It is checked at
the starting of every instruction cycle and if it is set, any debug fault is ignored
during the instruction cycle. The RF is automatically reset after successful execution
of every instruction, except for the IRET and POPF instructions. Also, it is not
automatically cleared after the successful execution of JMP, CALL and TNT
instructions causing a task switch. These instructions are used to set the RF to the
value specified by the memory data available at the stack.
NT
IOP
5.10
TB/AMP/2010
the Pentium microprocessor. Pay close attention to the way the linear address is used with
this scheme. Notice that the leftmost 10 bits of the linear address select an entry in the page
directory (just as with 4K pages). Unlike 4K pages, there are no page tables; instead, the
page directory addresses a 4M-byte memory page.
Memory-Management Mode
The system memory-management mode (SMM) is on the same level as protected mode, real
mode, and virtual mode, but it is provided to function as a manager. The SMM is not
intended to be used as an application or a system-level feature. It is intended for high-level
system functions such as power management and security, which most Pentiums use during
operation.
Access to the SMM is accomplished via a new external hardware interrupt applied to the
SMI pin on the Pentium. When the SMM interrupt is activated, the processor begins
executing system-level software in an area of memory called the system management RAM
5.11
TB/AMP/2010
or SMMRAM, called the SMM state dump record. The SMI interrupt disables all other
interrupts that are normally handled by user applications and the operating system. A return
from the SMM interrupt is accomplished with a new instruction. RSM returns from the
memory-management mode interrupt and returns to the interrupted program at the point of
the interruption.
The SMM interrupt calls the software, initially stored at memory location 38000H, using
CS=3000H and EIP = 8000H. This initial state can be changed using a jump to any location
within the first 1M byte of memory. An environment similar to real-mode memory
addressing is entered by the management mode interrupt, but it is different because, instead
of being able to address the first 1M of memory, SMM mode allows the Pentium to treat the
memory system as a flat, 4G-byte system.
In addition to executing software that begins at location 38000H, the SMM interrupt also
stores the state of the Pentium in what is called a dump record. The dump record is stored at
memory locations 3FFA8H through 3FFFFH, with an area at locations 3FE00H through
3FEF7H that is reserved by Intel. The dump record allows a Pentium-based system to enter
a sleep mode and reactivate at the point of program interruption. This requires that the
SMMRAM be powered during the sleep period. Many laptop computers have a separate
battery to power the SMMRAM for many hours during sleep mode.
The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the
RSM instruction. These data allow the RSM instruction to return to the halt-state or return to
the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering
the mode, the RSM instruction reloads the state of the machine from the state dump and
returns point of interruption.
The SMM mode can be used by the system before the normal operating system is placed in
the memory and executed. It can also periodically be used to manage the system, provided
that normal software doesnt exist at location 38000H3FFFFH. If the system relocates the
SMRAM before booting the normal operating system, it becomes available for use in
addition to the normal system.
5.12
TB/AMP/2010
The base address of the SMM mode SMRAM is changed by modifying the value in the state
dump base address registers (locations 3FEF8H through 3F3FBH) after the first memorymanagement mode interrupt. When the first RSM instruction is executed, returning control
back to the interrupted system, the new value from these locations changes the base address
of the SMM interrupt for all future uses. For example, if the state dump base address is
changed to 000E8000H, all subsequent SMM interrupts use locations E8000HEFFFFH
for the Pentium state dump. These locations are compatible with DOS and Windows.
PENTIUM II
Pentium II is also a 32-bit processor with 64-bit data bus and 36-bit address bus to address
up to 64GB of physical memory space. It is actually a Pentium pro processor with on-chip
MMX (Multi Media Extension). It is available with maximum internal ratings of 233 MHz
to 450 MHz.
The features of Pentium II processor are;
(i)
(ii)
Integrated primary (L1) 16-kb instruction cache and 16-kb write back data cache.
(iii)
(iv)
(v)
(vi)
Quick start and Deep sleep modes provide extremely low power dissipation.
(vii)
Low power GTL + processor system bus interface (GTL: Gunning transceiver
Logic).
(viii)
(ix)
5.13
TB/AMP/2010
The features are indicated in the EDX register after executing the CPUID instruction with a
zero in EAX. Only two new features are returned in EDX for the Pentium II. Bit position 11
indicates whether the microprocessor supports the two new fast call instructions
SYSENTER and SYSEXIT. Bit position 23 indicates whether the microprocessor supports
the MMX instruction set. The remaining bits are identical to earlier versions of the
5.14
TB/AMP/2010
microprocessor and are not described. Bit 16 indicates whether the microprocessor supports
the page attribute table or PAT. Bit 17 indicates whether the microprocessor supports the
page size extension found with the Pentium Pro and Pentium II microprocessors. The page
size extension allows memory above 4G through MG to be addressed. Finally, bit 24
indicates whether the fast floating-point save and restore instructions are implemented.
SYSENTER and SYSEXIT Instructions
The SYSENTER and SYSEXIT instructions use the fast call facility introduced in the
Pentium II microprocessor. Please note that these instructions function only in ring zero
(privilege level 0) in protected mode. Windows operates in ring 0, but does not allow
applications access to ring 0. These new instructions are meant for operating system
software.
The SYSENTER instruction uses some of the model-specific registers to store CS, EIP, and
ESP to execute a fast call to a procedure defined by the model-specific register. The fast call
is different from a regular call because it does not push the return address onto the stack as a
regular call. Table 5.2 illustrates the model-specific register used with SYSENTER and
SYSEXIT. Note that the model-specific registers are read with the RDMSR instruction and
written with the WRMSR instruction.
TABLE 5.2 The model- specific registers used with
SYSENTER and SYSEXIT.
To use the RDMSR or WRMSR instructions, place the register number in the ECX register.
If the WRMSR is used, place the new data for the register in EDS: EAX. For the
SYSENTER instruction, you need use only the EAX register, but place a zero into EDX. If
the RDMSR instruction is used, the data are returned in the EDX: EAX register pair.
5.15
TB/AMP/2010
To use the SYSENTER instruction, first load the model-specific registers with the address
of the system entrance point into the SYSENTER_CS and SYSENTER._EIP registers. This
would normally be the address of the operating system such as Windows or Windows NT.
Note that this instruction is meant as a system instruction to access code or software in ring
0. The stack segment register is lo4ded with the value placed into SYSENTER_CS plus 8.
In other words, the selector pair addressed by SYSENTER._CS selector value are loaded
into CS and SS. The value of the stack offset is loaded into SYSENTER_ESP.
The SYSEXIT instruction loads CS and SS with the selector pair addressed by
SYSENTER_CS plus 16 and 24. Table 5.3 illustrates the selectors from the global selector
table, as addressed by SYSENTER_CS. In addition to the code and stack segment selector
and the memory segments that they represent, the SYSEXIT instruction passes the value in
EDX to the EIP register and the value in ECX to the ESP register. The SYSEXIT instruction
returns control back to application ring 3. As mentioned, these instructions appear to have
been designed for quick entrance and return from the Windows or Windows NT operating
systems on the personal computer.
TABLE 5.3 Selectors addressed by the SYSENTER_CS select value.
To use SYSENTER and SYSEXIT, the SYSENTER instruction must pass the return address
to the system. This is accomplished by loading the EDX register with the return offset arni
by placing the segment address in the global descriptor table at location SYSENTER_C?+.
The stack segment is transferred by loading the stack segment selector into
SYSENTER_CS+24 and the ESP into the ECX.
FXSAVE and FXRSTOR Instructions
The last two new instructions added to the Pentium II microprocessor are the FXSAVE and
FXRSTOR instructions, which are almost identical to the FSAVE and FRSTOR instructions.
5.16
TB/AMP/2010
The main difference is that the FXSAVE instruction is designed to properly store the state of
the MMX machine, while the FSAVE properly stores the state of the floating- point
coprocessor. The FSAVE instruction stores the entire tag field, while the FXSAVE
instruction only stores the valid bits of the tag field. The valid tag field is used to reconstruct
the restore tag field when the FXRSTOR instruction executes. This means that if the MMX
state of the machine is saved, use the FXSAVE instruction; if the floating-point state of the
machine is saved, use the FSAVE instruction. For new applications, it is recommended that
the FXSAVE and FXRSTOR instructions should be used to save the MMX state and
floating-point state of the machine. Do not use the FSAVE and FRSTOR instructions in new
applications.
5.17
TB/AMP/2010
5. P-III employs dynamic execution technology, which has already been discussed.
6. A 512Kbyte unified, non-blocking level 2 cache has been used.
7. Eight 64-bit wide Intel MMX registers along with a set of 57 instructions for multimedia
applications are available
Chip Sets
The chip set for the Pentium III is different from the Pentium II. The Pentium III uses an
Intel 810, 815, or 820 chipset. The 815 is most commonly found in newer systems that use
the Pentium III. A few other vendor chip sets are available, but problems with drivers for
new peripherals, such as the video cards, have been reported. An 840 chip set also was
developed for the Pentium III, but Intel does not make it available.
Bus
The Coppermine version of the Pentium III increases the bus speed to either 100 MHz or
133MHz. The faster version allows transfers between the microprocessor and the memory at
higher speeds. Suppose that a 1-GHz microprocessor uses a 133-MHz memory bus. You
might think that the memory bus speed could be faster to improve performance. However,
the connections between the microprocessor and the memory preclude using a higher speed
for the memory. If it is decided to use a 200-MHz bus speed, we must recognize that a
wavelength at 200 MHz is 300,000,000/200,000,000 or 3/2 meter. An antenna is 1/4 of a
wavelength. At 200 MHz, an antenna is 14.8 inches. We do not want to radiate energy at 200
MHz, so we need to keep the printed circuit board connections shorter than 1/4-wavelength.
In practice, we would keep the connections to no more than 1/10 of 1/4-wavelength. This
means that the connections in a 200MHz system should be no longer than 1.48 inches. This
size would present the main board manufacturer with a problem when placing the sockets
for a 200 MHz memory system.
It is possible to approach or even exceed the 200 MHz memory system, if we develop a new
technology for interconnecting the microprocessor, chipset, and memory. At present the
memory functions in bursts of four 64-bit numbers each time we read the main memory.
This burst of 32bytes is read into the cache. The main memory requires 3 wait states at 100
MHz to access the first 64-bit number and then zero wait states for each of the three
5.18
TB/AMP/2010
remaining 64-bit wide numbers for a total of seven 100 MHz bus clocks. This means we are
reading data at 70 ns / 32 = 2.1875ns per byte, which is a bus speed of 457M bytes per
second. This is slower than the clock on a 1GHz microprocessor, but because most programs
are cyclic and the instructions are stored ii internal cache, we can and often do approach the
operating frequency of the microprocessor.
PENTIUM IV
The most recent version of the Pentium Pro architecture microprocessor is the Pentium 4
microprocessor from Intel. The Pentium 4 was released initially in November 2000 with a
speed of 1.3 GHz. It is currently available in speeds up to 2.0 GHz. There are two packages
available for this integrated microprocessor, the 423-pin PGA and the 478-pin FC-PGA2.
Both versions use the 1.8 micron technology for fabrication. As with earlier versions of the
Pentium, the Pentium 4 uses a 100-MHz memory bus speed, but because it is quad pumped,
the bus speed can approach 400 MHz.
Memory Interface
The memory interface to the Pentium 4 typically uses the Intel 850 chipset. The 850
provides a dual-pipe memory bus to the microprocessor with each pipe interfaced to a 32-bit
wide section of the memory. The two pipes function together to comprise the 64-bit wide
data path to the microprocessor. Because of the dual pipe arrangement, the memory must be
populated with pairs of RDRAM memory devices operating at either 600 MHz or 800 MHz.
According to Intel this arrangement provides a 300% increase in speed over a memory
populated with PC-l00 memory.
5.19
TB/AMP/2010
special microinstruction cache is 12K bytes deep. This technology excludes the execution
unit from the main cache path to the microinstruction stream to increase performance.
RISC Architecture
The complexities of the instructions supported by a CISC processor went on increasing, as
more and more sophisticated processors were designed and marketed. This resulted in an
increase of processor die size to accommodate the large microcode required by the complex
instructions. The large size in turn meant more cost, since it consumes more silicon. Also the
chip size increases, the power consumption increases, resulting in more heating of the chip.
This in turn requires more cooling arrangement.
If we use processor, which support a set of simpler instructions, which do not require
complex decoding, then the design of processor becomes simple, with an associated
reduction in cost and power consumption. Also the execution of these instructions becomes
very fast.
As the name implies, Reduced Instruction Set Computer or RISC as it is popularly known is
a type of architecture that utilizes a small, lightly optimized set f instructions, rather than a
more specialized set of instructions often found in other types of architectures. Typica1ly
every instruction is executed in a single clock after it is fetched and decoded. These
instructions are executed very fast. Lot of disc space is consumed by micro codes in a ClSC
design which could be otherwise used for enhanced features. It is thus possible to produce
more RISC processors per silicon wafer. This makes RISC processors smaller, with less
energy consumption.
5.20
TB/AMP/2010
(i)
RISC instructions, being simple, can be hard-wired, while CISC architectures may
have to use micro-programming in order to implement comp1ex instructions.
(ii)
A set of simple instructions results in reduced complexity of the control unit and
the data-path; as a consequence, the processor can work at a high clock frequency
and thus yields higher speed.
(iii)
(iv)
(v)
(vi)
Shorter design cycleA new RISC processor can be designed and tested more
quickly since RISC processors are simpler than corresponding CISC processors.
(vii)
The application programmers who use the microprocessors instructions will find it
easier to - develop code with a smaller and optimum instruction set.
(viii)
5.21
TB/AMP/2010
(ii)
Same length instruction: Each instruction is of the same length, so that it may be
fetched in a single operation. The traditional microprocessors from Intel or
Motorola support variable length instructions.
(iii)
(iv)
(v)
Very few addressing modes and formats: Unlike the CISC processors, where the
number of addressing modes are very high, in RISC processors, the addressing
modes are much less and it supports few formats.
(vi)
(vii)
(viii)
Load and Store architecture: The RISC architecture is primarily a Load and
Store architecture implying that all the memory accesses take place using Load
or Store type operations.
5.22