You are on page 1of 14

Computer Organization & Architecture Lecture #13 Computer Evolution and Performance The evolution of computers has been

characterized by increasing processor speed, decreasing component size, increasing memory size, and increasing I/O capacity and speed. One factor responsible for the great increase in processor speed in the shrinking size of the microprocessor components; this reduces the distance between components and hence increases speed. However, the true gains in speed in recent years have come from the organization of the processor, including heavy use of pipelining and parallel execution techniques and the use of speculative execution techniques, which results in the tentative execution of future instructions that might be needed. All of these techniques are designed to keep the processor busy as much of the time as possible. A critical issue in computer system design is balancing the performance of the various elements, so that gains in performance in one area are not handicapped by a lag in other areas. In particular, processor speed has increased more rapidly than memory access time. A variety of techniques are used to compensate for this mismatch, including caches, wider data paths from memory to processor, and more intelligent memory chips. A Brief History of Computers The First Generation: Vacuum Tubes ENIAC Electronic Numerical Integrator And Computer Designed by John Mauchly and John Presper Eckert University of Pennsylvania 1943 to 1946 Developed for calculating artillery firing tables Generally regarded as the first electronic computer Enormous!!! o 30 tons o 1500 square feet of floor space o 18,000 tubes

o 140 kW of power 5000 additions per second Decimal number system 20 accumulators 10 digits Programmed manually with switches and cables Disassemble in 1955

The von Neumann Machine The task of entering and altering programs for the ENIAC was extremely tedious. The programming process could be facilitated if the program could be represented in a form suitable for storing in memory alongside the data. Then, a computer could get its instructions by reading them from memory, and a program could be set or altered by setting the values of a portion of memory. Developed by John von Neumann Princeton Institute for Advanced Studies 1945 to 1952 Prototype of all subsequent general-purpose computers. IAS computer o Stored-Program concept o Main memory stores both data and instructions o Arithmetic and logic unit (ALU) capable of operating on binary data o Control unit, which interprets and executes the instructions in memory o Input and output (I/O) equipment operated by the control unit

Shown below is the general structure of the IAS computer:

With rare exceptions, all of todays computers have this same general structure and function and are referred to as von Neumann machines. IAS details 1000x40 words of storage both data and instructions 2x20 bit instructions Shown below is the number word format:

Shown below is the instruction word format:

The control unit operates the IAS by fetching instructions from memory and executing them one at a time. The control unit and the ALU contain storage locations, called registers. Memory buffer register (MBR): contains a word to be stored in memory, or is sued to receive a word from memory. Memory address register (MAR): specifies the address in memory of the word to be written from or read into the MBR. Instruction register (IR): contains the 8-bit opcode instruction being executed. Instruction buffer register (IBR): employed to hold temporarily the righthand instruction from a word in memory. Program Counter (PC): contains the address of the next instruction-pair to be fetched from memory. Accumulator (AC) and multiplier quotient (MQ): employed to hold temporarily operands and results of ALU operations.

Shown below is the expanded structure of IAS Computer:

Shown below is the IAS instruction cycle:

The IAS operates by repetitively performing an instruction cycle. Each instruction cycle consists of two subcycles. Fetch cycle: the opcode of the next instruction is loaded into the IR and the address portion is loaded into the MAR. This instruction may be taken from the IBR, or it can be obtained from memory by loading a word into the MBR, and then down to the IBR, IR, and MAR. Execute cycle: the control circuitry interprets the opcode and executes the instruction by sending out the appropriate control signals to cause data to be moved or an operation to be performed by the ALU. The IAS computer had 21 instructions which can be grouped as follows: Data transfer: move data between memory and ALU registers or between two ALU registers. Unconditional branch: used to facilitate repetitive operations. Conditional branch: branch can be made dependent on a condition, thus allowing decision points. Arithmetic: operations performed by the ALU. Address modify: permits address to be computed in the ALU and then inserted into instructions stored in memory.

Commercial Computers 1947 Eckert-Mauchly Computer Corporation formed to manufacture computers commercially 1950 UNIVAC I (Universal Automatic Computer) commissioned by Bureau of the Census o first successful commercial computer o scientific and commercial applications Eckert-Mauchly Computer Corporation became part of the UNIVAC division of Sperry-Rand Corporation Late 1950s UNIVAC II released with greater memory capacity and higher performance than UNIVAC I upward compatible IBM major manufacturer of punched-card processing equipment 1953 IBM 701 first electronic stored-program computer o scientific applications 1955 IBM 702 introduced o business applications IBM 700/7000 series established IBM as the overwhelmingly dominant computer manufacturer The Second Generation: Transistors Transistors replaced vacuum tubes o Smaller o Cheaper o Less heat o Same functionality o Solid-state device made from silicon (sand) Bell Labs 1947 Fully transistorized computers commercially available late 1950s NCR and RCA first to produce small transistor machines IBM 7000 Digital Equipment Corporation (DEC) PDP-1 High-level programming languages Provision of system software with computers

Third Generation: Integrated Circuits Single, self-contained transistor discrete component Manufacturing process was very expensive and cumbersome using discrete components Early second generation computers contained 10,000 transistors expanding to hundreds of thousands with newer machines 1958 Integrated circuit invented IBM System/360 DEC PDP-8 Microelectronics Means small electronics Computer consists of logic gates, memory cells and interconnections Manufactured on a semiconductor such as silicon Many transistors can be produced on a single wafer of silicon

Shown below is the relationship between Wafer, Chip, and Gate

The table below shows a summary of technology generations:


Generation 1 2 3 4 5 Dates 1946-1957 1958-1964 1965-1971 1972-1977 1978Technology Vacuum Tube Transistor SSI and MSI LSI VLSI Speed (ops per sec) 40,000 200,000 1,000,000 10,000,000 100,000,000

Moores Law Gordon Moore cofounder of Intel - 1965 The number of transistors on a chip will double every year Since the 1970s the number of transistors has doubled every 18 months Cost of a chip has remained virtually unchanged cost of computer logic and memory circuitry has fallen at a dramatic rate Higher packing density shorter electrical path increased operating speed Computers become smaller available in more environments Reduced power and cooling requirements Fewer interconnections increase in reliability Shown below is the Growth in CPU Transistor Count:

IBM System/360 Series see Table 2.4 1964 Replaced 7000 series not compatible Industrys first planned family of computers o Similar or identical instruction sets o Similar or identical operating systems (O/S) o Increasing speed o Increasing number if I/O ports more terminal connections o Increasing memory size o Increasing cost Multiplexed switch structure see Figure 2.5

DEC PDP-8 see Table 2.5 1964 First minicomputer named after miniskirt Did not need an air conditioned room Small enough to sit on a lab bench Could not do everything that a mainframe computer could o $16,000 versus $100,000+ IBM System/360 Original equipment manufacturers (OEM) would integrate PDP-8 as part of an integrated system package Introduced the bus structure that is virtually universal for all minicomputers and microcomputers o Omnibus 96 signal paths control, address, and data signals Show below is the Omnibus:

Semiconductor Memory 1950s and 1960s core memory Tiny rings of ferromagnetic material that were strung up on grids of fine wire suspended on small screens inside the computer Magnetized one way for a one and magnetized the other way for a zero Relatively fast 1 millionth of a second to read a stored bit Expensive and bulky Destructive read o Data erased during read o Extra circuits required to restore data after read 1970 Fairchild Size of a single core 256 bits of memory Nondestructive read Much faster than core 70 billionths of a second to read a stored bit Cost initially much higher than core changed in 1974 11 generations each generation provided four times the storage density

Microprocessors 1971 Intel 4004 4 bit o First microprocessor o All CPU components on a single chip o Designed for specific applications 1972 Intel 8008 8 bit o Twice as complex as the 4004 o Designed for specific applications 1974 Intel 8080 8 bit o First general-purpose microprocessor Table 2.6 shows the evolution of the Intel Microprocessors. Designing for Performance Microprocessor Speed Chipmakers release new generations of chips every three years each with four times as many transistors Memory chips have quadrupled the capacity of dynamic-access memory (DRAM) every three years Microprocessor speed boosts that come from reducing the distance between circuits has improved performance four- or fivefold every three years since Intel launched the x86 family in 1978 The raw speed of the microprocessor will not achieve its potential unless if is fed a constant stream of work to do in the form of computer instructions. While the chipmakers have been busy learning how to fabricate chips of greater and greater density, the processor designers must come up with ever more elaborate techniques for feeding the monster. Branch prediction the processor looks ahead in the instruction code fetched from memory and predicts which branches, or groups of instructions, are likely to be processed next. Data flow analysis the processor analyzes which instructions are dependent on each others results, or data, to create an optimized schedule of instructions.

Speculative execution using branch prediction and data flow analysis, some processors speculatively execute instructions ahead of their actual appearance in the program execution, holding the results in temporary locations. Performance Balance While processor power has raced ahead at breakneck speed, other critical componenets of the computer have not kept up. The result is a need to look for performance balance: an adjusting of the organization and architecture to compensate for the mismatch among the capabilities of the various components. Nowhere is the problem created by such mismatches than in the interface between processor and main memory. Shown below is the evolution of DRAM and Processor Characteristics:

While processor speed and memory capacity have grown rapidly, the speed with which data can be transferred between main memory and the processor has not. The interface between processor and main memory is the most critical pathway in the entire computer, because it is responsible for carrying a constant flow of program instructions and data between memory chips and the processor.

Shown below are the trends in DRAM use:

The amount of main memory is going up but, but DRAM density is going up faster. The net result is that, on average, the number of DRAMs per system is going down. This has an affect on transfer rates, because there is less opportunity for parallel transfer of data. There are a number of ways that a system architect can attack this problem: Increase the number of bits that are retrieved at one time wider data paths Change the DRAM interface include cache or other buffering techniques Reduce the frequency of memory access include one or more level of cache both on- and off-chip between the processor and main memory Increase the interconnect bandwidth between processors and memory higherspeed buses and using a hierarchy of buses to buffer and structure data flow

Pentium and PowerPC Evolution Pentium Evolution 8080 o First general purpose microprocessor o 8 bit data path o Used in first personal computer Altair 8086 o Much more powerful o 16 bit o Instruction cache prefetch few instructions o 8088 (8 bit external bus) used in first IBM PC 80286 o 16 Mbyte memory up form 1 Mbyte 80386 o 32 bit o Support for multitasking 80486 o Sophisticated powerful cache and instruction pipelining o Built in math co-processor Pentium o Introduced superscalar techniques o Multiple instructions executed in parallel Pentium Pro o Increased superscalar organization o Aggressive register renaming o Branch prediction o Data flow analysis o Speculative execution Pentium II o MMX technology video, audio, graphics processing Pentium III o Additional floating point instructions for graphics Pentium 4 o Arabic not Roman numerals o Additional floating point and multimedia enhancements Itanium o 64 bit

PowerPC Evolution 601 o Introduce the market to the PowerPC architecture o 32 bit 603 o Used for low-end desktop and portable computers o 32 bit o Lower cost and a more efficient implementation 604 o Used for desktop computers and low-end servers o 32 bit o Used advanced superscalar design techniques 620 o Used in high-end servers o 64 bit 740/750 (G3) o Two levels of cache in the main processor significant performance improvement over machines with off-chip cache G4 o Increase the parallelism and internal speed of the processor

You might also like