Unit 1

1 |Page
Prepared by B.Rajalingam M.E., AP/CSE
UNIT I BASIC STRUCTURE OF COMPUTERS
Functional units Basic operational concepts Bus structures Performance and metrics Instructions and instruction sequencing Hardware Software Interface Instruction set architecture Addressing modes RISC CISC ALU design Fixed point and floating point operations.
1. FUNCTIONAL UNITS A computer consists of five functionally independent main parts. They are, Input Memory Arithmetic and logic Output Control unit
The operation of a computer can be summarized as follows The computer accepts programs and the data through an input and stores them in the memory. The stored data are processed by the arithmetic and logic unit under program control. The processed data is delivered through the output unit. All above activities are directed by control unit. The information is stored either in the computers memory for later use or immediately used by ALU to perform the desired operations. Instructions are explicit commands that Manage the transfer of information within a computer as well as between the computer and its I/O devices. Specify the arithmetic and logic operations to be performed. To execute a program, the processor fetches the instructions one after another, and performs the desired operations. The processor accepts only the
UNIT-1
CS2253 Computer Organization and Architecture
2 |Page
machine language program. To get the machine language program, Complier is used. Note: Compiler is software (Translator) which converts the High Level Language program (source program) into Machine language program (object program) 1. Input unit: The computer accepts coded information through input unit. The input can be from human operators, electromechanical devices such as keyboards or from other computer over communication lines. Examples of input devices are Keyboard, joysticks, trackballs and mouse are used as graphic input devices in conjunction with display. Micro phones can be used to capture audio input which is then sampled and converted into digital code for storage and processing. Keyboard It is a common input device. Whenever a key is pressed, the corresponding letter or digit is automatically translated into its corresponding binary code and transmitted over cable to the memory of the computer. 2. Memory unit: Memory unit is used to store programs as well as data. Memory is classified into primary and secondary storage. Primary storage: It also called main memory. It operates at high speed and it is expensive. It is made up of large number of semiconductor storage cells, each capable of storing one bit of information. These cells are grouped together in a fixed size called word. This facilitates reading and writing the content of one word (n bits) in single basic operation instead of reading and writing one bit for each operation Each word is associated with a distinct address that identifies word location. A given word is accessed by specifying its address. Word length: The number of bits in each word is called word length of the computer. Typical word lengths range from 16 to 64bits.Programs must reside in the primary memory during execution. RAM: It stands for Random Access Memory. Memory in which any location can be reached in a short and fixed amount of time by specifying its address is called random-access memory.
UNIT-1
3 |Page
Memory access time Time required to access one word is called Memory access time. This time is fixed and independent of the word being accessed. It typically ranges from few nano seconds (ns) to about 100ns. Caches They are small and fast RAM units. They are tightly coupled with the processor. They are often contained on the same integrated circuits(IC) chip to achieve high performance. Secondary storage: It is slow in speed. It is cheaper than primary memory. Its capacity is high. It is used to store information that is not accessed frequently. Various secondary devices are magnetic tapes and disks, optical disks (CDROMs), floppy etc. 3. Arithmetic and logic unit: Arithmetic and logic unit (ALU) and control unit together form a processor. Actual execution of most computer operations takes place in arithmetic and logic unit of the processor. Example: Suppose two numbers located in the memory are to be added. They are brought into the processor, and the actual addition is carried out by the ALU. Registers: Registers are high speed storage elements available in the processor.Each register can store one word of data. When operands are brought into the processor for any operation, they are stored in the registers. Accessing data from register is faster than that of the memory. 4. Output unit The function of output unit is to produce processed result to the outside world in human understandable form. Examples of output devices are Graphical display, Printers such as inkjet, laser, and dot matrix and so on. The laser printer works faster. 5.Control unit: Control unit coordinates the operation of memory, arithmetic and logic unit, input unit, and output unit in some proper way. Control unit sends control signals to other units and senses their states. Example: Data transfers between the processor and the memory are controlled by the control unit through timing signals. Timing signals are the signals that determine when a given action is to take place. Control units are well defined, physically separate unit that interact with other parts of the machine. A set of control lines carries the signals used for timing and synchronization of events in all units
UNIT-1
4 |Page
2. BASIC OPERATIONAL CONCEPTS To perform a given task on computer, an appropriate program is to be stored in the memory. Individual instructions are brought from the memory into the processor, which executes the specified operations. Data to be used as operands are also stored in the memory. Consider an instruction
UNIT-1
5 |Page
Add LOCA, R0 This instruction adds operand at memory location LOCA to the operand in a register R0 in the processor and the result get stored in the register R0. The original content of LOCA is preserved, whereas the content of R0 is overwritten. This instruction requires the following steps 1) The instruction is fetched from memory into the processor. 2) The operand at LOCA is fetched and added to the content of R0. 3) Resulting sum is stored in register R0. The above add instruction combines a memory access operation with an ALU operation. Same can be performed using two instruction sequences Load LOCA, R1 ADD R1, R0 1. Here the first instruction, transfer the contents of memory location LOCA into register R1. 2. The second instruction adds the contents of R1 and R0 and places the sum into R0. 3. The first instruction destroys the content of R1 and preserve the value of LOCA, the second instruction destroys the content of R0. Connection between memory and the processor Transfer between the memory and the processor are started by sending the address of the memory to be accessed to the memory unit and issuing the appropriate control signals. The data are then transferred to or from the memory. Processor contains number of registers in addition to the ALU and the Control unit for different purposes. Various registers are
Instructions register (IR)
Program counter (PC) Memory address registers (MAR) Memory data register (MDR) The below figure shows how the memory and the processor can be connected
UNIT-1
6 |Page
Instruction registers (IR): IR holds the instruction that is currently being executed by the processor. Its output is available to the control circuits, which generates the timing signals that controls various processing elements involved in executing the instruction. Program counter (PC): It is a special purpose register that contains the address of the next instruction to be fetched and executed. During the execution of one instruction PC is updated to point the address of the next instruction to be fetched and executed. It keeps track of the execution of a program. Memory address registers (MAR): The MAR holds the address of the memory location to be accessed. Memory data register (MDR): The MDR contains the data to be written into or read from the memory location that is being pointed by MAR. These two registers MAR and MDR facilitates communication between memory and the processor. Operating steps 1. Initially program resides in the memory (usually get through the input Unit.) and PC is set to point to the first instruction of the program. 2. The contents of PC are transferred to MAR and Read control signal is sent to the memory. The addressed word (in this case the first instruction of the program) is read out of the memory and located into the MDR .register. 3. Next, the contents of MDR are transferred to the IR. At this point the instruction is ready to be decoded and executed.
UNIT-1
7 |Page
4. If the instruction involves an operation to be performed by the ALU, it is Necessary to obtain the required operands. If the operands resides in the memory (it could also be in a general purpose register in the processor), then the operands required are fetched from the memory to the MDR by sending its address to the MAR and initiating a read cycle. The fetched operands are then transferred to the ALU. After one or more operands are fetched dint his way the ALU can perform the desired operation.. 5. If the result of this operation is to be stored in the memory, then the result is sent to MDR and address of the memory location where the result is to be stored is sent to MAR and write cycle is initiated. 6. During the execution of current instruction the contents of the PC are incremented to point to next instruction to be executed. Thus as soon as the execution of the current instruction is completed, a new instruction fetch may be started. Note: in addition to transferring data between the memory and the processor, the Computer accepts data from input devices and sends data to the output devices. Thus some machine instructions with ability to handle IO transfers are provided. Interruption Normal execution of the program may be interrupted if some other device requires urgent service of the processor. For example, a monitoring device in a computer controlled industrial process may detect a dangerous condition. In order to deal with that situation immediately, the normal execution of the current program must be interrupted. To do this the device raises an interrupt signal. An interrupt is the request from an I/O device for the service by the processor. The processor provides the requested service by executing an appropriate interrupt service routine. When the interrupt service routine is completed, the execution of the interrupted program is continued by the processor. Because of such changes, it may alter the internal state of the processor. So its state must be saved in memory location before servicing the interrupt. Normally PC will be used. 3. BUS STRUCTURES Bus is a group of lines that serves as a connection path for several individual parts of a computer to transfer the data between them. To achieve a reasonable speed of the operation, a computer must be organized so that, All its units can handle one full word of data at a given time. When a word of data is transferred in a bus, all its bits are transferred in parallel, that is, the bits are transferred simultaneously over many wires, or lines, one bit per line. A group of lines serves as a connecting path of several devices is called a BUS. The bus must have a separate line for carrying data, address and control signals. Single bus is used to interconnect all the units as shown above and hence the bus can be used for only one transfer at a time, only two units can actively use the bus at any given time.
UNIT-1
8 |Page
Advantage of using single bus structure is its low cost and its flexibility for Attaching peripheral devices. Multiple Bus structure System using multiple buses results in concurrency as it allows two or more transfer at the same time. This leads to high performance but at increased cost. The use of Buffer registers Speed of operation of various devices connected to a common bus varies. Input and output devices such as keyboard and printers are relatively slow compared to processor and storage devices such as optical disks. Consider an example the transfer of encoded character from a processor to a character printer. The efficiency of using processor will be reduced because of variance in their speed. To overcome this problem a buffer register is used. Buffer registers Buffer register is an electronic register that is included with the devices to hold the information during transfer. When the processor sends a set of characters to a printer, those contents is transferred to the printer buffer (buffer register for a printer). Once printer buffer is loaded processor and the bus is no longer needed and the processor can be released for other activity. Purpose of Buffer Register: Buffer register prevent a high speed processor from being locked to a slow I/O devices. Buffer register is used which smooth out timing differences among slow and the fast devices. It allows the processor to switch rapidly from one device to another. 4. PERFORMANCE AND METRICS Performance of a computer can be measured by speed with which it can execute the program. Speed of the computer is affected by
UNIT-1
9 |Page
Hardware design Machine language instruction of the computer. Because the programs are usually written in high level language. Compiler, which translates high-level language into machine language. For best performance, it is necessary to design a complier, machine instruction set, and the hardware in a coordinated way. Consider a Time line diagram to describe how the operating system overlaps processing, disk transfers, and printing for several programs to make the best possible use of the resources available. The total time required to execute the program is t5 - t0. This is called elapsed time and it is the measure of the performance of the entire Computer system. It is affected by the speed of the processor, the disk and the printer. To discuss the performance of the processor we should only the periods during which the processor is active.
Elapsed time for the execution of the program depends on hardware involved in the execution of the program. This hardware includes processor and the memory which are usually connected by a BUS (As shown in the bus structure diagram.).
When the execution of the program starts, all program instructions and the required data are stored in the main memory. As execution proceeds,
UNIT-1
10 | P a g e
instructions are fetched from the main memory one by one by the processor, and a copy is placed in the cache. When execution of the instruction calls for the data located in the main memory, the data are fetched and a copy is placed in the cache. If the same instruction or data is needed later, it is read directly from the cache. The processor and a small cache memory are fabricated into a single IC chip. The speed of such chip is relatively faster than the speed at which instruction and data can be fetched from the main memory. A program can be executed faster if the movement of the instructions and data between the main memory and the processor is minimized, which is achieved by using the cache. To evaluate the performance, we can discuss about, Processor clock Basic performance equation Pipelining and Superscalar operation Clock Rate Instruction Set: CISC and RISC Compiler Performance Measurement Processor clock Processor circuits are controlled by timing signal called a clock. The clock defines regular time intervals, called clock cycle. To execute a machine instruction, the processor divides the action to be performed into sequence of basic steps, such that each step can be completed in one clock cycle. Length of one clock cycle is P and this parameter P affects processor performance. It is inversely proportional to clock rate R=1/P This is measured in cycles per second. Processors used in todays personal computers and workstations have clock rates from a few hundred millions to over a billion cycles per second is called hertz (Hz). The term million is denoted by the prefix Mega (M) and billion is denoted by prefix Giga (G). Hence, 500 million cycles per second is usually abbreviated to 500Mega Hertz (MHz).And 1250 million cycles per second is abbreviated to 1.25 Giga Hertz (GHz). The corresponding clock periods are 2 and 0.8 nano seconds (ns) respectively. Basic performance equation Let T be the time required for the processor to execute a program in high level language. The compiler generates machine language object program corresponding to the source program. Assume that complete execution of the program requires the execution of N machine language instructions. Assume that average number of basic steps needed to execute one machine instruction is S, where each basic step is completed in one clock cycle. If the clock rate is R cycles per second, the program execution time is given by T = (N x S) / R This is often called Basic performance equation. To achieve high performance, the performance parameter T should be reduced. T value can be reduced by reducing N and S, and increasing R. Value of N is reduced if the source program is compiled into fewer number of machine instructions.
UNIT-1
11 | P a g e
Value of S is reduced if instruction has a smaller no of basic steps to perform or if the execution of the instructions is overlapped. Value of R can be increased by using high frequency clock, ie. Time required to complete a basic execution step is reduced. N, S and R are dependent factors. Changing one may affect another. Pipelining and Superscalar operation Pipelining It is a technique of overlapping the execution of successive instructions. This technique improves performance. Consider the instruction Add R1, R2, R3 The above instruction adds the contents of registers R1 and R2, and places the sum to R3. The contents of R1 and R2 are first transferred to the inputs of the ALU. After addition is performed the result is transferred to register R3 from the processor. Here processor can read the next instruction to be executed while performing addition operation of the current instruction and while transferring the result of addition to ALU, the operands required for the next instruction can be transferred to the processor. This process of overlapping the instruction execution is called Pipelining. If all the instructions are overlapped to the maximum degree, the effective value of S is 1. It is impossible always. Individual instructions require several clock cycles to complete but for the purpose of computing T, effective value of S is 1. Superscalar operation A higher degree of concurrency can be achieved if multiple instruction pipelines are implemented in the processor. This means that multiple functional units are used, creating parallel paths through which different instruction can be executed in parallel. With such an arrangement, it becomes possible to start the execution of several instructions in every clock cycle. This mode of operation is called superscalar execution. So there is possibility of reducing the S value even less than 1. Parallel execution should preserve the logical correctness of programs. That is the result produced must be same as those produced by serial execution of program executions. Clock Rate There are two possibilities for increasing the clock rate, R. First, improving the integrated-circuit (IC) technology makes logic circuits faster, which reduces the time needed to complete a basic step. This allows the clock period, P, to be reduced and the clock rate, R, to be increased. Second, reducing the amount of processing done in one basic step also makes it possible to reduce the clock period, P. However, if the actions that have to be performed by an instruction remain the same, the number of basic steps needed may increase. Increases in the value of R by improvements in IC technology affect all aspects of the processor's operation equally with the exception of the time it takes to access the main memory. In the presence of a cache, the percentage of
UNIT-1
12 | P a g e
accesses to the main memory is small. Hence, much of the performance can be improved. The value of T will be reduced by the same factor as R is increased because S and N are not affected. Instruction Set: CISC and RISC CISC: Complex Instructional Set Computers RISC: Reduced Instructional Set Computers Simple instructions require a small number of basic steps to execute. Complex instructions involve a large number of steps. For a processor that has only simple instructions, a large number of instructions may be needed to perform a given programming task. This could lead to a large value for N and a small value for S. On the other hand, if individual instructions perform more complex operations, fewer instructions will be needed, leading to a lower value of N and a larger value of S. It is not obvious if one choice is better than the other. Processors with simple instructions are called Reduced Instruction Set Computers (RISC) and processors with more complex instructions are referred to as Complex Instruction Set Computers (CISC) The decision for choosing the instruction set is done with the use of pipelining. Because the effective value of S is close 1. Compiler A compiler translates a high-level language program into a sequence of machine instructions. To reduce N, we need to have a suitable machine instruction set and a compiler that makes good use of it. An optimizing compiler takes advantage of various features of the target processor to reduce the product N x S, which is the total number of clock cycles needed to execute a program. The number of cycles is dependent not only on the choice of instructions, but also on the order in which they appear in the program. The compiler may rearrange program instructions to achieve better performance without changing the logic of the program. Complier and processor must be closely linked in their architecture. They should be designed at the same time. Performance Measurement The computer community adopted the idea of measuring computer performance using benchmark programs. To make comparisons possible, standardized programs must be used. The performance measure is the time it takes a computer to execute a given benchmark program. A non profit organization called System Performance Evaluation Corporation (SPEC). Running time on the reference computer SPEC rating = -----------------------------------------------Running time on the computer under test The test is repeated for all the programs in the SPEC suite, and the geometric means of the results are computed. Let SPECi be the rating for program i in the suite.
UNIT-1
13 | P a g e
The overall SPEC rating for the computer is given by
Where n is the number of programs in the suite. 5. INSTRUCTION AND INSTRUCTION SEQUENCING A computer must have instruction capable of performing four types of basic operations such as Data transfer between the memory and the processor registers. Arithmetic and logic operation on data Program sequencing and control I/O transfers To understand the first two types of instruction, we need to know some notations.. Register Transfer Notation (RTN) Data transfer can be represented by standard notations given below..Processor registers are represented by notations R0, R1, R2Address of the memory locations are represented by names such as LOC, PLACE, MEM etc..I/O registers are represented by names such as DATAIN, DATAOUT. The content of memory locations are denoted by placing square bracket around the name of the register. Example 1: R1 [LOC] This expression states that the contents of memory location LOC are transferred into the processor register R1. Example 2: R3 [R1] + [R2] This expression states that the contents of processor registers R1 and R2 are added and the result is stored into the processor register R3. This type of notation is known as Register Transfer Notation (RTN). Note: that the right-hand of an RTN expression always denotes a value, and lefthand side is name of a location where the value is to be placed, overwriting the old contents of that location. Assembly Language Notation To represent machine instructions, assembly language uses statements as shown below To transfer the data from memory location LOC to processor register R1 Move LOC, R1 To add two numbers in register R1 and R2 and to place their sum in register R3 ADD R1, R2, R3 BASIC INSTRUCTION TYPES
UNIT-1
14 | P a g e
The operation of addition of two numbers is a fundamental capability in any computer. The statement C= A + B In a high-level language program is a command to the computer to add the current values of the two variables called A and B, and to assign the sum to a third variable, C. When the program containing this statement is compiled, the three variables, A,B,C are assigned to distinct location in the memory. Hence the above high-level language statement requires the action C [A] + [B] To take place in the computer. Here [A] and [B] represents contents of A and B respectively. To carry out this action, the contents of memory locations A and B are fetched from the memory and transferred into the processor where their sum is computed. This result is then sent back to the memory and stored in location C. Performing a basic instruction is represented in many ways: They are 3-address instruction 2 -address instruction 1-address instruction 0-address instruction Let us first assume that this action is to be accomplished by a single machine Instruction. Furthermore, assume that this instruction contains the memory addresses of the three operands - A, B, and C. This three-address instruction can be represented symbolically as Add A, B, C Operands A and B are called the source operands, C is called the destination operand, and Add is the operation to be performed on the operands. A general instruction of this type has the format Operation Source1, Source2, Destination If k bits are needed to specify the memory address of each operand, the encoded form of the above instruction must contain 3k bits for addressing purposes in addition to the bits needed to denote the Add operation. For a modern processor with a 32-bit address space, a 3-address instruction is too large to fit in one word for a reasonable word length. Thus, a format that allows multiple words to be used for a single instruction would be needed to represent an instruction of this type. An alternative approach is to use a sequence of simpler instructions to perform the same task, with each instruction having only one or two operands. Suppose hat two-address instructions of the form are available. Operation Source, Destination An Add instruction of this type is Add A,B which performs the operation B [A] + [B]. When the sum is calculated, the result is sent to the memory and stored in location B, replacing the original contents of this location. This means that operand B is both a source and a destination. A single two-address instruction cannot be used to solve our original problem, which is to add the contents of locations A and B, without destroying either of them, and to place the sum in location C. The problem can be solved by using
UNIT-1
15 | P a g e
another two address instruction that copies the contents of one memory location into another. Such an instruction is Move B,C Which performs the operation C [B], leaving the contents of location B unchanged. The word "Move" is a misnomer here; it should be "Copy." However, this instruction name is deeply entrenched in computer nomenclature. The operation C [A] + [B] can now be performed by the two-instruction sequence Move B,C Add A,C In all the instructions given above, the source operands are specified first, followed by the destination. This order is used in the assembly language expressions for machine instructions in many computers. But there are also many computers in which the order of the source and destination operands is reversed. It is unfortunate that no single convention has been adopted by all manufacturers. In fact, even for a particular computer, its assembly language may use a different order for different instructions. We have defined three- and twoaddress instructions. But, even two-address instructions will not normally fit into one word for usual word lengths and address sizes. Another possibility is to have machine instructions that specify only one memory operand. When a second operand is needed, as in the case of an Add instruction, it is understood implicitly to be in a unique location. A processor register, usually called the accumulator, may be used for this purpose. Thus, the one-address instruction Add A means the following: Add the contents of memory location A to the contents of the accumulator register and place the sum back into the accumulator. Let us also introduce the one-address instructions Load A and Store A The Load instruction copies the contents of memory location A into the accumulator, and the Store instruction copies the contents of the accumulator into memory location A. Using only one-address instructions, the operation C ( [A] + [B] can be performed by executing the sequence of instructions Load A Add B Store C Note that the operand specified in the instruction may be a source or a destination, depending on the instruction. In the Load instruction, address A specifies the source operand, and the destination location, the accumulator, is implied. On the other hand, C denotes the destination location in the Store instruction, whereas the source, the accumulator, is implied. Some early computers were designed around a single accumulator structure. Most modern computers have a number of general-purpose processor registers typically 8 to 32, and even considerably more in some cases. Access to data in these registers is much faster than to data stored in memory locations because the registers are inside the processor. Because the number of registers is relatively small, only a few bits are needed to specify
UNIT-1
16 | P a g e
which register takes part in an operation. For example, for 32 registers, only 5 bits are needed. This is much less than the number of bits needed to give the address of a location in the memory. Because the use of registers allows faster processing and results in shorter instructions, registers are used to store data temporarily in the processor during processing. Let Ri represent a general-purpose register. The instructions Load A,Ri Store Ri,A and Add A,Ri are generalizations of the Load, Store, and Add instructions for the singleaccumulator case, in which register Ri performs the function of the accumulator. Even in these cases, when only one memory address is directly specified in an instruction, the instruction may not fit into one word. When a processor has several general-purpose registers, many instructions involve only operands that are in the registers. In fact, in many modem processors, computations can be performed directly only on data held in processor registers. Instructions such as Add Ri,Rj or Add Ri,Rj,Rk are of this type. In both of these instructions, the source operands are the contents of registers Ri and Rj. In the first instruction, Rj also serves as the destination register, whereas in the second instruction, a third register, Rk, is used as the destination. Such instructions, where only register names are contained in the instruction, will normally fit into one word. It is often necessary to transfer data between different locations. This is achieved with the instruction Move Source, Destination which places a copy of the contents of Source into Destination. When data are moved to or from a processor register, the Move instruction can be used rather than the Load or Store instructions because the order of the source and destination operands determines which operation is intended. Thus, Move A,Ri is the same as Load A,Ri And Move Ri,A is the same as Store Ri ,A In processors where arithmetic operations are allowed only on operands that are in processor registers, the C = A + B task can be performed by the instruction sequence Move A,Ri Move B,Rj Add Ri ,Rj Move Rj ,C In processors where one operand may be in the memory but the other must be in a register, an instruction sequence for the required task would be Move A,Ri Add B,Ri Move Ri,C
UNIT-1
17 | P a g e
The speed with which a given task is carried out depends on the time it takes to transfer instructions from memory into the processor and to access the operands referenced by these instructions. Transfers that involve the memory are much slower than transfers within the processor. Hence, a substantial increase in speed is achieved when several operations are performed in succession on data in processor registers without the need to copy data to or from the memory. When machine language programs are generated by compilers from highlevel languages, it is important to minimize the frequency with which data is moved back and forth between the memory and processor registers. We used the task C [A] + [B] as an example instruction format. The diagram shows a possible program segment for this task as it appears in the memory of a computer. We have assumed that the computer allows one memory operand per instruction and has a number of processor registers. We assume that the word length is 32 bits and the memory is byte addressable. The three instructions of the program are in successive word locations, starting at location i. Since each instruction is 4 bytes long, the second and third instructions start at addresses i + 4 and i + 8. For simplicity, we also assume that a full memory address can be directly specified in a single-word instruction, although this is not usually possible for address space sizes and word lengths of current processors.
Execution steps of an above program: The processor contains a register called the program counter (PC), which holds the address of the instruction to be executed next. To begin executing a program, the address of its first instruction (i in our example) must be placed into the PC. Then, the processor control circuits use the information in the PC to fetch and
UNIT-1
18 | P a g e
execute instructions, one at a time, in the order of increasing addresses. This is called straight-line sequencing. During the execution of each instruction, the PC is incremented by 4 to point to the next instruction. Thus, after the Move instruction at location i + 8 is executed, the PC contains the value i + 12, which is the address of the first instruction of the next program Segment. Executing a given instruction is a two-phase procedure. In the first phase, called instruction fetch, the instruction is fetched from the memory location whose address is in the PC. This instruction is placed in the instruction register (IR) in the processor. At the start of the second phase, called instruction execute, the instruction in IR is examined to determine which operation is to be performed. The specified operation is then performed by the processor. This often involves fetching operands from the memory or from processor registers, performing an arithmetic or logic operation, and storing the result in the destination location. At some point during this two-phase procedure, the contents of the PC are advanced to point to the next instruction. When the execute phase of an instruction is completed, the PC contains the address of the next instruction, and a new instruction fetch phase can begin. In most processors, the execute phase itself is divided into a small number of distinct phases corresponding to fetching operands, performing the operation, and storing the result. BRANCHING Consider the task of adding a list of n numbers. The addresses of the memory locations containing the n numbers are symbolically given as NUM1, NUM2, . . . , NUMn, and a separate Add instruction is used to add each number to the contents of register R0. After all the numbers have been added, the result is placed in memory location SUM. Instead of using a long list of Add instructions, it is possible to place a single Add instruction in a program loop. The loop is a straight-line sequence of instructions executed as many times as needed. It starts at location LOOP and ends at the instruction Branch>0. During each pass through this loop, the address of the next list entry is determined, and that entry is fetched and added to R0. Now, we concentrate on how to create and control a program loop. Assume that the number of entries in the list, n, is stored in memory location N. Register R1 is used as a counter to determine the number of times the loop is executed. Hence, the contents of location N are loaded into register R1 at the beginning of the program. Then, within the body of the loop, the instruction Decrement R1 reduces the contents of R1 by 1 each time through the loop. (A similar type of operation is performed by an Increment instruction, which adds 1 to its operand.) Execution of the oop is repeated as long as the result of the decrement operation is greater than zero
UNIT-1
19 | P a g e
We now introduce branch instructions. This type of instruction loads a new value into the program counter. As a result, the processor fetches and executes the instruction at this new address, called the branch target, instead of the instruction at the location that follows the branch instruction in sequential address order. A conditional branch instruction causes a branch only if a specified condition is satisfied. If the condition is not satisfied, the PC is incremented in the normal way, and the next instruction in sequential address order is fetched and executed. In the above program, the instruction, Branch>0 LOOP (branch if greater than 0) is a conditional branch instruction that causes a branch to location LOOP if the result of the immediately preceding instruction, which is the decremented value in register R1, is greater than zero. This means that the loop is repeated as long as there are entries in the list that are yet to be added to R0. At the end of the nth pass through the loop, the Decrement instruction produces a value of zero, and, hence, branching does not occur. Instead, the Move instruction is fetched and executed. It moves the final result from R0 into memory location SUM. The capability to test conditions and subsequently choose one of a set of alternative ways to continue computation has many more applications than just loop control. Such a capability is found in the instruction sets of all computers and is fundamental to the programming of most nontrivial tasks. CONDITION CODES The processor keeps track of information about the results of various operations for use by subsequent conditional branch instructions. This is accomplished by recording the required information in individual bits, often called condition code flags. These flags are usually grouped together in a special processor register called the condition code register or status register. Individual condition code flags are set to 1 or cleared to 0, depending on the outcome of the operation performed.
UNIT-1
20 | P a g e
Four commonly used flags are
The N and Z flags indicate whether the result of an arithmetic or logic operation is negative or zero. The N and Z flags may also be affected by instructions that transfer data, such as Move, Load, or Store. This makes it possible for a later conditional branch instruction to cause a branch based on the sign and value of the operand that was moved. Some computers also provide a special Test instruction that examines a value in a register or in the memory and sets or clears the N and Z flags accordingly. The V flag indicates whether overflow has taken place. Overflow occurs when the result of an arithmetic operation is outside the range of values that can be represented by the number of bits available for the operands. The processor sets the V flag to allow the programmer to test whether overflow has occurred and branch to an appropriate routine that corrects the problem. Instructions such as Branch If Overflow are provided for this purpose. A program interrupt may occur automatically as a result of the V bit being set, and the operating system will resolve what to do. The C flag is set to 1 if a carry occurs from the most significant bit position during an arithmetic operation. This flag makes it possible to perform arithmetic operations on operands that are longer than the word length of the processor. Such operations are used in multiple-precision arithmetic. The instruction Branch>0, an example of a branch instruction that tests one or more of the condition flags. It causes a branch if the value tested is neither negative nor equal to zero. That is, the branch is taken if neither N nor Z is 1. Many other conditional branch instructions are provided to enable a variety of conditions to be tested. The conditions are given as logic expressions involving the condition code flags. In some computers, the condition code flags are affected automatically by instructions that perform arithmetic or logic operations. However, this is not always the case.
UNIT-1
21 | P a g e
A number of computers have two versions of an Add instruction, for example. One version, Add, does not affect the flags, but a second version, Add Set CC, does. This provides the programmerand the compilerwith more flexibility when preparing programs for pipelined execution. GENERATING MEMORY ADDRESSES The purpose of the instruction block at LOOP is to add a different number from the list during each pass through the loop. Hence, the Add instruction in that block must refer to a different address during each pass. How are the addresses to be specified? The memory operand address cannot be given directly in a single Add instruction in the loop. Otherwise, it would need to be modified on each pass through the loop. As one possibility, suppose that a processor register, Ri, is used to hold the memory address of an operand. If it is initially loaded with the address NUM1 before the loop is entered and is then incremented by 4 on each pass through the loop, it can provide the needed capability. This situation, and many others like it, gives rise to the need for flexible ways to specify the address of an operand. The instruction set of a computer typically provides a number of such methods, called addressing modes. While the details differ from one computer to another, the underlying concepts are the same. 6. HARDWARE The traffic-light controller is a very simple special-purpose computer system requiring only a few of the physical hardware components that constitute a general purpose computer system. The four major hardware blocks of a general purpose computer system are its memory unit (MU), arithmetic and logic unit (ALU), input=output unit (IOU), and control unit (CU). Input=output (I / O) devices input and output data into and out of the memory unit. In some systems, I / O devices send and receive data into and from the ALU rather than the MU. Programs reside in the memory unit. The ALU processes the data taken from the memory unit (or the ALU) and stores the processed data back in the memory unit (or the ALU). The control unit coordinates the activities of the other three units. It retrieves instructions from programs resident in the MU, decodes these instructions, and directs the ALU to perform corresponding processing steps. It also oversees I / O operations. A keyboard and a mouse are the most common input devices nowadays
UNIT-1
22 | P a g e
A video display and are the most common output devices. Scanners are used to input data from hardcopy sources. Magnetic tapes and disks are used as I / O devices. These devices are also used as memory devices to increase the capacity of the MU. The console is a special-purpose I/O device that permits the system operator to interact with the computer system. In modern-day computer systems, the console is typically a dedicated terminal. 7. SOFTWARE The hardware components of a computer system are electronic devices in which the basic unit of information is either a 0 or a 1, corresponding to two states of an electronic signal. For instance, in one of the popular hardware technologies a 0 is represented by 0V while a 1 is represented by 5 V. Programs and data must therefore be expressed using this binary alphabet consisting of 0 and 1. Programs written using only these binary digits are machine language programs. At this level of programming, operations such as ADD and SUBTRACT are each represented by a unique pattern of 0s and 1s, and the computer hardware is designed to interpret these sequences. Programming at this level is tedious since the programmer has to work with sequences of 0s and 1s and needs to have very detailed knowledge of the computer structure. The tedium of machine language programming is partially alleviated by using symbols such as ADD and SUB rather than patterns of 0s and 1s for these operations. Programming at the symbolic level is called assembly language programming. An assembly language programmer also is required to have a detailed knowledge of the machine structure, because the operations permitted in the assembly language are primitive and the instruction format and capabilities depend on the hardware organization of the machine. An assembler program is used to translate assembly language programs into machine language. Use of high-level programming languages such as FORTRAN, COBOL, C, and JAVA further reduces the requirement of an intimate knowledge of the machine organization. A compiler program is needed to translate a high-level language program into the machine language. A separate compiler is needed for each high-level language used in programming the computer system. Note that the assembler
UNIT-1
23 | P a g e
and the compiler are also programs written in one of those languages and can translate an assembly or high-level language program, respectively, into the machine language. The below figure shows the sequence of operations that occurs once a program is developed. A program written in either the assembly language or a high-level language is called a source program. An assembly language source program is translated by the assembler into the machine language program. This machine language program is the object code. A compiler converts a high-level language source into object code. The object code ordinarily resides on an intermediate device such as a magnetic disk or tape. A loader program loads the object code from the intermediate device into the memory unit. The data required by the program will be either available in the memory or supplied by an input device during the execution of the program. The effect of program execution is the production of processed data or results.
System Operations such as selecting the appropriate compiler for translating the source into object code; loading the object code into the memory unit; and starting, stopping, and accounting for the computer system usage are automatically done by the system. A set of supervisory programs that permit such automatic operation is usually provided by the computer system manufacturer. This set, called the operating system, receives the information it needs through a set of command language statements from the user and manages the overall operation of the computer system. Operating system and other utility programs used in the system may reside in a memory block that is typically read only. Special devices are needed to write these programs into read-only memory. Such programs and commonly used data are termed firmware. The below Figure is a simple rendering of the complete hardware software environment of a generalpurpose computer system.
UNIT-1
24 | P a g e
F igure: Hardware and software components. Definition Software is a collection of program written to solve the problem using computer. Software is two types. System software Applications software
System software perform the following functions Receiving and interpreting user commands. Entering and editing application programs and storing them as files in secondary storage devices. Eg., Text editors Managing the storage and retrieval of files in secondary storage devices. Running standard application program such as word processor or spreadsheet, with data supplied by the user. Controlling I/O units to receive input and produce output. Translating source program into object program. Eg., Compiler.
UNIT-1
25 | P a g e
Linking and running user written programs. Compiler Compiler is a system software that translating high-level language program (source program) such as C, C++ into machine language program (object program). Text editor It is used for entering and editing application programs. The user can use the commands that allow statements of a source program and saved as a file in secondary storage memory. A file can be referred to by a name chosen by the user. Operating System Operating system is a large program with a collection of routines. It is used to control the sharing of and interaction among various computers units as they execute application programs. Other tasks of OS are, To assign memory and magnetic disk space to program and data files. To move data between memory and disk units To handle I/O operations Steps involved in running an application program 1) Transfer the program to be executed from secondary storage into main memory. 2) Start executing the program. 3) Read the required data for program from memory and perform the specified computation on the data. 4) Print the result. Role of operating system in running the program: 1. When the executing program requires some data from the memory then it sends request to operating system. The operating system fetches the requested data and passes the control back to program which then proceed to perform the required computation. 2. When the computation is completed and the results are ready to be printed, the program again sends a request to the operating system. An OS routine makes the printer to print the result. The below time line diagram illustrates the sharing of the processor execution time. In this diagram during time period t0 to t1, OS initiates loading the application program from disk to main memory wait unit loading and then passes execution control to the application program. Same activity occurs during period t2 to t3 and period t4 to t5.During t1 to t2 and t3 to t4 processor performs actual execution of program.
UNIT-1
26 | P a g e
From t4 to t5 OS transfers the file from main memory to printer to print the result. During this period processor is free which can execute next program till printing is completed. Thus operating system manages the concurrent execution of several programs to make the best possible use of computer resources and it is called multiprogramming or multitasking. MEMORY LOCATIONS AND ADDRESS Computer memory consists of millions of storage cells. Each cell can store a bit (0 or 1) of information. Usually n bits are grouped, so that such group of bits can be stored and retrieved in a single basic operation. Each group of n bits is called a word of information, and n is called word length. Thus memory of a computer can be schematically represented as a collection of words.
Figure: Memory words Characteristics of word length: World length of modern computers ranges from 16 to 64 bits.
UNIT-1
27 | P a g e
If word length of a computer is 32 bits, then a single word can store 32 bit 2s Complement number or four ASCII characters, each occupying 8 bits (a unit of 8 bits called a byte). Machine instructions may require one or more words for their representation. The format for encoding the machine instructions into memory word
Address and name representations to store an information: Accessing the memory to store or retrieve a single item of information, either a word or a byte, requires distinct names or addresses for each item location. Normally numbers will be represented from 0 through 2k - 1, for some suitable value of k, as the addresses of successive locations in the memory. The 2k addresses constitutes the address space of the computer, and the memory can have up to 2k addressable locations. For example: A 24-bit address generates an address space of 224(16,777,216) locations. This number is usually written as 16M (16 mega), where 1M is the number 220(1,048,576). A 32-bit address creates an address space of 232 or 4G (4 giga) locations, where 1G is 230. Other notational convention that are commonly used are K (kilo) for the number 210 (1,024), and T (tera) for the number 240. Byte Addressability A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits. It is impractical to assign distinct addresses tio individual bit locations in the memory. The practical assignment is to have successive addresses refer to successive byte locations in the memory. Byte addressable memory is one in which successive addresses refer to successive byte location in the memory.
UNIT-1
28 | P a g e
Thus each byte in a memory is addresses as 0,1,2 and if the word length of the machine is 32 bits, successive words are located at addresses 0, 4, 8., with each word consisting of four bytes. Big-Endian And Little Endian Assignments There are two ways of assigning byte addresses. They are Big-endian assignment Little-endian assignment Big-endian Big-endian is used when lower byte addresses are used for the more significant bytes (the left most bytes) of the word. Little-endian Little-endian is used when lower byte addresses are used for the less significant bytes (the rightmost bytes) of the word.
In both Big-endian and Little-endian assignments byte addresses 0,4,8,, are taken as the address of successive words (ie, word length is 4 bytes) in the memory and are the addresses are used when specifying memory read and write operations for words. Word alignment There are two kind of address Aligned address Unaligned address Aligned address Words are said to be aligned in memory if they begin at a byte address that is a multiple of number of bytes in a word. For example,
UNIT-1
29 | P a g e
In a 32 bit (4 bytes) word length machine, number of bytes in a word is 4. In this case words are said to have aligned address, if words begin at address 0, 4, 8, 16 i.e., multiple of number of byte in a word. Similarly if word length is 16 (2 bytes), aligned words begin at byte addresses 0,2,4 Unaligned address Words are said to be unaligned in memory if they do not begin at a byte address that is a multiple of number of bytes in a word. Accessing numbers, characters, and character strings A number occupies one word. It can be accessed in the memory by specifying its word address. A Character occupies one Byte. It can be accessed in the memory by specifying its Byte address. Accessing Strings The beginning of the string is indicated by giving the byte address of its first character. Successive byte locations contain successive characters of a string. There are two ways to indicate the length of the string A special control character with the meaning end of string can be used as the last character in the string. A separate memory word location or processor register can contain a number indicating the length of the string in bytes. MEMORY OPERATIONS To execute an instruction, the processor control circuits must cause the word containing the instruction to be transferred from the memory to the processor. Operands and results must also be moved between the memory and the processor. Thus, two basic memory operations are needed, they are Load Store Load o The load operation transfers a copy of the content of a specific memory location to the processor. o To start a load operation, the processor sends the address of the desired location to the memory. o The memory reads the data stored at that address and sends them to the processor.
UNIT-1
30 | P a g e
Store The store operation transfers an item of information from the processor register to a specific memory location. The processor sends the address of the desired memory location to the memory, together with the data to be written into that location.
An information item of either one word or one byte can be transferred between the processor and the memory in a single operation. Processor register can hold one word of information at a time. INSTRUCTION SET ARCHITECTURE Interface between the high level language and the machine language It has the following parts: Instruction set Addressing modes Instruction formats Instruction representation Instructions Logical instructions AND, OR, XOR, Shift Arithmetic instructions Data types Integers: Unsigned, Signed, Byte, Short, Long
UNIT-1
31 | P a g e
Real numbers: Singleprecision (float), Doubleprecision (double) Operations Addition, Subtraction, Multiplication, Division Data transfer instructions Register transfer: Move Memory transfer: Load, Store I/O transfer: In, Out Control transfer instructions Unconditional branch Conditional branch Procedure call Return Addressing modes Specification of operands in instructions Different addressing modes: Register direct: Value of operand in a register Register indirect: Address of operand in a register Immediate: Value of operand Memory direct: Address of operand Indexed: Base register, Index register Relative: Base register, Displacement Indexed relative: Base register, Index register, Instruction formats 3operand instructions ADD op1, op2, op3; op1 op2 + op3 2operand instructions ADD op1, op2; op1 op1 + op2 1operand instructions INC op1; op1 op1 + 1 Types of operands: Register operands Memory operands specified using addressing modes Effect of instruction format: Instruction length Number of instructions for a program Complexity of instruction decoding (Control unit) Complex Instruction Set Computer (CISC) processors: 2operand instructions and 1operand instructions Any instruction can use memory operands, Many addressing modes, Complex instruction formats: Varying length instructions , Micro programmed control unit. Reduced Instruction Set Computer (RISC) processors: 3operand instructions, 2operand instructions, and 1operand instructions.
UNIT-1
32 | P a g e
Architecture (LSA) processors: Only memory transfer instructions (Load and Store) can use memory operands. All other instructions can use register operands only. A few addressing modes. Simple instruction formats: Fixed length instructions. Hardwired control unit. ADDRESSING MODES In general, a program operates on data that reside in the computers memory. These data can be organized in a variety of ways. If we want to keep track of students names, we can write them in a list. If we want to associate information with each name, for example to record telephone numbers or marks in various courses, we may organize this information in the form of a table. Programmers use organizations called data structures to represent the data used in computations. These include lists, linked lists, arrays, queues, and so on. Programs are normally written in a high-level language, which enables the programmer to use constants, local and global variables, pointers, and arrays. When translating a high-level language program into assembly language, the compiler must be able to implement these constructs using the facilities provided in the instruction set of the computer in which the program will be run. The different ways in which the location of an operand is specified in an instruction are referred to as addressing modes.
UNIT-1
33 | P a g e
1. IMPLEMENTATION OF VARIABLES AND CONSTANTS Variables and constants are the simplest data types and are found in almost every Computer program. In assembly language, a variable is represented by allocating a register or a memory location to hold its value. Thus, the value can be changed as needed using appropriate instructions. We accessed an operand by specifying the name of the register or the address of the memory location where the operand is located. Register mode The operand is the contents of a processor register; the name (address) of the register is given in the instruction. It is used to access the variables in the program. Absolute mode The operand is in a memory location; the address of this location is given explicitly in the instruction. It is also called as Direct mode. It also used to access the variables in the program. Example instruction for register and absolute mode Move LOC, R2 Uses the register and absolute modes. The processor registers are used as temporary storage locations where the data in a register are accessed using the Register mode. The Absolute mode can represent global variables in a program. A declaration such as Integer A, B; In a high-level language program will cause the compiler to allocate a memory location to each of the variables A and B. Absolute mode can be used to access the variables in the program. Immediate mode Address and data constants can be represented in assembly language using the Immediate mode. The operand is given explicitly in the instruction. For example, the instruction Move 200immediate, R0 places the value 200 in register R0. Clearly, the Immediate mode is only used to specify the value of a source operand. Using a subscript to denote the Immediate mode is not appropriate in assembly languages. A common convention is to use the sharp sign (#) in front of the value to indicate that this value is to be used as an immediate operand. Hence, we write the instruction above in the form Move #200, R0 Constant values are used frequently in high-level language programs. For example, the statement A=B+6 contains the constant 6. Assuming that A and B have been declared earlier as variables and may be accessed using the Absolute mode, this statement may be compiled as follows: Move B, R1 Add #6, R1 Move R1, A
UNIT-1
34 | P a g e
Constants are also used in assembly language to increment a counter, test for some bit pattern, and so on. INDIRECTION AND POINTERS In the addressing modes that follow, the instruction does not give the operand or its address explicitly. Instead, it provides information from which the memory address of the operand can be determined. We refer to this address as the effective address (EA) of the operand. Indirect mode The effective address of the operand is the contents of a register or memory location whose address appears in the instruction. We denote indirection by placing the name of the register or the memory address given in the instruction in parentheses. To execute the Add instruction the processor uses the value B, which is in register R1, as the effective address of the operand. It requests a read operation from the memory to read the contents of location B. The value read is the desired operand, which the processor adds to the contents of register R0. Indirect addressing through a memory location is also possible. In this case, the processor first reads the contents of memory location A, then requests a
Second read operation using the value B as an address to obtain the operand. The register or memory location that contains the address of an
UNIT-1
35 | P a g e
operand is called a pointer. Consider the analogy of a treasure hunt: In the instructions for the hunt you may be told to go to a house at a given address. Instead of finding the treasure there, you find a note that gives you another address where you will find the treasure. By changing the note, the location of the treasure can be changed, but the instructions for the hunt remain the same. Changing the note is equivalent to changing the contents of a pointer in a computer program. For example, by changing the contents of register R1 or location A, the same Add instruction fetches different operands to add to register R0. Let us now return to the program for adding a list of numbers. Indirect addressing can be used to access successive numbers in the list, resulting in the program. Register R2 is used as a pointer to the numbers in the list, and the operands are accessed indirectly through R2. The initialization section of the program loads the counter value n from memory location N into R1 and uses the immediate addressing mode to place the address value NUM1, which is the address of the first number in the list, into R2. Then it clears R0 to 0. The first two instructions in the loop implement the unspecified instruction block starting at LOOP. The first time through the loop, the instruction Add (R2), R0 fetches the operand at location NUM1 and adds it to R0. The second Add instruction adds 4 to the contents of the pointer R2, so that it will contain the address value NUM2 when the above instruction is executed in the second pass through the loop. Consider the C-language statement A= *B; Where B is a pointer variable. This statement may be compiled into Move B, R1 Move (R1), A Using indirect addressing through memory, the same action can be achieved with Move (B), A Despite its apparent simplicity, indirect addressing through memory has proven to be of limited usefulness as an addressing mode, and it is seldom found in modern computers. An instruction that involves accessing the memory twice to get an operand is not well suited to pipelined execution. Indirect addressing through registers is used extensively. The program shows the flexibility it provides. Also, when absolute addressing is not available, indirect addressing through registers makes it possible to access global variables by first loading the operands address in a register. INDEXING AND ARRAYS It is useful in dealing with lists and arrays. Index mode The effective address of the operand is generated by adding a constant value to the contents of a register. The register used may be either a special register provided for this purpose, or, more commonly; it may be any one of a set of general-purpose registers in the processor. In either case, it is referred to as an index register. We indicate the Index mode symbolically as X (Ri) where X denotes the constant value contained in the instruction and Ri is the name of the register involved. The effective address of the operand is given by EA = X + [Ri
UNIT-1
36 | P a g e
].The contents of the index register are not changed in the process of generating the effective address. In an assembly language program, the constant X may be given either as an explicit number or as a symbolic name representing a numerical value. When the instruction is translated into machine code, the constant X is given as a part of the instruction and is usually represented by fewer bits than the word length of the computer. Since X is a signed integer, it must be sign extended to the register length before being added to the contents of the register. The index register, R1, contains the address of a memory location, and the value X defines an offset (also called a displacement) from this address to the location where the operand is found. An alternative use: Constant X corresponds to a memory address, and the contents of the index register define the offset to the operand. In either case, the effective address is the sum of two values; one is given explicitly in the instruction, and the other is stored in a register.
UNIT-1
37 | P a g e
Indexed addressing To see the usefulness of indexed addressing, consider a simple example involving a list of test scores for students taking a given course. Assume that the list of scores, beginning at location LIST. A four-word memory block comprises a record that stores the relevant information for each student. Each record consists of the students identification number (ID), followed by the scores the student earned on three tests. There are n students in the class, and the value n is stored in location N immediately in front of the list. The addresses given in the figure for the student IDs and test scores assume that the memory is byte addressable and that the word length is 32 bits. We should note that the list in n represents a two-dimensional array having n rows and four columns. Each row contains the entries for one student, and the columns give the IDs and test scores.
Fig: A list of students marks Suppose that we wish to compute the sum of all scores obtained on each of the tests and store these three sums in memory locations SUM1, SUM2, and SUM3. In the body of the loop, the program uses the Index addressing mode. To access each of the three scores in a students record, Register R0 is used as the index register. Before the loop is entered, R0 is set to point to the ID location of the first student record; thus, it contains the address LIST. On the first pass through the loop, test scores of the first student are added to the running sums held in registers R1, R2, and R3, which are initially cleared to 0. These scores are accessed using the Index addressing modes 4(R0), 8(R0), and 12(R0). The index register R0 is then incremented by 16 to point to the ID location of the second student. Register R4, initialized to contain the value n, is decremented by 1 at the end of each pass through the loop. When the contents of R4 reach 0, all student records have been accessed, and the loop terminates. Until then, the
UNIT-1
38 | P a g e
conditional branch instruction transfers control back to the start of the loop to process the next record. The last three instructions transfer the accumulated sums from registers R1, R2, and R3, into memory locations SUM1, SUM2, and SUM3, respectively. It should be emphasized that the contents of the index register, R0, are not changed when it is used in the Index addressing mode to access the scores. The contents of R0 are changed only by the last Add instruction in the loop, to move from one student record to the next. In general, the Index mode facilitates access to an operand whose location is defined relative to a reference point within the data structure in which the operand appears. In the example just given, the ID locations of successive student records are the reference points, and the test scores are the operands accessed by the Index addressing mode.
Fig: Indexed addressing used in accessing test scores in the list We have introduced the most basic form of indexed addressing. Several variations of this basic form provide for very efficient access to memory operands in practical programming situations. For example, a second register may be used to contain the offset X, in which case we can write the Index mode as (Ri,R j ) The effective address is the sum of the contents of registers Ri and Rj . The second register is usually called the base register. This form of indexed addressing provides more flexibility in accessing operands, because both components of the effective address can be changed. As an example of where this flexibility may be useful, consider again the student record data structure shown in Figure. In the above program, we used different index values in the three Add instructions at the beginning of the loop to access different test scores. Suppose each record contains a large number of items, many more than the three test scores of that example. In this case, we would need the ability to
UNIT-1
39 | P a g e
replace the three Add instructions with one instruction inside a second (nested) loop. Just as the successive starting locations of the records (the reference points) are maintained in the pointer register R0, offsets to the individual items relative to the contents of R0 could be maintained in another register. The contents of that register would be incremented in successive passes through the inner loop. Yet another version of the Index mode uses two registers plus a constant, which can be denoted as X(Ri,R j ) In this case, the effective address is the sum of the constant X and the contents of registers Ri and Rj . This added flexibility is useful in accessing multiple components inside each item in a record, where the beginning of an item is specified by the (Ri,R j ) part of the addressing mode. In other words, this mode implements a three-dimensional array.
RELATIVE ADDRESSING We have defined the Index mode using general-purpose processor registers. A useful version of this mode is obtained if the program counter, PC, is used instead of a general purpose register. Then, X(PC) can be used to address a memory location that is X bytes away from the location presently pointed to by the program counter. Since the addressed location is identified relative to the program counter, which always identifies the current execution point in a program, the name Relative mode is associated with this type of addressing. Relative mode The effective address is determined by the Index mode using the program counter in place of the general-purpose register Ri. This mode can be used to access data operands. But, its most common use is to specify the target address in branch instructions. An instruction such as Branch>0 LOOP causes program execution to go to the branch target location identified by the name LOOP if the branch condition is satisfied. This location can be computed by specifying it as an offset from the current value of the program counter. Since the branch target may be either before or after the branch instruction, the offset is given as a signed number. Recall that during the execution of an instruction, the processor increments the PC to point to the next instruction. Most computers use this updated value in computing the effective address in the Relative mode. For example, suppose that the Relative mode is used to generate the branch target address LOOP in the Branch instruction of the program in Figure 2.12. Assume that the four instructions of the loop body, starting at LOOP, are located at memory locations 1000, 1004, 1008, and 1012. Hence, the updated contents of the PC at the time the branch target address is generated will be 1016. To branch to location LOOP (1000), the offset value needed is X = -16. Assembly languages allow branch instructions to be written using labels to denote the branch target. When the assembler program processes such an instruction, it computes the required offset value, -16 in this case, and generates the corresponding machine instruction using the addressing mode -16(PC).
UNIT-1
40 | P a g e
ADDITIONAL MODES We have given a number of common versions of the Index mode, not all of which may be found in any one computer. Although these modes suffice for general computation, many computers provide additional modes intended to aid certain programming tasks. The two modes described next are useful for accessing data items in successive locations in the memory. Auto increment mode The effective address of the operand is the contents of a register specified in the instruction. After accessing the operand, the contents of this register are automatically incremented to point to the next item in a list. We denote the Auto increment mode by putting the specified register in parentheses, to show that the contents of the register are used as the effective address, followed by a plus sign to indicate that these contents are to be incremented after the operand is accessed. Thus, the Auto increment mode is written as (Ri )+ Implicitly, the increment amount is 1 when the mode is given in this form. But in a byte addressable memory, this mode would only be useful in accessing successive bytes of some list. To access successive words in a byte-addressable memory with a 32-bit word length, the increment must be 4. Computers that have the Auto increment mode automatically increment the contents of the register by a value that corresponds to the size of the accessed operand. Thus, the increment is 1 for byte-sized operands, 2 for 16- bit operands, and 4 for 32-bit operands. Since the size of the operand is usually specified as part of the operation code of an instruction, it is sufficient to indicate the Auto increment mode as (Ri)+. If the Auto increment mode is available, it can be used in the first Add instruction and the second Add instruction can be eliminated. The modified program is shown in below Fig. As a companion for the Auto increment mode, another useful mode accesses the items of a list in the reverse order. Auto decrement mode The contents of a register specified in the instruction are first automatically decremented and are then used as the effective address of the operand. We denote the Auto decrement mode by putting the specified register in parentheses, preceded by a minus sign to indicate that the contents of the register are to be decremented before being used as the effective address. Thus, we write -( Ri )
Fig: The Autoincrement addressing mode used in the program
UNIT-1
41 | P a g e
In this mode, operands are accessed in descending address order. The reader may wonder why the address is decremented before it is used in the Autodecrement mode and incremented after it is used in the Autoincrement mode. The actions performed by the Autoincrement and Autodecrement addressing modes can obviously be achieved by using two instructions, one to access the operand and the other to increment or decrement the register that contains the operand address. Combining the two operations in one instruction reduces the number of instructions needed to perform the desired task. RISC RISC stands for Reduced Instruction Set Computer . This research was further developed by the universities of Berkeley and Stanford to give basic architectural models.The IBM was the first company to define the RISC architecture in the 1970s RISC can be described as a philosophy with three basic levels : (i)All instruction will be executed in a single cycle (ii)Memory will only be accessed via load and store instruction. (iii)All executions units will be hardwired with no micro coding. The instruction set is the hardware language in which the software tells the processor what to do. The vacated area of chip can be used in ways that accelerate the performance of more commonly used instructions . It becomes easier to optimize the design . Basically the philosophy is, that instructions are handled in parts: Fetch the instruction Get the arguments Perform the action Write back the result which means : r0 = r1 + r2 ro=r1+r2 RISC CHARACTERISTICS Simple instruction set Same length instructions. 1 machine-cycle instructions CPU Register Overview
UNIT-1
42 | P a g e
UNIT-1
43 | P a g e
Its a 32 general purpose register. A program counter(PC) register. 2 registers that hold the results of integer multiply and divide operations( HI & LO). The R4000 has no Program Status Word(PSW) register, as such this is covered by the status and cause registers incorporated within the system control coprocessor(CP0). CPU Instruction Set Overview Each CPU instruction is 32-bits long. There are three instruction formats : immediate ( I - type) jump (J - type) register (r - type)
UNIT-1
44 | P a g e
Memory Management Unit (MMU) The MIPS R4000 processor provides a full-featured MMU which uses an on chip translation look aside buffer(TLB) to translate virtual addresses into physical addresses. System Control Co processor (CP0) CP0 translates virtual addresses into physical addresses and manages exceptions and transitions between kernel, supervisor, and user states.CP0 also controls the cache subsystem, as well as providing diagnostic control and error recovery facilities. Floating Point Unit (FPU), CP1 R4000 has on-chip floating point unit designated as CP1. The FPU extends the CPU instruction set to perform arithmetic operations on floating-point values. The FPU features include: Full 64-bit Operation. Load and store instruction set. Tightly coupled coprocessor Interface.
UNIT-1
45 | P a g e
CISC CISC, which stands for Complex Instruction Set Computer, is a philosophy for designing chips that are easy to program and which make efficient use of memory. Each instruction in a CISC instruction set might perform a series of operations inside the processor. This reduces the number of instructions required to implement a given program, and allows the programmer to learn a small but flexible set of instructions. Since the earliest machines were programmed in assembly language and memory was slow and expensive, the CISC philosophy made sense, and was commonly implemented in such large computers as the PDP-11 and the DEC system 10 and 20 machines. Most common microprocessor designs including the Intel(R) 80x86 and Motorola 68K series also follow the CISC philosophy. As we shall see, recent changes in software and hardware technology have forced a re-examination of CISC. But first, let's take a closer look at the decisions which led to CISC. CISC philosophy 1: Use Microcode The earliest processor designs used dedicated (hardwire) logic to decode and execute each instruction in the processor's instruction set. This worked well for simple designs with few registers, but made more complex architectures hard to build, as control path logic can be hard to implement. So, designers switched tactics they built some simple logic to control the data paths between the various elements of the processor, and used a simplified microcode instruction set to control the data path logic. This type of implementation is known as a micro programmed implementation. In a micro programmed system, the main processor has some built-in memory (typically ROM) which contains groups of microcode instructions which correspond with each machine-language instruction. When a machine language instruction arrives at the central processor, the processor executes the corresponding series of microcode
UNIT-1
46 | P a g e
instructions. Because instructions could be retrieved up to 10 times faster from a local ROM than from main memory, designers began to put as many instructions as possible into microcode. In fact, some processors could be ordered with custom microcode which would replace frequently used but slow routines in certain application. There are some real advantages to a micro coded implementation: Since the microcode memory can be much faster than main memory, an instruction set can be implemented in microcode without losing much speed over a purely hard-wired implementation. New chips are easier to implement and require fewer transistors than implementing the same instruction set with dedicated logic, and a micro programmed design can be modified to handle entirely new instruction sets quickly. Using micro coded instruction sets, the IBM 360 series was able to offer the same programming model across a range of different hardware configurations. Some machines were optimized for scientific computing, while others were optimized for business computing. However, since they all shared the same instruction set, programs could be moved from machine to machine without recompilation (but with a possible increase or decrease in performance depending on the underlying hardware.) This kind of flexibility and power made micro coding the preferred way to build new computers for quite some time. CISC philosophy 2: Build "rich" instruction sets One of the consequences of using a micro programmed design is that designers could build more functionality into each instruction. This not only cut down on the total number of instructions required to implement a program, and therefore made more efficient use of a slow main memory, but it also made the assembly-language programmer's life simpler. Soon, designers were enhancing their instruction sets with instructions aimed specifically at the assembly language programmer. Such enhancements included string manipulation operations, special looping constructs, and special addressing modes for indexing through tables in memory. For example: ABCD Add Decimal with Extend ADDA Add Address ADDX Add with Extend ASL Arithmetic Shift Left CAS Compare and Swap Operands NBCD Negate Decimal with Extend EORI Logical Exclusive OR Immediate TAS Test Operand and Set CISC philosophy 3: Build high-level instruction sets Once designers started building programmer-friendly instruction sets, the logical next step was to build instruction sets which map directly from high-level languages.Not only does this simplify the compiler writer's task, but it also allows compilers to emit fewer instructions per line of source code. Modern CISC microprocessors, such as the
UNIT-1
47 | P a g e
68000, implement several such instructions, including routines for creating and removing stack frames with a single call. For example: DBcc Test Condition, Decrement and Branch ROXL Rotate with Extend Left RTR Return and Restore Codes SBCD Subtract Decimal with Extend SWAP Swap register Words CMP2 Compare Register against Upper and Lower Bounds The rise of CISC : CISC Design Decisions: use microcode, build rich instruction sets, build high-level instruction sets taken together, these three decisions led to the CISC philosophy which drove all computer designs until the late 1980s, and is still in major use today. (Note that "CISC" didn't enter the computer designer's vocabulary until the advent of RISC it was simply the way that everybody designed computers). The next lesson discusses the common characteristics that all CISC designs share, and how those characteristics affect the operation of a CISC machine. Characteristics of a CISC design Introduction While the chips that emerged from the 1970s and 1980s followed their own unique design paths, most were bound by what we are calling the "CISC Design Decisions". These chips all have similar instruction sets, and similar hardware architectures. In general terms, the instruction sets are designed for the convenience of the assembly-language programmer and the hardware designs are fairly complex. Instruction sets The design constraints that led to the development of CISC (small amounts of slow memory, and the fact that most early machines were programmed in assembly language) give CISC instruction sets some common characteristics. A 2-operand format, where instructions have a source and a destination. For example, the add instruction "add #5, D0" would add the number 5 to the contents of register D0 and place the result in register D0.Register to register, register to memory, and memory to register commands. Multiple addressing modes for Memory, including specialized modes for indexing through arrays Variable length instructions where the length often varies according to the addressing. Hardware architectures Most CISC hardware architectures have several characteristics in common. Complex instruction-decoding logic, driven by the need for a single instruction to support multiple addressing modes. A small number of general purpose registers. This is the direct result of having instructions which can operate
UNIT-1
48 | P a g e
directly on memory and the limited amount of chip space not dedicated to instruction decoding, execution, and microcode storage. Several special purpose registers. Many CISC designs set aside special registers for the stack pointer, interrupt handling, and so on. This can simplify the hardware design somewhat, at the expense of making the instruction set more complex. A "Condition code" register which is set as a side-effect of most instructions. This register reflects whether the result of the last operation is less than, equal to, or greater than zero, and records if certain error conditions occur. The ideal CISC machine CISC processors were designed to execute each instruction completely before beginning the next instruction. Even so, most processors break the execution of an instruction into several definite stages; as soon as one stage is finished, the processor passes the result to the next stage. An instruction is fetched from main memory. The instruction is decoded, the controlling code from the micro program identifies the type of operation to be performed, where to find the data on which to perform the operation, and where to put the result. If necessary, the processor reads in additional information from memory. The instruction is executed. the controlling code from the micro program determines the circuitry/hardware that will perform the operation. The results are written to memory. In an ideal CISC machine, each complete instruction would require only one clock cycle (which means that each stage would complete in a fraction of a cycle.) In fact, this is the maximum possible speed for a machine that executes 1 instruction at a time.
A realistic CISC machine In reality, some instructions may require more than one clock per stage, as the animation shows. However, a CISC design can tolerate this slowdown since the idea behind CISC is to keep the total number of cycles small by having complicated things happen within each cycle. Advantages of CISC At the time of their initial development, CISC machines used available technologies to optimize computer performance. Microprogramming is as easy as assembly language to implement, and much less expensive than hardwiring a control unit. The ease of microcoding new instructions allowed designers to make CISC machines upwardly compatible: a new computer could run the same programs as earlier computers because the new computer would contain a superset of the instructions of the earlier computers. As each instruction became more capable, fewer instructions could be used to implement a given task. This made more efficient use of the relatively slow main memory. Because microprogram instruction sets can be written to match the constructs of high-level languages, the compiler does not have to be as complicated. Disadvantages of CISC
UNIT-1
49 | P a g e
Earlier generations of a processor family generally were contained as a subset in every new version so instruction set & chip hardware become more complex with each generation of computers. So that as many instructions as possible could be stored in memory with the least possible wasted space, individual instructions could be of almost any length this means that different instructions will take different amounts of clock time to execute, slowing down the overall performance of the machine. Many specialized instructions aren't used frequently enough to justify their existence approximately 20% of the available instructions are used in a typical program. CISC instructions typically set the condition codes as a side effect of the instruction. Not only does setting the condition codes take time, but programmers have to remember to examine the condition code bits before a subsequent instruction changes them. ALU DESIGN An Arithmetic and Logic Unit (ALU) is a combinational circuit that performs logic and arithmetic micro-operations on a pair of n-bit operands (ex. A[3:0] and B[3:0]). The operations performed by an ALU are controlled by a set of functionselect inputs. In this lab you will design a 4-bit ALU with 3 function-select inputs: Mode M, Select S1 and S0 inputs. The mode input M selects between a Logic (M=0) and Arithmetic (M=1) operation. The functions performed by the ALU are specified in Table I.
Figure 1: Block diagram of the 4-bit ALU.
UNIT-1
50 | P a g e
When doing arithmetic, you need to decide how to represent negative numbers. As is commonly done in digital systems, negative numbers are represented in twos complement. This has a number of advantages over the sign and magnitude representation such as easy addition or subtraction of mixed positive and negative numbers. Also, the number zero has a unique representation in twos complement. The twos complement of a n-bit number N is defined as, 2n - N = (2n - 1 - N) + 1 The last representation gives us an easy way to find twos complement: take the bit wise complement of the number and add 1 to it. As an example, to represent the number -5, we take twos complement of 5(=0101) as follows, 5 0 1 0 1 --> 1 0 1 0 (bit wise complement) +1 _______ 1 0 1 1 (twos complement) Numbers represented in twos complement lie within the range -(2n-1) to +(2n-1 - 1). For a 4-bit number this means that the number is in the range -8 to +7. There is a potential problem we still need to be aware of when working with two's complement, i.e. over- and underflow as is illustrated in the example below,
Both calculations give the wrong results (-7 instead of +9 or +7 instead of -9) which is caused by the fact that the result +9 or -9 is out of the allowable range for a 4-bit twos complement number. Whenever the result is larger than +7 or smaller than -8 there is an overflow or underflow and the result of the addition or subtraction is wrong. Overflow and underflow can be easily detected when the carry out of the most significant stage (i.e. C4 ) is different from the carry out of the previous stage (i.e. C3). You can assume that the inputs A and B are in twos complement when they are presented to the input of the ALU.
UNIT-1
51 | P a g e
b. Design strategies When designing the ALU we will follow the principle "Divide and Conquer" in order to use a modular design that consists of smaller, more manageable blocks, some of which can be re-used. Instead of designing the 4-bit ALU as one circuit we will first design a one-bit ALU, also called a bitslice. These bit-slices can then be put together to make a 4-bit ALU. There are different ways to design a bit-slice of the ALU. One method consists of writing the truth table for the onebit ALU. This table has 6 inputs (M, S1, S0, C0, Ai and Bi) and two outputs Fi and Ci+1. This can be done but may be tedious when it has to be done by hand. An alternative way is to split the ALU into two modules, one Logic and one Arithmetic module. Designing each module separately will be easier than designing a bit-slice as one unit. A possible block diagram of the ALU is shown in Figure 2. It consists of three modules: 2:1 MUX, a Logic unit and an Arithmetic unit.
Figure 2: Block diagram of a bit-slice ALU C. Displaying the results In order the easily see the output of the ALU you will display the results on the seven-segment displays and the LEDs (LD). 1. The result of the logic operation can be displayed on the LEDs (LD). Use also one of these LEDs to display the overflow flag V. 2. Since you are working with a 4-bit representation for 2's complement numbers, the maximum positive number is +7 and the most negative number is 8. Thus a single seven-segment display can be used to show the magnitude of the number. Use another seven-segment display for the - sign (e.g. use segment g).
UNIT-1
52 | P a g e
3. There is one complication when using more than one of the seven segment displays on the Digilab board, as can be seens from the connections of the LED segments of the displays. You will notice that the four seven-segment displays share the same cathodes A, B, ..., G). This implies that one cannot directly connect the signals for the segments of the magnitude and sign to these terminals, since that would short the outputs of the gates which would damage the FPGA!. How could you solve this problem? Sketch a possible solution in your lab notebook. (Hint: You can alternate the signals applied to the cathodes between those of the Magnitude and Sign displays. If you do this faster than 30 times per second the eye will not notice the flickering. You will also need to alternate the anode signals). What type of circuit will be needed to accomplish this? You can make use of an on-chip clock, called OSC4 that provides clock signals of 8MHz, 500KHz, 590Hz and 15Hz. 4. Figure 3 shows a schematic of the overall system, consisting of the ALU, Decoder and Switching circuit, and Displays on the Digital lab board.
Figure 3: Overall system, including the 4-bit ALU and display units. d. Tasks: Do the following tasks prior to coming to the lab. Write the answers to all questions in your lab notebook prior to coming to the lab. There is no on-line submission for the pre-lab. Ask the TA to sign pre-lab section in your lab notebook at the start of the lab session. You will also need to include answer to the pre-lab questions in your lab report. 1. Design the MUX. You can choose to design the MUX with gates or by writing HDL (VHDL) code. Choose one of the two methods and write the design down in your lab notebook. 2. Design of the Logic unit. Here you also have several choices to design this unit: a. Write truth table, derive the K-map and give the minimum gate implementation b. Use a 4:1 MUX and gates c. Write an HDL file As part of the pre-lab, you can choose any of the three methods. Briefly justify why you chose a particular design method. Explain the design procedure
UNIT-1
53 | P a g e
and give the logic diagram or the HDL file. In case you use a MUX, you need also to give the schematic or the HDL file for the MUX. 3. Design the arithmetic unit. Again, here you have different choices to design and implement the arithmetic unit. A particularly attractive method is one that makes use of previously designed modules, such as your Full Adder. The arithmetic unit performs basically additions on a set of inputs. By choosing the proper inputs, one can perform a range of operations. This approach is shown in Figure 4. The only blocks that need to be designed are the A Logic and B Logic circuits. You can make use of your previously designed full adder (MYFA).
Figure 4: Schematic block diagram of the arithmetic unit. a. Give the truth tables for the Xi and Yi functions with as inputs S1, S0 and Ai, and S1, S0 and Bi, respectively. Fill out the following tables. Notice that in definition table I of the ALU, the variable C0 acts as the Carry input. Depending on the value of C0, one performs the function on the odd or even entries of the definition table I. As an example the first entry is "transfer A" (for C0=0) while the second one is "A+1" (for C0=1); Similarly for A + B and A + B + 1, etc.
b. Give the K-map for Xi and Yi functions. Find the minimum realization for Xi and Yi. c. Draw the logic diagram for Xi and Yi. d. Design the circuit that detects over- or underflow.
UNIT-1
54 | P a g e
4.Design the decoder for the seven-segment displays. Remember that the segments of the display are active-low. The decoders should be designed in such a way that when the Logic Mode (M=0) is selected, only the LEDs are active and when the Arithmetic Mode (M=1) is selected only the seven segment displays are active. FIXED AND FLOATING-POINT OPERATION Definition: An arithmetic operation performed on floating-point numbers; "this computer can perform a million flops per second".Floating point hardware was standard throughout the 7090/94 family. The 7090 had single precision (36-bit) floating point operations while the 7094/7094 II machines also provided double precision (72-bit) floating point instructions. The fraction was considered normalized if Bit-9 (or Bit-18in Double Precision) contained the first 1-bit of the fraction so that the floating point word was positioned to have no leading zero bits. The characteristic for single precision numbers consisted of eight bits (Bits 1-8) and defined the exponent of the number. Since the exponent could either be positive or negative, but the hardware sign bit was already allocated for the fraction, then the exponent was algebraically signed in socalled excess form where the characteristic was formed by subtracting +128 from the exponent (e.g., an exponent of +12 would be coded as 140 and -30 would be coded as 98). The allowable range for the single precision exponent was -128 (decimal) to +127 (decimal) which yielded a floating point range between approximately 10E-39 to 10E+39 (decimal). As example, single precision floating point 10.00 (decimal) was represented as 204500000000 (octal) which yielded a sign bit of 0; a characteristic of 204 (octal); and a mantissa of 500000000 (octal). The zero sign bit indicated an algebraically positive number; the 204 (octal) or 132 (decimal) characteristic indicated, after subtracting 128 (decimal), an exponent of 4; and the mantissa of 500000000 (octal) indicated a fraction of (2 ** -2) + (2 ** -3) or 0.63 (decimal). Therefore, the floating point number was (2 ** 4) * (0.63) or 10.00. Other floating point examples: 0.00390625 (decimal) was represented by 171400000000 (octal); 44.00 (decimal) was represented by 206510000000 (octal); and -20.00 (decimal) was represented by 605500000000 (octal). IEEE STANDARD 754 FLOATING POINT NUMBERS IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC's, Macintoshes, and most Unix platforms. This article gives a brief overview of IEEE floating point and its representation. Discussion of arithmetic implementation may be found in the book mentioned at the bottom of this article. Floating Point Numbers There are several ways to represent real numbers on computers. Fixed point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that represent portions of some unit. For example, one might represent 1/100ths of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. Another approach is to use rationals, and
UNIT-1
55 | P a g e
represent every number as the ratio of two integers. Floating-point representation - the most common solution basically represents reals in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 102. In hexadecimal, the number 123.abc might be represented as 1.23abc 162. Floating-point solves a number of representation problems. Fixed-point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision when two large numbers are divided. Floating-point, on the other hand, employs a sort of "sliding window" of precision appropriate to the scale of the number. This allows it to represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease. Storage Layout IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The mantissa is composed of the fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and need not be stored. The following figure shows the layout for single (32-bit) and double (64- bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):
The Sign Bit The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a negative number. Flipping the value of this bit flips the sign of the number. The Exponent The exponent field needs to represent both positive and negative exponents. To do this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or 73. For reasons discussed later, exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers. For double precision, the exponent field is 11 bits, and has a bias of 1023. The Mantissa The mantissa, also known as the significand, represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits. To find out the value of the implicit leading bit, consider that any number can be
UNIT-1
56 | P a g e
expressed in scientific notation in many different ways. For example, the number five can be represented as any of these: 5.00 100 0.05 102 5000 10-3 In order to maximize the quantity of representable numbers, floatingpoint numbers are typically stored in normalized form. This basically puts the radix point after the first non-zero digit. In normalized form, five is represented as 5.0 100. A nice little optimization is available to us in base two, since the only possible non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to represent it explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23 fraction bits. Putting it All Together 1. The sign bit is 0 for positive, 1 for negative. 2. The exponent's base is two. 3. The exponent field contains 127 plus the true exponent for single precision, or 1023 plus the true exponent for double precision. 4. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of fraction bits. Ranges of Floating-Point Numbers Let's consider single-precision floats for a second. Note that we're taking essentially a 32-bit number and re-jiggering the fields to cover a much broader range. Something has to give, and it's precision. For example, regular 32-bit integers, with all precision centered around zero, can precisely store integers with 32-bits of resolution. Single-precision floating point, on the other hand, is unable to match this resolution with its 24 bits. It does, however, approximate this value by effectively truncating from the lower end. For example: 11110000 11001100 10101010 00001111 // 32-bit integer = +1.1110000 11001100 10101010 x 231 // Single-Precision Float = 11110000 11001100 10101010 00000000 // Corresponding Value This approximates the 32-bit value, but doesn't yield an exact representation. On the other hand, besides the ability to represent fractional components (which integers lack completely), the floating-point value can represent numbers around 2127, compared to 32-bit integers maximum value around 232. The range of positive floating point numbers can be split into normalized numbers (which preserve the full precision of the mantissa), and denormalized numbers (discussed later) which use only a portion of the fractions's precision.
UNIT-1
57 | P a g e
Since the sign of floating point numbers is given by a special leading bit, the range for negative numbers is given by the negation of the above values. There are five distinct numerical ranges that single-precision floating-point numbers are not able to represent: 1. Negative numbers less than -(2-2-23) 2127 (negative overflow) 2. Negative numbers greater than -2-149 (negative underflow) 3. Zero 4. Positive numbers less than 2-149 (positive underflow) 5. Positive numbers greater than (2-2-23) 2127 (positive overflow) Overflow means that values have grown too large for the representation, much in the same way that you can overflow integers. Underflow is a less serious problem because is just denotes a loss of precision, which is guaranteed to be closely approximated by zero. Here's a table of the effective range (excluding infinite values) of IEEE floating-point numbers:
Note that the extreme values occur (regardless of sign) when the exponent is at the maximum value for finite numbers (2127 for single-precision, 21023 for double), and the mantissa is filled with 1s (including the normalizing 1 bit).
UNIT-1
58 | P a g e
Special Values IEEE reserves exponent field values of all 0s and all 1s to denote special values in the floating-point scheme. Zero As mentioned above, zero is not directly representable in the straight format, due to the assumption of a leading 1 (we'd need to specify a true zero mantissa to yield a value of zero). Zero is a special value denoted with an exponent field of zero and a fraction field of zero. Note that -0 and +0 are distinct values, though they both compare as equal. Denormalized If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted as zero), then the value is a denormalized number, which does not have an assumed leading 1 before the binary point. Thus, this represents a number (-1)s 0.f 2-126, where s is the sign bit and f is the fraction. For double precision, denormalized numbers are of the form (-1)s 0.f 2-1022. From this you can interpret zero as a special type of denormalized number. Infinity The values +infinity and -infinity are denoted with an exponent of all 1s and a fraction of all 0s. The sign bit distinguishes between negative infinity and positive infinity. Being able to denote infinity as a specific value is useful because it allows operations to continue past overflow situations. Operations with infinite values are well defined in IEEE floating point. Not A Number The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN). A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined. An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN's can be handy to assign to uninitialized variables to trap premature usage. Semantically, QNaN's denote indeterminate operations, while SNaN's denote invalid operations. Special Operations
UNIT-1
59 | P a g e
Operations on special numbers are well-defined by IEEE. In the simplest case, any operation with a NaN yields a NaN result. Other operations are as follows:
To sum up, the representation:
following
are
the
corresponding
values
for
given
UNIT-1
60 | P a g e

PART-A (2 MARKS)
1. Define Computer Architecture. Computer Architecture deals with the structure and behaviour of a computer including the information formats, the instruction sets and various techniques used for memory addressing. It can be defined as the functional operation of individual hardware units, the flow of information between them and control of these functions coherently and smoothly. 2. What are the functional units in a computer? 1. Input Unit 2. Output Unit 3. Memory Unit 4. Arithmetic and Logic Unit 5. Control Unit 3. Define CISC. A computer with a large number of instruction sets is called a Complex Instruction Set Computer or CISC for short, e.g. VAX of Digital Equipment Corporation (DEC), IBM System-370. 4. Define RISC. A computer with limited number of instruction sets is known as Reduced Instruction Set Computers or RISC for short, e.g. Alpha Processor of DEC and 68000 series of Motorola.
6. What are the two advantages of Load/Store Architecture? 1. Simple fixed length encodings and so simplifies decoding. 2. Similar number of clock cycles per instructions which simplify control.
UNIT-1
61 | P a g e

architectures more typical of RISC or CISC
7. Are load/store Architectures?
Load/Store architectures are more typical to CISC. 8. Assuming all other factors are equal, is a load/store architecture likely to have a higher or low CPI than memory-memory architecture? Since going to memory increases the number of cycles needed by the instructions, load/store architecture will have a Lower CPI. 9. Define Response Time. Response time is the time spent to complete an event or an operation. It is also referred to as the execution time or latency. 10. Define Throughput. Throughput is the amount of work done at a given time. That is, the amount of processing that can be accomplished during a given interval of time. 11. What is meant by performance of a system? The performance of the processor is measured by elapsed time which is the time spent from the start of execution to the completion of a program. 12. What is meant by clock. Define the terms Clock interval and Clock rate. Clock - The heart of the processor circuit is the timing signal that it generates called clock. Clock Interval - The clock defines regular time intervals called clock interval. Clock Rate- The length of one clock cycle is an important parameter in the processor performance. The inverse of clock length is called clack rate which is expressed in cycles per second.
13. Give the basic performance equation. The basic performance equation is given by T = (N* S) / R where, T - Performance parameter of an application program
UNIT-1
62 | P a g e
N - Number of machine language instruction required to complete the execution of a program S - Average number of basic steps required to complete the execution of a program R - Clock rate of the processor in cycles per second 14. Define Pipelining. Pipelining is a technique of splitting a sequential process into suboperations being executed in a dedicated segment that operates concurrently with all other segments. 15. How the clock rate can be increased? There are two ways by which the clock rates can be increased 1. Improved integrated circuits improve the logic circuit to be faster, thus reducing the time taken to complete a basic step. 2. Reducing the processing amount in one basic step helps to reduce the clock period t. 16. State Amdahl's Law. Amdahl's law states that performance improvement to be gained by using a faster mode of execution is limited by the fraction of time the faster mode can be used. Using this law, the performance gain that ran be obtained by improving some portion of the computer can be calculated using the following formula. Performance of entire task using the enhancement when possible Speedup = --------------------------------------------------------Performance for the entire task without using the enhancement 17. What are the types of benchmarks available for evaluating the performance? 1. Actual target workload 2. Real full program based benchmarks 3. Small kernel based benchmarks 4. Micro-benchmarks 18. What is SPEC?
UNIT-1
63 | P a g e
A milestone in performance evaluation was the formation of the System Performance Evaluation Cooperative (SPEC) group in 1998. SPEC consists of representatives from various computer related organizations such as Apollo, Hewlett-Packard, DEC, MIPS and SUN. 19. Define SPEC rating. SPEC rating is a measure of the combined effect of all the factors affecting the performance, including the compiler, the OS, the processor and the memory of the computer being tested, because the actual execution time is being measured. Running time on the reference computer SPEC rating = Running time of the computer under test 20. What is Speedup? Speedup is a measure of how fast a task will run using the machine with enhancement as opposed to the original machine without enhancement. Execution time for the entire task without using enhancement Speedup __________________________________________________________________
Execution tune for the entire task using the enhancement when possible 21. Define Cycles Per Instruction (CPI). A program consists of a number of CPU instructions represented by Instruction Count (IC). If the number of clock cycles and the instruction count are known, then CPI can be calculated as, CPI = CPU clock cycles for a program/Instruction Count 22. What are the various factors affecting the CPU time. 1. Clock cycles or Clock rate (C) 2. Clock cycles Per Instruction (CPI) 3. Instruction Count (IC) 23. What are the advantages of full application benchmarks? 1. Portable and widely used 2. Measurements useful in reality 24. What is Instruction Set and Instruction Set Architecture?
UNIT-1
64 | P a g e
The operation of a CPU is determined by the instruction it executes called machine instruction. The collection of such instructions is called Instruction Set of a particular CPU. The complete instruction set is commonly referred to as Instruction Set Architecture. 25. What are the different types of addressing modes used in the instruction set design? Immediate Addressing Direct Addressing Register Direct Addressing Register Indirect Addressing Displacement Addressing Relative Addressing Base Register Addressing Indexing 26. What is a good ISA? To design a good ISA two factors must be considered. 1. A good ISA should define a set of instructions that can be implemented efficiently both in the current and future technologies. This results in cost effective design over several generations. 2. A good ISA should provide a clean target for the complete code and it should be backward compatible. 27. What are the different types of data used in an instruction? 1. Addresses 2. Numbers 3. Characters (Alphanumeric) 4. Logical data (True or False situation) 28. What is subroutine or a called procedure? Procedure is an important innovation in the development of programming languages. It is a self contained program incorporated into a large program. It is sometimes called as subroutine.
UNIT-1
65 | P a g e
29. What are the advantages of called procedure? The two principal reasons for using subroutines are economy and modularity. The advantages of a called procedure are: 1. A procedure may be called from more than one location 2. A procedure call may appear inside another procedure, allowing nesting of procedures. 30. Classify the ISA according to Internal storage. The ISA can be classified, according to the storage they use for implementation of the instruction, as follows. 1. Using stack 2. Using accumulator 3. Using register and memory 4. Using two registers 31. What is addressing mode? The addressing mode specifies a rule for interpreting or translating the address field of the instruction into an effective address from where the operand is actually referenced. PART-B (16 MARKS) 1. Draw the single bus and three bus organization of the data path inside a processor. 2. What do you know about bit, bytes, nibbles and word? What are big-endian and little-endian assignments of addresses? 3. Write notes on Instruction formats. 4. Explain about Instruction & Instruction Sequencing? 5. (a) Explain in detail the different Instruction types and Instruction Sequencing. (OR) (b) Explain the different types of Addressing modes with suitable examples. 6. (a) Illustrate Booth Algorithm with an example. (OR)
UNIT-1
66 | P a g e
(b) Design a 4-bit Carry-Look ahead Adder and explain its operation with an example. 7. Explain various instruction formats in detail. 8. Explain the instruction cycle highlighting the sub-cycles and sequence of steps to be followed. 9. Explain about Addressing modes? 10. Registers R1 and R2 of a computer contains the decimal values 1200 and 2400 respectively. What is the effective address of the memory operand in each of the following instructions? i. Load 20(R1), R5 ii. Add (R2) , R5 iii. Move #3000, R5 iv. Sub (R1)+, R5 11. Explain how the processor is interfaced with the memory with a neat block diagram and explain how they communicate. 12. What are the disadvantages in using a ripple carry adder? 13. Design a binary multiplier using sequential adder. Explain its operation. 14. Draw the diagram of a carry look a head adder and explain the carry look ahead principle. 15. Design a 4-bit binary adder/ subtractor and explain its functions. 16. Give the algorithm for multiplication of signed 2s complement numbers and illustrate with an example. 17. Design an array multiplier that multiplies two 4-bit numbers and explain its operation. 18. Write the algorithm for division of floating point numbers and illustrate with an example. 19. Write about the CSA method of fast multiplication. Prove how it is faster with an example. 20. Draw the circuit for integer division and explain. 21. Explain the working of a floating point adder/ subtractor. With a detailed flow chart explain how floating point additional/ subtraction is performed.
UNIT-1
67 | P a g e
22. Give the IEEE standard double precision floating point format. 23. Explain the representation of floating point numbers in detail.
UNIT-1

Unit 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1

Uploaded by

Copyright:

Available Formats

1 |Page

Prepared by B.Rajalingam M.E., AP/CSE

UNIT I BASIC STRUCTURE OF COMPUTERS

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

The overall SPEC rating for the computer is given by

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

Four commonly used flags are

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE

CS2253 Computer Organization and Architecture

Prepared by B.Rajalingam M.E., AP/CSE