You are on page 1of 1

Total No. of Pages: 1 Roll Number:..

B.E. (COE) 6th SEMESTER MID SEMESTER EXAMINATION, MARCH 2011


COE 315:- ADVANCED COMPUTER ARCHITECTURE
TIME: 1.5 HOURS MAX. MARKS: 20
NOTE: Attempt all questions. Assume suitable missing data, if any, and specify it clearly.
----------------------------------------------------------------------------------------------------------------------------------
Question 1 [2]
In the CPU performance equation how do following technologies affect different parameters?
(i) RISC (ii) Superscalar

Question 2 [2]
What type of branch predictor would you recommend for a branch with the following behavior and why?
T, T, T, T, N, T, T, T, T, T, T, N, T, T, T, T, T, T, T, N, N, N, N, N, T, N, N, N, N, N, N, T, N, N, N, N

Question 3 [2]
Why can loop unrolling improve performance? Are there any potential downsides to using loop unrolling?
Explain in brief.

Question 4 [2]
Consider a pipeline that has four stages with the specified time requirement: Fetch (30 ns); Decode (20 ns);
Execute (20 ns); Writeback (30 ns). Assume that every instruction in its instruction set needs use of all
stages. Also, each stage is clocked with a common clock. What is the latency of an instruction flowing
through the pipeline? Also, what is idealized throughput of the pipeline?

Question 5 [2]
Consider two machines, machine A and machine B. For both machines, all instructions except for loads take
one cycle. Loads take one cycle plus an additional "cache penalty." Machine A has a clock rate of 1.0 GHz
and a cache penalty of 5 cycles. Machine B has a clock rate of 2.0 GHz and a cache penalty of 20 cycles.
Loads are 33% of all instructions. Which machine is faster, machine A or B and how much?

Question 6 [2]
A processor runs at 2 GHz and has a CPI of 1.2 without including the stall cycles due to cache misses. Load
and store instructions count 30% of all instructions. The processor has an I-cache and a D-cache. The hit
time is 1 clock cycle for these caches. The I-cache has a 2% miss rate. The D-cache has a 5% miss rate on
load and store instructions. The miss penalty is 50 ns, which is the time to access and transfer a cache block
between main memory and the processor. What is the Average Memory Access Time for Instruction Access
and Data Access (in clock cycles)? What is the number of stall cycles per instruction and the overall CPI?
(AMAT = hit time + miss rate * miss penalty)

Question 7 [3]
Suppose we have a pipelined architecture that has 7 stages and on a particular benchmark, it has the
following dynamic instruction mix: LOADS: 24%; STORES: 15%; ALU: 37%; BRANCHES (taken): 19%;
BRANCHES (not taken): 5% . Suppose we are using dynamic branch prediction and that each misprediction
costs a 2 cycle penalty. Assume there are no additional hazards. What branch prediction accuracy rate (in
%) is needed in order to achieve a throughput of 0.90 instructions per cycle?

Question 8 [5]
Consider the following assembly code:
Loop: LOAD R2, 0(R1) ; R2 = memory[R1]
LOAD R3, 4(R1) ; R3 = memory[R1+4]
ADD R4, R2, R3 ; R4 = R2 + R3
STORE R4, 0(R1) ; memory[R1] = R4
ADDI R1, R1, 4 ; R1 = R1 + 4
ADDI R2, R1, -400 ; for comparing R1 to 400
BEQZ R2, Loop ; if R2 = 0 then goto Loop
Assume that: R1 is initially 0; Branches resolve in the ID stage; WB writes registers in the 1st half and
ID reads registers in the 2nd half of the same cycle; Branches are handled by stalling until they are
resolved; All functional units take one cycle; There are no structural hazards. For the classic 5-stage
pipeline (IF-ID-EX-MEM-WB) without forwarding, how many cycles will this loop take?

You might also like