VTU Previous Year Question Papers BE CS Eighth Semester
Advanced Computer Architecture June 2012
Note: Answer FIVE full questions, selecting at least TWO questions from each part.
1 a. Define computer architecture. Illustrate the seven dimensions of an ISA.
b. Explain in brief measuring, reporting and summarizing performance of computer system.
c. Assume a disk subsystem with the following components and MTTF:
10 disks, each rated at 1000000 – hour MTTF.
I SCSI controller, 500,000 – hour MTTF.
1 power supply, 200,000 – hour MTTF.
I tan, 200,000 – hour MTTF.
1 SCSI cable, 1,000,000 – hour MTTF.
Using the simplifying assumptions that the lifetimes are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole.
2 a. Explain how pipeline is implemented in MIPS.
b. Explain different techniques in reducing pipeline branch penalties,
c. What are the major hurdles of pipelining? Explain briefly.
d. Consider the unpipelined processor in RISC. Assume that it has a 1 ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2 ns of overhead to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline?
3 a. What are the basic compiler techniques For exposing ILP? Explain briefly.
b. Explain Tomarulo’s algorithm, sketching the basic structure of a MIPS floating point unit,
c. Explain true data dependence, name dependence and control dependence with an example code fragment.
4 a. Explain exploiting ILP using dynamic scheduling, multiple issue and speculation.
b. Explain Pentium 4 pipeline supporting multiple issue with speculation.
c. Suppose we have a VLIW that could issue two memory references, two FP operations and one integer operation or branch in every clock cycle, show an unrolled version of the loop x(i) = x(i) + s, for such a processor. Unroll as many times as necessary to eliminate any stalls. Ignore delayed branches.
Loop: L.D Fo, O(R1);
ADD.D F4, FO, F2;
S.D F1, 0(R1);
DADDU1 R1, R1, #-8;
BNE R1 R2, Loop
5 a. Explain basic schemes for enforcing coherence.
b. Explain performance of symmetric shared memory multiprocessors,
c. Suppose we have an application running on a 32-processor multiprocessor, which has a 200 ns time to handle reference to a remote memory. For this application, assuming that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic. Processors are stalled on a remote request, and the processor clock rate is 2 GHz. If the base CPI (assuming that all references hit in the cache) is 0,5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference?
6 a. Explain the six basic cache optimization techniques.
b. Given the data below, what is the impact of second level cache associativity on its mass penalty? Hit time La for direct mapped = 10 clock cycles
Two way set associativity increases hit time by 0.1 clock cycles to 10.1 clock cycles. Local miss rate for direct mapped = 25%
Local miss rate Li for two-way set associative = 20%
Miss penalty L> = 200 clock cycles.
c. What are the techniques for fast address translation? Explain.
7 a. Explain any 3 advanced cache optimization techniques.
b. Explain memory technology and optimizations.
c. Assume that the hit time of a two-way set associative first level data cache is 1.1 times faster than a four-way set associative cache of the same size. The iniss falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume that the miss penalty is 10 clock cycles to the L: cache for the two-way set associative cache, and that the Lj cache does not miss. Which has the faster average memory access time?
8 a. Explain detecting and enhancing loop level parallelism for VL1W.
b. Explain Intel-IA 64 architecture with a neat diagram.
c. Explain hardware support for exposing parallelism for VLIW and EPIC.