Why is the cache miss penalty greater i开发者_开发知识库n a deeply pipelined processor?
Is it because the stalling period will be more if the miss occurs at some late stage of the pipeline? Or because there are simply too many instructions in the pipeline?
Usually you implement a deeper pipeline to reduce the cycle time of each pipe stage.
Consider two in-order single-issue pipelined processor microarchitectures.
uA1 has a 5 stage pipeline and a 2 ns cycle time. uA2 has a 10 stage pipeline and a 1 ns cycle time.
A full cache miss must (at least) load an entire cache line from DRAM. Assume that takes 100 ns, including row activation, burst reads of the line words, and row precharge.
When uA1 takes a cache miss, it stalls for 100 ns, e.g. 50 clock cycles, e.g. 50 issue slots. When uA2 takes a cache miss, it stalls for 100 ns, e.g. 100 clock cycles, e.g. 100 issue slots.
Here the cache miss penalty (expressed in instruction issue slots missed), is twice as large in the more deeply pipelined processor.
精彩评论