Instruction
Scheduling for the Intel® Itanium® 2 Processor
On the Itanium® processor, use of the output of MM
instructions (variable shifts, etc.) by integer instructions (ALU,
st, ld) must be completed
or the pipeline is flushed. Flushing the pipeline causes a penalty of
ten cycles, because the compiler must insert blocks of nops
with stop bits after shift operations. These blocks result because the
MM instructions take an average latency of 4
cycles. The Integer instructions that use the outputs of the MM
instructions are placed at least 4 cycles away from the issue of the MM
instructions.
On the Itanium 2 processor, these operations are scoreboarded,
removing the risk of flushing the pipeline. Therefore:
- The latency for such use is three cycles instead of four
- The subsequent use will simply stall until the data is ready
The example on the next page shows a comparison of the assembly code
generated with and without the -G2 option.
|