|
Before HLO |
After HLO (loop unrolling) |
for (j=1; j<1000; j++) { y(j) = y(j) + a*x(j) } |
for (j=1; j<1000; j+=2) { |
The code on the left uses an ldf (load floating-point), an Itanium® processor assembly instruction that loads a single floating point value into a register. |
The code on the right uses an ldfp (load floating-point pair), an Itanium® architecture assembly instruction that loads two floating-point values into two registers simultaneously. |