Vectorization Examples

This section contains a few simple examples of some common issues in vector programming.

Argument Aliasing: A Vector Copy

The loop in the "Vectorizable Copy Due to Unproven Distinction" example, a vector copy operation, vectorizes because the compiler can prove dest[i] and src[i] are distinct.

Vectorizable Copy Due to Unproven Distinction

The restrict keyword in the "Using restrict to Prove Vectorizable Distinction" example indicates that the pointers refer to distinct objects. Therefore, the compiler allows vectorization without generation of multi-version code.

Using restrict to Prove Vectorizable Distinction

Data Alignment

A 16-byte or greater data structure or array should be aligned so that the beginning of each structure or array element is aligned in a way that its base address is a multiple of sixteen.

The "Misaligned Data Crossing 16-Byte Boundary" figure shows the effect of a data cache unit (DCU) split due to misaligned data. The code loads the misaligned data across a 16-byte boundary, which results in an additional memory access causing a six- to twelve-cycle stall. You can avoid the stalls if you know that the data is aligned and you specify to assume alignment.

Misaligned Data Crossing 16-Byte Boundary

For example, if you know that elements a[0] and b[0] are aligned on a 16-byte boundary, then the following loop can be vectorized with the alignment option on (#pragma vector aligned):

Alignment of Pointers is Known

After vectorization, the loop is executed as shown in the "Vector and Scalar Clean-up Interations" figure.

Vector and Scalar Clean-up Iterations

Both the vector iterations a[0:3] = b[0:3]; and a[4:7] = b[4:7]; can be implemented with aligned moves if both the elements a[0] and b[0] (or, likewise, a[4] and b[4] ) are 16-byte aligned.

Caution

If you specify the vectorizer with incorrect alignment options, the compiler will generate unexpected behavior. Specifically, using aligned moves on unaligned data, will result in an illegal instruction exception!

Data Alignment Examples

The "Loop Unaligned Due to Unknown Variable Value at Compiler Time" example contains a loop that vectorizes but only with unaligned memory instructions. The compiler can align the local arrays, but because lb is not known at compile-time, the correct alignment cannot be determined.

Loop Unaligned Due to Unknown Variable Value at Compile Time

If you know that lb is a multiple of 4, you can align the loop with #pragma vector aligned as shown in the "Alignment Due to Assertion of Variable as Multiple of 4" example.

Alignment Due to Assertion of Variable as Multiple of 4

The use of the assertion checks that the constraint lb is a multiple of 4 is satisfied.