C++ Classes and SIMD Operations

The use of C++ classes for SIMD operations is based on the concept of operating on arrays, or vectors of data, in parallel. Consider the addition of two vectors, A and B, where each vector contains four elements. Using the integer vector (Ivec) class, the elements A[i] and B[i] from each array are summed as shown in the following example.

Typical Method of Adding Elements Using a Loop

short a[4], b[4], c[4];
for (i=0; i<4; i++) /* needs four iterations */
c[i] = a[i] + b[i]; /* returns c[0], c[1], c[2], c[3] *

The following example shows the same results using one operation with Ivec Classes.

SIMD Method of Adding Elements Using Ivec Classes

sIs16vec4 ivecA, ivecB, ivec C; /*needs one iteration */
ivecC = ivecA + ivecB; /*returns ivecC0, ivecC1, ivecC2, ivecC3 */

Available Classes

The Intel C++ SIMD classes provide parallelism, which is not easily implemented using typical mechanisms of C++. The following table shows how the Intel C++ SIMD classes use the classes and libraries.

SIMD Vector Classes

Instruction Set Class Signedness Data Type Size Elements Header File
MMX(TM) technology (available for
IA-32- and ItaniumŪ-based systems)
I64vec1 unspecified __m64 64 1 ivec.h
  I32vec2 unspecified int 32 2 ivec.h
  Is32vec2 signed int 32 2 ivec.h
  Iu32vec2 unsigned int 32 2 ivec.h
  I16vec4 unspecified short 16 4 ivec.h
  Is16vec4 signed short 16 4 ivec.h
  Iu16vec4 unsigned short 16 4 ivec.h
  I8vec8 unspecified char 8 8 ivec.h
  Is8vec8 signed char 8 8 ivec.h
  Iu8vec8 unsigned char 8 8 ivec.h
Streaming SIMD Extensions (available for
IA-32 and Itanium-based systems)
F32vec4 signed float 32 4 fvec.h
  F32vec1 signed float 32 1 fvec.h
Streaming SIMD Extensions 2 (available for
IA-32-based systems only)
F64vec2 signed double 64 2 dvec.h
  I128vec1 unspecified __m128i 128 1 dvec.h
  I64vec2 unspecified long int 64 4 dvec.h
  Is64vec2 signed long int 64 4 dvec.h
  Iu64vec2 unsigned long int 32 4 dvec.h
  I32vec4 unspecified int 32 4 dvec.h
  Is32vec4 signed int 32 4 dvec.h
  Iu32vec4 unsigned int 32 4 dvec.h
  I16vec8 unspecified int 16 8 dvec.h
  Is16vec8 signed int 16 8 dvec.h
  Iu16vec8 unsigned int 16 8 dvec.h
  I8vec16 unspecified char 8 16 dvec.h
  Is8vec16 signed char 8 16 dvec.h
  Iu8vec16 unsigned char 8 16 dvec.h

Most classes contain similar functionality for all data types and are represented by all available intrinsics. However, some capabilities do not translate from one data type to another without suffering from poor performance, and are therefore excluded from individual classes.

Note

Intrinsics that take immediate values and cannot be expressed easily in classes are not implemented.
(For example, _mm_shuffle_ps, _mm_shuffle_pi16, _mm_extract_pi16, _mm_insert_pi16).

Access to Classes Using Header Files

The required class header files are installed in the include directory with the IntelŪ C++ Compiler. To enable the classes, use the #include directive in your program file as shown in the table that follows.

Include Directives for Enabling Classes

Instruction Set Extension Include Directive
MMX Technology #include <ivec.h>
Streaming SIMD Extensions #include <fvec.h>
Streaming SIMD Extensions 2 #include <dvec.h>

Each succeeding file from the top down includes the preceding class. You only need to include fvec.h if you want to use both the Ivec and Fvec classes. Similarly, to use all the classes including those for the Streaming SIMD Extensions 2, you need only to include the dvec.h file.

Usage Precautions

When using the C++ classes, you should follow some general guidelines. More detailed usage rules for each class are listed in Integer Vector Classes, and Floating-point Vector Classes.

Clear MMX Registers

If you use both the Ivec and Fvec classes at the same time, your program could mix MMX instructions, called by Ivec classes, with Intel x87 architecture floating-point instructions, called by Fvec classes. Floating-point instructions exist in the following Fvec functions:

Note

MMX registers are aliased on the floating-point registers, so you should clear the MMX state with the EMMS instruction intrinsic before issuing an x87 floating-point instruction, as in the following example.

ivecA = ivecA & ivecB; /* Ivec logical operation that uses MMX instructions */
empty (); /* clear state */
cout << f32vec4a; /* F32vec4 operation that uses x87 floating-point instructions */

Caution

Failure to clear the MMX registers can result in incorrect execution or poor performance due to an incorrect register state.

Follow EMMS Instruction Guidelines

Intel strongly recommends that you follow the guidelines for using the EMMS instruction. Refer to this topic before coding with the Ivec classes.