Blue Gene/Q vector data type for C/C++ IBM XL C/C++ for Blue Gene/Q, V12.0 (technology preview) ii Blue Gene/Q vector data type for C/C++ Contents Chapter 1. Blue Gene/Q vector data type 1 Chapter 2. Vector type declaration . . . 3 Vector types (IBM extension) . typedef definitions . . . . Vector literals (IBM extension) . . . . . . . . . . . . . . . . . . . . . . 3 . 3 . 5 Chapter 3. Initialization of vectors (IBM extension) . . . . . . . . . . . . . . 7 Chapter 4. Macros related to the platform. . . . . . . . . . . . . . . 9 Chapter 5. Compiler option reference -qflttrap . . . . . . . . . . . . . 11 . . 11 Chapter 6. Aligning data . . . . . . . 15 The The The The The __align type qualifier (IBM extension) aligned variable attribute . . . . aligned type attribute . . . . . packed variable attribute . . . . packed type attribute . . . . . . . . . . . . . . . . . . . . . . . . . 15 17 19 19 20 Chapter 7. Quad vector usage. . . . . 21 Pointer arithmetic . . Type conversions . . Overload resolution . Parameter declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 22 22 24 Chapter 8. Quad vector operators . . . 27 Address operator & . . . . . . . . . Indirection operator * . . . . . . . . . The __alignof__ operator (IBM extension) . . The sizeof operator . . . . . . . . . . The typeof operator (IBM extension) . . . . Assignment operators . . . . . . . . . Vector subscripting operator [ ] (IBM extension) . . . . . . . . . . . . . . 27 28 28 29 31 32 34 Chapter 9. Inline assembly statements (IBM extension) . . . . . . . . . . . 35 Supported and unsupported constructs . . Restrictions on inline assembly statements . Examples of inline assembly statements . . . . . . . . . 38 . 39 . 39 Chapter 10. Vector built-in functions . . 45 Load and store functions vec_ld, vec_lda . . vec_ldia, vec_ldiaa . vec_ldiz, vec_ldiza . vec_lds, vec_ldsa . vec_ld2, vec_ld2a . vec_st, vec_sta . . vec_sts, vec_stsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 47 48 49 50 51 53 vec_st2, vec_st2a. . . . . . Unary arithmetic functions . . . vec_abs . . . . . . . . . vec_neg . . . . . . . . vec_nabs . . . . . . . . vec_re . . . . . . . . . vec_res . . . . . . . . . vec_rsqrte . . . . . . . . vec_rsqrtes . . . . . . . vec_swsqrt, vec_swsqrt_nochk . vec_swsqrts, vec_swsqrts_nochk Binary arithmetic functions . . . vec_add . . . . . . . . vec_cpsgn . . . . . . . . vec_mul . . . . . . . . vec_sub. . . . . . . . . vec_swdiv, vec_swdiv_nochk . vec_swdivs, vec_swdivs_nochk . vec_xmul . . . . . . . . Multiply-add functions . . . . vec_madd . . . . . . . . vec_msub . . . . . . . . vec_nmadd . . . . . . . vec_nmsub . . . . . . . vec_xmadd . . . . . . . vec_xxmadd . . . . . . . vec_xxcpnmadd . . . . . . vec_xxnpmadd . . . . . . Round functions . . . . . . . vec_ceil. . . . . . . . . vec_floor . . . . . . . . vec_round . . . . . . . . vec_rsp . . . . . . . . . vec_trunc . . . . . . . . Conversion functions . . . . . vec_cfid . . . . . . . . vec_cfidu . . . . . . . . vec_ctid . . . . . . . . vec_ctidu . . . . . . . . vec_ctidz . . . . . . . . vec_ctiduz . . . . . . . . vec_ctiw . . . . . . . . vec_ctiwu . . . . . . . . vec_ctiwz . . . . . . . . vec_ctiwuz . . . . . . . Comparison functions . . . . . vec_cmpgt . . . . . . . . vec_cmplt . . . . . . . . vec_cmpeq . . . . . . . vec_sel . . . . . . . . . vec_tstnan . . . . . . . . Element manipulation functions . vec_extract . . . . . . . vec_insert . . . . . . . . vec_gpci . . . . . . . . vec_lvsl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 56 56 56 57 58 59 60 61 62 63 64 64 65 65 66 67 68 69 69 69 70 71 71 72 73 73 74 75 75 75 76 76 77 77 78 78 79 80 80 81 82 82 83 84 84 84 85 86 86 87 88 88 88 89 90 iii vec_lvsr . . vec_perm . . vec_promote . vec_sldw . . vec_splat . . vec_splats . . Logical functions vec_and . . vec_andc . . vec_logical. . vec_nand . . vec_nor . . vec_not . . vec_or . . . iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blue Gene/Q vector data type for C/C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 . 94 . 95 . 95 . 96 . 97 . 97 . 97 . 98 . 99 . 100 . 101 . 101 . 102 vec_orc . vec_xor . . . . . . . . . . . . . . . . . . . . . . . . . . 102 . 103 Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS) . . . . . . . . . . . . . . 105 Using the scalar library . . . . . . . . . . 105 Using the SIMD library . . . . . . . . . . 108 Compiling and linking a program with MASS . . 111 Using libmass.a with the math system library. . . 111 Index . . . . . . . . . . . . . . . 113 Chapter 1. Blue Gene/Q vector data type The quad processing extension (QPX) floating-point unit of Blue Gene/Q supports operations on vectors of four IEEE 754 double-precision floating-point elements. In this document, the Blue Gene/Q vector data type is referred to as quad vector type. The Blue Gene/Q vector data type has an underlying representation of a four-element double-precision floating-point array. Note: QPX is like the vector multimedia extension (VMX) and the vector scalar extension (VSX), but the instruction sets and data types are different. On Blue Gene/Q, the XL C/C++ compiler provides a set of built-in functions that are optimized for the QPX floating-point unit. These built-in functions provide an almost one-to-one correspondence with the QPX instruction set. In additional, the XL C/C++ compiler includes a set of built-in functions that are optimized for the PowerPC® architecture. For a full description of these functions, see the following document: v Built-in functions for POWER® and PowerPC architectures in XL C/C++ for Linux, V12.1 Compiler Reference 1 2 Blue Gene/Q vector data type for C/C++ Chapter 2. Vector type declaration Vector types (IBM extension) XL C/C++ supports vector processing technologies through language extensions. In the extended syntax, type qualifiers and storage class specifiers can precede the keyword vector4double (or its alternative spelling, __vector4double) in a declaration. Most of the legal forms of the syntax are captured in the following diagram. Some variations have been omitted from the diagram for the sake of clarity: type qualifiers such as const and storage class specifiers such as static can appear in any order within the declaration, as long as neither immediately follows the keyword vector (or __vector). Vector declaration syntax type_qualifier storage_class_specifier vector4double __vector4double The following table lists the supported vector data types and the size and possible values for each type. Table 1. Vector data types Type Interpretation of content Range of values vector4double 4 double IEEE 754 double (64 bit) precision floating-point values The vector4double type must be aligned on 32-byte boundaries. The Blue Gene/Q technology does not generate exceptions for unaligned vector types. The load and store operations truncate addresses to a 16 or 32-byte boundary. You can alter the alignment of a vector type with alignment modifiers, but it is highly recommended not to alter that alignment. Aggregates containing vector types must be aligned on 32-byte boundaries and padded, if necessary, so that each member that has a vector type is aligned on a 32-byte boundary. Variable length arrays can contain vector data types as data members. typedef definitions A typedef declaration lets you define your own identifiers that can be used in place of type specifiers such as int, float, and double. A typedef declaration does not reserve storage. The names you define using typedef are not new data types, but synonyms for the data types or combinations of data types they represent. 3 The name space for a typedef name is the same as other identifiers. When an object is defined using a typedef identifier, the properties of the defined object are exactly the same as if the object were defined by explicitly listing the data type associated with the identifier. IBM typedef definitions are extended to handle vector types. A vector type can be used in a typedef definition, and the new type name can be used in the usual ways, except for declaring other vectors. In a vector declaration context, a typedef name is disallowed as a type specifier. The following example illustrates a typical usage of typedef with vector types: typedef vector4double vdt; vdt v1; IBM Examples of typedef definitions The following statements define LENGTH as a synonym for int and then use this typedef to declare length, width, and height as integer variables: typedef int LENGTH; LENGTH length, width, height; The preceding declarations are equivalent to the following declaration: int length, width, height; Similarly, typedef can be used to define a structure, union, or C++ class. For example: typedef struct { int scruples; int drams; int grains; } WEIGHT; The structure WEIGHT can then be used in the following declarations: WEIGHT chicken, cow, horse, whale; In the following example, the type of yds is "pointer to function with no parameters, returning int". typedef int SCROLL(void); extern SCROLL *yds; In the following typedef definitions, the token struct is part of the type name: the type of ex1 is struct a; the type of ex2 is struct b. typedef struct a { char x; } ex1, *ptr1; typedef struct b { char x; } ex2, *ptr2; Type ex1 is compatible with the type struct a and the type of the object pointed to by ptr1. Type ex1 is not compatible with char, ex2, or struct b. C++ In C++, a typedef name must be different from any class type name declared within the same scope. If the typedef name is the same as a class type name, it can only be so if that typedef is a synonym of the class name. 4 Blue Gene/Q vector data type for C/C++ A C++ class defined in a typedef definition without being named is given a dummy name.Such a class cannot have constructors or destructors. Consider the following example: typedef class { ~Trees(); } Trees; In this example, an unnamed class is defined in a typedef definition. Trees is an alias for the unnamed class, but not the class type name. So you cannot define a destructor ~Trees() for this unnamed class; otherwise, the compiler issues an error. C++ C++0x Declaring typedef names as friends In the C++0x standard, the extended friend declarations feature is introduced, with which you can declare typedef names as friends. For more information, see Extended friend declarations. ++0x Cz/OS Vector literals (IBM extension) A vector literal is a constant expression for which the value is interpreted as a vector type. The data type of a vector literal is represented by a parenthesized vector type, and its value is a set of constant expressions that represent the vector elements and are enclosed in parentheses or braces. When all vector elements have the same value, the value of the literal can be represented by a single constant expression. You can initialize vector types with vector literals. Vector literal syntax ( vector_type ) ( { literal_list ) } literal_list: , constant_expression The vector_type is vector4double. The literal_list can be either of the following expressions: v A single constant expression v A comma-separated list of constant expressions The delimiters around literal_list determine how the constant expressions are interpreted and the expected number of constant expressions. Use one of the following types of characters for the delimiters: v Parenthesis Chapter 2. Vector type declaration 5 The list must include exactly one or four constant expressions. – With one constant expression, all elements of the vector are initialized to the specified value. – With four constant expressions, each element of the vector is initialized to the corresponding specified value. v Braces The number of constant expressions can be less than the number of elements in the vector. Each unspecified element is set to 0.0. The following table shows the possible combinations where c1, c2, c3, and c4 are constant expressions. Table 2. Quad vector literals Literal Result (vector4double) (c1) (c1, c1, c1, c1) (vector4double) (c1, c2) Compile-time error (vector4double) (c1, c2, c3) Compile-time error (vector4double) (c1, c2, c3, c4) (c1, c2, c3, c4) (vector4double) {c1} (c1, 0.0, 0.0, 0.0) (vector4double) {c1, c2} (c1, c2, 0.0, 0.0) (vector4double) {c1, c2, c3} (c1, c2, c3, 0.0) (vector4double) {c1, c2, c3, c4} (c1, c2, c3, c4) Note: All the constant expressions in the initializer list must have a type that is appropriate for the vector literal. If that is not the case, the compiler converts the expression to one of which the type is compatible with the vector literal type. If the constant expression used to initialize the vector element has a value that cannot be represented in the destination format (the vector type), the compiler truncates that value. If you specify the -qinfo=trd option, the compiler generates a message stating that the value is not preserved. Example (vector4double) (3.0); // Assign the double-precision floating-point value 3.0 to all the four // elements that constitute the vector. (vector4double) (10.0,20.0,30.0,40.0); // Assign the double-precision floating-point values 10.0, 20.0, 30.0, and 40.0 // to the four elements that constitute the vector. (vector4double) {10.0}; // Assign the double-precision floating-point value 10.0 to the first element // of the vector. The other elements are set to 0.0. (vector4double) {10.0,20.0}; // Assign the double-precision floating-point values 10.0 and 20.0 // to the first and second elements of the vector. The other // elements are set to 0.0. (vector4double) {10.0,20.0,30.0,40.0}; // Assign the double-precision floating-point values 10.0, 20.0, 30.0, and 40.0 // to the four elements that constitute the vector. 6 Blue Gene/Q vector data type for C/C++ Chapter 3. Initialization of vectors (IBM extension) A vector type is initialized by a vector literal or any expression having the same vector type. For example: vector4double v1; vector4double v2 = (vector4double) (10.); vector4double v3 = (vector4double) (1.0, 2.0, 3.0, 4.0); v1 = v2; With XL C/C++, you can initialize a vector type with an initializer list. This feature is an extension for compatibility with GNU C. Vector initializer list syntax , vector_type identifier = { initializer } ; The number of values in a braced initializer list must be less than or equal to the number of elements of the vector type. Any uninitialized element will be initialized to zero. The following are examples of vector initialization using initializer lists: vector4double v1 = {1.0}; // initialize the first element of v1 with 1.0 // and the remaining three elements with 0.0 vector4double v2 = {1.0,2.0}; // initialize the first two elements of v2 with 1.0 // and 2.0, and the remaining two elements with 0.0 vector4double v3 = {1.0,2.0,3.0,4.0}; // equivalent to the vector literal // (vector4double) (1.0,2.0,3.0,4.0) Unlike vector literals, the values in the initializer list do not have to be constant expressions unless the initialized vector variable has static duration. Thus, the following is legal: double i=1.0; double foo() { return 2.0; } int main() { vector4double v1 = {i, foo()}; return 0; } 7 8 Blue Gene/Q vector data type for C/C++ Chapter 4. Macros related to the platform The following predefined macros are provided to facilitate porting applications between platforms. All platform-related predefined macros are unprotected and can be undefined or redefined without warning unless otherwise specified. Table 3. Platform-related predefined macros Predefined macro name Description Predefined Predefined under the value following conditions __bg__ Indicates that this is a Blue Gene® platform. 1 Always predefined for all Blue Gene platforms. __bgq__ Indicates that the architecture is the processor of Blue Gene/Q. 1 Predefined when the architecture is the processor of Blue Gene/Q. _BIG_ENDIAN, __BIG_ENDIAN__ Indicates that the platform is big-endian (that is, the most significant byte is stored at the memory location with the lowest address). 1 Always predefined. __ELF__ Indicates that the ELF object model is in effect. 1 Always predefined for the Linux platform. Always predefined. __GXX_WEAK__ Indicates that weak symbols are supported 1 (used for template instantiation by the linker). __powerpc, __powerpc__ Indicates that the target is a Power architecture. 1 Predefined when the target is a Power architecture. __PPC, __PPC__ Indicates that the target is a Power architecture. 1 Predefined when the target is a Power architecture. __THW_BLUEGENE, __THW_BLUEGENE__ Indicates that the target architecture is Blue Gene. 1 Predefined when the target is Blue Gene. __TOS_BGQ__ Indicates that the target architecture is the processor of Blue Gene/Q. 1 Predefined when the target is the processor of Blue Gene/Q. __unix, __unix__ Indicates that the operating system is a variety 1 of UNIX. Always predefined. __VECTOR4DOUBLE__ Indicates the support of vector data types on Blue Gene/Q Predefined on Blue Gene/Q C++ 1 9 10 Blue Gene/Q vector data type for C/C++ Chapter 5. Compiler option reference -qflttrap Category Error checking and debugging Pragma equivalent #pragma options [no]flttrap Purpose Determines what types of floating-point exceptions to detect at run time. The program receives a SIGFPE signal when the corresponding exception occurs. Syntax -q noflttrap flttrap : = zero zerodivide und underflow ov overflow inv invalid inex inexact enable en nanq qpxstore qpxs Defaults -qnoflttrap Parameters enable, en Inserts a trap when the specified exceptions (overflow, underflow, zerodivide, invalid, or inexact) occur. You must specify this suboption if you want to turn on exception trapping without modifying your source code. If any of the specified exceptions occur, a SIGTRAP or SIGFPE signal is sent to the process with the precise location of the exception. inexact, inex Enables the detection of floating-point inexact operations. If a floating-point 11 inexact operation occurs, an inexact operation exception status flag is set in the Floating-Point Status and Control Register (FPSCR). invalid, inv Enables the detection of floating-point invalid operations. If a floating-point invalid operation occurs, an invalid operation exception status flag is set in the FPSCR. nanq Generates code to detect Not a Number Quiet (NaNQ) and Not a Number Signalling (NaNS) exceptions before and after each floating-point operation, including assignment, and after each call to a function returning a floating-point result to trap if the value is a NaN. Trapping code is generated regardless of whether the enable suboption is specified. overflow, ov Enables the detection of floating-point overflow. If a floating-point overflow occurs, an overflow exception status flag is set in the FPSCR. qpxstore, qpxs Enables the detection of Not a Number (NaN) or infinity values in Quad Processing eXtension (QPX) vectors. To detect NaN or infinity values, the compiler generates stores with indicating instructions for QPX vectors in registers. The indicating vector stores are used for both stores as a result of using QPX store intrinsics or assignment operators. underflow, und Enables the detection of floating-point underflow. If a floating-point underflow occurs, an underflow exception status flag is set in the FPSCR. zerodivide, zero Enables the detection of floating-point division by zero. If a floating-point zero-divide occurs, a zero-divide exception status flag is set in the FPSCR. Usage Specifying -qflttrap option with no suboptions is equivalent to -qflttrap=overflow:underflow:zerodivide:invalid:inexact Exceptions will be detected by the hardware, but trapping is not enabled. It is recommended that you use the enable suboption whenever compiling the main program with -qflttrap. This ensures that the compiler will generate the code to automatically enable floating-point exception trapping, without requiring that you include calls to the appropriate floating-point exception library functions in your code. If you specify -qflttrap more than once, both with and without suboptions, the -qflttrap without suboptions is ignored. The -qflttrap option is recognized during linking with IPA. Specifying the option at the link step overrides the compile-time setting. If your program contains signalling NaNs, you should use the -qfloat=nans option along with -qflttrap to trap any exceptions. 12 Blue Gene/Q vector data type for C/C++ The compiler exhibits behavior as illustrated in the following examples when the -qflttrap option is specified together with an optimization option: v with -O2: – 1/0 generates a div0 exception and has a result of infinity – 0/0 generates an invalid operation v with -O3 or greater: – 1/0 generates a div0 exception and has a result of infinity – 0/0 returns zero multiplied by the result of the previous division. Note: Due to the transformations performed and the exception handling support of some vector instructions, use of -qsimd=auto may change the location where an exception is caught or even cause the compiler to miss catching an exception. Predefined macros None. Example #include <stdio.h> int main() { float x, y, z; x = 5.0; y = 0.0; z = x / y; printf("%f", z); } When you compile this program with the following command, the program stops when the division is performed. xlc -qflttrap=zerodivide:enable divide_by_zero.c The zerodivide suboption identifies the type of exception to guard against. The enable suboption causes a SIGFPE signal to be generated when the exception occurs. Related information v -qfloat v -qarch Chapter 5. Compiler option reference 13 14 Blue Gene/Q vector data type for C/C++ Chapter 6. Aligning data XL C/C++ provides many mechanisms for specifying data alignment at the levels of individual variables, members of aggregates, entire aggregates, and entire compilation units. If you are porting applications between different platforms, or between 32-bit and 64-bit modes, you need to take into account the differences between alignment settings available in the different environments, to prevent possible data corruption and deterioration in performance. In particular, vector types have special alignment requirements which, if not followed, can produce incorrect results. That is, vectors need to be aligned according to a 32 byte boundary. Using alignment modes, you can set alignment defaults for all data types for a compilation unit (or subsection of a compilation unit), by specifying a predefined suboption. Using alignment modifiers, you can set the alignment for specific variables or data types within a compilation unit, by specifying the exact number of bytes that should be used for the alignment. Using alignment modes discusses the default alignment modes for all data types on the different platforms and addressing models; the suboptions and pragmas you can use to change or override the defaults; and rules for the alignment modes for simple variables, aggregates, and bit fields. Using alignment modifiers discusses the different specifiers, pragmas, and attributes you can use in your source code to override the alignment mode currently in effect, for specific variable declarations. It also provides the rules governing the precedence of alignment modes and modifiers during compilation. The __align type qualifier (IBM extension) The __align qualifier is a language extension that allows you to specify an explicit alignment for an aggregate or a static (or global) variable. The specified byte boundary affects the alignment of an aggregate as a whole, not that of its members. The __align qualifier can be applied to an aggregate definition nested within another aggregate definition, but not to individual elements of an aggregate. The alignment specification is ignored for parameters and automatic variables. The __align type qualifier can also be used with vector types. Similar to the aligned attribute, the alignment of a vector type cannot be reduced using the __align type qualifier. A declaration takes one of the following forms: __align qualifier syntax for simple variables type specifier __align ( int_constant ) declarator 15 __align qualifier syntax for structures or unions __align ( { int_constant ) member_declaration_list } struct union tag_identifier ; where int_constant is a positive integer value indicating the byte-alignment boundary. Legal values are powers of 2 up to 32768. The following restrictions and limitations apply: v The __align qualifier cannot be used where the size of the variable alignment is smaller than the size of the type alignment. v Not all alignments may be representable in an object file. v The __align qualifier cannot be applied to the following: – – – – – Individual elements within an aggregate definition. Individual elements of an array. Variables of incomplete type. Aggregates declared but not defined. Other types of declarations or definitions, such as a typedef, a function, or an enumeration. Examples using the __align qualifier Applying __align to static or global variables: // varA is aligned on a 1024-byte boundary and padded with 1020 bytes int __align(1024) varA; int main() {...} // varB is aligned on a 512-byte boundary and padded with 508 bytes static int __align(512) varB; // Error int __align(128) functionB( ); // Error typedef int __align(128) T; // Error __align enum C {a, b, c}; Applying __align to align and pad aggregate tags without affecting aggregate members: // Struct structA is aligned on a 1024-byte boundary // with size including padding of 1024 bytes. __align(1024) struct structA { int i; int j; }; // Union unionA is aligned on a 1024-byte boundary // with size including padding of 1024 bytes. __align(1024) union unionA 16 Blue Gene/Q vector data type for C/C++ { int i; int j; }; Applying __align to a structure or union, where the size and alignment of the aggregate using the structure or union is affected: // sizeof(struct S) == 128 __align(128) struct S {int i;}; // sarray is aligned on 128-byte boundary with sizeof(sarray) == 1280 struct S sarray[10]; // Error: alignment of variable is smaller than alignment of type struct S __align(64) svar; // s2 is aligned on 128-byte boundary with sizeof(s2) == 256 struct S2 {struct S s1; int a;} s2; Applying __align to an array: In the following example, only arrayA is aligned on a 64-byte boundary, and elements within that array are aligned according to the alignment of AnyType. Padding is applied before the beginning of the array and does not affect the size of the array member itself. AnyType __align(64) arrayA[10]; Applying __align where the size of the variable alignment differs from the size of the type alignment: __align(64) struct S {int i;}; // Error: alignment of variable is smaller than alignment of type. struct S __align(32) s1; // s2 is aligned on 128-byte boundary struct S __align(128) s2; // Error struct S __align(16) s3[10]; // Error int __align(1) s4; // Error __align(1) struct S {int i;}; The aligned variable attribute With the aligned variable attribute, you can override the default memory alignment mode to specify a minimum memory alignment value, expressed as a number of bytes, for any of the following types of variables: v Non-aggregate variables v Aggregate variables (such as a structures, classes, or unions) v Selected member variables The attribute is typically used to increase the alignment of the given variable. Chapter 6. Aligning data 17 aligned variable attribute syntax __attribute__ (( aligned __aligned__ )) ( alignment_factor ) The alignment_factor is the number of bytes, specified as a constant expression that evaluates to a positive power of 2. You can specify a value up to a maximum of 1 GB. If you omit the alignment factor, and its enclosing parentheses, the compiler automatically uses 16 bytes. If you specify an alignment factor greater than the maximum, the compiler uses the default alignment in effect and ignores your specification. When you apply the aligned attribute to a member variable in a bit field structure, the attribute specification is applied to the bit field container. If the default alignment of the container is greater than the alignment factor, the default alignment is used. The aligned attribute can be applied to the following types of variables: v static vector variables v auto vector variables v Aggregate members that have a vector type The alignment of auto variables is limited to the maximal stack alignment: v 32 for functions containing vector data types v 16 for other functions The aligned attribute cannot be used to decrease the natural alignment of any type, including vector types. The aligned attribute is ignored with a warning message when the alignment factor is less than 32 for vector types. Example In the following example, the structures first_address and second_address are set to an alignment of 16 bytes: struct address { int street_no; char *street_name; char *city; char *prov; char *postal_code; } first_address __attribute__((__aligned__(16))) ; struct address second_address __attribute__((__aligned__(16))) ; In the following example, only the members first_address.prov and first_address.postal_code are set to an alignment of 16 bytes: struct address { int street_no; char *street_name; char *city; char *prov __attribute__((__aligned__(16))) ; char *postal_code __attribute__((__aligned__(16))) ; } first_address ; 18 Blue Gene/Q vector data type for C/C++ The aligned type attribute With the aligned type attribute, you can override the default alignment mode to specify a minimum alignment value, expressed as a number of bytes, for a structure, class, union, enumeration, or other user-defined type created in a typedef declaration. The aligned attribute is typically used to increase the alignment of any variables declared of the type to which the attribute applies. aligned type attribute syntax __attribute__ (( aligned __aligned__ )) ( alignment_factor ) The alignment_factor is the number of bytes, specified as a constant expression that evaluates to a positive power of 2. You can specify a value up to a maximum 1048576 bytes. If you omit the alignment factor (and its enclosing parentheses), the compiler automatically uses 16 bytes. If you specify an alignment factor greater than the maximum, the attribute specification is ignored, and the compiler uses the default alignment in effect. The alignment value that you specify is applied to all instances of the type. Also, the alignment value applies to the variable as a whole; if the variable is an aggregate, the alignment value applies to the aggregate as a whole, not to the individual members of the aggregate. The aligned attribute cannot be used to decrease the natural alignment of any type, including vector types. The aligned attribute is ignored with a warning when the alignment factor is less than 32 for vector types. Example In all of the following examples, the aligned attribute is applied to the structure type A. Because a is declared as a variable of type A, it also receives the alignment specification, as any other instances declared of type A. struct __attribute__((__aligned__(8))) A {}; struct __attribute__((__aligned__(8))) A {} a; typedef struct __attribute__((__aligned__(8))) A {} a; The packed variable attribute The variable attribute packed allows you to override the default alignment mode, to reduce the alignment for all members of an aggregate, or selected members of an aggregate to the smallest possible alignment: one byte for a member and one bit for a bit field member. The packed attribute can be applied to aggregate members that have a vector type. That attribute reduces the member alignment to one byte. Note: The compiler does not generate warnings if the vector members are not aligned on 32-byte boundaries. Chapter 6. Aligning data 19 packed variable attribute syntax __attribute__ (( packed __packed__ )) The packed type attribute The packed type attribute specifies that the minimum alignment should be used for the members of a structure, class, union, or enumeration type. For structure, class, or union types, the alignment is one byte for a member and one bit for a bit field member. For enumeration types, the alignment is the smallest size that will accomodate the range of values in the enumeration. All members of all instances of that type will use the minimum alignment. The packed attribute can be applied to aggregate members that have a vector type. That attribute reduces the member alignment to one byte. Note: The compiler does not generate warnings if the vector members are not aligned on 32-byte boundaries. packed type attribute syntax __attribute__ (( packed __packed__ )) Unlike the aligned type attribute, the packed type attribute is not allowed in a typedef declaration. 20 Blue Gene/Q vector data type for C/C++ Chapter 7. Quad vector usage This section describes how the quad vector type is integrated in the XL C/C++ compiler. Pointer arithmetic You can perform a limited number of arithmetic operations on pointers. These operations are: v Increment and decrement v Addition and subtraction v Comparison v Assignment The increment (++) operator increases the value of a pointer by the size of the data object the pointer refers to. For example, if the pointer refers to the second element in an array, the ++ makes the pointer refer to the third element in the array. The decrement (--) operator decreases the value of a pointer by the size of the data object the pointer refers to. For example, if the pointer refers to the second element in an array, the -- makes the pointer refer to the first element in the array. You can add an integer to a pointer but you cannot add a pointer to a pointer. If the pointer p points to the first element in an array, the following expression causes the pointer to point to the third element in the same array: p = p + 2; If you have two pointers that point to the same array, you can subtract one pointer from the other. This operation yields the number of elements in the array that separate the two addresses that the pointers refer to. You can compare two pointers with the following operators: ==, !=, <, >, <=, and >=. Pointer comparisons are defined only when the pointers point to elements of the same array. Pointer comparisons using the == and != operators can be performed even when the pointers point to elements of different arrays. You can assign to a pointer the address of a data object, the value of another compatible pointer or the NULL pointer. IBM Pointer arithmetic is defined for pointer to vector types. Given: vector4double *v; the expression v + 1 represents a pointer to the vector following v. 21 Type conversions An expression of a given type is implicitly converted when it is used in the following situations: v As an operand of an arithmetic or logical operation. v As a condition in an if statement or an iteration statement (such as a for loop). The expression will be converted to a Boolean (or an integer in C89). v In a switch statement. The expression is converted to an integral type. v As an assignment is made to an lvalue that has a different type than the assigned value. v As an initialization. This includes the following types: – A function is provided an argument value that has a different type than the parameter. – The value specified in the return statement of a function has a different type from the defined return type for the function. C The implicit conversion result is an rvalue. C The implicit conversion result belongs to one of the following value C++ categories depending on different converted expressions types: v An lvalue if the type is an lvalue reference type C++0x or an rvalue reference ++0x to a function type Cz/OS ++0x v An xvalue if the type is an rvalue reference to an object type Cz/OS C++0x ++0x v A C++0x (prvalue) Cz/OS rvalue in other cases C++ You can perform explicit type conversions using a cast expression, as described in Cast expressions. Vector type casts (IBM extension) In the Blue Gene/Q environment, vector types cannot be converted to other vector types or other compiler intrinsics data types. Overload resolution The process of selecting the most appropriate overloaded function or operator is called overload resolution. Suppose that f is an overloaded function name. When you call the overloaded function f(), the compiler creates a set of candidate functions. This set of functions includes all of the functions named f that can be accessed from the point where you called f(). The compiler may include as a candidate function an alternative representation of one of those accessible functions named f to facilitate overload resolution. After creating a set of candidate functions, the compiler creates a set of viable functions. This set of functions is a subset of the candidate functions. The number of parameters of each viable function agrees with the number of arguments you used to call f(). 22 Blue Gene/Q vector data type for C/C++ The compiler chooses the best viable function, the function declaration that the C++ runtime environment will use when you call f(), from the set of viable functions. The compiler does this by implicit conversion sequences. An implicit conversion sequence is the sequence of conversions required to convert an argument in a function call to the type of the corresponding parameter in a function declaration. The implicit conversion sequences are ranked; some implicit conversion sequences are better than others. The best viable function is the one whose parameters all have either better or equal-ranked implicit conversion sequences than all of the other viable functions. The compiler will not allow a program in which the compiler was able to find more than one best viable function. Implicit conversion sequences are described in more detail in Implicit conversion sequences . When a variable length array is a function parameter, the leftmost array dimension does not distinguish functions among candidate functions. In the following, the second definition of f is not allowed because void f(int []) has already been defined. void f(int a[*]) {} void f(int a[5]) {} // illegal However, array dimensions other than the leftmost in a variable length array do differentiate candidate functions when the variable length array is a function parameter. For example, the overload set for function f might comprise the following: void f(int a[][5]) {} void f(int a[][4]) {} void f(int a[][g]) {} // assume g is a global int but cannot include void f(int a[][g2]) {} // illegal, assuming g2 is a global int because having candidate functions with second-level array dimensions g and g2 creates ambiguity about which function f should be called: neither g nor g2 is known at compile time. IBM If you are using vector data types, the parameter of the calling function should be of the exact same type as the vector data type. Example: int f(vector4double) // (function 1) { return 1; } int f(double) // (function 2) { return 2; } For f((vector4double)(1.0)), the overloading resolution will find that function 1 is the best candidate function. If function 1 is not in the candidate function list, the compiler will not find a matchable candidate function. IBM You can override an exact match by using an explicit cast. In the following example, the second call to f() matches with f(void*): Chapter 7. Quad vector usage 23 void f(int) { }; void f(void*) { }; int main() { f(0xaabb); f((void*) 0xaabb); } // matches f(int); // matches f(void*) Parameter declarations The function declarator includes the list of parameters that can be passed to the function when it is called by another function, or by itself. C++ In C++, the parameter list of a function is referred to as its signature. The name and signature of a function uniquely identify it. As the word itself suggests, the function signature is used by the compiler to distinguish among the different instances of overloaded functions. Function parameter declaration syntax , ( ) parameter , ... parameter type_specifier register declarator C++ An empty argument list in a function declaration or definition indicates a function that takes no arguments. To explicitly indicate that a function does not take any arguments, you can declare the function in two ways: with an empty parameter list, or with the keyword void: int f(void); int f(); An empty argument list in a function definition indicates that a function C that takes no arguments. An empty argument list in a function declaration indicates that a function may take any number or type of arguments. Thus, int f() { ... } indicates that function f takes no arguments. However, int f(); simply indicates that the number and type of parameters is not known. To explicitly indicate that a function does not take any arguments, you can replace the argument list with the keyword void. int f(void); C 24 Blue Gene/Q vector data type for C/C++ An ellipsis at the end of the parameter specifications is used to specify that a function has a variable number of parameters. The number of parameters is equal to, or greater than, the number of parameter specifications. int f(int, ...); C++ The comma before the ellipsis is optional. In addition, a parameter declaration is not required before the ellipsis. C At least one parameter declaration, as well as a comma before the ellipsis, are both required in C. Functions with variable number of parameters do not accept parameters that have the vector4double type. IBM Parameter types In a function declaration, or prototype, the type of each parameter must be C++ specified. In the function definition, the type of each parameter must also C In the function definition, if the type of a parameter is not be specified. specified, it is assumed to be int. A variable of a user-defined type may be declared in a parameter declaration, as in the following example, in which x is declared for the first time: struct X { int i; }; void print(struct X x); C The user-defined type can also be defined within the parameter C++ declaration. The user-defined type can not be defined within the parameter declaration. void print(struct X { int i; } x); void print(struct X { int i; } x); // legal in C // error in C++ Parameter names In a function definition, each parameter must have an identifier. In a function declaration, or prototype, specifying an identifier is optional. Thus, the following example is legal in a function declaration: int func(int,long); The following constraints apply to the use of parameter names in C++ function declarations: v Two parameters cannot have the same name within a single declaration. v If a parameter name is the same as a name outside the function, the name outside the function is hidden and cannot be used in the parameter declaration. In the following example, the third parameter name intersects is meant to have enumeration type subway_line, but this name is hidden by the name of the first parameter. The declaration of the function subway() causes a compile-time error, because subway_line is not a valid type name. The first parameter name subway_line hides the namespace scope enum type and cannot be used again in the third parameter. enum subway_line {yonge, university, spadina, bloor}; int subway(char * subway_line, int stations, subway_line intersects); C++ Chapter 7. Quad vector usage 25 Static array indices in function parameter declarations (C only) Except in certain contexts, an unsubscripted array name (for example, region instead of region[4]) represents a pointer whose value is the address of the first element of the array, provided that the array has previously been declared. An array type in the parameter list of a function is also converted to the corresponding pointer type. Information about the size of the argument array is lost when the array is accessed from within the function body. To preserve this information, which is useful for optimization, you may declare the index of the argument array using the static keyword. The constant expression specifies the minimum pointer size that can be used as an assumption for optimizations. This particular usage of the static keyword is highly prescribed. The keyword may only appear in the outermost array type derivation and only in function parameter declarations. If the caller of the function does not abide by these restrictions, the behavior is undefined. Note: This feature is C99 specific. The following examples show how the feature can be used. void foo(int arr [static 10]); void foo(int arr [const 10]); void foo(int arr [static const i]); void foo(int arr [const static i]); void foo(int arr [const]); 26 Blue Gene/Q vector data type for C/C++ /* arr points to the first of at least 10 ints /* arr is a const pointer /* arr points to at least i ints; i is computed at run time. /* alternate syntax to previous example /* const pointer to int */ */ */ */ */ Chapter 8. Quad vector operators The following operators support the quad vector type. Address operator & The & (address) operator yields a pointer to its operand. The operand must be an lvalue, a function designator, or a qualified name. It cannot be a bit field. C It cannot have the storage class register. If the operand is an lvalue or function, the resulting type is a pointer to the expression type. For example, if the expression has type int, the result is a pointer to an object having type int. If the operand is a qualified name and the member is not static, the result is a pointer to a member of class and has the same type as the member. The result is not an lvalue. If p_to_y is defined as a pointer to an int and y as an int, the following expression assigns the address of the variable y to the pointer p_to_y : p_to_y = &y; IBM The address operator has been extended to handle vector types, provided that vector support is enabled. The result of the address operator applied to a vector type can be stored in a pointer to a compatible vector type. The address of a vector type can be used to initialize a pointer to vector type if both sides of the initialization have compatible types. A pointer to void can also be initialized with the address of a vector type. The ampersand symbol & is used in C++ as a reference declarator in C++ addition to being the address operator. The meanings are related but not identical. int target; int &rTarg = target; void f(int*& p); // rTarg is a reference to an integer. // The reference is initialized to refer to target. // p is a reference to a pointer If you take the address of a reference, it returns the address of its target. Using the previous declarations, &rTarg is the same memory address as &target. You may take the address of a register variable. You can use the & operator with overloaded functions only in an initialization or assignment where the left side uniquely determines which version of the overloaded function is used. C++ IBM The address of a label can be taken using the GNU C address operator &&. The label can thus be used as a value. 27 Indirection operator * The * (indirection) operator determines the value referred to by the pointer-type operand. The operand cannot be a pointer to an incomplete type. If the operand points to an object, the operation yields an lvalue referring to that object. If the operand points to a function, the result is a function designator in C or, in C++, an lvalue referring to the object to which the operand points. Arrays and functions are converted to pointers. The type of the operand determines the type of the result. For example, if the operand is a pointer to an int, the result has type int. Do not apply the indirection operator to any pointer that contains an address that is not valid, such as NULL. The result is not defined. If p_to_y is defined as a pointer to an int and y as an int, the expressions: p_to_y = &y; *p_to_y = 3; cause the variable y to receive the value 3. IBM The indirection operator * has been extended to handle pointer to vector types, provided that vector support is enabled. A vector pointer should point to a memory location that has 32-byte alignment. However, the compiler does not enforce this constraint. Dereferencing a vector pointer maintains the vector type and its 32-byte alignment. If a program dereferences a vector pointer that does not contain a 32-byte aligned address, the behavior is undefined. See the following example: vector4double v1; vector4double *pv1; v1=*pv1; // legal, results in a copy of data pointed at pv1 into v1. IBM The __alignof__ operator (IBM extension) The __alignof__ operator is a language extension to C99 and Standard C++ that returns the number of bytes used in the alignment of its operand. The operand can be an expression or a parenthesized type identifier. If the operand is an expression representing an lvalue, the number returned by __alignof__ represents the alignment that the lvalue is known to have. The type of the expression is determined at compile time, but the expression itself is not evaluated. If the operand is a type, the number represents the alignment usually required for the type on the target platform. The __alignof__ operator may not be applied to the following: v An lvalue representing a bit field v A function type v An undefined structure or class v An incomplete type (such as void) 28 Blue Gene/Q vector data type for C/C++ __alignof__ operator syntax __alignof__ unary_expression ( type-id ) If type-id is a reference or a referenced type, the result is the alignment of the referenced type. If type-id is an array, the result is the alignment of the array element type. If type-id is a fundamental type, the result is implementation-defined. For example, on Blue Gene/Q, __alignof__(long) returns 8. The operand of __alignof__ can be a vector type, provided that vector support is enabled. For example, vector4double v1 = (vector4double) {1., 2., 3., 4.}; vector4double *pv1 = &v1; __alignof__(v1); // vector type alignment: 32 __alignof__(&v1); // address of vector alignment: 8 __alignof__(*pv1); // dereferenced pointer to vector alignment: 32 __alignof__(pv1); // pointer to vector alignment: 8 __alignof__(vector4double); // vector type alignment: 32 When the aligned attribute is applied to a vector variable, the value returned by __alignof__ is the actual alignment of the variable. The actual alignment is greater than the aligned attribute when the aligned attribute is less than the natural alignment of the vector type. For example: vector4double v1 __attribute__((aligned(4))); // // // // int alignment = __alignof__(v1); The aligned attribute is ignored because the alignment factor is less than the natural alignment of v1 // alignment will be 32, not 4 The sizeof operator The sizeof operator yields the size in bytes of the operand, which can be an expression or the parenthesized name of a type. sizeof operator syntax sizeof expr ( type-name ) The result for either kind of operand is not an lvalue, but a constant integer value. The type of the result is the unsigned integral type size_t defined in the header file stddef.h. Except in preprocessor directives, you can use a sizeof expression wherever an integral constant is required. One of the most common uses for the sizeof operator is to determine the size of objects that are referred to during storage allocation, input, and output functions. Another use of sizeof is in porting code across platforms. You can use the sizeof operator to determine the size that a data type represents. For example: sizeof(int); Chapter 8. Quad vector operators 29 The sizeof operator applied to a type name yields the amount of memory that can be used by an object of that type, including any internal or trailing padding. The operand of the sizeof operator can be a vector variable, a vector type, or the result of dereferencing a pointer to vector type, provided that vector support is enabled. In these cases, the return value of sizeof is always 32. IBM vector4double v1; vector4double *pv1 = &v1; sizeof(v1); // size of vector type: 32 sizeof(&v1); // size of address of vector: 8 sizeof(*pv1); // size of dereferenced pointer to vector: 32 sizeof(pv1); // size of pointer to vector: 8 sizeof(vector4double); // size of vector type: 32 IBM For compound types, results are as follows: Operand Result An array The result is the total number of bytes in the array. For example, in an array with 10 elements, the size is equal to 10 times the size of a single element. The compiler does not convert the array to a pointer before evaluating the expression. C++ A class The result is always nonzero. It is equal to the number of bytes in an object of that class, also including any padding required for placing class objects in an array. C++ A reference The result is the size of the referenced object. The sizeof operator cannot be applied to: v A bit field v A function type v An undefined structure or class v An incomplete type (such as void) The sizeof operator applied to an expression yields the same result as if it had been applied to only the name of the type of the expression. At compile time, the compiler analyzes the expression to determine its type. None of the usual type conversions that occur in the type analysis of the expression are directly attributable to the sizeof operator. However, if the operand contains operators that perform conversions, the compiler does take these conversions into consideration in determining the type. For example, the second line of the following sample causes the usual arithmetic conversions to be performed. Assuming that a short uses 2 bytes of storage and an int uses 4 bytes, short x; ... sizeof (x) short x; ... sizeof (x + 1) /* the value of sizeof operator is 2 */ /* value is 4, result of addition is type int */ The result of the expression x + 1 has type int and is equivalent to sizeof(int). The value is also 4 if x has type char, short, or int or any enumeration type. sizeof... is a unary expression operator introduced by the variadic template feature. This operator accepts an expression that names a parameter pack as its operand. It then expands the parameter pack and returns the number of arguments provided for the parameter pack. Consider the following example: C++0x 30 Blue Gene/Q vector data type for C/C++ template<typename...T> void foo(T...args){ int v = sizeof...(args); } In this example, the variable v is assigned to the number of the arguments provided for the parameter pack args. Notes: v The operand of the sizeof... operator must be an expression that names a parameter pack. v The operand of the sizeof operator cannot be an expression that names a parameter pack or a pack expansion. For more information, see Variadic templates (C++0x) The typeof operator (IBM extension) The typeof operator returns the type of its argument, which can be an expression or a type. The language feature provides a way to derive the type from an expression. Given an expression e, __typeof__(e) can be used anywhere a type name is needed, for example in a declaration or in a cast. The alternate spelling of the keyword, __typeof__, is recommended. The typeof operator is extended to accept a vector type as its operand, when vector support is enabled. typeof operator syntax __typeof__ typeof ( expr type-name ) A typeof construct itself is not an expression, but the name of a type. A typeof construct behaves like a type name defined using typedef, although the syntax resembles that of sizeof. The following examples illustrate its basic syntax. For an expression e: int e; __typeof__(e + 1) j; /* the same as declaring int j; */ e = (__typeof__(e)) f; /* the same as casting e = (int) f; */ Using a typeof construct is equivalent to declaring a typedef name. Given int T[2]; int i[2]; you can write __typeof__(i) a; __typeof__(int[2]) a; __typeof__(T) a; /* all three constructs have the same meaning */ The behavior of the code is as if you had declared int a[2];. Examples with vectors: vector4double v1 = (vector4double) {1., 2., 3., 4.}; __typeof__(v1) w1; // w1 has the vector4double type __typeof__(vector4double) w2; // w2 has the vector4double type Chapter 8. Quad vector operators 31 For a bit field, typeof represents the underlying type of the bit field. For example, int m:2;, the typeof(m) is int. Since the bit field property is not reserved, n in typeof(m) n; is the same as int n, but not int n:2. The typeof operator can be nested inside sizeof and itself. The following declarations of arr as an array of pointers to int are equivalent: int *arr[10]; /* traditional C declaration __typeof__(__typeof__ (int *)[10]) a; /* equivalent declaration */ */ The typeof operator can be useful in macro definitions where expression e is a parameter. For example, #define SWAP(a,b) { __typeof__(a) temp; temp = a; a = b; b = temp; } Note: 1. The typeof and __typeof__ keywords are supported as follows: v v C The __typeof__ keyword is recognized under compilation with the xlc invocation command or the -qlanglvl=extc89, -qlanglvl=extc99, or -qlanglvl=extended options. The typeof keyword is only recognized under compilation with -qkeyword=typeof. C++ The typeof and __typeof__ keywords are recognized by default. Assignment operators An assignment expression stores a value in the object designated by the left operand. There are two types of assignment operators: v “Simple assignment operator =” v “Compound assignment operators” on page 33 The left operand in all assignment expressions must be a modifiable lvalue. The type of the expression is the type of the left operand. The value of the expression is the value of the left operand after the assignment has completed. C The result of an assignment expression is not an lvalue. result of an assignment expression is an lvalue. C++ The All assignment operators have the same precedence and have right-to-left associativity. Simple assignment operator = The simple assignment operator has the following form: lvalue = expr The operator stores the value of the right operand expr in the object designated by the left operand lvalue. The left operand must be a modifiable lvalue. The type of an assignment operation is the type of the left operand. If the left operand is not a class type or a vector type, the right operand is implicitly converted to the type of the left operand. This converted type will not be qualified by const or volatile. 32 Blue Gene/Q vector data type for C/C++ If the left operand is a class type, that type must be complete. The copy assignment operator of the left operand will be called. If the left operand is an object of reference type, the compiler will assign the value of the right operand to the object denoted by the reference. The assignment operator has been extended to permit operands of vector type. Both sides of an assignment expression must be of the same vector type. IBM Compound assignment operators The compound assignment operators consist of a binary operator and the simple assignment operator. They perform the operation of the binary operator on both operands and store the result of that operation into the left operand, which must be a modifiable lvalue. The following table shows the operand types of compound assignment expressions: Operator Left operand Right operand += or -= Arithmetic Arithmetic += or -= Pointer Integral type *=, /=, and %= Arithmetic Arithmetic <<=, >>=, &=, ^=, and |= Integral type Integral type Note that the expression a *= b + c is equivalent to a = a * (b + c) and not a = a * b + c The following table lists the compound assignment operators and shows an expression using each operator: Operator Example Equivalent expression += index += 2 index = index + 2 -= *pointer -= 1 *pointer = *pointer - 1 *= bonus *= increase bonus = bonus * increase /= time /= hours time = time / hours %= allowance %= 1000 allowance = allowance % 1000 <<= result <<= num result = result << num >>= form >>= 1 form = form >> 1 &= mask &= 2 mask = mask & 2 ^= test ^= pre_test test = test ^ pre_test |= flag |= ON flag = flag | ON Chapter 8. Quad vector operators 33 Although the equivalent expression column shows the left operands (from the example column) twice, it is in effect evaluated only once. C++ In addition to the table of operand types, an expression is implicitly converted to the cv-unqualified type of the left operand if it is not of class type. However, if the left operand is of class type, the class becomes complete, and assignment to objects of the class behaves as a copy assignment operation. Compound expressions and conditional expressions are lvalues in C++, which allows them to be a left operand in a compound assignment expression. C When GNU C language features have been enabled, compound expressions and conditional expressions are allowed as lvalues, provided that their operands are lvalues. The following compound assignment of the compound expression (a, b) is legal under GNU C, provided that expression b, or more generally, the last expression in the sequence, is an lvalue: IBM (a,b) += 5 /* Under GNU C, this is equivalent to a, (b += 5) */ Vector subscripting operator [ ] (IBM extension) Access to individual elements of a vector data type is provided through the use of square brackets, similar to how array elements are accessed. The vector data type is followed by a set of square brackets containing the position of the element. The position of the first element is 0. The type of the result is the type of the elements contained in the vector type. Example: vector4double v1 = (vector4double) {1.0, 2.0, 3.0, 4.0}; double d1, d2, d3, d4; d1 = v1[0]; // d1=1.0 d2 = v1[1]; // d2=2.0 d3 = v1[2]; // d3=3.0 d4 = v1[3]; // d4=4.0 Note: You can also access and manipulate individual elements of vectors with the following intrinsic functions: v vec_extract v vec_insert v vec_promote v vec_splats 34 Blue Gene/Q vector data type for C/C++ Chapter 9. Inline assembly statements (IBM extension) Under extended language levels, the compiler provides full support for embedded assembly code fragments among C and C++ source statements. This extension has been implemented for use in general system programming code, and in the operating system kernel and device drivers, which were originally developed with GNU C. The keyword asm stands for assembly code. When strict language levels are used in compilation, the C compiler recognizes and ignores the keyword asm in a declaration. The C++ compiler always recognizes the keyword. The syntax is as follows: asm statement syntax — statement in local scope asm __asm __asm__ ( volatile code_format_string ) : output : input : clobbers input: , constraint ( C_expression ) modifier output: , modifier constraint ( C_expression ) asm statement syntax — statement in global scope asm __asm __asm__ ( code_format_string ) volatile The qualifier volatile instructs the compiler to perform only minimal optimizations on the assembly block. The compiler cannot move any instructions across the implicit fences surrounding the assembly block. See Example 1 for detailed usage information. code_format_string The code_format_string is the source text of the asm instructions and is a string literal similar to a printf format specifier. 35 Operands are referred to in the %integer format, where integer refers to the sequential number of the input or output operand. See Example 1 for detailed usage information. To increase readability, each operand can be given a symbolic name enclosed in brackets. In the assembler code section, you can refer to each operand in the %[symbolic_name] format, where the symbolic_name is referenced in the operand list. You can use any name, including existing C or C++ symbols, for a symbolic operand, because the symbolic operand names have no relation to any C or C++ identifiers. However, no two operands in the same assembly statement can use the same symbolic name. See Example 2 for detailed usage information. output The output consists of zero, one or more output operands, separated by commas. Each operand consists of a constraint(C_expression) pair. The output operand must be constrained by the = or + modifier (described below), and, optionally, by an additional % or & modifier. input The input consists of zero, one or more input operands, separated by commas. Each operand consists of a constraint(C_expression) pair. clobbers clobbers is a comma-separated list of register names enclosed in double quotes. If an asm instruction updates registers that are not listed in the input or output of the asm statement, the registers must be listed as clobbered registers. The following register names are valid : r0 to r31 General purpose registers f0 to f31 Floating-point registers lr Link register ctr Loop count, decrement and branching register fpscr Floating-point status and control register xer Fixed-point exception register cr0 to cr7 Condition registers. Example 3 shows a typical use of condition registers in the clobbers. v0 to v31 Vector registers (on selected processors only) In addition to the register names, cc and memory can also be used in the list of clobbered registers. The usage information of cc and memory is listed as follows: cc Add cc to the list of clobbered registers if assembler instructions can alter the condition code register. memory Add memory to the list of clobbered registers if assembler instructions can change a memory location in an unpredictable fashion. The memory clobber ensures that the data used after the completion of the assembly statement is valid and synchronized. 36 Blue Gene/Q vector data type for C/C++ However, the memory clobber can result in many unnecessary reloads, reducing the benefits of hardware prefetching. Thus, the memory clobber can impose a performance penalty and should be used with caution. See Example 4 and Example 1 for the detailed usage information. modifier The modifier can be one of the following operators: = Indicates that the operand is write-only for this instruction. The previous value is discarded and replaced by output data. See Example 5 for detailed usage information. + Indicates that the operand is both read and written by the instruction. See Example 6 for detailed usage information. & Indicates that the operand may be modified before the instruction is finished using the input operands; a register that is used as input should not be reused here. % Declares the instruction to be commutative for this operand and the following operand. This means that the order of this operand and the next may be swapped when generating the instruction. This modifier can be used on an input or output operand, but cannot be specified on the last operand. See Example 7 for detailed usage information. constraint The constraint is a string literal that describes the kind of operand that is permitted, one character per constraint. The following constraints are supported: b Use a general register other than zero. Some instructions treat the designation of register 0 specially, and do not behave as expected if the compiler chooses r0. For these instructions, the designation of r0 does not mean that r0 is used. Instead, it means that the literal value 0 is specified. See Example 8 for detailed usage information. c Use the CTR register. f Use a floating-point register. See Example 7 for detailed usage information. g Use a general register, memory, or immediate operand. In POWER, there are no instructions where a register, memory specifier, or immediate operand can be used interchangeably. However, this constraint is tolerated where it is possible to do so. h Use the CTR or LINK register. i Use an immediate integer or string literal operand. l Use the CTR register. m Use a memory operand supported by the machine. You can use this constraint for operands of the form D(R), where D is a displacement and R is a register. See Example 9 for detailed usage information. n Use an immediate integer. o Use a memory operand that is offsetable. This means that the memory operand can be addressed by adding an integer to a base Chapter 9. Inline assembly statements (IBM extension) 37 address. In POWER, memory operands are always offsetable, so the constraints o and m can be used interchangeably. r Use a general register. See Example 5 for detailed usage information. s Use a string literal operand. v Use a vector register. In the inline assembly statements, the input or output operands can be of the vector4double type. To allocate a register for an operand of the vector4double type, you must use the v constraint. See Example 10 for detailed usage information. 0, 1, 2, ... A matching constraint. Allocate the same register in output as in the corresponding input. I, J, K, L, M, N, O, P Constant values. Fold the expression in the operand and substitute the value into the % specifier. These constraints specify a maximum value for the operand, as follows: v v v v v I — signed 16-bit J — unsigned 16-bit shifted left 16 bits K — unsigned 16-bit constant L — signed 16-bit shifted left 16 bits M — unsigned constant greater than 31 v N — unsigned constant that is an exact power of 2 v O — zero v P — signed whose negation is a signed 16-bit constant C_expression The C_expression is a C or C++ expression whose value is used as the operand for the asm instruction. Output operands must be modifiable lvalues. The C_expression must be consistent with the constraint specified on it. For example, if i is specified, the operand must be an integer constant number. Note: If pointer expressions are used in input or output, the assembly instructions should honor the ANSI aliasing rule (see Type-based aliasing for more information). This means that indirect addressing using values in pointer expression operands should be consistent with the pointer types; otherwise, you must disable the -qalias=ansi option during compilation. Supported and unsupported constructs Supported constructs The inline assembly statements support the following constructs: v All the instruction statements listed in the Assembler Language Reference v All extended instruction mnemonics v Label definitions v Branches to labels 38 Blue Gene/Q vector data type for C/C++ Unsupported constructs The inline assembly statements do not support the following constructs: v Pseudo-operation statements, which are assembly statements that begin with a dot (.), such as .function v Branches between different asm blocks In addition, some constraints originating from the GNU compiler are not supported, but are tolerated where it is possible. For example, constraints S and T are treated as immediates, but the compiler issues a warning message stating that they are unsupported. Restrictions on inline assembly statements The following restrictions are on the use of inline assembly statements: v The assembler instructions must be self-contained within an asm statement. The asm statement can only be used to generate instructions. All connections to the rest of the program must be established through the output and input operand list. v Referencing an external symbol directly, without going through the operand list, is not supported. v Assembler instructions requiring a pair of registers are not specifiable by any constraints, and are therefore not supported. For example, you cannot use the %f constraint for a long double operand. v The shared register file between the floating-point scalar and the vector registers on POWER7® are not modelled as shared in inline assembly statements. You must specify registers f0-f31 and v0-v31 in the clobbers list. There is no combined x0-x63. v Operand replacements (such as %0, %1, and so on) can use an optional x before the number or symbolic name to indicate that a vsx register reference must be used. For example, a vector operand %1 allocated to register v0 is replaced with 0 (for use in VMX instructions). The same operand used as %x1 in the assembly text is replaced with 32 (for use in VSX instructions). Note that this restriction applies only for architectures that support VSX architecture extension, such as POWER7). Examples of inline assembly statements Example 1: The following example illustrates the usage of the volatile keyword. #include <stdio.h> inline bool acquireLock(int *lock){ bool returnvalue = false; int lockval; asm volatile( /*--------a fence here-----*/ " 0: lwarx %0,0,%2 \n" // Loads the word and reserves; reserves a // memory location for the subsequent stwcx. // instruction. " " cmpwi %0,0 bne- 1f \n" // Compares the lock value to 0. \n" // If it is 0, you can acquire // the lock. Otherwise, you fail to get // the lock and must try again later. " ori %0,%0,1 \n" // Sets the lock to 1. Chapter 9. Inline assembly statements (IBM extension) 39 " stwcx. %0,0,%2 \n" // Tries to conditionally store 1 // into the lock word to acquire // the lock. " bne- 0b \n" // Reservation was lost. Try again. " isync \n" // // // // // // " ori %1,%1,1 Lock acquired. The isync instruction implements an import barrier to ensure that the instructions that access the shared region guarded by this lock are executed only after they acquire the lock. \n" // Sets the return value for the function // acquireLock to true. " 1: \n" // Did not get the lock. Will return false. /*------a fence here------*/ : : : ); "+r" "+r" "r" (lockval), (returnvalue) (lock) // Lock is the address of the lock in // memory. "cr0" // cr0 is clobbered by cmpwi and stwcx. return returnvalue; } int main() { int myLock; if(acquireLock(&myLock)){ printf("got it!\n"); }else{ printf("someone else got it\n"); } return 0; } In this example, %0 refers to the first operand "+r"(lockval), %1 refers to the second operand "+r"(returnvalue), and %2 refers to the third operand "r"(lock). The assembly statement uses a lock to control access to the shared storage; no instruction can access the shared storage before acquiring the lock. The volatile keyword implies fences around the assembly instruction group, so that no assembly instructions can be moved out of or around the assembly block. Without the volatile keyword, the compiler can move the instructions around for optimization. This might cause some instructions to access the shared storage without acquiring the lock. It is unnecessary to use the memory clobber in this assembly statement, because the instructions do not modify memory in an unexpected way. If you use the memory clobber, the program is still functionally correct. However, the memory clobber results in many unnecessary reloads, imposing a performance penalty. Example 2: The following example illustrates the use of the symbolic names for input and output operands. int a ; int b = 1, c = 2, d = 3 ; __asm(" addc %[result], %[first], %[second]" 40 Blue Gene/Q vector data type for C/C++ : [result] : [first] [second] ); "=r" "r" "r" (a) (b), (d) In this example, %[result] refers to the output operand variable a, %[first] refers to the input operand variable b, and %[second] refers to the input operand variable d. Example 3: The following example shows a typical use of condition registers in the clobbers. asm (" : : : ); add. %0,%1,%2 "=r" (c) "r" (a), "r" (b) "cr0" \n" In this example, apart from the registers listed in the input and output of the assembly statement, the add. instruction also affects the condition register field 0. Therefore, you must inform the compiler about this by adding cr0 to the clobbers. Example 4: The following example shows the usage of the memory clobber. asm volatile (" dcbz 0, %0 : "=r"(b) : : "memory" ); \n" In this example, the instruction dcbz clears a cache block, and might have changed the variables in the memory location. There is no way for the compiler to know which variables have been changed. Therefore, the compiler assumes that all data might be aliased with the memory changed by that instruction. As a result, everything that is needed must be reloaded from memory after the completion of the assembly statement. The memory clobber ensures program correctness at the expense of program performance, because the compiler might reload data that had nothing to do with the assembly statement. Example 5: The following example shows the usage of the = modifier and the r constraint. int a ; int b = 100 ; int c = 200 ; asm(" add %0, : "=r" : "r" "r" ); %1, %2" (a) (b), (c) The add instruction adds the contents of two general purpose registers. The %0, %1, and %2 operands are substituted by the C expressions in the output/input operand fields. The output operand uses the = modifier to indicate that a modifiable operand is required; it uses the r constraint to indicate that a general purpose register is required. Likewise, the r constraint in the input operands indicates that general purpose registers are required. Within these restrictions, the compiler is free to choose any registers to substitute for %0, %1, and %2. Chapter 9. Inline assembly statements (IBM extension) 41 Note: If the compiler chooses r0 for the second operand, the add instruction uses the literal value 0 and yields an unexpected result. Thus, to prevent the compiler from choosing r0 for the second operand, you can use the b constraint to denote the second operand. Example 6: The following example shows the usage of the + modifier and the K constraint. asm (" : : addi %0,%0,%2" "+r" (a) "r" (a), "K" (15) ); This assembly statement adds operand %0 and operand %2, and writes the result to operand %0. The output operand uses the + modifier to indicate that operand %0 can be read and written by the instruction. The K constraint indicates that the value loaded to operand %2 must be an unsigned 16-bit constant value. Example 7: The following example shows the usage of the % modifier and the f constraint. asm(" : : fadd %0, %1, %2" "=f" (c) "%f" (a), "f" (b) ); This assembly statement adds operands a and b, and writes the result to operand c. The % modifier indicates that operands a and b can be switched if the compiler can generate better code in doing so. Each operand has the f constraint, which indicates that a floating point register is required. Example 8: The following example shows the usage of the b constraint. char res[8]={’a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’}; char a=’y’; int index=7; asm (" : : stbx %0,%1,%2 "r" "b" "r" \n" \ \ (a), (index), (res) ); In this example, the b constraint instructs the compiler to choose a general register other than r0 for the input operand %1. The result string of this program is abcdefgy. However, if you use the r constraint and the compiler chooses r0 for %1, this instruction produces an incorrect result string ybcdefgh. For instructions that treat the designation of r0 specially, it is therefore important to denote the input operands with the b constraint. Example 9: The following example shows the usage of the m constraint. asm (" : : ); stb %1,%0 "=m" (res) "r" (a) \n" \ \ In this example, the syntax of the instruction stb is stb RS,D(RA), where D is a displacement and R is a register. D+RA forms an effective address, which is 42 Blue Gene/Q vector data type for C/C++ calculated from D(RA). You do not need to manually construct effective addresses by specifying the register and displacement separately. You can use a single constraint m or o to refer to the two operands in the instruction, regardless of what the correct offset should be and whether it is an offset off the stack or off the TOC (Table of Contents). This allows the compiler to choose the right register (r1 for an automatic variable, for instance) and apply the right displacement automatically. Example 10: The following example shows the usage of the v constraint. vector4double rv, av, bv, cv; ... __asm(" qvfadd 4, %1, %2 \n", " qvfadd %0, 4, %3 \n" /* ouput register */ : "=v" (rv) /* : input registers */ "v" (av), "v" (bv), "v" (cv) /* : ); clobbered register "f4" */ In this example, the inline assembly statement adds the operands av, bv, and cv; writes the result to the rv operand. A temporary vector register, register 4, is used to store the sum of the operands av and bv. The v constraint instructs the compiler to allocate a vector register for the av, bv, cv, or rv operand. The qpx registers and floating point scalar registers are physically the same registers, so you can list only the floating point registers that are altered as the clobbered registers. Chapter 9. Inline assembly statements (IBM extension) 43 44 Blue Gene/Q vector data type for C/C++ Chapter 10. Vector built-in functions Individual elements of vectors can be accessed by using the Quad Processing Extension (QPX) built-in functions. This section provides an alphabetical reference to the QPX built-in functions. You can use these functions to manipulate vectors. You must specify appropriate compiler options for your architecture when you use the built-in functions. This section uses pseudocode description to represent function syntax, as shown below: d=func_name(a, b, c) In the description, v d represents the return value of the function. v a, b, and c represent the arguments of the function. v func_name is the name of the function. For example, the syntax for the function vector4double vec_add(vector4double, vector4double); is represented by d=vec_add(a, b). Some built-in functions depend on the value of the floating-point status and control register (FPSCR). For information on the FPSCR, see FPSCR functions. Floating-point operands for logical functions In the quad vector logical functions, such as vec_and, floating-point operands are interpreted in the following ways: v Any value that is greater than or equal to zero (both positive zero and negative zero) is interpreted as the true logical value. v Any value that is less than zero is interpreted as the false logical value. v NaN is interpreted as false. In the result values, floating-point boolean values are as follows: v true is 1.0. v false is -1.0. Load and store functions With the load and store functions, you can load quad vectors from memory and store them to memory. vec_ld, vec_lda Purpose Loads a vector from the given memory address. Syntax d=vec_ld(a, b) d=vec_lda(a, b) 45 Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long long* unsigned long* long long* unsigned long long* float* _Complex float* double* _Complex double* Result value The effective address (EA) is the sum of a and b. The effective address is truncated to an n-byte alignment depending on the type of b as shown in the following table. The result is the content of the n bytes of memory starting at the effective address. Type of b n long* unsigned long* long long* unsigned long long* 32 float* 16 _Complex float* double* 32 _Complex double* vec_lda generates an exception (SIGBUS) if the effective address is not aligned to the appropriate memory boundary indicated in the table. If b is a pointer to a variable of the single-precision floating-point type or single-precision complex type, the values loaded from memory are converted to double precision before being saved to the result value. Formula The following table shows the formulas depending on the type of b. 46 Type of b Formula long* unsigned long* long long* unsigned long long* d[0]=Memory[EA] d[1]=Memory[EA+8] d[2]=Memory[EA+16] d[3]=Memory[EA+24] Blue Gene/Q vector data type for C/C++ Type of b Formula float* d[0]=(double) d[1]=(double) d[2]=(double) d[3]=(double) _Complex float* double* Memory_SP[EA] Memory_SP[EA+4] Memory_SP[EA+8] Memory_SP[EA+12] d[0]=Memory[EA] d[1]=Memory[EA+8] d[2]=Memory[EA+16] d[3]=Memory[EA+24] _Complex double* Note: Memory_SP[] is a single-precision floating-point array. Example Type of b Memory values d long* unsigned long* long long* unsigned long long* 0x4024000000000000, 0x4034000000000000, 0x403E000000000000, 0x4044000000000000 (10.0, 20.0, 30.0, 40.0) float* 10.0f, 20.0f, 30.0f, 40.0f (10.0, 20.0, 30.0, 40.0) _Complex float* (10.0f, 20.0f) (30.0f, 40.0f) double* 10.0, 20.0, 30.0, 40.0 _Complex double* (10.0, 20.0) (30.0, 40.0) vec_ldia, vec_ldiaa Purpose Loads a vector from four 4-byte signed integer values at the given memory address, with sign extension to 8-byte signed integer values. Syntax d=vec_ldia(a, b) d=vec_ldiaa(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long int* Result value The effective address (EA) is the sum of a and b. The effective address is truncated to a 16-byte alignment. The contents of the 16 bytes starting at the effective address are loaded from memory. They are then converted from four 4-byte signed integer values to four 8-byte signed integer values before being saved in the result value. vec_ldiaa generates an exception (SIGBUS) if the effective address is not aligned to a 16-byte memory boundary. Chapter 10. Vector built-in functions 47 Formula d[0] d[1] d[2] d[3] = = = = (long) (long) (long) (long) Memory_4B[EA] Memory_4B[EA+4] Memory_4B[EA+8] Memory_4B[EA+12] Note: Memory_4B[] is a 4-byte signed integer array. Example Memory values: (10, -20, 30, -40) Convert result values d to IEEE floating point numbers using: d2 = vec_cfid(d) d2: (10.0, -20.0, 30.0, -40.0) vec_ldiz, vec_ldiza Purpose Loads a vector from four 4-byte unsigned integer values at the given memory address, with zero extension to 8-byte unsigned integer values. Syntax d=vec_ldiz(a, b) d=vec_ldiza(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long unsigned* Result value The effective address (EA) is the sum of a and b. The effective address is truncated to a 16-byte alignment. The contents of the 16 bytes starting at the effective address are loaded from memory. Each of their four 4-byte integer values is extended with zeros to fill 8-byte integer values before being saved in the result value. vec_ldiza generates an exception (SIGBUS) if the effective address is not aligned to a 16-byte memory boundary. Formula d[0] d[1] d[2] d[3] = = = = (unsigned (unsigned (unsigned (unsigned long) long) long) long) Memory_4B[EA] Memory_4B[EA+4] Memory_4B[EA+8] Memory_4B[EA+12] Note: Memory_4B[] is a 4-byte integer array. 48 Blue Gene/Q vector data type for C/C++ Example Memory values: (10, 20, 30, 40) Convert result values d to IEEE floating point numbers using: d2 = vec_cfid(d) d2: (10.0, 20.0, 30.0, 40.0) vec_lds, vec_ldsa Purpose Loads a vector from a single floating-point or complex value at the given memory address. Syntax d=vec_lds(a, b) d=vec_ldsa(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long double* (only for vec_lds) float* (only for vec_lds) _Complex double* _Complex float* Result value The effective address (EA) is the sum of a and b. If b is a pointer to a complex value, the effective address is truncated to an n-byte alignment depending on the type of b as shown in the following table. The loaded value or complex value is replicated to fill the result. Type of b n _Complex double* _Complex float* 16 8 vec_ldsa generates an exception (SIGBUS) if the effective address is not aligned to the appropriate memory boundary indicated in the table. If b is a pointer to a variable of the single-precision floating-point type or single-precision complex type, the values loaded from memory are converted to double precision before being saved to the result value. Chapter 10. Vector built-in functions 49 Formula The following table shows the formulas depending on the type of b. Type of b _Complex double* double* float* _Complex float* d[0] Memory[EA] (double) Memory_SP[EA] Memory[EA] (double) Memory_SP[EA] d[1] Memory[EA] (double) Memory_SP[EA] Memory[EA+8] (double) Memory_SP[EA+4] d[2] Memory[EA] (double) Memory_SP[EA] Memory[EA] (double) Memory_SP[EA] d[3] Memory[EA] (double) Memory_SP[EA] Memory[EA+8] (double) Memory_SP[EA+4] Note: Memory_SP[] is a single-precision floating-point array. Example Type of b double* float* _Complex double* _Complex float* Memory values 10.0 10.0f (10.0, 20.0) (10.0f, 20.0f) d (10.0, 10.0, 10.0, 10.0) (10.0, 20.0, 10.0, 20.0) vec_ld2, vec_ld2a Purpose Loads a vector from two floating-point values at a given memory address. Syntax d=vec_ld2(a, b) d=vec_ld2a(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long double* float* Result value The effective address (EA) is the sum of a and b. The effective address is truncated to an n-byte alignment depending on the type of b as shown in the following table. n bytes of memory are loaded from memory starting at the effective address and replicated to fill the result. 50 Blue Gene/Q vector data type for C/C++ Type of b n double* float* 16 8 vec_ld2a generates an exception (SIGBUS) if the effective address is not aligned to the appropriate memory boundary indicated in the table. If b is a pointer to a variable of the single-precision floating-point type, the values loaded from memory are converted to double precision before being saved to the result value. Formula The following table shows the formulas depending on the type of b. Type of b double* float* d[0] Memory[EA] (double) Memory_SP[EA] d[1] Memory[EA+8] (double) Memory_SP[EA+4] d[2] Memory[EA] (double) Memory_SP[EA] d[3] Memory[EA+8] (double) Memory_SP[EA+4] Note: Memory_SP[] is a single-precision floating-point array. Example Type of b double* float* Memory values 10.0, 20.0 10.0f, 20.0f d (10.0, 20.0, 10.0, 20.0) vec_st, vec_sta Purpose Stores a vector to memory at the given address. Syntax vec_st(a, b, c) vec_sta(a, b, c) Chapter 10. Vector built-in functions 51 Argument types The following table describes the types of the function arguments. a b c vector4double long int* unsigned* long* unsigned long* long long* unsigned long long* float* _Complex float* double* _Complex double* Result The effective address (EA) is the sum of b and c. The effective address is truncated to an n-byte alignment depending on the type of c as shown in the following table. The value of a is then stored at the effective address. Type of c n int* unsigned* 16 long* unsigned long* long long* unsigned long long* 32 float* 16 _Complex float* double* 32 _Complex double* vec_sta generates an exception (SIGBUS) if the effective address is not aligned to the appropriate memory boundary indicated in the table. If c is a pointer to a variable of single-precision floating-point type or single-precision complex type, the elements of a are converted to single precision before being saved to memory. If c is a pointer to a variable of 4-byte integer type, the four low-order bytes of the elements of a are saved to memory. 52 Blue Gene/Q vector data type for C/C++ Formula The following table shows the formulas depending on the type of c. Type of c Formula int* unsigned* Memory_4B[EA]=a[0]32:63 Memory_4B[EA+4]=a[1]32:63 Memory_4B[EA+8]=a[2]32:63 Memory_4B[EA+12]=a[3]32:63 long* unsigned long* long long* unsigned long long* Memory[EA]=a[0] Memory[EA+8]=a[1] Memory[EA+16]=a[2] Memory[EA+24]=a[3] float* Memory_SP[EA]=(float) a[0] Memory_SP[EA+4]=(float) a[1] Memory_SP[EA+8]=(float) a[2] Memory_SP[EA+12]=(float) a[3] _Complex float* double* Memory[EA]=a[0] Memory[EA+8]=a[1] Memory[EA+16]=a[2] Memory[EA+24]=a[3] _Complex double* Notes: v Memory_SP[] is a single-precision floating-point array. v Memory_4B[] is a 4-byte integer array. Examples Type of c a Memory values int* unsigned* (10, 20, 30, 40) 10, 20, 30, 40 long* (10.0, 20.0, 30.0, 40.0) unsigned long* long long* unsigned long long* 0x4024000000000000, 0x4034000000000000, 0x403E000000000000, 0x4044000000000000 float* 10.0f, 20.0f, 30.0f, 40.0f (10.0, 20.0, 30.0, 40.0) _Complex float* (10.0f, 20.0f) (30.0f, 40.0f) double* 10.0, 20.0, 30.0, 40.0 _Complex double* (10.0, 20.0) (30.0, 40.0) vec_sts, vec_stsa Purpose Stores the first element or the first two elements of a quad vector to memory at the given address. Syntax vec_sts(a, b, c) vec_stsa(a, b, c) Chapter 10. Vector built-in functions 53 Argument types The following table describes the types of the function arguments. a b c vector4double long double* (only for vec_sts) float* (only for vec_sts) _Complex double* _Complex float* Result The effective address (EA) is the sum of b and c. If c is a pointer to a complex value, the effective address is truncated to an n-byte alignment depending on the type of c as shown in the following table. The value of a is then stored to the effective address as follows: v If c is a pointer to a variable of floating-point type, the first element of a is stored to memory. v If c is a pointer to a variable of complex type, the first two elements of a are stored to memory. Type of c n _Complex double* _Complex float* 16 8 vec_stsa generates an exception (SIGBUS) if the effective address is not aligned to the appropriate memory boundary indicated in the table. If c is a pointer to a variable of single-precision floating-point type or single-precision complex type, the elements of a are converted to single precision before being saved to memory. Formula The following tables show the formulas depending on the type of c. Type of c double* Formula Memory[EA] = a[0] _Complex double* Memory[EA] = a[0] Memory[EA+8] = a[1] float* _Complex float* Memory_SP[EA] = (float) a[0] Memory_SP[EA] = (float) a[0] Memory_SP[EA+4] = (float) a[1] Note: Memory_SP[] is a single-precision floating-point array. 54 Blue Gene/Q vector data type for C/C++ Examples Type of c _Complex double* double* a (10.0, 20.0, 30.0, 40.0) Memory values 10.0 (10.0, 20.0) float* _Complex float* 10.0f (10.0f, 20.0f) vec_st2, vec_st2a Purpose Stores the first two elements of a quad vector to memory at the given address. Syntax vec_st2(a, b, c) vec_st2a(a, b, c) Argument types The following table describes the types of the function arguments. a b c vector4double long double* float* Result The effective address (EA) is the sum of b and c. The effective address is truncated to an n-byte alignment depending on the type of c as shown in the following table. The first two elements of a are then stored at the effective address. Type of c n double* float* 16 8 vec_st2a generates an exception (SIGBUS) if the effective address is not aligned to the appropriate memory boundary indicated in the table. If c is a pointer to a variable of single-precision floating-point type, the elements of a are converted to single precision before being saved to memory. Formula The following table shows the formulas depending on the type of c. Type of c Formula double* float* Memory[EA]=a[0] Memory[EA+8]=a[1] Memory_SP[EA]=(float) a[0] Memory_SP[EA+4]=(float) a[1] Chapter 10. Vector built-in functions 55 Note: Memory_SP[] is a single-precision floating-point array. Examples Type of c double* float* a (10.0, 20.0, 30.0, 40.0) Memory values 10.0, 20.0 10.0f, 20.0f Unary arithmetic functions This section provides a reference to the quad vector unary arithmetic functions. vec_abs Purpose Returns a vector containing the absolute values of the contents of the given vector. Syntax d=vec_abs(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value The value of each element of the result is the absolute value of the corresponding element of a. Formula d[0] d[1] d[2] d[3] = = = = |a[0]| |a[1]| |a[2]| |a[3]| Example a = (10.0, -20.0, 30.0, -40.0) d: (10.0, 20.0, 30.0, 40.0) vec_neg Purpose Returns a vector containing the negated value of the corresponding elements in the given vector. 56 Blue Gene/Q vector data type for C/C++ Syntax d=vec_neg(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value This function multiplies the value of each element in the given vector by -1.0 and then assigns the result to the corresponding elements in the result vector. Formula d[0] d[1] d[2] d[3] = = = = -a[0] -a[1] -a[2] -a[3] Example a = ( 10.0, -20.0, 30.0, -40.0) d: (-10.0, 20.0, -30.0, 40.0) vec_nabs Purpose Returns a vector containing the results of performing a negative-absolute operation using the given vector. Syntax d=vec_nabs(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value This function computes the absolute value of each element in the given vector and then assigns the negated value of the result to the corresponding elements in the result vector. Chapter 10. Vector built-in functions 57 Formula d[0] d[1] d[2] d[3] = = = = -|a[0]| -|a[1]| -|a[2]| -|a[3]| Example a = ( 10.0, -20.0, 30.0, -40.0) d: (-10.0, -20.0, -30.0, -40.0) vec_re Purpose Returns a vector containing estimates of the reciprocals of the corresponding elements of the given vector. Syntax d=vec_re(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of the result contains the estimated value of the reciprocal of the corresponding element of a. Note: The precision guarantee is specified by the following expression, where x is the value of each element of a and r is the value of the corresponding element of the result value: | (r-1/x) / (1/x) | ≤ 1/256 Special operands Special operands are handled as follows: Operand Estimate Exception -Infinity -0 None -0 58 -Infinity 1 1 ZX ZX +0 +Infinity +Infinity +0 None SNaN QNaN2 VXSNAN QNaN QNaN None Blue Gene/Q vector data type for C/C++ Operand Estimate Exception 1. No result if FPSCRZE = 1. 2. No result if FPSCRVE = 1. Formula d[0] d[1] d[2] d[3] = = = = 1 1 1 1 / / / / a[0] a[1] a[2] a[3] Example a = (2.0, 4.0, 5.0, 8.0) d: (0.5, 0.25, 0.2, 0.125) vec_res Purpose Returns a vector containing estimates of the reciprocals of the corresponding elements of the given vector. Syntax d=vec_res(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value The double-precision elements of a are first truncated to single-precision values. An estimate of the reciprocal of each single-precision element of a is then converted to double precision and saved in the corresponding element of the result. Note: The precision guarantee is specified by the following expression, where x is the value of each element of a and r is the value of the corresponding element of the result value: | (r-1/x) / (1/x) | ≤ 1/256 Special operands Special operands are handled as follows: Operand Estimate Exception -Infinity -0 None Chapter 10. Vector built-in functions 59 Operand Estimate -0 -Infinity +0 +Infinity +Infinity +0 SNaN QNaN QNaN QNaN Exception 1 ZX 1 ZX None 2 VXSNAN None 1. No result if FPSCRZE = 1. 2. No result if FPSCRVE = 1. Formula d[0] d[1] d[2] d[3] = = = = (double) (double) (double) (double) (1 (1 (1 (1 / / / / (float) (float) (float) (float) a[0]) a[1]) a[2]) a[3]) Example a = (2.0, 4.0, 5.0, 8.0) d: (0.5, 0.25, 0.2, 0.125) vec_rsqrte Purpose Returns a vector containing estimates of the reciprocal square roots of the corresponding elements of the given vector. Syntax d=vec_rsqrte(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of the result contains the estimated value of the reciprocal square root of the corresponding element of a. Note: The precision guarantee is specified by the following expression, where x is the value of each element of a and r is the value of the corresponding element of the result value: | (r-1/'x) / 1/'x | ≤ 1/32 60 Blue Gene/Q vector data type for C/C++ Special operands Special operands are handled as follows: Operand Estimate -Infinity <0 -0 QNaN 2 QNaN 2 -Infinity VXSQRT VXSQRT 1 +0 +Infinity +Infinity +0 SNaN QNaN QNaN QNaN Exception ZX 1 ZX None 2 VXSNAN None 1. No result if FPSCRZE = 1. 2. No result if FPSCRVE = 1. Formula d[0] d[1] d[2] d[3] = = = = 1 1 1 1 / / / / 'a[0] 'a[1] 'a[2] 'a[3] Example a = (4.0, 16.0, 25.0, 64.0) d: (0.5, 0.25, 0.2, 0.125) vec_rsqrtes Purpose Returns a vector containing estimates of the reciprocal square roots of the corresponding elements of the given vector. Syntax d=vec_rsqrtes(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value The double-precision elements of a are first truncated to single-precision values. An estimate of the reciprocal square root of each single-precision element of a is then converted to double precision and saved in the corresponding element of the result. Note: Chapter 10. Vector built-in functions 61 The precision guarantee is specified by the following expression, where x is the value of each element of a and r is the value of the corresponding element of the result value: | (r-1/'x) / 1/'x | ≤ 1/32 Special operands Special operands are handled as follows: Operand Estimate -Infinity <0 -0 QNaN 2 QNaN 2 -Infinity Exception VXSQRT VXSQRT 1 +0 +Infinity +Infinity +0 ZX 1 ZX None SNaN QNaN QNaN QNaN 2 VXSNAN None 1. No result if FPSCRZE = 1. 2. No result if FPSCRVE = 1. Formula d[0] d[1] d[2] d[3] = = = = (double) (double) (double) (double) (1 (1 (1 (1 / / / / ' ' ' ' (float) (float) (float) (float) a[0]) a[1]) a[2]) a[3]) Example a = (4.0, 16.0, 25.0, 64.0) d: (0.5, 0.25, 0.2, 0.125) vec_swsqrt, vec_swsqrt_nochk Purpose Returns a vector containing the square root of each element in the given vector. Syntax d=vec_swsqrt(a) d=vec_swsqrt_nochk(a) Result and argument types The following table describes the types of the returned value and the function arguments. 62 d a vector4double vector4double Blue Gene/Q vector data type for C/C++ For vec_swsqrt_nochk, the compiler does not check the validity of the arguments. You must ensure that the following condition is satisfied where x represents each element of a: v 2-969 <= x < Infinity Result value The result value is a quad vector that contains the square root of each element of a. When the following options are used, the result is bitwise identical to the IEEE square root. v -qstrict=precision v -qstrict=ieeefp v -qstrict=zerosigns v -qstrict=operationprecision Otherwise, the result might differ slightly from the IEEE square root. Formula d[0] d[1] d[2] d[3] = = = = 'a[0] 'a[1] 'a[2] 'a[3] Example a = ( 4.0, 9.0, 16.0, 25.0) d: ( 2.0, 3.0, 4.0, 5.0) vec_swsqrts, vec_swsqrts_nochk Purpose Returns a vector containing estimates of the square roots of the corresponding elements of the given vector. Syntax d=vec_swsqrts(a) d=vec_swsqrts_nochk(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double For vec_swsqrts_nochk, the compiler does not check the validity of the arguments. You must ensure that the following condition is satisfied where x represents each element of a: v 2-102 <= x < Infinity Chapter 10. Vector built-in functions 63 Result value The double-precision elements of a are first truncated to single-precision values. The square root of each single-precision element of a is then converted to double-precision and saved in the corresponding element of the result. When the following options are used, the result is bitwise identical to the IEEE square root. v -qstrict=precision v -qstrict=ieeefp v -qstrict=zerosigns v -qstrict=operationprecision Otherwise, the result might differ slightly from the IEEE square root. Formula d[0] d[1] d[2] d[3] = = = = (double) (double) (double) (double) ' ' ' ' ((float) ((float) ((float) ((float) a[0]) a[1]) a[2]) a[3]) Example a = ( 4.0, 9.0, 16.0, 25.0) d: ( 2.0, 3.0, 4.0, 5.0) Binary arithmetic functions This section provides a reference to the quad vector binary arithmetic functions. vec_add Purpose Returns a vector containing the sums of each set of corresponding elements of the given vectors. Syntax d=vec_add(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the sum of the corresponding elements of a and b. 64 Blue Gene/Q vector data type for C/C++ Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] + + + + b[0] b[1] b[2] b[3] Example a = (10.0, 20.0, 30.0, 40.0) b = (50.0, 60.0, 70.0, 80.0) d: (60.0, 80.0, 100.0, 120.0) vec_cpsgn Purpose Returns a vector by copying the sign of the elements in vector a to the sign of the corresponding elements in vector b. Syntax d=vec_cpsgn(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The values of the elements of the result are obtained by copying the sign of the elements in a to the sign of the corresponding elements in b. Formula d[0] d[1] d[2] d[3] = = = = (double) (double) (double) (double) { { { { sign(a[0]), sign(a[1]), sign(a[2]), sign(a[3]), mantissa(b[0]), mantissa(b[1]), mantissa(b[2]), mantissa(b[3]), exponent(b[0]) exponent(b[1]) exponent(b[2]) exponent(b[3]) } } } } Example a = ( -1.0, 2.0, -3.0, 4.0) b = ( 1.5e10, 2.5e15, 3.5e20, 4.5e25) d: (-1.5e10, 2.5e15, -3.5e20, 4.5e25) vec_mul Purpose Returns a vector containing the results of performing a multiply operation using the given vectors. Syntax d=vec_mul(a, b) Chapter 10. Vector built-in functions 65 Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The values of the elements of the result are obtained by multiplying the elements of a and the corresponding elements of b. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] × × × × b[0] b[1] b[2] b[3] Example a = (10.0, 20.0, 30.0, 40.0) b = (50.0, 60.0, 70.0, 80.0) d: (500.0, 1200.0, 2100.0, 3200.0) vec_sub Purpose Returns a vector containing the result of subtracting each element of b from the corresponding element of a. Syntax d=vec_sub(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of subtracting the value of the corresponding element of b from the value of the corresponding element of a. Formula d[0] d[1] d[2] d[3] 66 = = = = a[0] a[1] a[2] a[3] - b[0] b[1] b[2] b[3] Blue Gene/Q vector data type for C/C++ Example a = (50.0, 60.0, 70.0, 80.0) b = (10.0, 20.0, 30.0, 40.0) d: (40.0, 40.0, 40.0, 40.0) vec_swdiv, vec_swdiv_nochk Purpose Returns a vector containing the result of dividing each element of a by the corresponding element of b. Syntax d=vec_swdiv(a, b) d=vec_swdiv_nochk(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double For vec_swdiv_nochk, the compiler does not check the validity of the arguments. You must ensure that the following conditions are satisfied where x represents each element of a and y represents the corresponding element of b: v 2-1021 ≤ |y| ≤ 21020 v If x ≠ 0.0 2-969 ≤ |x| < Infinity 2-1020 ≤ |x / y| ≤ 21022 Result value The values of the elements of the result are obtained by dividing the elements of a by the corresponding elements of b. When the following options are used, the result is bitwise identical to the IEEE division. v -qstrict=precision v -qstrict=ieeefp v -qstrict=zerosigns v -qstrict=operationprecision Otherwise, the result might differ slightly from the IEEE division. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] / / / / b[0] b[1] b[2] b[3] Chapter 10. Vector built-in functions 67 Example a = (50.0, 1.0, 30.0, 40.0) b = (10.0, 5.0, -1.0, 80.0) d: ( 5.0, 0.2, -30.0, 0.5) vec_swdivs, vec_swdivs_nochk Purpose Returns a vector containing the result of dividing each element of a by the corresponding element of b. Syntax d=vec_swdivs(a, b) d=vec_swdivs_nochk(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double For vec_swdivs_nochk, the compiler does not check the validity of the arguments. You must ensure that the following conditions are satisfied where x represents each element of a and y represents the corresponding element of b: v 2-125 ≤ |y| ≤ 2124 v If x ≠ 0 2-102 ≤ |x| < Infinity 2-124 ≤ |x / y| ≤ 2126 Result value The double-precision elements of a and b are first truncated to single-precision values. The result of dividing the single-precision elements of a by the corresponding single-precision elements of b is then converted to double precision and saved in the corresponding elements of the result. When the following options are used, the result is bitwise identical to the IEEE division. v -qstrict=precision v -qstrict=ieeefp v -qstrict=zerosigns v -qstrict=operationprecision Otherwise, the result might differ slightly from the IEEE division. Formula d[0] d[1] d[2] d[3] 68 = = = = (double) (double) (double) (double) Blue Gene/Q vector data type for C/C++ ( ( ( ( (float) (float) (float) (float) a[0] a[1] a[2] a[3] / / / / (float) (float) (float) (float) b[0] b[1] b[2] b[3] ) ) ) ) Example a = (50.0, 1.0, 30.0, 40.0) b = (10.0, 5.0, -1.0, 80.0) d: ( 5.0, 0.2, -30.0, 0.5) vec_xmul Purpose Returns a vector containing the result of cross multiplying the first and the third elements of a by the elements of b. Syntax d=vec_xmul(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The values of the elements of the result are obtained by cross multiplying the first and the third elements of a by the elements of b. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[0] a[2] a[2] × × × × b[0] b[1] b[2] b[3] Example a = (10.0, 0.0, 30.0, 0.0) b = (50.0, 60.0, 70.0, 80.0) d: (500.0, 600.0, 2100.0, 2400.0) Multiply-add functions This section provides a reference to the quad vector multiply-add functions. vec_madd Purpose Returns a vector containing the results of performing a fused multiply-add operation for each corresponding set of elements of the given vectors. Syntax d=vec_madd(a, b, c) Chapter 10. Vector built-in functions 69 Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The value of each element of the result is the product of the values of the corresponding elements of a and b, added to the value of the corresponding element of c. Formula d[0] d[1] d[2] d[3] = = = = ( ( ( ( a[0] a[1] a[2] a[3] × × × × b[0] b[1] b[2] b[3] ) ) ) ) + + + + c[0] c[1] c[2] c[3] Example a = (10.0, 10.0, 10.0, 10.0) b = ( 1.0, 2.0, 3.0, 4.0) c = (20.0, 20.0, 20.0, 20.0) d: (30.0, 40.0, 50.0, 60.0) vec_msub Purpose Returns a vector containing the results of performing a multiply-subtract operation using the given vectors. Syntax d=vec_msub(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The values of the elements of the result are the product of the values of the corresponding elements of a and b, minus the values of the corresponding elements of c. 70 Blue Gene/Q vector data type for C/C++ Formula d[0] d[1] d[2] d[3] = = = = ( ( ( ( × × × × a[0] a[1] a[2] a[3] b[0] b[1] b[2] b[3] ) ) ) ) - c[0] c[1] c[2] c[3] Example a = ( 10.0, 10.0, 10.0, 10.0) b = ( 1.0, 2.0, 3.0, 4.0) c = ( 20.0, 20.0, 20.0, 20.0) d: (-10.0, 0.0, 10.0, 20.0) vec_nmadd Purpose Returns a vector containing the results of performing a negative multiply-add operation on the given vectors. Syntax d=vec_nmadd(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The value of each element of the result is the product of the corresponding elements of a and b, added to the corresponding elements of c, and then multiplied by -1.0. Formula d[0] d[1] d[2] d[3] = = = = - ( ( ( ( ( ( ( ( a[0] a[1] a[2] a[3] × × × × b[0] b[1] b[2] b[3] ) ) ) ) + + + + c[0] c[1] c[2] c[3] ) ) ) ) Example a = ( 10.0, 10.0, 10.0, 10.0) b = ( 1.0, 2.0, 3.0, 4.0) c = ( 20.0, 20.0, 20.0, 20.0) d: (-30.0, -40.0, -50.0, -60.0) vec_nmsub Purpose Returns a vector containing the results of performing a negative multiply-subtract operation on the given vectors. Chapter 10. Vector built-in functions 71 Syntax d=vec_nmsub(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The value of each element of the result is the product of the corresponding elements of a and b, subtracted from the corresponding element of c. Formula d[0] d[1] d[2] d[3] = = = = - ( ( ( ( ( ( ( ( a[0] a[1] a[2] a[3] × × × × b[0] b[1] b[2] b[3] ) ) ) ) - c[0] c[1] c[2] c[3] ) ) ) ) Example a = (10.0, 10.0, 10.0, 10.0) b = ( 1.0, 2.0, 3.0, 4.0) c = (20.0, 20.0, 20.0, 20.0) d: (10.0, 0.0, -10.0, -20.0) vec_xmadd Purpose Returns a vector containing the results of performing a fused cross multiply-add operation for each corresponding set of elements of the given vectors. Syntax d=vec_xmadd(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The values of the elements of the result are the product of the values of the first and the third elements of a and the elements of b, added to the values of the corresponding elements of c. 72 Blue Gene/Q vector data type for C/C++ Formula d[0] d[1] d[2] d[3] = = = = ( ( ( ( a[0] a[0] a[2] a[2] × × × × b[0] b[1] b[2] b[3] ) ) ) ) + + + + c[0] c[1] c[2] c[3] Example a = ( 1.0, 0.0, 3.0, 0.0) b = ( 5.0, 10.0, 15.0, 20.0) c = (10.0, 10.0, 10.0, 10.0) d: (15.0, 20.0, 55.0, 70.0) vec_xxmadd Purpose Returns a vector containing the results of performing a fused double cross multiply-add operation for each corresponding set of elements of the given vectors. Syntax d=vec_xxmadd(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The values of the elements of the result are specified in the formula. Formula d[0] d[1] d[2] d[3] = = = = ( ( ( ( a[1] a[0] a[3] a[2] × × × × b[1] b[1] b[3] b[3] ) ) ) ) + + + + c[0] c[1] c[2] c[3] Example a = b = c = d: ( ( 1.0, 2.0, 3.0, 4.0) ( 0.0, 10.0, 0.0, 20.0) ( 10.0, 10.0, 10.0, 10.0) 30.0, 20.0, 90.0, 70.0) vec_xxcpnmadd Purpose Returns a vector containing the results of performing a fused double cross conjugate multiply/add for each corresponding set of elements of the given vectors. Chapter 10. Vector built-in functions 73 Syntax d=vec_xxcpnmadd(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The values of the elements of the result are specified in the formula. Formula d[0] d[1] d[2] d[3] = ( ( = - ( ( = ( ( = - ( ( a[1] a[0] a[3] a[2] × × × × b[1] b[1] b[3] b[3] ) ) ) ) + + + + c[0] c[1] c[2] c[3] ) ) ) ) Example a = b = c = d: ( ( 1.0, 2.0, 3.0, 4.0) ( 0.0, 10.0, 0.0, 20.0) ( 10.0, 10.0, 10.0, 10.0) 30.0, -20.0, 90.0, -70.0) vec_xxnpmadd Purpose Returns a vector containing the results of performing a fused double cross complex multiply-add operation for each corresponding set of elements of the given vectors. Syntax d=vec_xxnpmadd(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The values of the elements of the result are specified in the formula. Formula d[0] d[1] d[2] d[3] 74 = - ( ( a[1] = ( ( a[0] = - ( ( a[3] = ( ( a[2] Blue Gene/Q vector data type for C/C++ × × × × b[1] b[1] b[3] b[3] ) ) ) ) + + + + c[0] c[1] c[2] c[3] ) ) ) ) Example a = b = c = d: ( ( 1.0, 2.0, 3.0, 4.0) ( 0.0, 10.0, 0.0, 20.0) ( 10.0, 10.0, 10.0, 10.0) -30.0, 20.0, -90.0, 70.0) Round functions With the round functions, you can round the elements of quad vectors. vec_ceil Purpose Returns a vector containing the smallest representable floating-point integral values greater than or equal to the values of the corresponding elements of the given vector. Syntax d=vec_ceil(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of the result contains the smallest representable floating-point integral value greater than or equal to the value of the corresponding element of a. Example a = (-5.8, -2.3, 2.3, 5.8) d: (-5.0, -2.0, 3.0, 6.0) vec_floor Purpose Returns a vector containing the largest representable floating-point integral values less than or equal to the values of the corresponding elements of the given vector. Syntax d=vec_floor(a) Result and argument types The following table describes the types of the returned value and the function arguments. Chapter 10. Vector built-in functions 75 d a vector4double vector4double Result value Each element of the result contains the largest representable floating-point integral value less than or equal to the value of the corresponding element of a. Example a = (-5.8, -2.3, 2.3, 5.8) d: (-6.0, -3.0, 2.0, 5.0) vec_round Purpose Returns a vector containing the rounded values of the corresponding elements of the given vector. Syntax d=vec_round(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of the result contains the value of the corresponding element of a, rounded to the nearest representable floating-point integer. Formula For each element of a: If a[n] <0, d[n] = (a[n] – 0.5), truncated to the nearest integral value. If a[n] >0, d[n] = (a[n] + 0.5), truncated to the nearest integral value. If a[n] EQ 0, d[n] = 0. Note: EQ is the equal operator. Example ARG1 = (-5.8, -2.3, 2.3, 5.8) Result: (-6.0, -2.0, 2.0, 6.0) vec_rsp Purpose Returns a vector containing the single-precision values of the corresponding elements of the given vector. 76 Blue Gene/Q vector data type for C/C++ Syntax d=vec_rsp(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value The value of each element of the result contains the single-precision value of the corresponding element of a. Formula d[0] d[1] d[2] d[3] = = = = (double) (double) (double) (double) ( ( ( ( (float) (float) (float) (float) a[0] a[1] a[2] a[3] ) ) ) ) vec_trunc Purpose Returns a vector containing the truncated values of the corresponding elements of the given vector. Syntax d=vec_trunc(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of the result contains the value of the corresponding element of a, truncated to an integral value. Example a = (-5.8, -2.3, 2.3, 5.8) d: (-5.0, -2.0, 2.0, 5.0) Conversion functions With the conversion functions, you can convert quad vectors to integer vectors. Chapter 10. Vector built-in functions 77 vec_cfid Purpose Returns a vector of which each element is the floating point equivalent of the 64-bit signed integer in the corresponding element of a, rounded to double-precision, using the rounding mode specified by FPSCRRN. Syntax d=vec_cfid(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value The value of each element of the result is the floating-point representation of the 64-bit signed integer in the corresponding element of a, rounded to double-precision using the rounding mode specified by FPSCRRN. Example FPSCRRN = DFP_ROUND_TO_NEAREST_WITH_TIES_TO_EVEN a = ( 1, -1, 2, -2) d: ( 1.0, -1.0, 2.0, -2.0) Related functions v FPSCR functions vec_cfidu Purpose Returns a vector of which each element is the floating point equivalent of the 64-bit unsigned integer in the corresponding element of a, rounded to double-precision, using the rounding mode specified by FPSCRRN. Syntax d=vec_cfidu(a) Result and argument types The following table describes the types of the returned value and the function arguments. 78 d a vector4double vector4double Blue Gene/Q vector data type for C/C++ Result value The value of each element of the result is the floating-point representation of the 64-bit unsigned integer in the corresponding element of a, rounded to double-precision using the rounding mode specified by FPSCRRN. Example FPSCRRN = DFP_ROUND_TO_NEAREST_WITH_TIES_TO_EVEN a = ( 1, 2, 3, 4) d: ( 1.0, 2.0, 3.0, 4.0) Related functions v FPSCR functions vec_ctid Purpose Converts a quad vector to 64-bit signed integer values. Syntax d=vec_ctid(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of a is rounded to floating-point integral value according to FPSCRRN. The corresponding element of the result vector is then set to one of the following values: v If the rounded value is greater than 263-1, the result is maximal long integer (0x7FFF FFFF FFFF FFFF). v If the rounded value is less than -263, the result is minimal long integer (0x8000 0000 0000 0000). v Otherwise, the result is the 64-bit signed integer value equivalent to the rounded value. Example FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY a = (1.4, -2.9, 9.0e20, -5.0e25) d: ( 2, -2, 0x7FFF FFFF FFFF FFFF, 0x8000 0000 0000 0000) Related functions v FPSCR functions Chapter 10. Vector built-in functions 79 vec_ctidu Purpose Converts a quad vector to 64-bit unsigned integer values. Syntax d=vec_ctidu(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of a is rounded to floating-point integral value according to FPSCRRN. The corresponding element of the result vector is then set to one of the following values: v If the rounded value is greater than 264-1, the result is maximal unsigned long integer (0xFFFF FFFF FFFF FFFF). v If the rounded value is less than 0, the result is 0 (0x0000 0000 0000 0000). v Otherwise, the result is the 64-bit unsigned integer value equivalent to the rounded value. Example FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY a = (1.4, 1.9, 9.0e22, -5.0e25) d: ( 2, 2, 0xFFFF FFFF FFFF FFFF, 0) Related functions v FPSCR functions vec_ctidz Purpose Converts a quad vector to 64-bit signed integer values with rounding toward zero. Syntax d=vec_ctidz(a) Result and argument types The following table describes the types of the returned value and the function arguments. 80 d a vector4double vector4double Blue Gene/Q vector data type for C/C++ Result value Each element of a is rounded towards zero to floating-point integral value. The corresponding element of the result vector is then set to one of the following values: v If the rounded value is greater than 263-1, the result is maximal long integer (0x7FFF FFFF FFFF FFFF). v If the rounded value is less than -263, the result is minimal long integer (0x8000 0000 0000 0000). v Otherwise, the result is the 64-bit signed integer value equivalent to the rounded value. Example a = (1.6, -1.9, 9.0e20, -5.0e25) d: ( 1, -1, 0x7FFF FFFF FFFF FFFF , 0x8000 0000 0000 0000) vec_ctiduz Purpose Converts a quad vector to 64-bit unsigned integer values with rounding toward zero. Syntax d=vec_ctiduz(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of a is rounded towards to zero to floating-point integral value. The corresponding element of the result vector is then set to one of the following values: v If the rounded value is greater than 264-1, the result is maximal unsigned long integer (0xFFFF FFFF FFFF FFFF). v If the rounded value is less than 0, the result is 0 (0x0000 0000 0000 0000). v Otherwise, the result is the 64-bit unsigned integer value equivalent to the rounded value. Example a = (1.6, -8.8, 9.0e22, -5.0e25) d: ( 1, 0, 0xFFFF FFFF FFFF FFFF, 0) Chapter 10. Vector built-in functions 81 vec_ctiw Purpose Converts a quad vector to 32-bit signed integer values. Syntax d=vec_ctiw(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of a is rounded to floating-point integral value according to FPSCRRN. The four low-order bytes of the corresponding element of the result vector then contain one of the following values: v If the rounded value is greater than 231-1, the result is maximal integer (0x7FFF FFFF). v If the rounded value is less than -231, the result is minimal integer (0x8000 0000). v Otherwise, the result is the 32-bit signed integer value equivalent to the rounded value. Example FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY a = (1.4, -2.9, 9.0e11, -5.0e12) d: ( 2, -2, 0x7FFF FFFF, 0x8000 0000) Related functions v FPSCR functions vec_ctiwu Purpose Converts a quad vector to 32-bit unsigned integer values. Syntax d=vec_ctiwu(a) Result and argument types The following table describes the types of the returned value and the function arguments. 82 d a vector4double vector4double Blue Gene/Q vector data type for C/C++ Result value Each element of a is rounded to floating-point integral value according to FPSCRRN. The four low-order bytes of the corresponding element of the result vector then contain one of the following values: v If the rounded value is greater than 232-1, the result is maximal unsigned integer (0xFFFF FFFF). v If the rounded value is less than 0, the result is 0 (0x0000 0000). v Otherwise, the result is the 32-bit unsigned integer value equivalent to the rounded value. Example FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY a = (1.4, 1.9, 9.0e11, -5.0e12) d: ( 2, 2, 0xFFFF FFFF, 0) Related functions v FPSCR functions vec_ctiwz Purpose Converts a quad vector to 32-bit signed integer values with rounding toward zero. Syntax d=vec_ctiwz(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of a is rounded towards zero to floating-point integral value. The four low-order bytes of the corresponding element of the result vector then contain one of the following values: v If the rounded value is greater than 231-1, the result is maximal integer (0x7FFF FFFF). v If the rounded value is less than -231, the result is minimal integer (0x8000 0000). v Otherwise, the result is the 32-bit signed integer value equivalent to the rounded value. Example a = (1.6, -1.9, 9.0e11, -5.0e12) d: ( 1, -1, 0x7FFF FFFF, 0x8000 0000) Chapter 10. Vector built-in functions 83 vec_ctiwuz Purpose Converts a quad vector to 32-bit unsigned integer values with rounding toward zero. Syntax d=vec_ctiwuz(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Result value Each element of a is rounded towards zero to floating-point integral value. The four low-order bytes of the corresponding element of the result vector then contain one of the following values: v If the rounded value is greater than 232-1, the result is maximal unsigned integer (0xFFFF FFFF). v If the rounded value is less than 0, the result is 0 (0x0000 0000). v Otherwise, the result is the 32-bit unsigned integer value equivalent to the rounded value. Example a = (1.6, -1.9, 9.0e11, -5.0e12) d: ( 1, 0, 0xFFFF FFFF, 0) Comparison functions With the comparison functions, you can compare quad vectors. In the result values, floating-point boolean values are as follows: v True is 1.0. v False is -1.0. vec_cmpgt Purpose Returns a vector containing the results of a greater-than comparison between each set of corresponding elements of the given vectors. Syntax d=vec_cmpgt(a, b) 84 Blue Gene/Q vector data type for C/C++ Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is 1.0 if the corresponding element of a is greater than the corresponding element of b. Otherwise, the value is -1.0. Formula If If If If (a[0] (a[1] (a[2] (a[3] > > > > b[0]) b[1]) b[2]) b[3]) Then Then Then Then d[0] d[1] d[2] d[3] = = = = 1.0 1.0 1.0 1.0 Else Else Else Else d[0] d[1] d[2] d[3] = = = = -1.0 -1.0 -1.0 -1.0 Example a = (10.0, 20.0, 30.0, -40.0) b = (20.0, -10.0, 10.0, 80.0) d: (-1.0, 1.0, 1.0, -1.0) vec_cmplt Purpose Returns a vector containing the results of a less-than comparison between each set of corresponding elements of the given vectors. Syntax d=vec_cmplt(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is 1.0 if the corresponding element of a is less than the corresponding element of b. Otherwise, the value is -1.0. Formula If If If If (a[0] (a[1] (a[2] (a[3] < < < < b[0]) b[1]) b[2]) b[3]) Then Then Then Then d[0] d[1] d[2] d[3] = = = = 1.0 1.0 1.0 1.0 Else Else Else Else d[0] d[1] d[2] d[3] = = = = -1.0 -1.0 -1.0 -1.0 Chapter 10. Vector built-in functions 85 Example a = (20.0, -10.0, 10.0, 80.0) b = (10.0, 20.0, 30.0, -40.0) d: (-1.0, 1.0, 1.0, -1.0) vec_cmpeq Purpose Returns a vector containing the results of comparing each set of corresponding elements of the given vectors for equality. Syntax d=vec_cmpeq(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is 1.0 if the corresponding element of a is equal to the corresponding element of b. Otherwise, the value is -1.0. Formula If If If If (a[0] (a[1] (a[2] (a[3] EQ EQ EQ EQ b[0]) b[1]) b[2]) b[3]) Then Then Then Then d[0] d[1] d[2] d[3] = = = = 1.0 1.0 1.0 1.0 Else Else Else Else d[0] d[1] d[2] d[3] = = = = -1.0 -1.0 -1.0 -1.0 Note: EQ is the equal operator. Example a = (10.0, -10.0, -10.0, 80.0) b = (10.0, 20.0, -10.0, -40.0) d: ( 1.0, -1.0, 1.0, -1.0) vec_sel Purpose Returns a vector containing the value of either a or b depending on the value of c. Syntax d=vec_sel(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. 86 Blue Gene/Q vector data type for C/C++ d a b c vector4double vector4double vector4double vector4double Result value The value of each element of the result is equal to the corresponding element of b if the corresponding element of c is greater than or equal to zero (regardless of sign), or the value is equal to the corresponding element of a if the corresponding element of c is less than zero or NaN. Formula If If If If (c[0] (c[1] (c[2] (c[3] ≥ ≥ ≥ ≥ 0) 0) 0) 0) Then Then Then Then d[0] d[1] d[2] d[3] = = = = b[0] b[1] b[2] b[3] Else Else Else Else d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] Example a = (20.0, 20.0, 20.0, 20.0) b = (10.0, 10.0, 10.0, 10.0) c = ( 1.0, -1.0, 2.5, -2.5) d: (10.0, 20.0, 10.0, 20.0) vec_tstnan Purpose Returns a vector whose elements depend on if the value of the corresponding element of a or b is NaN. Syntax d=vec_tstnan(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is 1.0 if the corresponding element of a or b is a NaN, otherwise the value is -1.0. Formula If If If If ((a[0] ((a[1] ((a[2] ((a[3] EQ EQ EQ EQ NaN) NaN) NaN) NaN) or or or or (b[0] (b[1] (b[2] (b[3] EQ EQ EQ EQ NaN)) NaN)) NaN)) NaN)) Then Then Then Then d[0] d[1] d[2] d[3] = = = = 1.0 1.0 1.0 1.0 Else Else Else Else d[0] d[1] d[2] d[3] = = = = -1.0 -1.0 -1.0 -1.0 Note: EQ is the equal operator. Chapter 10. Vector built-in functions 87 Example a = (10.0, 20.0, NaN, 40.0) b = (50.0, NaN, 70.0, 80.0) d: (-1.0, 1.0, 1.0, -1.0) Element manipulation functions With the element manipulation functions, you can manipulate vectors at the element level. For example, you can permute elements. vec_extract Purpose Returns the value of element a from the vector b. Syntax d=vec_extract(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b double vector4double int Result value This function uses the modulo arithmetic on b to determine the element number. For example, if b is out of range, the compiler uses b modulo the number of elements in the vector to determine the element position. Formula d = a[b MOD 4] Note: MOD is the modulo operator. Example a = (10.0, 20.0, 30.0, 40.0) b = 1 d: 20.0 vec_insert Purpose Returns a copy of the vector b with the value of its element c replaced by a. Syntax d=vec_insert(a, b, c) 88 Blue Gene/Q vector data type for C/C++ Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double double vector4double int Result value This function uses the modulo arithmetic on c to determine the element number. For example, if c is out of range, the compiler uses c modulo the number of elements in the vector to determine the element position. Formula If If If If ((c ((c ((c ((c MOD MOD MOD MOD 4) 4) 4) 4) EQ EQ EQ EQ 0) 1) 2) 3) Then Then Then Then d[0] d[1] d[2] d[3] = = = = a a a a Else Else Else Else d[0] d[1] d[2] d[3] = = = = b[0] b[1] b[2] b[3] Notes: v MOD is the modulo operator. v EQ is the equal operator. Example a = 50.0 b = (10.0, 20.0, 30.0, 40.0) c = 1 d: (10.0, 50.0, 30.0, 40.0) vec_gpci Purpose Returns a vector containing the results of dispersing the 12-bit literal a to be used as control value for a permute instruction. Syntax d=vec_gpci(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double int, a value in 0x000 - 0xFFF Result value The value of each element of the result has a sign bit set to 0, an exponent set to 0x400, and a mantissa where bits 0:2 are taken from the 12-bit literal a as shown in the formula. Chapter 10. Vector built-in functions 89 Formula d[0] d[1] d[2] d[3] = = = = (double) (double) (double) (double) {sign {sign {sign {sign = = = = 0, 0, 0, 0, mantissa0:2 mantissa0:2 mantissa0:2 mantissa0:2 = = = = a0:2, exponent = 0x400} a3:5, exponent = 0x400} a6:8, exponent = 0x400} a9:11, exponent = 0x400} Example Shifting the elements of a given vector to the left by one step and rotate around requires the pattern 1–2–3–0. It can be obtained by the following code: pattern = vec_gpci(0x298); v = vec_perm(v,v,pattern); Fortran: pattern = vec_gpci(Z’298’) v = vec_perm(v,v,pattern) With the pattern 1–2–3–0, the vector (0.0, 1.0, 2.0, 3.0) becomes (1.0, 2.0, 3.0, 0.0). vec_lvsl Purpose Returns a vector useful for aligning non-aligned data. Syntax d=vec_lvsl(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long double* _Complex double* float* _Complex float* Result value The result value is a quad vector. The elements of the quad vector are generated in the following ways: v Sign: 0 v Mantissa: 1. For the first element, the mantissa is the result of following operations: – If b is a pointer to a double-precision floating-point value or complex value: a. Add a and b. 90 Blue Gene/Q vector data type for C/C++ b. Mask the result of the previous step with 0b11000. c. Take the integer value of bits 58 - 60 from the result of the previous step. – If b is a pointer to a single-precision floating-point value or complex value: a. Add a and b. b. Multiply the result of the previous step by two. c. Mask the result of the previous step with 0b11000. d. Take the integer value of bits 58 - 60 from the result of the previous step. 2. The mantissa is incremented by one for each subsequent element. The mantissa is seen as a 3-bit value for the increment operation. That is, incrementing 0b111 produces 0b000. v Exponent: 0x400 You can use the result as an argument of the vec_perm function. Formula The following formula is applicable if b is a pointer to a double-precision floating-point value or complex value: EA = a + b AA = EA AND 0b11000 Offset = AA58:60 d[0] = (double) {sign d[1] = (double) {sign d[2] = (double) {sign d[3] = (double) {sign = = = = 0, 0, 0, 0, mantissa mantissa mantissa mantissa = Offset , = (Offset+1) AND 0b111, = (Offset+2) AND 0b111, = (Offset+3) AND 0b111, exponent exponent exponent exponent = = = = 0x400} 0x400} 0x400} 0x400} The following formula is applicable if b is a pointer to a single-precision floating-point value or complex value: EA = a + b AA = (EA × 2) AND 0b11000 Offset = AA58:60 d[0] = (double) {sign = 0, d[1] = (double) {sign = 0, d[2] = (double) {sign = 0, d[3] = (double) {sign = 0, mantissa mantissa mantissa mantissa = Offset , = (Offset+1) AND 0b111, = (Offset+2) AND 0b111, = (Offset+3) AND 0b111, exponent exponent exponent exponent = = = = 0x400} 0x400} 0x400} 0x400} Note: v AND is the bitwise AND operator. Example: Loading 8-byte aligned vectors // my_array is an array of the double type vector4double v, v1, v2, vp; v1 = vec_ld(0,my_array) // Load the left part of the vector v2 = vec_ld(32,my_array) // Load the right part of the vector vp = vec_lvsl(0,my_array) // Generate control value v = vec_perm(v1,v2,vp) // Generate the aligned vector Example: Loading 4-byte aligned vectors // my_array is an array of the float type vector4double v, v1, v2, vp; v1 = vec_ld(0,my_array) // Load the left part of the vector Chapter 10. Vector built-in functions 91 v2 = vec_ld(16,my_array) vp = vec_lvsl(0,my_array) v = vec_perm(v1,v2,vp) // Load the right part of the vector // Generate control value // Generate the aligned vector vec_lvsr Purpose Returns a vector useful for aligning non-aligned data. Syntax d=vec_lvsr(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double long double* _Complex double* float* _Complex float* Result value The result value is a quad vector. The elements of the quad vector are generated in the following ways: v Sign: 0 v Mantissa: 1. For the first element, the mantissa is the result of following operations: – If b is a pointer to a double-precision floating-point value or complex value: a. Add a and b. b. Mask the result of the previous step with 0b11000. c. Subtract the result of the previous step from 32. d. Take the integer value of bits 58 - 60 from the result of the previous step. – If b is a pointer to a single-precision floating-point value or complex value: a. Add a and b. b. Mask the result of the previous step with 0b1100. c. Subtract the result of the previous step from 16. d. Take the integer value of bits 59 - 61 from the result of the previous step. 2. The mantissa is incremented by one for each subsequent element. The mantissa is seen as a 3-bit value for the increment operation. That is, incrementing 0b111 produces 0b000. v Exponent: 0x400 92 Blue Gene/Q vector data type for C/C++ You can use the result as an argument of the vec_perm function. Formula The following formula is applicable if b is a pointer to a double-precision floating-point value or complex value: EA = a + b AA = 32 – (EA AND 0b11000) Offset = AA58:60 d[0] = (double) {sign = 0, d[1] = (double) {sign = 0, d[2] = (double) {sign = 0, d[3] = (double) {sign = 0, mantissa mantissa mantissa mantissa = Offset , = (Offset+1) AND 0b111, = (Offset+2) AND 0b111, = (Offset+3) AND 0b111, exponent exponent exponent exponent = = = = 0x400} 0x400} 0x400} 0x400} The following formula is applicable if b is a pointer to a single-precision floating-point value or complex value: EA = a + b AA = 16 – (EA AND 0b1100) Offset = AA59:61 d[0] = (double) {sign = 0, d[1] = (double) {sign = 0, d[2] = (double) {sign = 0, d[3] = (double) {sign = 0, mantissa mantissa mantissa mantissa = Offset , = (Offset+1) AND 0b111, = (Offset+2) AND 0b111, = (Offset+3) AND 0b111, exponent exponent exponent exponent = = = = 0x400} 0x400} 0x400} 0x400} Note: v AND is the bitwise AND operator. Example: Storing 8-byte aligned vectors void my_vec_store(vector4double v, double *arr) { vector4double v1, v2, v3, p, m1, m2, m3; /* generate insert masks */ p = vec_lvsr(0,arr); m1 = vec_cmplt(p,p); /* generate vector of all FALSE */ m2 = vec_neg(m1); /* generate vector of all TRUE */ m3 = vec_perm(m1,m2,p); /* get existing data */ v1 = vec_ld(0,arr); v2 = vec_ld(0,arr+4); /* permute and insert */ v3 = vec_perm(v,v,p); v1 = vec_sel(v1,v3,m3); v2 = vec_sel(v3,v2,m3); /* store data back */ vec_st(0,arr,v1); vec_st(0,arr+4,v2); } Example: Storing 4-byte aligned vectors void my_vec_store(vector4double v, float *arr) { vector4double v1, v2, v3, p, m1, m2, m3 /* generate insert masks */ p = vec_lvsr(0,arr); m1 = vec_cmplt(p,p); /* generate vector of all FALSE */ m2 = vec_neg(m1); /* generate vector of all TRUE */ m3 = vec_perm(m1,m2,p); /* get existing data */ v1 = vec_ld(0,arr); v2 = vec_ld(0,arr+4); /* permute and insert */ v3 = vec_perm(v,v,p); Chapter 10. Vector built-in functions 93 v1 = vec_sel(v1,v3,m3); v2 = vec_sel(v3,v2,m3); /* store data back */ vec_st(0,arr,v1); vec_st(0,arr+4,v2); } vec_perm Purpose Returns a vector that contains some elements of two vectors, in the order specified by a third vector. Syntax d=vec_perm(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double vector4double Result value The value of each element of the result is the element of the concatenation of a and b that is specified by bits 0:2 of the mantissa of the corresponding element of c. Each element of c must have an exponent equal to 0x400, or the corresponding element of the result is undefined. Note: The following functions generate control values that can be used for c: v “vec_gpci” on page 89 v “vec_lvsl” on page 90 v “vec_lvsr” on page 92 Formula Concat = ( a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3] ) d[0] = Concat[Mantissa02(c[0])] d[1] = Concat[Mantissa02(c[1])] d[2] = Concat[Mantissa02(c[2])] d[3] = Concat[Mantissa02(c[3])] Note: Mantissa02 is a function that returns the integer that is equivalent to the bits 0:2 of the mantissa of its argument. Example If a = (10.0, 20.0, 30.0, 40.0), b = (50.0, 60.0, 70.0, 80.0), and the mantissas of the elements of c = (2,3,4,5), the result value is (30.0, 40.0, 50.0, 60.0). 94 Blue Gene/Q vector data type for C/C++ vec_promote Purpose Returns a vector with a in element position b. Syntax d=vec_promote(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double double int Result value The result is a vector with a in element position b. This function uses modulo arithmetic on b to determine the element number. For example, if b is out of range, the compiler uses b modulo the number of elements in the vector to determine the element position. The other elements of the vector are undefined. Formula d[b MOD 4] = a Note: MOD is the modulo operator. Example a = 50.0 b = 1 d: ( X, 50.0, Y, Z) // X, Y, and Z are undefined values vec_sldw Purpose Returns a vector by concatenating a and b, and then left-shifting the result vector by multiples of 8 bytes. c specifies the offset for the shifting operation. Syntax d=vec_sldw(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double int, a value in 0 - 3 Chapter 10. Vector built-in functions 95 Result value After left-shifting the concatenated a and b by multiples of 8 bytes specified by c, the function takes the four leftmost 8-byte values and forms the result vector. Formula Concat = ( a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3] ) d[0] = Concat[c] d[1] = Concat[c+1] d[2] = Concat[c+2] d[3] = Concat[c+3] Example a = (10.0, 20.0, 30.0, 40.0) b = (50.0, 60.0, 70.0, 80.0) c = 2 d: (30.0, 40.0, 50.0, 60.0) vec_splat Purpose Returns a vector that has all of its elements set to a given value. Syntax d=vec_splat(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double int, a value in 0 - 3 Result value The value of each element of the result is the value of the element of a specified by b. Formula d[0] d[1] d[2] d[3] = = = = a[b] a[b] a[b] a[b] Example a = (10.0, 20.0, 30.0, 40.0) b = 1 d: (20.0, 20.0, 20.0, 20.0) 96 Blue Gene/Q vector data type for C/C++ vec_splats Purpose Returns a vector of which the value of each element is set to a. Syntax d=vec_splats(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double double Result value The value of each element of the result is a. Formula d[0] d[1] d[2] d[3] = = = = a a a a Example a = 50.0 d: (50.0, 50.0, 50.0, 50.0) Logical functions With the logical functions, you can perform logical operations between quad vectors. vec_and Purpose Returns a vector containing the results of performing a logical AND operation between the given vectors. Syntax d=vec_and(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Chapter 10. Vector built-in functions 97 Result value The value of each element of the result is the result of a logical AND operation between the corresponding elements of a and b. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] AND AND AND AND b[0] b[1] b[2] b[3] Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: (-1.0, -1.0, -1.0, 1.0) vec_andc Purpose Returns a vector containing the results of performing a logical AND operation between a and the complement of b. Syntax d=vec_andc(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of a logical AND operation between the corresponding element of a and the complement of the corresponding element of b. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] AND AND AND AND NOT NOT NOT NOT (b[0]) (b[1]) (b[2]) (b[3]) Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: (-1.0, -1.0, 1.0,-1.0) 98 Blue Gene/Q vector data type for C/C++ vec_logical Purpose Returns a vector containing the results of performing a logical operation between a and b, using the truth table specified by c. Syntax d=vec_logical(a, b, c) Result and argument types The following table describes the types of the returned value and the function arguments. d a b c vector4double vector4double vector4double int, a value in the range of [0x0, 0xF] Result value The value of each element of the result is the result of the logical operation between the corresponding elements of a and b, using the truth table specified by c. The following table shows how to read the truth table in c for the nth element of a and b. a[n] b[n] Binary result False False c0 True False c1 False True c2 True True c3 The result value is calculated from the binary result. Binary result Result value 0 1.0 (True) 1 -1.0 (False) Formula If (a[n] < 0.0) AND (b[n] < If (c0 EQ 0), d[n]= -1.0 Else d[n]= 1.0 If (a[n] ≥ 0.0) AND (b[n] < If (c1 EQ 0), d[n]= -1.0 Else d[n]= 1.0 If (a[n] < 0.0) AND (b[n] ≥ If (c2 EQ 0), d[n]= -1.0 Else d[n]= 1.0 If (a[n] ≥ 0.0) AND (b[n] ≥ If (c3 EQ 0), d[n]= -1.0 Else d[n]= 1.0 0.0) 0.0) 0.0) 0.0) Chapter 10. Vector built-in functions 99 Notes: v EQ is the equal operator. v In this function, NaN is considered to be less than zero. Example You can use the values for c from the following table to replicate some usual logical operators. Binary c Operator 0001 0x1 AND 0110 0x6 XOR 0111 0x7 OR 1000 0x8 NOR 1110 0xE NAND vec_nand Purpose Returns a vector containing the results of performing a logical NOT operation of the result of a logical AND operation between the given vectors. Syntax d=vec_nand(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of a logical NOT operation of a logical AND operation between the corresponding elements of a and b. Formula d[0] d[1] d[2] d[3] = = = = NOT NOT NOT NOT (a[0] (a[1] (a[2] (a[3] AND AND AND AND b[0]) b[1]) b[2]) b[3]) Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: ( 1.0, 1.0, 1.0,-1.0) 100 Blue Gene/Q vector data type for C/C++ vec_nor Purpose Returns a vector containing the results of performing a logical NOT operation of the result of a logical OR operation between the given vectors. Syntax d=vec_nor(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of a logical NOT operation of a logical OR operation between the corresponding elements of a and b. Formula d[0] d[1] d[2] d[3] = = = = NOT NOT NOT NOT (a[0] (a[1] (a[2] (a[3] OR OR OR OR b[0]) b[1]) b[2]) b[3]) Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: ( 1.0, -1.0, -1.0,-1.0) vec_not Purpose Returns a vector containing the result of a logical NOT operation on the given vector. Syntax d=vec_not(a) Result and argument types The following table describes the types of the returned value and the function arguments. d a vector4double vector4double Chapter 10. Vector built-in functions 101 Result value The value of each element of the result is the result of a logical NOT operation of the corresponding element of a. Formula d[0] d[1] d[2] d[3] = = = = NOT NOT NOT NOT a[0] a[1] a[2] a[3] Example a = (-1.0, -2.0, 1.0, 2.0) d: ( 1.0, 1.0, -1.0, -1.0) vec_or Purpose Returns a vector containing the results of performing a logical OR operation between the given vectors. Syntax d=vec_or(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of a logical OR operation between the corresponding elements of a and b. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] OR OR OR OR b[0] b[1] b[2] b[3] Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: (-1.0, 1.0, 1.0, 1.0) vec_orc Purpose Returns a vector containing the result of performing a logical OR operation between a and the complement of b. 102 Blue Gene/Q vector data type for C/C++ Syntax d=vec_orc(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of a logical OR operation between the corresponding element of a and the complement of the corresponding element of b. Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] OR OR OR OR NOT NOT NOT NOT (b[0]) (b[1]) (b[2]) (b[3]) Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: ( 1.0, -1.0, 1.0, 1.0) vec_xor Purpose Returns a vector containing the results of performing a logical exclusive OR operation between the given vectors. Syntax d=vec_xor(a, b) Result and argument types The following table describes the types of the returned value and the function arguments. d a b vector4double vector4double vector4double Result value The value of each element of the result is the result of a logical exclusive OR between the corresponding elements of a and b. Chapter 10. Vector built-in functions 103 Formula d[0] d[1] d[2] d[3] = = = = a[0] a[1] a[2] a[3] XOR XOR XOR XOR b[0] b[1] b[2] b[3] Example a = (-1.0, -1.0, 1.0, 1.0) b = (-1.0, 1.0, -1.0, 1.0) d: (-1.0, 1.0, 1.0, -1.0) 104 Blue Gene/Q vector data type for C/C++ Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS) XL C/C++ is shipped with a set of Mathematical Acceleration Subsystem (MASS) libraries for high-performance mathematical computing. The MASS libraries consist of a library of scalar C/C++ functions described in “Using the scalar library,” a set of vector libraries tuned for specific architectures described in Using the vector libraries, and a SIMD library described in “Using the SIMD library” on page 108. The functions contained in both scalar and vector libraries are automatically called at certain levels of optimization, but you can also call them explicitly in your programs. Note that the accuracy and exception handling might not be identical in MASS functions and system library functions. The MASS functions must run with the default rounding mode and floating-point exception trapping settings. When you compile programs with any of the following sets of options: v -qhot -qignerrno -qnostrict v -qhot -O3 v -O4 v -O5 the compiler automatically attempts to vectorize calls to system math functions by calling the equivalent MASS vector functions (with the exceptions of functions vdnint, vdint, vcosisin, vscosisin, vqdrt, vsqdrt, vrqdrt, vsrqdrt, vpopcnt4, vpopcnt8, vexp2, vexp2m1, vsexp2, vsexp2m1, vlog2, vlog21p, vslog2, and vslog21p). If it cannot vectorize, it automatically tries to call the equivalent MASS scalar functions. For automatic vectorization or scalarization, the compiler uses versions of the MASS functions contained in the XLOPT library libxlopt.a. In addition to any of the preceding sets of options, when the -qipa option is in effect, if the compiler cannot vectorize, it tries to inline the MASS scalar functions before deciding to call them. “Compiling and linking a program with MASS” on page 111 describes how to compile and link a program that uses the MASS libraries, and how to selectively use the MASS scalar library functions in conjunction with the regular system libraries. Related external information Mathematical Acceleration Subsystem website, available at http://www.ibm.com/software/awdtools/mass/ Using the scalar library The MASS scalar library libmass.a contains an accelerated set of frequently used math intrinsic functions that provide improved performance over the corresponding standard system library functions. The MASS scalar functions are used when explicitly linking libmass.a. 105 If you want to explicitly call the MASS scalar functions, you can take the following steps: 1. Provide the prototypes for the functions (except anint, cosisin, dnint, sincos, and rsqrt), by including math.h in your source files. 2. Provide the prototypes for anint, cosisin, dnint, sincos, and rsqrt, by including mass.h in your source files. 3. Link the MASS scalar library libmass.a with your application. For instructions, see “Compiling and linking a program with MASS” on page 111. The MASS scalar functions accept double-precision parameters and return a double-precision result, or accept single-precision parameters and return a single-precision result, except sincos which gives 2 double-precision results. They are summarized in Table 4. Table 4. MASS scalar functions Doubleprecision function Singleprecision function Description Double-precision function prototype acos acosf Returns the arccosine of double acos (double x); x float acosf (float x); acosh acoshf Returns the hyperbolic arccosine of x float acoshf (float x); anint Returns the rounded integer value of x float anint (float x); asin asinf Returns the arcsine of x double asin (double x); float asinf (float x); asinh asinhf Returns the hyperbolic arcsine of x double asinh (double x); float asinhf (float x); atan2 atan2f Returns the arctangent of x/y double atan2 (double x, double y); float atan2f (float x, float y); atan atanf Returns the arctangent of x double atan (double x); float atanf (float x); atanh atanhf Returns the hyperbolic arctangent of x double atanh (double x); float atanhf (float x); cbrt cbrtf Returns the cube root of x double cbrt (double x); float cbrtf (float x); copysign copysignf Returns x with the sign of y double copysign (double x,double y); float copysignf (float x); cos cosf Returns the cosine of x double cos (double x); float cosf (float x); cosh coshf Returns the hyperbolic cosine of x double cosh (double x); float coshf (float x); double acosh (double x); cosisin Returns a complex double_Complex cosisin (double); number with the real part the cosine of x and the imaginary part the sine of x. dnint Returns the nearest integer to x (as a double) double dnint (double x); Returns the error function of x double erf (double x); erf 106 erff Blue Gene/Q vector data type for C/C++ Single-precision function prototype float erff (float x); Table 4. MASS scalar functions (continued) Doubleprecision function Singleprecision function Description Double-precision function prototype Single-precision function prototype erfc erfcf Returns the complementary error function of x double erfc (double x); float erfcf (float x); exp expf Returns the exponential double exp (double x); function of x float expf (float x); expm1 expm1f Returns (the exponential function of x) - 1 float expm1f (float x); hypot hypotf Returns the square root double hypot (double x, double y); of x2 + y2 float hypotf (float x, float y); lgamma lgammaf Returns the natural logarithm of the absolute value of the Gamma function of x double lgamma (double x); float lgammaf (float x); log logf Returns the natural logarithm of x double log (double x); float logf (float x); log10 log10f Returns the base 10 logarithm of x double log10 (double x); float log10f (float x); log1p log1pf Returns the natural logarithm of (x + 1) double log1p (double x); float log1pf (float x); pow powf Returns x raised to the power y double pow (double x, double y); float powf (float x, float y); Returns the reciprocal of the square root of x double rsqrt (double x); Returns the sine of x double sin (double x); Sets *s to the sine of x and *c to the cosine of x void sincos (double x, double* s, double* c); Returns the hyperbolic sine of x double sinh (double x); rsqrt sin sinf sincos sinh sinhf sqrt double expm1 (double x); float sinf (float x); float sinhf (float x); Returns the square root double sqrt (double x); of x tan tanf Returns the tangent of x double tan (double x); float tanf (float x); tanh tanhf Returns the hyperbolic tangent of x float tanhf (float x); double tanh (double x); Notes: v The trigonometric functions (sin, cos, tan) return NaN (Not-a-Number) for large arguments (where the absolute value is greater than 250pi). v In some cases, the MASS functions are not as accurate as the libm.a library, and they might handle edge cases differently (sqrt(Inf), for example). v See the Mathematical Acceleration Subsystem website for accuracy comparisons with libm.a. Related external information Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS) 107 Mathematical Acceleration Subsystem website, available at http://www.ibm.com/software/awdtools/mass/ Using the SIMD library The MASS SIMD library libmass_simd.a contains a set of frequently used math intrinsic functions that provide improved performance over the corresponding standard system library functions. If you want to use the MASS SIMD functions, you can do so as follows: 1. Provide the prototypes for the functions by including mass_simd.h in your source files. 2. Link the MASS SIMD library libmass_simd.a with your application. For instructions, see “Compiling and linking a program with MASS” on page 111. The single/double-precision MASS SIMD functions accept single/double-precision arguments and return single/double-precision results. They are summarized in Table 5. Table 5. MASS SIMD functions Doubleprecision function Singleprecision function Description Double-precision function prototype Single-precision function prototype acosd4 acosf4 Computes the arc cosine of each element of vx. vector4double acosd4 (vector4double vx); vector4double acosf4 (vector4double vx); acoshd4 acoshf4 Computes the arc hyperbolic cosine of each element of vx. vector4double acoshd4 (vector4double vx); vector4double acoshf4 (vector4double vx); asind4 asinf4 Computes the arc sine of each element of vx. vector4double asind4 (vector4double vx); vector4double asinf4 (vector4double vx); asinhd4 asinhf4 Computes the arc hyperbolic sine of each element of vx. vector4double asinhd4 (vector4double vx); vector4double asinhf4 (vector4double vx); atand4 atanf4 Computes the arc vector4double atand4 tangent of each element (vector4double vx); of vx. vector4double atanf4 (vector4double vx); atan2d4 atan2f4 Computes the arc vector4double atan2d4 tangent of each element (vector4double vx, of vy/vx. vector4double vy); vector4double atan2f4 (vector4double vx, vector4double vy); atanhd4 atanhf4 Computes the arc hyperbolic tangent of each element of vx. vector4double atanhf4 (vector4double vx); cbrtd4 cbrtf4 Computes the cube root vector4double cbrtd4 of each element of vx. (vector4double vx); vector4double cbrtf4 (vector4double vx); cosd4 cosf4 Computes the cosine of vector4double cosd4 each element of vx. (vector4double vx); vector4double cosf4 (vector4double vx); coshd4 coshf4 Computes the hyperbolic cosine of each element of vx. 108 Blue Gene/Q vector data type for C/C++ vector4double atanhd4 (vector4double vx); vector4double coshd4 (vector4double vx); vector4double coshf4 (vector4double vx); Table 5. MASS SIMD functions (continued) Doubleprecision function Singleprecision function Description Double-precision function prototype cosisind4 cosisinf4 void cosisind4 (vector4double x, Computes the cosine vector4double *y, vector4double and sine of each element of x, and stores *z) the results in y and z as follows: Single-precision function prototype void cosisinf4 (vector4double x, vector4double *y, vector4double *z) cosisind2 (x,y,z) sets y and z to {cos(x1), sin(x1)} and {cos(x2), sin(x2)} where x={x1,x2}. cosisinf4 (x,y,z) sets y and z to {cos(x1), sin(x1), cos(x2), sin(x2)} and {cos(x3), sin(x3), cos(x4), sin(x4)} where x={x1,x2,x3,x4}. divd4 divf4 Computes the quotient vx/vy. vector4double divd4 (vector4double vx, vector4double vy); vector4double divf4 (vector4double vx, vector4double vy); erfcd4 erfcf4 Computes the complementary error function of each element of vx. vector4double erfcd4 (vector4double vx); vector4double erfcf4 (vector4double vx); erfd4 erff4 Computes the error function of each element of vx. vector4double erfd4 (vector4double vx); vector4double erff4 (vector4double vx); expd4 expf4 Computes the exponential function of each element of vx. vector4double expd4 (vector4double vx); vector4double expf4 (vector4double vx); exp2d4 exp2f4 Computes 2 raised to the power of each element of vx. vector4double exp2d4 (vector4double vx); vector4double exp2f4 (vector4double vx); expm1d4 expm1f4 Computes (the exponential function of each element of vx) - 1. vector4double expm1d4 (vector4double vx); vector4double expm1f4 (vector4double vx); exp2m1d4 exp2m1f4 Computes (2 raised to the power of each element of vx) -1. vector4double exp2m1d4 (vector4double vx); vector4double exp2m1f4 (vector4double vx); hypotd4 For each element of vx and the corresponding element of vy, computes sqrt(x*x+y*y). vector4double hypotd4 (vector4double vx, vector4double vy); vector4double hypotf4 (vector4double vx, vector4double vy); vector4double lgammad4 (vector4double vx); vector4double lgammaf4 (vector4double vx); hypotf4 lgammad4 lgammaf4 Computes the natural logarithm of the absolute value of the Gamma function of each element of vx . Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS) 109 Table 5. MASS SIMD functions (continued) Doubleprecision function Singleprecision function Description Double-precision function prototype Single-precision function prototype logd4 logf4 Computes the natural logarithm of each element of vx. vector4double logd4 (vector4double vx); vector4double logf4 (vector4double vx); log2d4 log2f4 Computes the base-2 logarithm of each element of vx. vector4double log2d4 (vector4double vx); vector4double log2f4 (vector4double vx); log10d4 log10f4 Computes the base-10 logarithm of each element of vx. vector4double log10d4 (vector4double vx); vector4double log10f4 (vector4double vx); log1pd4 log1pf4 Computes the natural logarithm of each element of (vx +1). vector4double log1pd4 (vector4double vx); vector4double log1pf4 (vector4double vx); log21pd4 log21pf4 Computes the base-2 logarithm of each element of (vx +1). vector4double log21pd4 (vector4double vx); vector4double log21pf4 (vector4double vx); powd4 powf4 Computes each element vector4double powd4 (vector4double vx, of vx raised to the vector4double vy); power of the corresponding element of vy. vector4double powf4 (vector4double vx, vector4double vy); qdrtd4 qdrtf4 Computes the quad root of each element of vx. vector4double qdrtd4 (vector4double vx); vector4double qdrtf4 (vector4double vx); rcbrtd4 rcbrtf4 Computes the reciprocal of the cube root of each element of vx. vector4double rcbrtd4 (vector4double vx); vector4double rcbrtf4 (vector4double vx); recipd4 recipf4 Computes the reciprocal of each element of vx. vector4double recipd4 (vector4double vx); vector4double recipf4 (vector4double vx); rqdrtd4 rqdrtf4 Computes the reciprocal of the quad root of each element of vx. vector4double rqdrtd4 (vector4double vx); vector4double rqdrtf4 (vector4double vx); rsqrtd4 rsqrtf4 vector4double rsqrtd4 Computes the reciprocal of the square (vector4double vx); root of each element of vx. vector4double rsqrtf4 (vector4double vx); sincosd4 sincosf4 Computes the sine and cosine of each element of vx. void sincosd4 (vector4double vx, void sincosf4 (vector4double vx, vector4double *vs, vector4double *vs, vector4double *vc); vector4double *vc); sind4 sinf4 Computes the sine of each element of vx. vector4double sind4 (vector4double vx); vector4double sinf4 (vector4double vx); sinhd4 sinhf4 Computes the hyperbolic sine of each element of vx. vector4double sinhd4 (vector4double vx); vector4double sinhf4 (vector4double vx); sqrtd4 sqrtf4 Computes the square root of each element of vx. vector4double sqrtd4 (vector4double vx); vector4double sqrtf4 (vector4double vx); 110 Blue Gene/Q vector data type for C/C++ Table 5. MASS SIMD functions (continued) Doubleprecision function Singleprecision function Description Double-precision function prototype Single-precision function prototype tand4 tanf4 Computes the tangent of each element of vx. vector4double tand4 (vector4double vx); vector4double tanf4 (vector4double vx); tanhd4 tanhf4 Computes the hyperbolic tangent of each element of vx. vector4double tanhd4 (vector4double vx); vector4double tanhf4 (vector4double vx); Compiling and linking a program with MASS To compile an application that calls the functions in the MASS libraries, specify one or more of the following keywords on the -l linker option: v mass v massv v mass_simd For example, if the MASS libraries are installed in the default directory, you can specify one of the following: Link with scalar library libmass.a and vector library libmassv.a bgxlc progc.c -o progc -lmass -lmassv Link with SIMD library libmass_simd.a bgxlc progc.c -o progc -lmass_simd Using libmass.a with the math system library If you want to use the libmass.a scalar library for some functions and the normal math library libm.a for other functions, follow this procedure to compile and link your program: 1. Use the ar command to extract the object files of the desired functions from libmass.a. For most functions, the object file name is the function name followed by .s64.o. 1 For example, to extract the object file for the tan function, the command would be: ar -x tan.s64.o libmass.a 2. Archive the extracted object files into another library: ar -qv libfasttan.a tan.s64.o ranlib libfasttan.a 3. Create the final executable using xlc, specifying -lfasttan instead of -lmass: xlc sample.c -o sample -Ldir_containing_libfasttan -lfasttan This links only the tan function from MASS (now in libfasttan.a) and the remainder of the math functions from the standard system library. Exceptions: 1. The sin and cos functions are both contained in the object file sincos.s64.o. The cosisin and sincos functions are both contained in the object file cosisin.s64.o. 2. The XL C/C++ pow function is contained in the object file dxy.s64.o. Note: The cos and sin functions will both be exported if either one is exported. cosisin and sincos will both be exported if either one is exported. Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS) 111 112 Blue Gene/Q vector data type for C/C++ Index Special characters D M __align 15 -qflttrap compiler option 11 /= (compound assignment operator) * (indirection operator) 28 *= (compound assignment operator) [ ] (vector subscript operator) 34 >>= (compound assignment operator) 32 <<= (compound assignment operator) 32 & (address operator) 27 &= (compound assignment operator) += (compound assignment operator) = (simple assignment operator) 32 ^= (compound assignment operator) data types vector 3 declarations vector types 3 dereferencing operator macro definition typeof operator 31 macros related to the platform 9 MASS libraries 105 scalar functions 105 modifiable lvalue 32 A address operator (&) 27 aggregate alignment 15 alignment 15, 17, 19 structures 17 structures and unions 15 alignof operator 28 arrays as function parameter 24 declaration 24 asm statements 35 assembly statements 35 assignment operator (=) compound 32 simple 32 B best viable function bit fields type name 31 bool 3 22 C candidate functions 22 cast expressions 3, 5 vector literal 5 compound assignment 32 expression 32 conditional expression (? :) const 3 conversions standard 22 32 32 28 E 32 32 32 ellipsis in function declaration 24 in function definition 24 examples inline assembly statements 39 exception handling for floating point 11 expressions assignment 32 extended friend declarations typedef names 3 F floating-point exceptions 11 function overload resolution functions declaration parameter names 24 signature 24 I implicit conversion 22 types 22 indirection operator (*) 3, 28 initialization vector types 7 initializer lists 7 initializers vector types 7 inline assembly statements 35 32 operators * (indirection) 28 [] (vector subscripting) 34 & (address) 27 = (simple assignment) 32 assignment 32 compound assignment 32 sizeof 29 typeof 31 optimization math functions 105 overload resolution 22 P packed assignments and comparisons variable attribute 19 pixel 3 pointer arithmetic 21 pointers pointer arithmetic 3, 21 vector types 3 32 R references declarator 27 return type size_t 29 S L libmass library 105 library MASS 105 scalar 105 literals vector 5 long long type specifier long type specifier 3 21 O 3 scalar MASS library 105 SIGTRAP signal 11 size_t 29 sizeof operator 29 sizeof... operator 29 standard type conversions 22 statements inline assembly restrictions 39 static 3 in array declaration 24 structures alignment 15 subscripting operator 34 113 T type attributes aligned 19 packed 20 type conversion 21 type name typeof operator 31 type specifiers vector data types 3 typedef names friends 3 typedef specifier 3 typeof operator 31 U unsubscripted arrays description 24 V variable argument list 21 variable length array as function parameter 22 vector literals 5 subscripting operator 34 vector built-in functions vec_abs 56 vec_add 64 vec_and 97 vec_andc 98 vec_ceil 75 vec_cfid 78 vec_cfidu 78 vec_cmpeq 86 vec_cmpgt 84 vec_cmplt 85 vec_cpsgn 65 114 vector built-in functions (continued) vec_ctid 79 vec_ctidu 80 vec_ctiduz 81 vec_ctidz 80 vec_ctiw 82 vec_ctiwu 82 vec_ctiwuz 84 vec_ctiwz 83 vec_extract 88 vec_floor 75 vec_gpci 89 vec_insert 88 vec_ld 45 vec_ld2 50 vec_ld2a 50 vec_lda 45 vec_ldia 47 vec_ldiaa 47 vec_ldiz 48 vec_ldiza 48 vec_lds 49 vec_ldsa 49 vec_logical 99 vec_lvsl 90 vec_lvsr 92 vec_madd 69 vec_msub 70 vec_mul 65 vec_nabs 57 vec_nand 100 vec_neg 56 vec_nmadd 71 vec_nmsub 71 vec_nor 101 vec_not 101 vec_or 102 vec_orc 102 vec_perm 94 vec_promote 95 vec_re 58 Blue Gene/Q vector data type for C/C++ vector built-in functions (continued) vec_res 59 vec_round 76 vec_rsp 76 vec_rsqrte 60 vec_rsqrtes 61 vec_sel 86 vec_sldw 95 vec_splat 96 vec_splats 97 vec_st 51 vec_st2 55 vec_st2a 55 vec_sta 51 vec_sts 53 vec_stsa 53 vec_sub 66 vec_swdiv 67 vec_swdiv_nochk 67 vec_swdivs 68 vec_swdivs_nochk 68 vec_swsqrt 62 vec_swsqrt_nochk 62 vec_swsqrts 63 vec_swsqrts_nochk 63 vec_trunc 77 vec_tstnan 87 vec_xmadd 72 vec_xmul 69 vec_xor 103 vec_xxcpnmadd 73 vec_xxmadd 73 vec_xxnpmadd 74 vector data types 3 vector literal cast expressions 5 vector types 31 in typedef declarations 3 literals 5 void in function definition 24
© Copyright 2024 Paperzz