Blue Gene-Q vector data type for C

򔻐򗗠򙳰
Blue Gene/Q vector data type for C/C++
IBM XL C/C++ for Blue Gene/Q, V12.0 (technology preview)
ii
Blue Gene/Q vector data type for C/C++
Contents
Chapter 1. Blue Gene/Q vector data type 1
Chapter 2. Vector type declaration . . . 3
Vector types (IBM extension) .
typedef definitions . . . .
Vector literals (IBM extension)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 3
. 3
. 5
Chapter 3. Initialization of vectors (IBM
extension) . . . . . . . . . . . . . . 7
Chapter 4. Macros related to the
platform. . . . . . . . . . . . . . . 9
Chapter 5. Compiler option reference
-qflttrap
.
.
.
.
.
.
.
.
.
.
.
.
.
11
.
. 11
Chapter 6. Aligning data . . . . . . . 15
The
The
The
The
The
__align type qualifier (IBM extension)
aligned variable attribute . . . .
aligned type attribute . . . . .
packed variable attribute . . . .
packed type attribute . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
17
19
19
20
Chapter 7. Quad vector usage. . . . . 21
Pointer arithmetic . .
Type conversions . .
Overload resolution .
Parameter declarations
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
22
22
24
Chapter 8. Quad vector operators . . . 27
Address operator & . . . . . . . . .
Indirection operator * . . . . . . . . .
The __alignof__ operator (IBM extension) . .
The sizeof operator . . . . . . . . . .
The typeof operator (IBM extension) . . . .
Assignment operators . . . . . . . . .
Vector subscripting operator [ ] (IBM extension)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
28
28
29
31
32
34
Chapter 9. Inline assembly statements
(IBM extension) . . . . . . . . . . . 35
Supported and unsupported constructs . .
Restrictions on inline assembly statements .
Examples of inline assembly statements . .
.
.
.
.
.
.
. 38
. 39
. 39
Chapter 10. Vector built-in functions . . 45
Load and store functions
vec_ld, vec_lda . .
vec_ldia, vec_ldiaa .
vec_ldiz, vec_ldiza .
vec_lds, vec_ldsa .
vec_ld2, vec_ld2a .
vec_st, vec_sta . .
vec_sts, vec_stsa . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
47
48
49
50
51
53
vec_st2, vec_st2a. . . . . .
Unary arithmetic functions . . .
vec_abs . . . . . . . . .
vec_neg . . . . . . . .
vec_nabs . . . . . . . .
vec_re . . . . . . . . .
vec_res . . . . . . . . .
vec_rsqrte . . . . . . . .
vec_rsqrtes . . . . . . .
vec_swsqrt, vec_swsqrt_nochk .
vec_swsqrts, vec_swsqrts_nochk
Binary arithmetic functions . . .
vec_add . . . . . . . .
vec_cpsgn . . . . . . . .
vec_mul . . . . . . . .
vec_sub. . . . . . . . .
vec_swdiv, vec_swdiv_nochk .
vec_swdivs, vec_swdivs_nochk .
vec_xmul . . . . . . . .
Multiply-add functions . . . .
vec_madd . . . . . . . .
vec_msub . . . . . . . .
vec_nmadd . . . . . . .
vec_nmsub . . . . . . .
vec_xmadd . . . . . . .
vec_xxmadd . . . . . . .
vec_xxcpnmadd . . . . . .
vec_xxnpmadd . . . . . .
Round functions . . . . . . .
vec_ceil. . . . . . . . .
vec_floor . . . . . . . .
vec_round . . . . . . . .
vec_rsp . . . . . . . . .
vec_trunc . . . . . . . .
Conversion functions . . . . .
vec_cfid . . . . . . . .
vec_cfidu . . . . . . . .
vec_ctid . . . . . . . .
vec_ctidu . . . . . . . .
vec_ctidz . . . . . . . .
vec_ctiduz . . . . . . . .
vec_ctiw . . . . . . . .
vec_ctiwu . . . . . . . .
vec_ctiwz . . . . . . . .
vec_ctiwuz . . . . . . .
Comparison functions . . . . .
vec_cmpgt . . . . . . . .
vec_cmplt . . . . . . . .
vec_cmpeq . . . . . . .
vec_sel . . . . . . . . .
vec_tstnan . . . . . . . .
Element manipulation functions .
vec_extract . . . . . . .
vec_insert . . . . . . . .
vec_gpci . . . . . . . .
vec_lvsl. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
56
56
56
57
58
59
60
61
62
63
64
64
65
65
66
67
68
69
69
69
70
71
71
72
73
73
74
75
75
75
76
76
77
77
78
78
79
80
80
81
82
82
83
84
84
84
85
86
86
87
88
88
88
89
90
iii
vec_lvsr . .
vec_perm . .
vec_promote .
vec_sldw . .
vec_splat . .
vec_splats . .
Logical functions
vec_and . .
vec_andc . .
vec_logical. .
vec_nand . .
vec_nor . .
vec_not . .
vec_or . . .
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Blue Gene/Q vector data type for C/C++
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 92
. 94
. 95
. 95
. 96
. 97
. 97
. 97
. 98
. 99
. 100
. 101
. 101
. 102
vec_orc .
vec_xor .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 102
. 103
Chapter 11. Using the Mathematical
Acceleration Subsystem libraries
(MASS) . . . . . . . . . . . . . . 105
Using the scalar library . . . . . . . . . . 105
Using the SIMD library . . . . . . . . . . 108
Compiling and linking a program with MASS . . 111
Using libmass.a with the math system library. . . 111
Index . . . . . . . . . . . . . . . 113
Chapter 1. Blue Gene/Q vector data type
The quad processing extension (QPX) floating-point unit of Blue Gene/Q supports
operations on vectors of four IEEE 754 double-precision floating-point elements. In
this document, the Blue Gene/Q vector data type is referred to as quad vector
type.
The Blue Gene/Q vector data type has an underlying representation of a
four-element double-precision floating-point array.
Note: QPX is like the vector multimedia extension (VMX) and the vector scalar
extension (VSX), but the instruction sets and data types are different.
On Blue Gene/Q, the XL C/C++ compiler provides a set of built-in functions that
are optimized for the QPX floating-point unit. These built-in functions provide an
almost one-to-one correspondence with the QPX instruction set.
In additional, the XL C/C++ compiler includes a set of built-in functions that are
optimized for the PowerPC® architecture. For a full description of these functions,
see the following document:
v Built-in functions for POWER® and PowerPC architectures in XL C/C++ for Linux,
V12.1 Compiler Reference
1
2
Blue Gene/Q vector data type for C/C++
Chapter 2. Vector type declaration
Vector types (IBM extension)
XL C/C++ supports vector processing technologies through language extensions.
In the extended syntax, type qualifiers and storage class specifiers can precede the
keyword vector4double (or its alternative spelling, __vector4double) in a
declaration.
Most of the legal forms of the syntax are captured in the following diagram. Some
variations have been omitted from the diagram for the sake of clarity: type
qualifiers such as const and storage class specifiers such as static can appear in
any order within the declaration, as long as neither immediately follows the
keyword vector (or __vector).
Vector declaration syntax
type_qualifier
storage_class_specifier
vector4double
__vector4double
The following table lists the supported vector data types and the size and possible
values for each type.
Table 1. Vector data types
Type
Interpretation of content
Range of values
vector4double
4 double
IEEE 754 double (64 bit)
precision floating-point
values
The vector4double type must be aligned on 32-byte boundaries. The Blue Gene/Q
technology does not generate exceptions for unaligned vector types. The load and
store operations truncate addresses to a 16 or 32-byte boundary. You can alter the
alignment of a vector type with alignment modifiers, but it is highly recommended
not to alter that alignment. Aggregates containing vector types must be aligned on
32-byte boundaries and padded, if necessary, so that each member that has a
vector type is aligned on a 32-byte boundary. Variable length arrays can contain
vector data types as data members.
typedef definitions
A typedef declaration lets you define your own identifiers that can be used in
place of type specifiers such as int, float, and double. A typedef declaration does
not reserve storage. The names you define using typedef are not new data types,
but synonyms for the data types or combinations of data types they represent.
3
The name space for a typedef name is the same as other identifiers. When an
object is defined using a typedef identifier, the properties of the defined object are
exactly the same as if the object were defined by explicitly listing the data type
associated with the identifier.
IBM
typedef definitions are extended to handle vector types. A vector type can be used
in a typedef definition, and the new type name can be used in the usual ways,
except for declaring other vectors. In a vector declaration context, a typedef name
is disallowed as a type specifier. The following example illustrates a typical usage
of typedef with vector types:
typedef vector4double vdt;
vdt v1;
IBM
Examples of typedef definitions
The following statements define LENGTH as a synonym for int and then use this
typedef to declare length, width, and height as integer variables:
typedef int LENGTH;
LENGTH length, width, height;
The preceding declarations are equivalent to the following declaration:
int length, width, height;
Similarly, typedef can be used to define a structure, union, or C++ class. For
example:
typedef struct {
int scruples;
int drams;
int grains;
} WEIGHT;
The structure WEIGHT can then be used in the following declarations:
WEIGHT
chicken, cow, horse, whale;
In the following example, the type of yds is "pointer to function with no
parameters, returning int".
typedef int SCROLL(void);
extern SCROLL *yds;
In the following typedef definitions, the token struct is part of the type name: the
type of ex1 is struct a; the type of ex2 is struct b.
typedef struct a { char x; } ex1, *ptr1;
typedef struct b { char x; } ex2, *ptr2;
Type ex1 is compatible with the type struct a and the type of the object pointed
to by ptr1. Type ex1 is not compatible with char, ex2, or struct b.
C++
In C++, a typedef name must be different from any class type name declared
within the same scope. If the typedef name is the same as a class type name, it can
only be so if that typedef is a synonym of the class name.
4
Blue Gene/Q vector data type for C/C++
A C++ class defined in a typedef definition without being named is given a
dummy name.Such a class cannot have constructors or destructors. Consider the
following example:
typedef class {
~Trees();
} Trees;
In this example, an unnamed class is defined in a typedef definition. Trees is an
alias for the unnamed class, but not the class type name. So you cannot define a
destructor ~Trees() for this unnamed class; otherwise, the compiler issues an error.
C++
C++0x
Declaring typedef names as friends
In the C++0x standard, the extended friend declarations feature is introduced, with
which you can declare typedef names as friends. For more information, see
Extended friend declarations.
++0x
Cz/OS
Vector literals (IBM extension)
A vector literal is a constant expression for which the value is interpreted as a
vector type. The data type of a vector literal is represented by a parenthesized
vector type, and its value is a set of constant expressions that represent the vector
elements and are enclosed in parentheses or braces. When all vector elements have
the same value, the value of the literal can be represented by a single constant
expression. You can initialize vector types with vector literals.
Vector literal syntax
(
vector_type )
(
{
literal_list
)
}
literal_list:
,
constant_expression
The vector_type is vector4double.
The literal_list can be either of the following expressions:
v A single constant expression
v A comma-separated list of constant expressions
The delimiters around literal_list determine how the constant expressions are
interpreted and the expected number of constant expressions. Use one of the
following types of characters for the delimiters:
v Parenthesis
Chapter 2. Vector type declaration
5
The list must include exactly one or four constant expressions.
– With one constant expression, all elements of the vector are initialized to the
specified value.
– With four constant expressions, each element of the vector is initialized to the
corresponding specified value.
v Braces
The number of constant expressions can be less than the number of elements in
the vector. Each unspecified element is set to 0.0.
The following table shows the possible combinations where c1, c2, c3, and c4 are
constant expressions.
Table 2. Quad vector literals
Literal
Result
(vector4double) (c1)
(c1, c1, c1, c1)
(vector4double) (c1, c2)
Compile-time error
(vector4double) (c1, c2, c3)
Compile-time error
(vector4double) (c1, c2, c3, c4)
(c1, c2, c3, c4)
(vector4double) {c1}
(c1, 0.0, 0.0, 0.0)
(vector4double) {c1, c2}
(c1, c2, 0.0, 0.0)
(vector4double) {c1, c2, c3}
(c1, c2, c3, 0.0)
(vector4double) {c1, c2, c3, c4}
(c1, c2, c3, c4)
Note: All the constant expressions in the initializer list must have a type that is
appropriate for the vector literal. If that is not the case, the compiler converts the
expression to one of which the type is compatible with the vector literal type. If the
constant expression used to initialize the vector element has a value that cannot be
represented in the destination format (the vector type), the compiler truncates that
value. If you specify the -qinfo=trd option, the compiler generates a message
stating that the value is not preserved.
Example
(vector4double) (3.0);
// Assign the double-precision floating-point value 3.0 to all the four
// elements that constitute the vector.
(vector4double) (10.0,20.0,30.0,40.0);
// Assign the double-precision floating-point values 10.0, 20.0, 30.0, and 40.0
// to the four elements that constitute the vector.
(vector4double) {10.0};
// Assign the double-precision floating-point value 10.0 to the first element
// of the vector. The other elements are set to 0.0.
(vector4double) {10.0,20.0};
// Assign the double-precision floating-point values 10.0 and 20.0
// to the first and second elements of the vector. The other
// elements are set to 0.0.
(vector4double) {10.0,20.0,30.0,40.0};
// Assign the double-precision floating-point values 10.0, 20.0, 30.0, and 40.0
// to the four elements that constitute the vector.
6
Blue Gene/Q vector data type for C/C++
Chapter 3. Initialization of vectors (IBM extension)
A vector type is initialized by a vector literal or any expression having the same
vector type. For example:
vector4double v1;
vector4double v2 = (vector4double) (10.);
vector4double v3 = (vector4double) (1.0, 2.0, 3.0, 4.0);
v1 = v2;
With XL C/C++, you can initialize a vector type with an initializer list. This feature
is an extension for compatibility with GNU C.
Vector initializer list syntax
,
vector_type
identifier
=
{
initializer
}
;
The number of values in a braced initializer list must be less than or equal to the
number of elements of the vector type. Any uninitialized element will be initialized
to zero.
The following are examples of vector initialization using initializer lists:
vector4double v1 = {1.0};
// initialize the first element of v1 with 1.0
// and the remaining three elements with 0.0
vector4double v2 = {1.0,2.0};
// initialize the first two elements of v2 with 1.0
// and 2.0, and the remaining two elements with 0.0
vector4double v3 = {1.0,2.0,3.0,4.0};
// equivalent to the vector literal
// (vector4double) (1.0,2.0,3.0,4.0)
Unlike vector literals, the values in the initializer list do not have to be constant
expressions unless the initialized vector variable has static duration. Thus, the
following is legal:
double i=1.0;
double foo() { return 2.0; }
int main()
{
vector4double v1 = {i, foo()};
return 0;
}
7
8
Blue Gene/Q vector data type for C/C++
Chapter 4. Macros related to the platform
The following predefined macros are provided to facilitate porting applications
between platforms. All platform-related predefined macros are unprotected and
can be undefined or redefined without warning unless otherwise specified.
Table 3. Platform-related predefined macros
Predefined macro name Description
Predefined Predefined under the
value
following conditions
__bg__
Indicates that this is a Blue Gene® platform.
1
Always predefined for all
Blue Gene platforms.
__bgq__
Indicates that the architecture is the processor
of Blue Gene/Q.
1
Predefined when the
architecture is the processor
of Blue Gene/Q.
_BIG_ENDIAN,
__BIG_ENDIAN__
Indicates that the platform is big-endian (that
is, the most significant byte is stored at the
memory location with the lowest address).
1
Always predefined.
__ELF__
Indicates that the ELF object model is in effect. 1
Always predefined for the
Linux platform.
Always predefined.
__GXX_WEAK__
Indicates that weak symbols are supported
1
(used for template instantiation by the linker).
__powerpc,
__powerpc__
Indicates that the target is a Power
architecture.
1
Predefined when the target is
a Power architecture.
__PPC, __PPC__
Indicates that the target is a Power
architecture.
1
Predefined when the target is
a Power architecture.
__THW_BLUEGENE,
__THW_BLUEGENE__
Indicates that the target architecture is Blue
Gene.
1
Predefined when the target is
Blue Gene.
__TOS_BGQ__
Indicates that the target architecture is the
processor of Blue Gene/Q.
1
Predefined when the target is
the processor of Blue
Gene/Q.
__unix, __unix__
Indicates that the operating system is a variety 1
of UNIX.
Always predefined.
__VECTOR4DOUBLE__
Indicates the support of vector data types on
Blue Gene/Q
Predefined on Blue Gene/Q
C++
1
9
10
Blue Gene/Q vector data type for C/C++
Chapter 5. Compiler option reference
-qflttrap
Category
Error checking and debugging
Pragma equivalent
#pragma options [no]flttrap
Purpose
Determines what types of floating-point exceptions to detect at run time.
The program receives a SIGFPE signal when the corresponding exception occurs.
Syntax
-q
noflttrap
flttrap
:
= zero
zerodivide
und
underflow
ov
overflow
inv
invalid
inex
inexact
enable
en
nanq
qpxstore
qpxs
Defaults
-qnoflttrap
Parameters
enable, en
Inserts a trap when the specified exceptions (overflow, underflow, zerodivide,
invalid, or inexact) occur. You must specify this suboption if you want to turn
on exception trapping without modifying your source code. If any of the
specified exceptions occur, a SIGTRAP or SIGFPE signal is sent to the process
with the precise location of the exception.
inexact, inex
Enables the detection of floating-point inexact operations. If a floating-point
11
inexact operation occurs, an inexact operation exception status flag is set in the
Floating-Point Status and Control Register (FPSCR).
invalid, inv
Enables the detection of floating-point invalid operations. If a floating-point
invalid operation occurs, an invalid operation exception status flag is set in the
FPSCR.
nanq
Generates code to detect Not a Number Quiet (NaNQ) and Not a Number
Signalling (NaNS) exceptions before and after each floating-point operation,
including assignment, and after each call to a function returning a
floating-point result to trap if the value is a NaN. Trapping code is generated
regardless of whether the enable suboption is specified.
overflow, ov
Enables the detection of floating-point overflow. If a floating-point overflow
occurs, an overflow exception status flag is set in the FPSCR.
qpxstore, qpxs
Enables the detection of Not a Number (NaN) or infinity values in Quad
Processing eXtension (QPX) vectors.
To detect NaN or infinity values, the compiler generates stores with indicating
instructions for QPX vectors in registers. The indicating vector stores are used
for both stores as a result of using QPX store intrinsics or assignment
operators.
underflow, und
Enables the detection of floating-point underflow. If a floating-point underflow
occurs, an underflow exception status flag is set in the FPSCR.
zerodivide, zero
Enables the detection of floating-point division by zero. If a floating-point
zero-divide occurs, a zero-divide exception status flag is set in the FPSCR.
Usage
Specifying -qflttrap option with no suboptions is equivalent to
-qflttrap=overflow:underflow:zerodivide:invalid:inexact
Exceptions will be detected by the hardware, but trapping is not enabled.
It is recommended that you use the enable suboption whenever compiling the
main program with -qflttrap. This ensures that the compiler will generate the code
to automatically enable floating-point exception trapping, without requiring that
you include calls to the appropriate floating-point exception library functions in
your code.
If you specify -qflttrap more than once, both with and without suboptions, the
-qflttrap without suboptions is ignored.
The -qflttrap option is recognized during linking with IPA. Specifying the option
at the link step overrides the compile-time setting.
If your program contains signalling NaNs, you should use the -qfloat=nans option
along with -qflttrap to trap any exceptions.
12
Blue Gene/Q vector data type for C/C++
The compiler exhibits behavior as illustrated in the following examples when the
-qflttrap option is specified together with an optimization option:
v with -O2:
– 1/0 generates a div0 exception and has a result of infinity
– 0/0 generates an invalid operation
v with -O3 or greater:
– 1/0 generates a div0 exception and has a result of infinity
– 0/0 returns zero multiplied by the result of the previous division.
Note: Due to the transformations performed and the exception handling support
of some vector instructions, use of -qsimd=auto may change the location where an
exception is caught or even cause the compiler to miss catching an exception.
Predefined macros
None.
Example
#include <stdio.h>
int main()
{
float x, y, z;
x = 5.0;
y = 0.0;
z = x / y;
printf("%f", z);
}
When you compile this program with the following command, the program stops
when the division is performed.
xlc -qflttrap=zerodivide:enable divide_by_zero.c
The zerodivide suboption identifies the type of exception to guard against. The
enable suboption causes a SIGFPE signal to be generated when the exception
occurs.
Related information
v -qfloat
v -qarch
Chapter 5. Compiler option reference
13
14
Blue Gene/Q vector data type for C/C++
Chapter 6. Aligning data
XL C/C++ provides many mechanisms for specifying data alignment at the levels
of individual variables, members of aggregates, entire aggregates, and entire
compilation units. If you are porting applications between different platforms, or
between 32-bit and 64-bit modes, you need to take into account the differences
between alignment settings available in the different environments, to prevent
possible data corruption and deterioration in performance. In particular, vector
types have special alignment requirements which, if not followed, can produce
incorrect results. That is, vectors need to be aligned according to a 32 byte
boundary.
Using alignment modes, you can set alignment defaults for all data types for a
compilation unit (or subsection of a compilation unit), by specifying a predefined
suboption.
Using alignment modifiers, you can set the alignment for specific variables or data
types within a compilation unit, by specifying the exact number of bytes that
should be used for the alignment.
Using alignment modes discusses the default alignment modes for all data types
on the different platforms and addressing models; the suboptions and pragmas
you can use to change or override the defaults; and rules for the alignment modes
for simple variables, aggregates, and bit fields.
Using alignment modifiers discusses the different specifiers, pragmas, and
attributes you can use in your source code to override the alignment mode
currently in effect, for specific variable declarations. It also provides the rules
governing the precedence of alignment modes and modifiers during compilation.
The __align type qualifier (IBM extension)
The __align qualifier is a language extension that allows you to specify an explicit
alignment for an aggregate or a static (or global) variable. The specified byte
boundary affects the alignment of an aggregate as a whole, not that of its
members. The __align qualifier can be applied to an aggregate definition nested
within another aggregate definition, but not to individual elements of an
aggregate. The alignment specification is ignored for parameters and automatic
variables.
The __align type qualifier can also be used with vector types. Similar to the
aligned attribute, the alignment of a vector type cannot be reduced using the
__align type qualifier.
A declaration takes one of the following forms:
__align qualifier syntax for simple variables
type specifier __align (
int_constant
) declarator
15
__align qualifier syntax for structures or unions
__align (
{
int_constant
)
member_declaration_list }
struct
union
tag_identifier
;
where int_constant is a positive integer value indicating the byte-alignment
boundary. Legal values are powers of 2 up to 32768.
The following restrictions and limitations apply:
v The __align qualifier cannot be used where the size of the variable alignment is
smaller than the size of the type alignment.
v Not all alignments may be representable in an object file.
v The __align qualifier cannot be applied to the following:
–
–
–
–
–
Individual elements within an aggregate definition.
Individual elements of an array.
Variables of incomplete type.
Aggregates declared but not defined.
Other types of declarations or definitions, such as a typedef, a function, or an
enumeration.
Examples using the __align qualifier
Applying __align to static or global variables:
// varA is aligned on a 1024-byte boundary and padded with 1020 bytes
int __align(1024) varA;
int main()
{...}
// varB is aligned on a 512-byte boundary and padded with 508 bytes
static int __align(512) varB;
// Error
int __align(128) functionB( );
// Error
typedef int __align(128) T;
// Error
__align enum C {a, b, c};
Applying __align to align and pad aggregate tags without affecting aggregate
members:
// Struct structA is aligned on a 1024-byte boundary
// with size including padding of 1024 bytes.
__align(1024) struct structA
{
int i;
int j;
};
// Union unionA is aligned on a 1024-byte boundary
// with size including padding of 1024 bytes.
__align(1024) union unionA
16
Blue Gene/Q vector data type for C/C++
{
int i;
int j;
};
Applying __align to a structure or union, where the size and alignment of the
aggregate using the structure or union is affected:
// sizeof(struct S) == 128
__align(128) struct S {int i;};
// sarray is aligned on 128-byte boundary with sizeof(sarray) == 1280
struct S sarray[10];
// Error: alignment of variable is smaller than alignment of type
struct S __align(64) svar;
// s2 is aligned on 128-byte boundary with sizeof(s2) == 256
struct S2 {struct S s1; int a;} s2;
Applying __align to an array:
In the following example, only arrayA is aligned on a 64-byte boundary, and
elements within that array are aligned according to the alignment of AnyType.
Padding is applied before the beginning of the array and does not affect the size of
the array member itself.
AnyType __align(64) arrayA[10];
Applying __align where the size of the variable alignment differs from the size of
the type alignment:
__align(64) struct S {int i;};
// Error: alignment of variable is smaller than alignment of type.
struct S __align(32) s1;
// s2 is aligned on 128-byte boundary
struct S __align(128) s2;
// Error
struct S __align(16) s3[10];
// Error
int __align(1) s4;
// Error
__align(1) struct S {int i;};
The aligned variable attribute
With the aligned variable attribute, you can override the default memory
alignment mode to specify a minimum memory alignment value, expressed as a
number of bytes, for any of the following types of variables:
v Non-aggregate variables
v Aggregate variables (such as a structures, classes, or unions)
v Selected member variables
The attribute is typically used to increase the alignment of the given variable.
Chapter 6. Aligning data
17
aligned variable attribute syntax
__attribute__ ((
aligned
__aligned__
))
(
alignment_factor
)
The alignment_factor is the number of bytes, specified as a constant expression that
evaluates to a positive power of 2. You can specify a value up to a maximum of 1
GB. If you omit the alignment factor, and its enclosing parentheses, the compiler
automatically uses 16 bytes. If you specify an alignment factor greater than the
maximum, the compiler uses the default alignment in effect and ignores your
specification.
When you apply the aligned attribute to a member variable in a bit field structure,
the attribute specification is applied to the bit field container. If the default
alignment of the container is greater than the alignment factor, the default
alignment is used.
The aligned attribute can be applied to the following types of variables:
v static vector variables
v auto vector variables
v Aggregate members that have a vector type
The alignment of auto variables is limited to the maximal stack alignment:
v 32 for functions containing vector data types
v 16 for other functions
The aligned attribute cannot be used to decrease the natural alignment of any
type, including vector types. The aligned attribute is ignored with a warning
message when the alignment factor is less than 32 for vector types.
Example
In the following example, the structures first_address and second_address are set
to an alignment of 16 bytes:
struct address {
int street_no;
char *street_name;
char *city;
char *prov;
char *postal_code;
} first_address __attribute__((__aligned__(16))) ;
struct address second_address __attribute__((__aligned__(16))) ;
In the following example, only the members first_address.prov and
first_address.postal_code are set to an alignment of 16 bytes:
struct address {
int street_no;
char *street_name;
char *city;
char *prov __attribute__((__aligned__(16))) ;
char *postal_code __attribute__((__aligned__(16))) ;
} first_address ;
18
Blue Gene/Q vector data type for C/C++
The aligned type attribute
With the aligned type attribute, you can override the default alignment mode to
specify a minimum alignment value, expressed as a number of bytes, for a
structure, class, union, enumeration, or other user-defined type created in a
typedef declaration. The aligned attribute is typically used to increase the
alignment of any variables declared of the type to which the attribute applies.
aligned type attribute syntax
__attribute__ ((
aligned
__aligned__
))
(
alignment_factor
)
The alignment_factor is the number of bytes, specified as a constant expression that
evaluates to a positive power of 2. You can specify a value up to a maximum
1048576 bytes. If you omit the alignment factor (and its enclosing parentheses), the
compiler automatically uses 16 bytes. If you specify an alignment factor greater
than the maximum, the attribute specification is ignored, and the compiler uses the
default alignment in effect.
The alignment value that you specify is applied to all instances of the type. Also,
the alignment value applies to the variable as a whole; if the variable is an
aggregate, the alignment value applies to the aggregate as a whole, not to the
individual members of the aggregate.
The aligned attribute cannot be used to decrease the natural alignment of any
type, including vector types. The aligned attribute is ignored with a warning when
the alignment factor is less than 32 for vector types.
Example
In all of the following examples, the aligned attribute is applied to the structure
type A. Because a is declared as a variable of type A, it also receives the alignment
specification, as any other instances declared of type A.
struct __attribute__((__aligned__(8))) A {};
struct __attribute__((__aligned__(8))) A {} a;
typedef struct __attribute__((__aligned__(8))) A {} a;
The packed variable attribute
The variable attribute packed allows you to override the default alignment mode,
to reduce the alignment for all members of an aggregate, or selected members of
an aggregate to the smallest possible alignment: one byte for a member and one bit
for a bit field member.
The packed attribute can be applied to aggregate members that have a vector type.
That attribute reduces the member alignment to one byte.
Note: The compiler does not generate warnings if the vector members are not
aligned on 32-byte boundaries.
Chapter 6. Aligning data
19
packed variable attribute syntax
__attribute__ ((
packed
__packed__
))
The packed type attribute
The packed type attribute specifies that the minimum alignment should be used for
the members of a structure, class, union, or enumeration type. For structure, class,
or union types, the alignment is one byte for a member and one bit for a bit field
member. For enumeration types, the alignment is the smallest size that will
accomodate the range of values in the enumeration. All members of all instances of
that type will use the minimum alignment.
The packed attribute can be applied to aggregate members that have a vector type.
That attribute reduces the member alignment to one byte.
Note: The compiler does not generate warnings if the vector members are not
aligned on 32-byte boundaries.
packed type attribute syntax
__attribute__ ((
packed
__packed__
))
Unlike the aligned type attribute, the packed type attribute is not allowed in a
typedef declaration.
20
Blue Gene/Q vector data type for C/C++
Chapter 7. Quad vector usage
This section describes how the quad vector type is integrated in the XL C/C++
compiler.
Pointer arithmetic
You can perform a limited number of arithmetic operations on pointers. These
operations are:
v Increment and decrement
v Addition and subtraction
v Comparison
v Assignment
The increment (++) operator increases the value of a pointer by the size of the data
object the pointer refers to. For example, if the pointer refers to the second element
in an array, the ++ makes the pointer refer to the third element in the array.
The decrement (--) operator decreases the value of a pointer by the size of the
data object the pointer refers to. For example, if the pointer refers to the second
element in an array, the -- makes the pointer refer to the first element in the array.
You can add an integer to a pointer but you cannot add a pointer to a pointer.
If the pointer p points to the first element in an array, the following expression
causes the pointer to point to the third element in the same array:
p = p + 2;
If you have two pointers that point to the same array, you can subtract one pointer
from the other. This operation yields the number of elements in the array that
separate the two addresses that the pointers refer to.
You can compare two pointers with the following operators: ==, !=, <, >, <=,
and >=.
Pointer comparisons are defined only when the pointers point to elements of the
same array. Pointer comparisons using the == and != operators can be performed
even when the pointers point to elements of different arrays.
You can assign to a pointer the address of a data object, the value of another
compatible pointer or the NULL pointer.
IBM
Pointer arithmetic is defined for pointer to vector types. Given:
vector4double *v;
the expression v + 1 represents a pointer to the vector following v.
21
Type conversions
An expression of a given type is implicitly converted when it is used in the
following situations:
v As an operand of an arithmetic or logical operation.
v As a condition in an if statement or an iteration statement (such as a for loop).
The expression will be converted to a Boolean (or an integer in C89).
v In a switch statement. The expression is converted to an integral type.
v As an assignment is made to an lvalue that has a different type than the
assigned value.
v As an initialization. This includes the following types:
– A function is provided an argument value that has a different type than the
parameter.
– The value specified in the return statement of a function has a different type
from the defined return type for the function.
C
The implicit conversion result is an rvalue.
C
The implicit conversion result belongs to one of the following value
C++
categories depending on different converted expressions types:
v An lvalue if the type is an lvalue reference type C++0x or an rvalue reference
++0x
to a function type Cz/OS
++0x
v
An xvalue if the type is an rvalue reference to an object type Cz/OS
C++0x
++0x
v A C++0x (prvalue) Cz/OS
rvalue in other cases
C++
You can perform explicit type conversions using a cast expression, as described in
Cast expressions.
Vector type casts (IBM extension)
In the Blue Gene/Q environment, vector types cannot be converted to other vector
types or other compiler intrinsics data types.
Overload resolution
The process of selecting the most appropriate overloaded function or operator is
called overload resolution.
Suppose that f is an overloaded function name. When you call the overloaded
function f(), the compiler creates a set of candidate functions. This set of functions
includes all of the functions named f that can be accessed from the point where
you called f(). The compiler may include as a candidate function an alternative
representation of one of those accessible functions named f to facilitate overload
resolution.
After creating a set of candidate functions, the compiler creates a set of viable
functions. This set of functions is a subset of the candidate functions. The number
of parameters of each viable function agrees with the number of arguments you
used to call f().
22
Blue Gene/Q vector data type for C/C++
The compiler chooses the best viable function, the function declaration that the C++
runtime environment will use when you call f(), from the set of viable functions.
The compiler does this by implicit conversion sequences. An implicit conversion
sequence is the sequence of conversions required to convert an argument in a
function call to the type of the corresponding parameter in a function declaration.
The implicit conversion sequences are ranked; some implicit conversion sequences
are better than others. The best viable function is the one whose parameters all
have either better or equal-ranked implicit conversion sequences than all of the
other viable functions. The compiler will not allow a program in which the
compiler was able to find more than one best viable function. Implicit conversion
sequences are described in more detail in Implicit conversion sequences .
When a variable length array is a function parameter, the leftmost array dimension
does not distinguish functions among candidate functions. In the following, the
second definition of f is not allowed because void f(int []) has already been
defined.
void f(int a[*]) {}
void f(int a[5]) {} // illegal
However, array dimensions other than the leftmost in a variable length array do
differentiate candidate functions when the variable length array is a function
parameter. For example, the overload set for function f might comprise the
following:
void f(int a[][5]) {}
void f(int a[][4]) {}
void f(int a[][g]) {}
// assume g is a global int
but cannot include
void f(int a[][g2]) {} // illegal, assuming g2 is a global int
because having candidate functions with second-level array dimensions g and g2
creates ambiguity about which function f should be called: neither g nor g2 is
known at compile time.
IBM
If you are using vector data types, the parameter of the calling function should be
of the exact same type as the vector data type.
Example:
int f(vector4double) // (function 1)
{
return 1;
}
int f(double)
// (function 2)
{
return 2;
}
For f((vector4double)(1.0)), the overloading resolution will find that function 1
is the best candidate function. If function 1 is not in the candidate function list,
the compiler will not find a matchable candidate function.
IBM
You can override an exact match by using an explicit cast. In the following
example, the second call to f() matches with f(void*):
Chapter 7. Quad vector usage
23
void f(int) { };
void f(void*) { };
int main() {
f(0xaabb);
f((void*) 0xaabb);
}
// matches f(int);
// matches f(void*)
Parameter declarations
The function declarator includes the list of parameters that can be passed to the
function when it is called by another function, or by itself.
C++
In C++, the parameter list of a function is referred to as its signature. The
name and signature of a function uniquely identify it. As the word itself suggests,
the function signature is used by the compiler to distinguish among the different
instances of overloaded functions.
Function parameter declaration syntax
,
(
)
parameter
,
...
parameter
type_specifier
register
declarator
C++
An empty argument list in a function declaration or definition indicates a
function that takes no arguments. To explicitly indicate that a function does not
take any arguments, you can declare the function in two ways: with an empty
parameter list, or with the keyword void:
int f(void);
int f();
An empty argument list in a function definition indicates that a function
C
that takes no arguments. An empty argument list in a function declaration indicates
that a function may take any number or type of arguments. Thus,
int f()
{
...
}
indicates that function f takes no arguments. However,
int f();
simply indicates that the number and type of parameters is not known. To
explicitly indicate that a function does not take any arguments, you can replace the
argument list with the keyword void.
int f(void);
C
24
Blue Gene/Q vector data type for C/C++
An ellipsis at the end of the parameter specifications is used to specify that a
function has a variable number of parameters. The number of parameters is equal
to, or greater than, the number of parameter specifications.
int f(int, ...);
C++
The comma before the ellipsis is optional. In addition, a parameter
declaration is not required before the ellipsis.
C
At least one parameter declaration, as well as a comma before the
ellipsis, are both required in C.
Functions with variable number of parameters do not accept parameters
that have the vector4double type.
IBM
Parameter types
In a function declaration, or prototype, the type of each parameter must be
C++
specified.
In the function definition, the type of each parameter must also
C
In the function definition, if the type of a parameter is not
be specified.
specified, it is assumed to be int.
A variable of a user-defined type may be declared in a parameter declaration, as in
the following example, in which x is declared for the first time:
struct X { int i; };
void print(struct X x);
C
The user-defined type can also be defined within the parameter
C++
declaration.
The user-defined type can not be defined within the
parameter declaration.
void print(struct X { int i; } x);
void print(struct X { int i; } x);
// legal in C
// error in C++
Parameter names
In a function definition, each parameter must have an identifier. In a function
declaration, or prototype, specifying an identifier is optional. Thus, the following
example is legal in a function declaration:
int func(int,long);
The following constraints apply to the use of parameter names in
C++
function declarations:
v Two parameters cannot have the same name within a single declaration.
v If a parameter name is the same as a name outside the function, the name
outside the function is hidden and cannot be used in the parameter declaration.
In the following example, the third parameter name intersects is meant to have
enumeration type subway_line, but this name is hidden by the name of the first
parameter. The declaration of the function subway() causes a compile-time error,
because subway_line is not a valid type name. The first parameter name
subway_line hides the namespace scope enum type and cannot be used again in
the third parameter.
enum subway_line {yonge, university, spadina, bloor};
int subway(char * subway_line, int stations, subway_line intersects);
C++
Chapter 7. Quad vector usage
25
Static array indices in function parameter declarations (C only)
Except in certain contexts, an unsubscripted array name (for example, region
instead of region[4]) represents a pointer whose value is the address of the first
element of the array, provided that the array has previously been declared. An
array type in the parameter list of a function is also converted to the corresponding
pointer type. Information about the size of the argument array is lost when the
array is accessed from within the function body.
To preserve this information, which is useful for optimization, you may declare the
index of the argument array using the static keyword. The constant expression
specifies the minimum pointer size that can be used as an assumption for
optimizations. This particular usage of the static keyword is highly prescribed.
The keyword may only appear in the outermost array type derivation and only in
function parameter declarations. If the caller of the function does not abide by
these restrictions, the behavior is undefined.
Note: This feature is C99 specific.
The following examples show how the feature can be used.
void foo(int arr [static 10]);
void foo(int arr [const 10]);
void foo(int arr [static const i]);
void foo(int arr [const static i]);
void foo(int arr [const]);
26
Blue Gene/Q vector data type for C/C++
/* arr points to the first of at least
10 ints
/* arr is a const pointer
/* arr points to at least i ints;
i is computed at run time.
/* alternate syntax to previous example
/* const pointer to int
*/
*/
*/
*/
*/
Chapter 8. Quad vector operators
The following operators support the quad vector type.
Address operator &
The & (address) operator yields a pointer to its operand. The operand must be an
lvalue, a function designator, or a qualified name. It cannot be a bit field.
C
It cannot have the storage class register.
If the operand is an lvalue or function, the resulting type is a pointer to the
expression type. For example, if the expression has type int, the result is a pointer
to an object having type int.
If the operand is a qualified name and the member is not static, the result is a
pointer to a member of class and has the same type as the member. The result is
not an lvalue.
If p_to_y is defined as a pointer to an int and y as an int, the following
expression assigns the address of the variable y to the pointer p_to_y :
p_to_y = &y;
IBM
The address operator has been extended to handle vector types, provided
that vector support is enabled. The result of the address operator applied to a
vector type can be stored in a pointer to a compatible vector type. The address of a
vector type can be used to initialize a pointer to vector type if both sides of the
initialization have compatible types. A pointer to void can also be initialized with
the address of a vector type.
The ampersand symbol & is used in C++ as a reference declarator in
C++
addition to being the address operator. The meanings are related but not identical.
int target;
int &rTarg = target;
void f(int*& p);
// rTarg is a reference to an integer.
// The reference is initialized to refer to target.
// p is a reference to a pointer
If you take the address of a reference, it returns the address of its target. Using the
previous declarations, &rTarg is the same memory address as &target.
You may take the address of a register variable.
You can use the & operator with overloaded functions only in an initialization or
assignment where the left side uniquely determines which version of the
overloaded function is used.
C++
IBM
The address of a label can be taken using the GNU C address operator
&&. The label can thus be used as a value.
27
Indirection operator *
The * (indirection) operator determines the value referred to by the pointer-type
operand. The operand cannot be a pointer to an incomplete type. If the operand
points to an object, the operation yields an lvalue referring to that object. If the
operand points to a function, the result is a function designator in C or, in C++, an
lvalue referring to the object to which the operand points. Arrays and functions are
converted to pointers.
The type of the operand determines the type of the result. For example, if the
operand is a pointer to an int, the result has type int.
Do not apply the indirection operator to any pointer that contains an address that
is not valid, such as NULL. The result is not defined.
If p_to_y is defined as a pointer to an int and y as an int, the expressions:
p_to_y = &y;
*p_to_y = 3;
cause the variable y to receive the value 3.
IBM
The indirection operator * has been extended to handle pointer to vector types,
provided that vector support is enabled. A vector pointer should point to a
memory location that has 32-byte alignment. However, the compiler does not
enforce this constraint. Dereferencing a vector pointer maintains the vector type
and its 32-byte alignment. If a program dereferences a vector pointer that does not
contain a 32-byte aligned address, the behavior is undefined. See the following
example:
vector4double v1;
vector4double *pv1;
v1=*pv1; // legal, results in a copy of data pointed at pv1 into v1.
IBM
The __alignof__ operator (IBM extension)
The __alignof__ operator is a language extension to C99 and Standard C++ that
returns the number of bytes used in the alignment of its operand. The operand can
be an expression or a parenthesized type identifier. If the operand is an expression
representing an lvalue, the number returned by __alignof__ represents the
alignment that the lvalue is known to have. The type of the expression is
determined at compile time, but the expression itself is not evaluated. If the
operand is a type, the number represents the alignment usually required for the
type on the target platform.
The __alignof__ operator may not be applied to the following:
v An lvalue representing a bit field
v A function type
v An undefined structure or class
v An incomplete type (such as void)
28
Blue Gene/Q vector data type for C/C++
__alignof__ operator syntax
__alignof__
unary_expression
( type-id )
If type-id is a reference or a referenced type, the result is the alignment of the
referenced type. If type-id is an array, the result is the alignment of the array
element type. If type-id is a fundamental type, the result is implementation-defined.
For example, on Blue Gene/Q, __alignof__(long) returns 8.
The operand of __alignof__ can be a vector type, provided that vector support is
enabled. For example,
vector4double v1 = (vector4double) {1., 2., 3., 4.};
vector4double *pv1 = &v1;
__alignof__(v1);
// vector type alignment: 32
__alignof__(&v1); // address of vector alignment: 8
__alignof__(*pv1); // dereferenced pointer to vector alignment: 32
__alignof__(pv1); // pointer to vector alignment: 8
__alignof__(vector4double); // vector type alignment: 32
When the aligned attribute is applied to a vector variable, the value returned by
__alignof__ is the actual alignment of the variable. The actual alignment is greater
than the aligned attribute when the aligned attribute is less than the natural
alignment of the vector type. For example:
vector4double v1 __attribute__((aligned(4))); //
//
//
//
int alignment = __alignof__(v1);
The aligned attribute is ignored
because the alignment factor is
less than the natural alignment
of v1
// alignment will be 32, not 4
The sizeof operator
The sizeof operator yields the size in bytes of the operand, which can be an
expression or the parenthesized name of a type.
sizeof operator syntax
sizeof
expr
( type-name
)
The result for either kind of operand is not an lvalue, but a constant integer value.
The type of the result is the unsigned integral type size_t defined in the header
file stddef.h.
Except in preprocessor directives, you can use a sizeof expression wherever an
integral constant is required. One of the most common uses for the sizeof operator
is to determine the size of objects that are referred to during storage allocation,
input, and output functions.
Another use of sizeof is in porting code across platforms. You can use the sizeof
operator to determine the size that a data type represents. For example:
sizeof(int);
Chapter 8. Quad vector operators
29
The sizeof operator applied to a type name yields the amount of memory that can
be used by an object of that type, including any internal or trailing padding.
The operand of the sizeof operator can be a vector variable, a vector
type, or the result of dereferencing a pointer to vector type, provided that vector
support is enabled. In these cases, the return value of sizeof is always 32.
IBM
vector4double v1;
vector4double *pv1 = &v1;
sizeof(v1);
// size of vector type: 32
sizeof(&v1); // size of address of vector: 8
sizeof(*pv1); // size of dereferenced pointer to vector: 32
sizeof(pv1); // size of pointer to vector: 8
sizeof(vector4double); // size of vector type: 32
IBM
For compound types, results are as follows:
Operand
Result
An array
The result is the total number of bytes in the array. For
example, in an array with 10 elements, the size is equal to 10
times the size of a single element. The compiler does not
convert the array to a pointer before evaluating the
expression.
C++
A class
The result is always nonzero. It is equal to the number of
bytes in an object of that class, also including any padding
required for placing class objects in an array.
C++
A reference
The result is the size of the referenced object.
The sizeof operator cannot be applied to:
v A bit field
v A function type
v An undefined structure or class
v An incomplete type (such as void)
The sizeof operator applied to an expression yields the same result as if it had
been applied to only the name of the type of the expression. At compile time, the
compiler analyzes the expression to determine its type. None of the usual type
conversions that occur in the type analysis of the expression are directly
attributable to the sizeof operator. However, if the operand contains operators that
perform conversions, the compiler does take these conversions into consideration
in determining the type. For example, the second line of the following sample
causes the usual arithmetic conversions to be performed. Assuming that a short
uses 2 bytes of storage and an int uses 4 bytes,
short x; ... sizeof (x)
short x; ... sizeof (x + 1)
/* the value of sizeof operator is 2 */
/* value is 4, result of addition is type int */
The result of the expression x + 1 has type int and is equivalent to sizeof(int).
The value is also 4 if x has type char, short, or int or any enumeration type.
sizeof... is a unary expression operator introduced by the variadic
template feature. This operator accepts an expression that names a parameter pack
as its operand. It then expands the parameter pack and returns the number of
arguments provided for the parameter pack. Consider the following example:
C++0x
30
Blue Gene/Q vector data type for C/C++
template<typename...T> void foo(T...args){
int v = sizeof...(args);
}
In this example, the variable v is assigned to the number of the arguments
provided for the parameter pack args.
Notes:
v The operand of the sizeof... operator must be an expression that names a
parameter pack.
v The operand of the sizeof operator cannot be an expression that names a
parameter pack or a pack expansion.
For more information, see Variadic templates (C++0x)
The typeof operator (IBM extension)
The typeof operator returns the type of its argument, which can be an expression
or a type. The language feature provides a way to derive the type from an
expression. Given an expression e, __typeof__(e) can be used anywhere a type
name is needed, for example in a declaration or in a cast. The alternate spelling of
the keyword, __typeof__, is recommended.
The typeof operator is extended to accept a vector type as its operand, when
vector support is enabled.
typeof operator syntax
__typeof__
typeof
(
expr
type-name
)
A typeof construct itself is not an expression, but the name of a type. A typeof
construct behaves like a type name defined using typedef, although the syntax
resembles that of sizeof.
The following examples illustrate its basic syntax. For an expression e:
int e;
__typeof__(e + 1) j;
/* the same as declaring int j;
*/
e = (__typeof__(e)) f; /* the same as casting e = (int) f; */
Using a typeof construct is equivalent to declaring a typedef name. Given
int T[2];
int i[2];
you can write
__typeof__(i) a;
__typeof__(int[2]) a;
__typeof__(T) a;
/* all three constructs have the same meaning */
The behavior of the code is as if you had declared int a[2];.
Examples with vectors:
vector4double v1 = (vector4double) {1., 2., 3., 4.};
__typeof__(v1) w1; // w1 has the vector4double type
__typeof__(vector4double) w2; // w2 has the vector4double type
Chapter 8. Quad vector operators
31
For a bit field, typeof represents the underlying type of the bit field. For example,
int m:2;, the typeof(m) is int. Since the bit field property is not reserved, n in
typeof(m) n; is the same as int n, but not int n:2.
The typeof operator can be nested inside sizeof and itself. The following
declarations of arr as an array of pointers to int are equivalent:
int *arr[10];
/* traditional C declaration
__typeof__(__typeof__ (int *)[10]) a; /* equivalent declaration
*/
*/
The typeof operator can be useful in macro definitions where expression e is a
parameter. For example,
#define SWAP(a,b) { __typeof__(a) temp; temp = a; a = b; b = temp; }
Note:
1. The typeof and __typeof__ keywords are supported as follows:
v
v
C
The __typeof__ keyword is recognized under compilation with the
xlc invocation command or the -qlanglvl=extc89, -qlanglvl=extc99, or
-qlanglvl=extended options. The typeof keyword is only recognized under
compilation with -qkeyword=typeof.
C++
The typeof and __typeof__ keywords are recognized by default.
Assignment operators
An assignment expression stores a value in the object designated by the left operand.
There are two types of assignment operators:
v “Simple assignment operator =”
v “Compound assignment operators” on page 33
The left operand in all assignment expressions must be a modifiable lvalue. The
type of the expression is the type of the left operand. The value of the expression
is the value of the left operand after the assignment has completed.
C
The result of an assignment expression is not an lvalue.
result of an assignment expression is an lvalue.
C++
The
All assignment operators have the same precedence and have right-to-left
associativity.
Simple assignment operator =
The simple assignment operator has the following form:
lvalue = expr
The operator stores the value of the right operand expr in the object designated by
the left operand lvalue.
The left operand must be a modifiable lvalue. The type of an assignment operation
is the type of the left operand.
If the left operand is not a class type or a vector type, the right operand is
implicitly converted to the type of the left operand. This converted type will not be
qualified by const or volatile.
32
Blue Gene/Q vector data type for C/C++
If the left operand is a class type, that type must be complete. The copy
assignment operator of the left operand will be called.
If the left operand is an object of reference type, the compiler will assign the value
of the right operand to the object denoted by the reference.
The assignment operator has been extended to permit operands of vector
type. Both sides of an assignment expression must be of the same vector type.
IBM
Compound assignment operators
The compound assignment operators consist of a binary operator and the simple
assignment operator. They perform the operation of the binary operator on both
operands and store the result of that operation into the left operand, which must
be a modifiable lvalue.
The following table shows the operand types of compound assignment
expressions:
Operator
Left operand
Right operand
+= or -=
Arithmetic
Arithmetic
+= or -=
Pointer
Integral type
*=, /=, and %=
Arithmetic
Arithmetic
<<=, >>=, &=, ^=, and |=
Integral type
Integral type
Note that the expression
a *= b + c
is equivalent to
a = a * (b + c)
and not
a = a * b + c
The following table lists the compound assignment operators and shows an
expression using each operator:
Operator
Example
Equivalent expression
+=
index += 2
index = index + 2
-=
*pointer -= 1
*pointer = *pointer - 1
*=
bonus *= increase
bonus = bonus * increase
/=
time /= hours
time = time / hours
%=
allowance %= 1000
allowance = allowance % 1000
<<=
result <<= num
result = result << num
>>=
form >>= 1
form = form >> 1
&=
mask &= 2
mask = mask & 2
^=
test ^= pre_test
test = test ^ pre_test
|=
flag |= ON
flag = flag | ON
Chapter 8. Quad vector operators
33
Although the equivalent expression column shows the left operands (from the
example column) twice, it is in effect evaluated only once.
C++
In addition to the table of operand types, an expression is implicitly
converted to the cv-unqualified type of the left operand if it is not of class type.
However, if the left operand is of class type, the class becomes complete, and
assignment to objects of the class behaves as a copy assignment operation.
Compound expressions and conditional expressions are lvalues in C++, which
allows them to be a left operand in a compound assignment expression.
C
When GNU C language features have been enabled, compound
expressions and conditional expressions are allowed as lvalues, provided that their
operands are lvalues. The following compound assignment of the compound
expression (a, b) is legal under GNU C, provided that expression b, or more
generally, the last expression in the sequence, is an lvalue:
IBM
(a,b) += 5 /* Under GNU C, this is equivalent to
a, (b += 5)
*/
Vector subscripting operator [ ] (IBM extension)
Access to individual elements of a vector data type is provided through the use of
square brackets, similar to how array elements are accessed. The vector data type
is followed by a set of square brackets containing the position of the element. The
position of the first element is 0. The type of the result is the type of the elements
contained in the vector type.
Example:
vector4double v1 = (vector4double) {1.0, 2.0, 3.0, 4.0};
double d1, d2, d3, d4;
d1 = v1[0];
// d1=1.0
d2 = v1[1];
// d2=2.0
d3 = v1[2];
// d3=3.0
d4 = v1[3];
// d4=4.0
Note: You can also access and manipulate individual elements of vectors with the
following intrinsic functions:
v vec_extract
v vec_insert
v vec_promote
v vec_splats
34
Blue Gene/Q vector data type for C/C++
Chapter 9. Inline assembly statements (IBM extension)
Under extended language levels, the compiler provides full support for embedded
assembly code fragments among C and C++ source statements. This extension has
been implemented for use in general system programming code, and in the
operating system kernel and device drivers, which were originally developed with
GNU C.
The keyword asm stands for assembly code. When strict language levels are used
in compilation, the C compiler recognizes and ignores the keyword asm in a
declaration. The C++ compiler always recognizes the keyword.
The syntax is as follows:
asm statement syntax — statement in local scope
asm
__asm
__asm__
(
volatile
code_format_string
)
:
output
:
input
:
clobbers
input:
,
constraint
(
C_expression
)
modifier
output:
,
modifier constraint
(
C_expression
)
asm statement syntax — statement in global scope
asm
__asm
__asm__
(
code_format_string
)
volatile
The qualifier volatile instructs the compiler to perform only minimal
optimizations on the assembly block. The compiler cannot move any
instructions across the implicit fences surrounding the assembly block. See
Example 1 for detailed usage information.
code_format_string
The code_format_string is the source text of the asm instructions and is a
string literal similar to a printf format specifier.
35
Operands are referred to in the %integer format, where integer refers to the
sequential number of the input or output operand. See Example 1 for
detailed usage information.
To increase readability, each operand can be given a symbolic name
enclosed in brackets. In the assembler code section, you can refer to each
operand in the %[symbolic_name] format, where the symbolic_name is
referenced in the operand list. You can use any name, including existing C
or C++ symbols, for a symbolic operand, because the symbolic operand
names have no relation to any C or C++ identifiers. However, no two
operands in the same assembly statement can use the same symbolic name.
See Example 2 for detailed usage information.
output
The output consists of zero, one or more output operands, separated by
commas. Each operand consists of a constraint(C_expression) pair. The
output operand must be constrained by the = or + modifier (described
below), and, optionally, by an additional % or & modifier.
input
The input consists of zero, one or more input operands, separated by
commas. Each operand consists of a constraint(C_expression) pair.
clobbers
clobbers is a comma-separated list of register names enclosed in double
quotes. If an asm instruction updates registers that are not listed in the
input or output of the asm statement, the registers must be listed as
clobbered registers. The following register names are valid :
r0 to r31
General purpose registers
f0 to f31
Floating-point registers
lr
Link register
ctr
Loop count, decrement and branching register
fpscr
Floating-point status and control register
xer
Fixed-point exception register
cr0 to cr7
Condition registers. Example 3 shows a typical use of condition
registers in the clobbers.
v0 to v31
Vector registers (on selected processors only)
In addition to the register names, cc and memory can also be used in the list
of clobbered registers. The usage information of cc and memory is listed as
follows:
cc
Add cc to the list of clobbered registers if assembler instructions
can alter the condition code register.
memory
Add memory to the list of clobbered registers if assembler
instructions can change a memory location in an unpredictable
fashion. The memory clobber ensures that the data used after the
completion of the assembly statement is valid and synchronized.
36
Blue Gene/Q vector data type for C/C++
However, the memory clobber can result in many unnecessary
reloads, reducing the benefits of hardware prefetching. Thus, the
memory clobber can impose a performance penalty and should be
used with caution. See Example 4 and Example 1 for the detailed
usage information.
modifier
The modifier can be one of the following operators:
=
Indicates that the operand is write-only for this instruction. The
previous value is discarded and replaced by output data. See
Example 5 for detailed usage information.
+
Indicates that the operand is both read and written by the
instruction. See Example 6 for detailed usage information.
&
Indicates that the operand may be modified before the instruction
is finished using the input operands; a register that is used as
input should not be reused here.
%
Declares the instruction to be commutative for this operand and
the following operand. This means that the order of this operand
and the next may be swapped when generating the instruction.
This modifier can be used on an input or output operand, but
cannot be specified on the last operand. See Example 7 for detailed
usage information.
constraint
The constraint is a string literal that describes the kind of operand that is
permitted, one character per constraint. The following constraints are
supported:
b
Use a general register other than zero. Some instructions treat the
designation of register 0 specially, and do not behave as expected if
the compiler chooses r0. For these instructions, the designation of
r0 does not mean that r0 is used. Instead, it means that the literal
value 0 is specified. See Example 8 for detailed usage information.
c
Use the CTR register.
f
Use a floating-point register. See Example 7 for detailed usage
information.
g
Use a general register, memory, or immediate operand. In POWER,
there are no instructions where a register, memory specifier, or
immediate operand can be used interchangeably. However, this
constraint is tolerated where it is possible to do so.
h
Use the CTR or LINK register.
i
Use an immediate integer or string literal operand.
l
Use the CTR register.
m
Use a memory operand supported by the machine. You can use
this constraint for operands of the form D(R), where D is a
displacement and R is a register. See Example 9 for detailed usage
information.
n
Use an immediate integer.
o
Use a memory operand that is offsetable. This means that the
memory operand can be addressed by adding an integer to a base
Chapter 9. Inline assembly statements (IBM extension)
37
address. In POWER, memory operands are always offsetable, so
the constraints o and m can be used interchangeably.
r
Use a general register. See Example 5 for detailed usage
information.
s
Use a string literal operand.
v
Use a vector register. In the inline assembly statements, the input
or output operands can be of the vector4double type. To allocate a
register for an operand of the vector4double type, you must use
the v constraint. See Example 10 for detailed usage information.
0, 1, 2, ...
A matching constraint. Allocate the same register in output as in
the corresponding input.
I, J, K, L, M, N, O, P
Constant values. Fold the expression in the operand and substitute
the value into the % specifier. These constraints specify a maximum
value for the operand, as follows:
v
v
v
v
v
I — signed 16-bit
J — unsigned 16-bit shifted left 16 bits
K — unsigned 16-bit constant
L — signed 16-bit shifted left 16 bits
M — unsigned constant greater than 31
v N — unsigned constant that is an exact power of 2
v O — zero
v P — signed whose negation is a signed 16-bit constant
C_expression
The C_expression is a C or C++ expression whose value is used as the
operand for the asm instruction. Output operands must be modifiable
lvalues. The C_expression must be consistent with the constraint specified
on it. For example, if i is specified, the operand must be an integer
constant number.
Note: If pointer expressions are used in input or output, the assembly instructions
should honor the ANSI aliasing rule (see Type-based aliasing for more
information). This means that indirect addressing using values in pointer
expression operands should be consistent with the pointer types; otherwise, you
must disable the -qalias=ansi option during compilation.
Supported and unsupported constructs
Supported constructs
The inline assembly statements support the following constructs:
v All the instruction statements listed in the Assembler Language Reference
v All extended instruction mnemonics
v Label definitions
v Branches to labels
38
Blue Gene/Q vector data type for C/C++
Unsupported constructs
The inline assembly statements do not support the following constructs:
v Pseudo-operation statements, which are assembly statements that begin with a
dot (.), such as .function
v Branches between different asm blocks
In addition, some constraints originating from the GNU compiler are not
supported, but are tolerated where it is possible. For example, constraints S and T
are treated as immediates, but the compiler issues a warning message stating that
they are unsupported.
Restrictions on inline assembly statements
The following restrictions are on the use of inline assembly statements:
v The assembler instructions must be self-contained within an asm statement. The
asm statement can only be used to generate instructions. All connections to the
rest of the program must be established through the output and input operand
list.
v Referencing an external symbol directly, without going through the operand list,
is not supported.
v Assembler instructions requiring a pair of registers are not specifiable by any
constraints, and are therefore not supported. For example, you cannot use the %f
constraint for a long double operand.
v The shared register file between the floating-point scalar and the vector registers
on POWER7® are not modelled as shared in inline assembly statements. You
must specify registers f0-f31 and v0-v31 in the clobbers list. There is no
combined x0-x63.
v Operand replacements (such as %0, %1, and so on) can use an optional x before
the number or symbolic name to indicate that a vsx register reference must be
used. For example, a vector operand %1 allocated to register v0 is replaced with 0
(for use in VMX instructions). The same operand used as %x1 in the assembly
text is replaced with 32 (for use in VSX instructions). Note that this restriction
applies only for architectures that support VSX architecture extension, such as
POWER7).
Examples of inline assembly statements
Example 1: The following example illustrates the usage of the volatile keyword.
#include <stdio.h>
inline bool acquireLock(int *lock){
bool returnvalue = false;
int lockval;
asm volatile(
/*--------a fence here-----*/
" 0: lwarx %0,0,%2
\n" // Loads the word and reserves; reserves a
// memory location for the subsequent stwcx.
// instruction.
"
"
cmpwi %0,0
bne- 1f
\n" // Compares the lock value to 0.
\n" // If it is 0, you can acquire
// the lock. Otherwise, you fail to get
// the lock and must try again later.
"
ori %0,%0,1
\n" // Sets the lock to 1.
Chapter 9. Inline assembly statements (IBM extension)
39
"
stwcx. %0,0,%2
\n" // Tries to conditionally store 1
// into the lock word to acquire
// the lock.
"
bne- 0b
\n" // Reservation was lost. Try again.
"
isync
\n" //
//
//
//
//
//
"
ori
%1,%1,1
Lock acquired. The isync instruction
implements an import barrier to ensure
that the instructions that access the
shared region guarded by this lock are
executed only after they acquire the
lock.
\n" // Sets the return value for the function
// acquireLock to true.
" 1:
\n" // Did not get the lock. Will return false.
/*------a fence here------*/
:
:
:
);
"+r"
"+r"
"r"
(lockval),
(returnvalue)
(lock)
// Lock is the address of the lock in
// memory.
"cr0"
// cr0 is clobbered by cmpwi and stwcx.
return returnvalue;
}
int main()
{
int myLock;
if(acquireLock(&myLock)){
printf("got it!\n");
}else{
printf("someone else got it\n");
}
return 0;
}
In this example, %0 refers to the first operand "+r"(lockval), %1 refers to the
second operand "+r"(returnvalue), and %2 refers to the third operand "r"(lock).
The assembly statement uses a lock to control access to the shared storage; no
instruction can access the shared storage before acquiring the lock.
The volatile keyword implies fences around the assembly instruction group, so
that no assembly instructions can be moved out of or around the assembly block.
Without the volatile keyword, the compiler can move the instructions around for
optimization. This might cause some instructions to access the shared storage
without acquiring the lock.
It is unnecessary to use the memory clobber in this assembly statement, because the
instructions do not modify memory in an unexpected way. If you use the memory
clobber, the program is still functionally correct. However, the memory clobber
results in many unnecessary reloads, imposing a performance penalty.
Example 2: The following example illustrates the use of the symbolic names for
input and output operands.
int a ;
int b = 1, c = 2, d = 3 ;
__asm(" addc %[result], %[first], %[second]"
40
Blue Gene/Q vector data type for C/C++
: [result]
: [first]
[second]
);
"=r"
"r"
"r"
(a)
(b),
(d)
In this example, %[result] refers to the output operand variable a, %[first] refers
to the input operand variable b, and %[second] refers to the input operand variable
d.
Example 3: The following example shows a typical use of condition registers in the
clobbers.
asm ("
:
:
:
);
add. %0,%1,%2
"=r"
(c)
"r"
(a),
"r"
(b)
"cr0"
\n"
In this example, apart from the registers listed in the input and output of the
assembly statement, the add. instruction also affects the condition register field 0.
Therefore, you must inform the compiler about this by adding cr0 to the clobbers.
Example 4: The following example shows the usage of the memory clobber.
asm volatile (" dcbz 0, %0
: "=r"(b)
:
: "memory"
);
\n"
In this example, the instruction dcbz clears a cache block, and might have changed
the variables in the memory location. There is no way for the compiler to know
which variables have been changed. Therefore, the compiler assumes that all data
might be aliased with the memory changed by that instruction.
As a result, everything that is needed must be reloaded from memory after the
completion of the assembly statement. The memory clobber ensures program
correctness at the expense of program performance, because the compiler might
reload data that had nothing to do with the assembly statement.
Example 5: The following example shows the usage of the = modifier and the r
constraint.
int a ;
int b = 100 ;
int c = 200 ;
asm("
add %0,
:
"=r"
:
"r"
"r"
);
%1, %2"
(a)
(b),
(c)
The add instruction adds the contents of two general purpose registers. The %0, %1,
and %2 operands are substituted by the C expressions in the output/input operand
fields.
The output operand uses the = modifier to indicate that a modifiable operand is
required; it uses the r constraint to indicate that a general purpose register is
required. Likewise, the r constraint in the input operands indicates that general
purpose registers are required. Within these restrictions, the compiler is free to
choose any registers to substitute for %0, %1, and %2.
Chapter 9. Inline assembly statements (IBM extension)
41
Note: If the compiler chooses r0 for the second operand, the add instruction uses
the literal value 0 and yields an unexpected result. Thus, to prevent the compiler
from choosing r0 for the second operand, you can use the b constraint to denote
the second operand.
Example 6: The following example shows the usage of the + modifier and the K
constraint.
asm ("
:
:
addi %0,%0,%2"
"+r"
(a)
"r"
(a),
"K"
(15)
);
This assembly statement adds operand %0 and operand %2, and writes the result to
operand %0. The output operand uses the + modifier to indicate that operand %0
can be read and written by the instruction. The K constraint indicates that the value
loaded to operand %2 must be an unsigned 16-bit constant value.
Example 7: The following example shows the usage of the % modifier and the f
constraint.
asm("
:
:
fadd %0, %1, %2"
"=f"
(c)
"%f"
(a),
"f"
(b)
);
This assembly statement adds operands a and b, and writes the result to operand
c. The % modifier indicates that operands a and b can be switched if the compiler
can generate better code in doing so. Each operand has the f constraint, which
indicates that a floating point register is required.
Example 8: The following example shows the usage of the b constraint.
char res[8]={’a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’};
char a=’y’;
int index=7;
asm ("
:
:
stbx %0,%1,%2
"r"
"b"
"r"
\n"
\
\
(a),
(index),
(res)
);
In this example, the b constraint instructs the compiler to choose a general register
other than r0 for the input operand %1. The result string of this program is
abcdefgy. However, if you use the r constraint and the compiler chooses r0 for %1,
this instruction produces an incorrect result string ybcdefgh. For instructions that
treat the designation of r0 specially, it is therefore important to denote the input
operands with the b constraint.
Example 9: The following example shows the usage of the m constraint.
asm ("
:
:
);
stb %1,%0
"=m"
(res)
"r"
(a)
\n"
\
\
In this example, the syntax of the instruction stb is stb RS,D(RA), where D is a
displacement and R is a register. D+RA forms an effective address, which is
42
Blue Gene/Q vector data type for C/C++
calculated from D(RA). You do not need to manually construct effective addresses
by specifying the register and displacement separately.
You can use a single constraint m or o to refer to the two operands in the
instruction, regardless of what the correct offset should be and whether it is an
offset off the stack or off the TOC (Table of Contents). This allows the compiler to
choose the right register (r1 for an automatic variable, for instance) and apply the
right displacement automatically.
Example 10: The following example shows the usage of the v constraint.
vector4double rv, av, bv, cv;
...
__asm("
qvfadd 4,
%1, %2 \n",
"
qvfadd %0, 4,
%3 \n"
/* ouput register */
:
"=v"
(rv)
/*
:
input registers */
"v"
(av),
"v"
(bv),
"v"
(cv)
/*
:
);
clobbered register
"f4"
*/
In this example, the inline assembly statement adds the operands av, bv, and cv;
writes the result to the rv operand. A temporary vector register, register 4, is used
to store the sum of the operands av and bv.
The v constraint instructs the compiler to allocate a vector register for the av, bv,
cv, or rv operand. The qpx registers and floating point scalar registers are
physically the same registers, so you can list only the floating point registers that
are altered as the clobbered registers.
Chapter 9. Inline assembly statements (IBM extension)
43
44
Blue Gene/Q vector data type for C/C++
Chapter 10. Vector built-in functions
Individual elements of vectors can be accessed by using the Quad Processing
Extension (QPX) built-in functions. This section provides an alphabetical reference
to the QPX built-in functions. You can use these functions to manipulate vectors.
You must specify appropriate compiler options for your architecture when you use
the built-in functions.
This section uses pseudocode description to represent function syntax, as shown
below:
d=func_name(a, b, c)
In the description,
v d represents the return value of the function.
v a, b, and c represent the arguments of the function.
v func_name is the name of the function.
For example, the syntax for the function vector4double vec_add(vector4double,
vector4double); is represented by d=vec_add(a, b).
Some built-in functions depend on the value of the floating-point status and
control register (FPSCR). For information on the FPSCR, see FPSCR functions.
Floating-point operands for logical functions
In the quad vector logical functions, such as vec_and, floating-point operands are
interpreted in the following ways:
v Any value that is greater than or equal to zero (both positive zero and negative
zero) is interpreted as the true logical value.
v Any value that is less than zero is interpreted as the false logical value.
v NaN is interpreted as false.
In the result values, floating-point boolean values are as follows:
v true is 1.0.
v false is -1.0.
Load and store functions
With the load and store functions, you can load quad vectors from memory and
store them to memory.
vec_ld, vec_lda
Purpose
Loads a vector from the given memory address.
Syntax
d=vec_ld(a, b)
d=vec_lda(a, b)
45
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
long*
unsigned long*
long long*
unsigned long long*
float*
_Complex float*
double*
_Complex double*
Result value
The effective address (EA) is the sum of a and b. The effective address is truncated
to an n-byte alignment depending on the type of b as shown in the following table.
The result is the content of the n bytes of memory starting at the effective address.
Type of b
n
long*
unsigned long*
long long*
unsigned long long*
32
float*
16
_Complex float*
double*
32
_Complex double*
vec_lda generates an exception (SIGBUS) if the effective address is not aligned to
the appropriate memory boundary indicated in the table.
If b is a pointer to a variable of the single-precision floating-point type or
single-precision complex type, the values loaded from memory are converted to
double precision before being saved to the result value.
Formula
The following table shows the formulas depending on the type of b.
46
Type of b
Formula
long*
unsigned long*
long long*
unsigned long long*
d[0]=Memory[EA]
d[1]=Memory[EA+8]
d[2]=Memory[EA+16]
d[3]=Memory[EA+24]
Blue Gene/Q vector data type for C/C++
Type of b
Formula
float*
d[0]=(double)
d[1]=(double)
d[2]=(double)
d[3]=(double)
_Complex float*
double*
Memory_SP[EA]
Memory_SP[EA+4]
Memory_SP[EA+8]
Memory_SP[EA+12]
d[0]=Memory[EA]
d[1]=Memory[EA+8]
d[2]=Memory[EA+16]
d[3]=Memory[EA+24]
_Complex double*
Note: Memory_SP[] is a single-precision floating-point array.
Example
Type of b
Memory values
d
long*
unsigned long*
long long*
unsigned long long*
0x4024000000000000,
0x4034000000000000,
0x403E000000000000,
0x4044000000000000
(10.0, 20.0, 30.0, 40.0)
float*
10.0f, 20.0f, 30.0f, 40.0f
(10.0, 20.0, 30.0, 40.0)
_Complex float*
(10.0f, 20.0f) (30.0f, 40.0f)
double*
10.0, 20.0, 30.0, 40.0
_Complex double*
(10.0, 20.0) (30.0, 40.0)
vec_ldia, vec_ldiaa
Purpose
Loads a vector from four 4-byte signed integer values at the given memory
address, with sign extension to 8-byte signed integer values.
Syntax
d=vec_ldia(a, b)
d=vec_ldiaa(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
int*
Result value
The effective address (EA) is the sum of a and b. The effective address is truncated
to a 16-byte alignment. The contents of the 16 bytes starting at the effective address
are loaded from memory. They are then converted from four 4-byte signed integer
values to four 8-byte signed integer values before being saved in the result value.
vec_ldiaa generates an exception (SIGBUS) if the effective address is not aligned to
a 16-byte memory boundary.
Chapter 10. Vector built-in functions
47
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(long)
(long)
(long)
(long)
Memory_4B[EA]
Memory_4B[EA+4]
Memory_4B[EA+8]
Memory_4B[EA+12]
Note: Memory_4B[] is a 4-byte signed integer array.
Example
Memory values: (10, -20, 30, -40)
Convert result values d to IEEE floating point numbers using: d2 = vec_cfid(d)
d2: (10.0, -20.0, 30.0, -40.0)
vec_ldiz, vec_ldiza
Purpose
Loads a vector from four 4-byte unsigned integer values at the given memory
address, with zero extension to 8-byte unsigned integer values.
Syntax
d=vec_ldiz(a, b)
d=vec_ldiza(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
unsigned*
Result value
The effective address (EA) is the sum of a and b. The effective address is truncated
to a 16-byte alignment. The contents of the 16 bytes starting at the effective address
are loaded from memory. Each of their four 4-byte integer values is extended with
zeros to fill 8-byte integer values before being saved in the result value.
vec_ldiza generates an exception (SIGBUS) if the effective address is not aligned to
a 16-byte memory boundary.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(unsigned
(unsigned
(unsigned
(unsigned
long)
long)
long)
long)
Memory_4B[EA]
Memory_4B[EA+4]
Memory_4B[EA+8]
Memory_4B[EA+12]
Note: Memory_4B[] is a 4-byte integer array.
48
Blue Gene/Q vector data type for C/C++
Example
Memory values: (10, 20, 30, 40)
Convert result values d to IEEE floating point numbers using: d2 = vec_cfid(d)
d2: (10.0, 20.0, 30.0, 40.0)
vec_lds, vec_ldsa
Purpose
Loads a vector from a single floating-point or complex value at the given memory
address.
Syntax
d=vec_lds(a, b)
d=vec_ldsa(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
double* (only for vec_lds)
float* (only for vec_lds)
_Complex double*
_Complex float*
Result value
The effective address (EA) is the sum of a and b. If b is a pointer to a complex
value, the effective address is truncated to an n-byte alignment depending on the
type of b as shown in the following table. The loaded value or complex value is
replicated to fill the result.
Type of b
n
_Complex double*
_Complex float*
16
8
vec_ldsa generates an exception (SIGBUS) if the effective address is not aligned to
the appropriate memory boundary indicated in the table.
If b is a pointer to a variable of the single-precision floating-point type or
single-precision complex type, the values loaded from memory are converted to
double precision before being saved to the result value.
Chapter 10. Vector built-in functions
49
Formula
The following table shows the formulas depending on the type of b.
Type of b
_Complex
double*
double*
float*
_Complex float*
d[0]
Memory[EA]
(double)
Memory_SP[EA]
Memory[EA]
(double)
Memory_SP[EA]
d[1]
Memory[EA]
(double)
Memory_SP[EA]
Memory[EA+8]
(double)
Memory_SP[EA+4]
d[2]
Memory[EA]
(double)
Memory_SP[EA]
Memory[EA]
(double)
Memory_SP[EA]
d[3]
Memory[EA]
(double)
Memory_SP[EA]
Memory[EA+8]
(double)
Memory_SP[EA+4]
Note: Memory_SP[] is a single-precision floating-point array.
Example
Type of b
double*
float*
_Complex
double*
_Complex float*
Memory values
10.0
10.0f
(10.0, 20.0)
(10.0f, 20.0f)
d
(10.0, 10.0, 10.0, 10.0)
(10.0, 20.0, 10.0, 20.0)
vec_ld2, vec_ld2a
Purpose
Loads a vector from two floating-point values at a given memory address.
Syntax
d=vec_ld2(a, b)
d=vec_ld2a(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
double*
float*
Result value
The effective address (EA) is the sum of a and b. The effective address is truncated
to an n-byte alignment depending on the type of b as shown in the following table.
n bytes of memory are loaded from memory starting at the effective address and
replicated to fill the result.
50
Blue Gene/Q vector data type for C/C++
Type of b
n
double*
float*
16
8
vec_ld2a generates an exception (SIGBUS) if the effective address is not aligned to
the appropriate memory boundary indicated in the table.
If b is a pointer to a variable of the single-precision floating-point type, the values
loaded from memory are converted to double precision before being saved to the
result value.
Formula
The following table shows the formulas depending on the type of b.
Type of b
double*
float*
d[0]
Memory[EA]
(double) Memory_SP[EA]
d[1]
Memory[EA+8]
(double) Memory_SP[EA+4]
d[2]
Memory[EA]
(double) Memory_SP[EA]
d[3]
Memory[EA+8]
(double) Memory_SP[EA+4]
Note: Memory_SP[] is a single-precision floating-point array.
Example
Type of b
double*
float*
Memory values
10.0, 20.0
10.0f, 20.0f
d
(10.0, 20.0, 10.0, 20.0)
vec_st, vec_sta
Purpose
Stores a vector to memory at the given address.
Syntax
vec_st(a, b, c)
vec_sta(a, b, c)
Chapter 10. Vector built-in functions
51
Argument types
The following table describes the types of the function arguments.
a
b
c
vector4double
long
int*
unsigned*
long*
unsigned long*
long long*
unsigned long long*
float*
_Complex float*
double*
_Complex double*
Result
The effective address (EA) is the sum of b and c. The effective address is truncated
to an n-byte alignment depending on the type of c as shown in the following table.
The value of a is then stored at the effective address.
Type of c
n
int*
unsigned*
16
long*
unsigned long*
long long*
unsigned long long*
32
float*
16
_Complex float*
double*
32
_Complex double*
vec_sta generates an exception (SIGBUS) if the effective address is not aligned to
the appropriate memory boundary indicated in the table.
If c is a pointer to a variable of single-precision floating-point type or
single-precision complex type, the elements of a are converted to single precision
before being saved to memory.
If c is a pointer to a variable of 4-byte integer type, the four low-order bytes of the
elements of a are saved to memory.
52
Blue Gene/Q vector data type for C/C++
Formula
The following table shows the formulas depending on the type of c.
Type of c
Formula
int*
unsigned*
Memory_4B[EA]=a[0]32:63
Memory_4B[EA+4]=a[1]32:63
Memory_4B[EA+8]=a[2]32:63
Memory_4B[EA+12]=a[3]32:63
long*
unsigned long*
long long*
unsigned long long*
Memory[EA]=a[0]
Memory[EA+8]=a[1]
Memory[EA+16]=a[2]
Memory[EA+24]=a[3]
float*
Memory_SP[EA]=(float) a[0]
Memory_SP[EA+4]=(float) a[1]
Memory_SP[EA+8]=(float) a[2]
Memory_SP[EA+12]=(float) a[3]
_Complex float*
double*
Memory[EA]=a[0]
Memory[EA+8]=a[1]
Memory[EA+16]=a[2]
Memory[EA+24]=a[3]
_Complex double*
Notes:
v Memory_SP[] is a single-precision floating-point array.
v Memory_4B[] is a 4-byte integer array.
Examples
Type of c
a
Memory values
int*
unsigned*
(10, 20, 30, 40)
10, 20, 30, 40
long*
(10.0, 20.0, 30.0, 40.0)
unsigned long*
long long*
unsigned long long*
0x4024000000000000,
0x4034000000000000,
0x403E000000000000,
0x4044000000000000
float*
10.0f, 20.0f, 30.0f, 40.0f
(10.0, 20.0, 30.0, 40.0)
_Complex float*
(10.0f, 20.0f) (30.0f, 40.0f)
double*
10.0, 20.0, 30.0, 40.0
_Complex double*
(10.0, 20.0) (30.0, 40.0)
vec_sts, vec_stsa
Purpose
Stores the first element or the first two elements of a quad vector to memory at the
given address.
Syntax
vec_sts(a, b, c)
vec_stsa(a, b, c)
Chapter 10. Vector built-in functions
53
Argument types
The following table describes the types of the function arguments.
a
b
c
vector4double
long
double* (only for vec_sts)
float* (only for vec_sts)
_Complex double*
_Complex float*
Result
The effective address (EA) is the sum of b and c. If c is a pointer to a complex
value, the effective address is truncated to an n-byte alignment depending on the
type of c as shown in the following table. The value of a is then stored to the
effective address as follows:
v If c is a pointer to a variable of floating-point type, the first element of a is
stored to memory.
v If c is a pointer to a variable of complex type, the first two elements of a are
stored to memory.
Type of c
n
_Complex double*
_Complex float*
16
8
vec_stsa generates an exception (SIGBUS) if the effective address is not aligned to
the appropriate memory boundary indicated in the table.
If c is a pointer to a variable of single-precision floating-point type or
single-precision complex type, the elements of a are converted to single precision
before being saved to memory.
Formula
The following tables show the formulas depending on the type of c.
Type of c
double*
Formula Memory[EA] =
a[0]
_Complex
double*
Memory[EA] =
a[0]
Memory[EA+8] =
a[1]
float*
_Complex float*
Memory_SP[EA] =
(float) a[0]
Memory_SP[EA] =
(float) a[0]
Memory_SP[EA+4] =
(float) a[1]
Note: Memory_SP[] is a single-precision floating-point array.
54
Blue Gene/Q vector data type for C/C++
Examples
Type of c
_Complex
double*
double*
a
(10.0, 20.0, 30.0, 40.0)
Memory values
10.0
(10.0, 20.0)
float*
_Complex float*
10.0f
(10.0f, 20.0f)
vec_st2, vec_st2a
Purpose
Stores the first two elements of a quad vector to memory at the given address.
Syntax
vec_st2(a, b, c)
vec_st2a(a, b, c)
Argument types
The following table describes the types of the function arguments.
a
b
c
vector4double
long
double*
float*
Result
The effective address (EA) is the sum of b and c. The effective address is truncated
to an n-byte alignment depending on the type of c as shown in the following table.
The first two elements of a are then stored at the effective address.
Type of c
n
double*
float*
16
8
vec_st2a generates an exception (SIGBUS) if the effective address is not aligned to
the appropriate memory boundary indicated in the table.
If c is a pointer to a variable of single-precision floating-point type, the elements of
a are converted to single precision before being saved to memory.
Formula
The following table shows the formulas depending on the type of c.
Type of c
Formula
double*
float*
Memory[EA]=a[0]
Memory[EA+8]=a[1]
Memory_SP[EA]=(float) a[0]
Memory_SP[EA+4]=(float) a[1]
Chapter 10. Vector built-in functions
55
Note: Memory_SP[] is a single-precision floating-point array.
Examples
Type of c
double*
float*
a
(10.0, 20.0, 30.0, 40.0)
Memory values
10.0, 20.0
10.0f, 20.0f
Unary arithmetic functions
This section provides a reference to the quad vector unary arithmetic functions.
vec_abs
Purpose
Returns a vector containing the absolute values of the contents of the given vector.
Syntax
d=vec_abs(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
The value of each element of the result is the absolute value of the corresponding
element of a.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
|a[0]|
|a[1]|
|a[2]|
|a[3]|
Example
a = (10.0, -20.0, 30.0, -40.0)
d: (10.0, 20.0, 30.0, 40.0)
vec_neg
Purpose
Returns a vector containing the negated value of the corresponding elements in the
given vector.
56
Blue Gene/Q vector data type for C/C++
Syntax
d=vec_neg(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
This function multiplies the value of each element in the given vector by -1.0 and
then assigns the result to the corresponding elements in the result vector.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-a[0]
-a[1]
-a[2]
-a[3]
Example
a = ( 10.0, -20.0, 30.0, -40.0)
d: (-10.0, 20.0, -30.0, 40.0)
vec_nabs
Purpose
Returns a vector containing the results of performing a negative-absolute operation
using the given vector.
Syntax
d=vec_nabs(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
This function computes the absolute value of each element in the given vector and
then assigns the negated value of the result to the corresponding elements in the
result vector.
Chapter 10. Vector built-in functions
57
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-|a[0]|
-|a[1]|
-|a[2]|
-|a[3]|
Example
a = ( 10.0, -20.0, 30.0, -40.0)
d: (-10.0, -20.0, -30.0, -40.0)
vec_re
Purpose
Returns a vector containing estimates of the reciprocals of the corresponding
elements of the given vector.
Syntax
d=vec_re(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of the result contains the estimated value of the reciprocal of the
corresponding element of a.
Note:
The precision guarantee is specified by the following expression, where x is the
value of each element of a and r is the value of the corresponding element of the
result value:
| (r-1/x) / (1/x) | ≤ 1/256
Special operands
Special operands are handled as follows:
Operand
Estimate
Exception
-Infinity
-0
None
-0
58
-Infinity
1
1
ZX
ZX
+0
+Infinity
+Infinity
+0
None
SNaN
QNaN2
VXSNAN
QNaN
QNaN
None
Blue Gene/Q vector data type for C/C++
Operand
Estimate
Exception
1. No result if FPSCRZE = 1.
2. No result if FPSCRVE = 1.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
1
1
1
1
/
/
/
/
a[0]
a[1]
a[2]
a[3]
Example
a = (2.0, 4.0, 5.0,
8.0)
d: (0.5, 0.25, 0.2, 0.125)
vec_res
Purpose
Returns a vector containing estimates of the reciprocals of the corresponding
elements of the given vector.
Syntax
d=vec_res(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
The double-precision elements of a are first truncated to single-precision values. An
estimate of the reciprocal of each single-precision element of a is then converted to
double precision and saved in the corresponding element of the result.
Note:
The precision guarantee is specified by the following expression, where x is the
value of each element of a and r is the value of the corresponding element of the
result value:
| (r-1/x) / (1/x) | ≤ 1/256
Special operands
Special operands are handled as follows:
Operand
Estimate
Exception
-Infinity
-0
None
Chapter 10. Vector built-in functions
59
Operand
Estimate
-0
-Infinity
+0
+Infinity
+Infinity
+0
SNaN
QNaN
QNaN
QNaN
Exception
1
ZX
1
ZX
None
2
VXSNAN
None
1. No result if FPSCRZE = 1.
2. No result if FPSCRVE = 1.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(double)
(double)
(double)
(double)
(1
(1
(1
(1
/
/
/
/
(float)
(float)
(float)
(float)
a[0])
a[1])
a[2])
a[3])
Example
a = (2.0, 4.0, 5.0,
8.0)
d: (0.5, 0.25, 0.2, 0.125)
vec_rsqrte
Purpose
Returns a vector containing estimates of the reciprocal square roots of the
corresponding elements of the given vector.
Syntax
d=vec_rsqrte(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of the result contains the estimated value of the reciprocal square
root of the corresponding element of a.
Note:
The precision guarantee is specified by the following expression, where x is the
value of each element of a and r is the value of the corresponding element of the
result value:
| (r-1/'x) / 1/'x | ≤ 1/32
60
Blue Gene/Q vector data type for C/C++
Special operands
Special operands are handled as follows:
Operand
Estimate
-Infinity
<0
-0
QNaN
2
QNaN
2
-Infinity
VXSQRT
VXSQRT
1
+0
+Infinity
+Infinity
+0
SNaN
QNaN
QNaN
QNaN
Exception
ZX
1
ZX
None
2
VXSNAN
None
1. No result if FPSCRZE = 1.
2. No result if FPSCRVE = 1.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
1
1
1
1
/
/
/
/
'a[0]
'a[1]
'a[2]
'a[3]
Example
a = (4.0, 16.0, 25.0, 64.0)
d: (0.5, 0.25, 0.2, 0.125)
vec_rsqrtes
Purpose
Returns a vector containing estimates of the reciprocal square roots of the
corresponding elements of the given vector.
Syntax
d=vec_rsqrtes(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
The double-precision elements of a are first truncated to single-precision values. An
estimate of the reciprocal square root of each single-precision element of a is then
converted to double precision and saved in the corresponding element of the
result.
Note:
Chapter 10. Vector built-in functions
61
The precision guarantee is specified by the following expression, where x is the
value of each element of a and r is the value of the corresponding element of the
result value:
| (r-1/'x) / 1/'x | ≤ 1/32
Special operands
Special operands are handled as follows:
Operand
Estimate
-Infinity
<0
-0
QNaN
2
QNaN
2
-Infinity
Exception
VXSQRT
VXSQRT
1
+0
+Infinity
+Infinity
+0
ZX
1
ZX
None
SNaN
QNaN
QNaN
QNaN
2
VXSNAN
None
1. No result if FPSCRZE = 1.
2. No result if FPSCRVE = 1.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(double)
(double)
(double)
(double)
(1
(1
(1
(1
/
/
/
/
'
'
'
'
(float)
(float)
(float)
(float)
a[0])
a[1])
a[2])
a[3])
Example
a = (4.0, 16.0, 25.0, 64.0)
d: (0.5, 0.25, 0.2, 0.125)
vec_swsqrt, vec_swsqrt_nochk
Purpose
Returns a vector containing the square root of each element in the given vector.
Syntax
d=vec_swsqrt(a)
d=vec_swsqrt_nochk(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
62
d
a
vector4double
vector4double
Blue Gene/Q vector data type for C/C++
For vec_swsqrt_nochk, the compiler does not check the validity of the arguments.
You must ensure that the following condition is satisfied where x represents each
element of a:
v 2-969 <= x < Infinity
Result value
The result value is a quad vector that contains the square root of each element of a.
When the following options are used, the result is bitwise identical to the IEEE
square root.
v -qstrict=precision
v -qstrict=ieeefp
v -qstrict=zerosigns
v -qstrict=operationprecision
Otherwise, the result might differ slightly from the IEEE square root.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
'a[0]
'a[1]
'a[2]
'a[3]
Example
a = ( 4.0, 9.0, 16.0, 25.0)
d: ( 2.0, 3.0,
4.0,
5.0)
vec_swsqrts, vec_swsqrts_nochk
Purpose
Returns a vector containing estimates of the square roots of the corresponding
elements of the given vector.
Syntax
d=vec_swsqrts(a)
d=vec_swsqrts_nochk(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
For vec_swsqrts_nochk, the compiler does not check the validity of the arguments.
You must ensure that the following condition is satisfied where x represents each
element of a:
v 2-102 <= x < Infinity
Chapter 10. Vector built-in functions
63
Result value
The double-precision elements of a are first truncated to single-precision values.
The square root of each single-precision element of a is then converted to
double-precision and saved in the corresponding element of the result.
When the following options are used, the result is bitwise identical to the IEEE
square root.
v -qstrict=precision
v -qstrict=ieeefp
v -qstrict=zerosigns
v -qstrict=operationprecision
Otherwise, the result might differ slightly from the IEEE square root.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(double)
(double)
(double)
(double)
'
'
'
'
((float)
((float)
((float)
((float)
a[0])
a[1])
a[2])
a[3])
Example
a = ( 4.0, 9.0, 16.0, 25.0)
d: ( 2.0, 3.0,
4.0,
5.0)
Binary arithmetic functions
This section provides a reference to the quad vector binary arithmetic functions.
vec_add
Purpose
Returns a vector containing the sums of each set of corresponding elements of the
given vectors.
Syntax
d=vec_add(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the sum of the corresponding elements
of a and b.
64
Blue Gene/Q vector data type for C/C++
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
+
+
+
+
b[0]
b[1]
b[2]
b[3]
Example
a = (10.0, 20.0, 30.0, 40.0)
b = (50.0, 60.0, 70.0, 80.0)
d: (60.0, 80.0, 100.0, 120.0)
vec_cpsgn
Purpose
Returns a vector by copying the sign of the elements in vector a to the sign of the
corresponding elements in vector b.
Syntax
d=vec_cpsgn(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are obtained by copying the sign of the
elements in a to the sign of the corresponding elements in b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(double)
(double)
(double)
(double)
{
{
{
{
sign(a[0]),
sign(a[1]),
sign(a[2]),
sign(a[3]),
mantissa(b[0]),
mantissa(b[1]),
mantissa(b[2]),
mantissa(b[3]),
exponent(b[0])
exponent(b[1])
exponent(b[2])
exponent(b[3])
}
}
}
}
Example
a = (
-1.0,
2.0,
-3.0,
4.0)
b = ( 1.5e10, 2.5e15, 3.5e20, 4.5e25)
d: (-1.5e10, 2.5e15, -3.5e20, 4.5e25)
vec_mul
Purpose
Returns a vector containing the results of performing a multiply operation using
the given vectors.
Syntax
d=vec_mul(a, b)
Chapter 10. Vector built-in functions
65
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are obtained by multiplying the elements
of a and the corresponding elements of b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
×
×
×
×
b[0]
b[1]
b[2]
b[3]
Example
a =
(10.0,
20.0, 30.0,
40.0)
b =
(50.0,
60.0, 70.0,
80.0)
d: (500.0, 1200.0, 2100.0, 3200.0)
vec_sub
Purpose
Returns a vector containing the result of subtracting each element of b from the
corresponding element of a.
Syntax
d=vec_sub(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of subtracting the value of the
corresponding element of b from the value of the corresponding element of a.
Formula
d[0]
d[1]
d[2]
d[3]
66
=
=
=
=
a[0]
a[1]
a[2]
a[3]
-
b[0]
b[1]
b[2]
b[3]
Blue Gene/Q vector data type for C/C++
Example
a = (50.0, 60.0, 70.0, 80.0)
b = (10.0, 20.0, 30.0, 40.0)
d: (40.0, 40.0, 40.0, 40.0)
vec_swdiv, vec_swdiv_nochk
Purpose
Returns a vector containing the result of dividing each element of a by the
corresponding element of b.
Syntax
d=vec_swdiv(a, b)
d=vec_swdiv_nochk(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
For vec_swdiv_nochk, the compiler does not check the validity of the arguments.
You must ensure that the following conditions are satisfied where x represents each
element of a and y represents the corresponding element of b:
v 2-1021 ≤ |y| ≤ 21020
v If x ≠ 0.0
2-969 ≤ |x| < Infinity
2-1020 ≤ |x / y| ≤ 21022
Result value
The values of the elements of the result are obtained by dividing the elements of a
by the corresponding elements of b.
When the following options are used, the result is bitwise identical to the IEEE
division.
v -qstrict=precision
v -qstrict=ieeefp
v -qstrict=zerosigns
v -qstrict=operationprecision
Otherwise, the result might differ slightly from the IEEE division.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
/
/
/
/
b[0]
b[1]
b[2]
b[3]
Chapter 10. Vector built-in functions
67
Example
a = (50.0, 1.0, 30.0, 40.0)
b = (10.0, 5.0, -1.0, 80.0)
d: ( 5.0, 0.2, -30.0,
0.5)
vec_swdivs, vec_swdivs_nochk
Purpose
Returns a vector containing the result of dividing each element of a by the
corresponding element of b.
Syntax
d=vec_swdivs(a, b)
d=vec_swdivs_nochk(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
For vec_swdivs_nochk, the compiler does not check the validity of the arguments.
You must ensure that the following conditions are satisfied where x represents each
element of a and y represents the corresponding element of b:
v 2-125 ≤ |y| ≤ 2124
v If x ≠ 0
2-102 ≤ |x| < Infinity
2-124 ≤ |x / y| ≤ 2126
Result value
The double-precision elements of a and b are first truncated to single-precision
values. The result of dividing the single-precision elements of a by the
corresponding single-precision elements of b is then converted to double precision
and saved in the corresponding elements of the result.
When the following options are used, the result is bitwise identical to the IEEE
division.
v -qstrict=precision
v -qstrict=ieeefp
v -qstrict=zerosigns
v -qstrict=operationprecision
Otherwise, the result might differ slightly from the IEEE division.
Formula
d[0]
d[1]
d[2]
d[3]
68
=
=
=
=
(double)
(double)
(double)
(double)
Blue Gene/Q vector data type for C/C++
(
(
(
(
(float)
(float)
(float)
(float)
a[0]
a[1]
a[2]
a[3]
/
/
/
/
(float)
(float)
(float)
(float)
b[0]
b[1]
b[2]
b[3]
)
)
)
)
Example
a = (50.0, 1.0, 30.0, 40.0)
b = (10.0, 5.0, -1.0, 80.0)
d: ( 5.0, 0.2, -30.0,
0.5)
vec_xmul
Purpose
Returns a vector containing the result of cross multiplying the first and the third
elements of a by the elements of b.
Syntax
d=vec_xmul(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are obtained by cross multiplying the first
and the third elements of a by the elements of b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[0]
a[2]
a[2]
×
×
×
×
b[0]
b[1]
b[2]
b[3]
Example
a =
(10.0,
0.0,
30.0,
0.0)
b =
(50.0, 60.0,
70.0,
80.0)
d: (500.0, 600.0, 2100.0, 2400.0)
Multiply-add functions
This section provides a reference to the quad vector multiply-add functions.
vec_madd
Purpose
Returns a vector containing the results of performing a fused multiply-add
operation for each corresponding set of elements of the given vectors.
Syntax
d=vec_madd(a, b, c)
Chapter 10. Vector built-in functions
69
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the product of the values of the
corresponding elements of a and b, added to the value of the corresponding
element of c.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(
(
(
(
a[0]
a[1]
a[2]
a[3]
×
×
×
×
b[0]
b[1]
b[2]
b[3]
)
)
)
)
+
+
+
+
c[0]
c[1]
c[2]
c[3]
Example
a = (10.0, 10.0, 10.0, 10.0)
b = ( 1.0, 2.0, 3.0, 4.0)
c = (20.0, 20.0, 20.0, 20.0)
d: (30.0, 40.0, 50.0, 60.0)
vec_msub
Purpose
Returns a vector containing the results of performing a multiply-subtract operation
using the given vectors.
Syntax
d=vec_msub(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are the product of the values of the
corresponding elements of a and b, minus the values of the corresponding
elements of c.
70
Blue Gene/Q vector data type for C/C++
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(
(
(
(
×
×
×
×
a[0]
a[1]
a[2]
a[3]
b[0]
b[1]
b[2]
b[3]
)
)
)
)
-
c[0]
c[1]
c[2]
c[3]
Example
a = ( 10.0, 10.0, 10.0, 10.0)
b = ( 1.0, 2.0, 3.0, 4.0)
c = ( 20.0, 20.0, 20.0, 20.0)
d: (-10.0, 0.0, 10.0, 20.0)
vec_nmadd
Purpose
Returns a vector containing the results of performing a negative multiply-add
operation on the given vectors.
Syntax
d=vec_nmadd(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the product of the corresponding
elements of a and b, added to the corresponding elements of c, and then
multiplied by -1.0.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-
(
(
(
(
(
(
(
(
a[0]
a[1]
a[2]
a[3]
×
×
×
×
b[0]
b[1]
b[2]
b[3]
)
)
)
)
+
+
+
+
c[0]
c[1]
c[2]
c[3]
)
)
)
)
Example
a = ( 10.0, 10.0, 10.0, 10.0)
b = ( 1.0,
2.0,
3.0,
4.0)
c = ( 20.0, 20.0, 20.0, 20.0)
d: (-30.0, -40.0, -50.0, -60.0)
vec_nmsub
Purpose
Returns a vector containing the results of performing a negative multiply-subtract
operation on the given vectors.
Chapter 10. Vector built-in functions
71
Syntax
d=vec_nmsub(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the product of the corresponding
elements of a and b, subtracted from the corresponding element of c.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-
(
(
(
(
(
(
(
(
a[0]
a[1]
a[2]
a[3]
×
×
×
×
b[0]
b[1]
b[2]
b[3]
)
)
)
)
-
c[0]
c[1]
c[2]
c[3]
)
)
)
)
Example
a = (10.0, 10.0, 10.0, 10.0)
b = ( 1.0, 2.0,
3.0,
4.0)
c = (20.0, 20.0, 20.0, 20.0)
d: (10.0, 0.0, -10.0, -20.0)
vec_xmadd
Purpose
Returns a vector containing the results of performing a fused cross multiply-add
operation for each corresponding set of elements of the given vectors.
Syntax
d=vec_xmadd(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are the product of the values of the first
and the third elements of a and the elements of b, added to the values of the
corresponding elements of c.
72
Blue Gene/Q vector data type for C/C++
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(
(
(
(
a[0]
a[0]
a[2]
a[2]
×
×
×
×
b[0]
b[1]
b[2]
b[3]
)
)
)
)
+
+
+
+
c[0]
c[1]
c[2]
c[3]
Example
a = ( 1.0, 0.0, 3.0, 0.0)
b = ( 5.0, 10.0, 15.0, 20.0)
c = (10.0, 10.0, 10.0, 10.0)
d: (15.0, 20.0, 55.0, 70.0)
vec_xxmadd
Purpose
Returns a vector containing the results of performing a fused double cross
multiply-add operation for each corresponding set of elements of the given vectors.
Syntax
d=vec_xxmadd(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are specified in the formula.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(
(
(
(
a[1]
a[0]
a[3]
a[2]
×
×
×
×
b[1]
b[1]
b[3]
b[3]
)
)
)
)
+
+
+
+
c[0]
c[1]
c[2]
c[3]
Example
a =
b =
c =
d: (
( 1.0, 2.0, 3.0, 4.0)
( 0.0, 10.0, 0.0, 20.0)
( 10.0, 10.0, 10.0, 10.0)
30.0, 20.0, 90.0, 70.0)
vec_xxcpnmadd
Purpose
Returns a vector containing the results of performing a fused double cross
conjugate multiply/add for each corresponding set of elements of the given
vectors.
Chapter 10. Vector built-in functions
73
Syntax
d=vec_xxcpnmadd(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are specified in the formula.
Formula
d[0]
d[1]
d[2]
d[3]
=
( (
= - ( (
=
( (
= - ( (
a[1]
a[0]
a[3]
a[2]
×
×
×
×
b[1]
b[1]
b[3]
b[3]
)
)
)
)
+
+
+
+
c[0]
c[1]
c[2]
c[3]
)
)
)
)
Example
a =
b =
c =
d: (
( 1.0,
2.0, 3.0,
4.0)
( 0.0, 10.0, 0.0, 20.0)
( 10.0, 10.0, 10.0, 10.0)
30.0, -20.0, 90.0, -70.0)
vec_xxnpmadd
Purpose
Returns a vector containing the results of performing a fused double cross complex
multiply-add operation for each corresponding set of elements of the given vectors.
Syntax
d=vec_xxnpmadd(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The values of the elements of the result are specified in the formula.
Formula
d[0]
d[1]
d[2]
d[3]
74
= - ( ( a[1]
=
( ( a[0]
= - ( ( a[3]
=
( ( a[2]
Blue Gene/Q vector data type for C/C++
×
×
×
×
b[1]
b[1]
b[3]
b[3]
)
)
)
)
+
+
+
+
c[0]
c[1]
c[2]
c[3]
)
)
)
)
Example
a =
b =
c =
d: (
(
1.0, 2.0,
3.0, 4.0)
(
0.0, 10.0,
0.0, 20.0)
( 10.0, 10.0, 10.0, 10.0)
-30.0, 20.0, -90.0, 70.0)
Round functions
With the round functions, you can round the elements of quad vectors.
vec_ceil
Purpose
Returns a vector containing the smallest representable floating-point integral values
greater than or equal to the values of the corresponding elements of the given
vector.
Syntax
d=vec_ceil(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of the result contains the smallest representable floating-point
integral value greater than or equal to the value of the corresponding element of a.
Example
a = (-5.8, -2.3, 2.3, 5.8)
d: (-5.0, -2.0, 3.0, 6.0)
vec_floor
Purpose
Returns a vector containing the largest representable floating-point integral values
less than or equal to the values of the corresponding elements of the given vector.
Syntax
d=vec_floor(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
Chapter 10. Vector built-in functions
75
d
a
vector4double
vector4double
Result value
Each element of the result contains the largest representable floating-point integral
value less than or equal to the value of the corresponding element of a.
Example
a = (-5.8, -2.3, 2.3, 5.8)
d: (-6.0, -3.0, 2.0, 5.0)
vec_round
Purpose
Returns a vector containing the rounded values of the corresponding elements of
the given vector.
Syntax
d=vec_round(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of the result contains the value of the corresponding element of a,
rounded to the nearest representable floating-point integer.
Formula
For each element of a:
If a[n] <0, d[n] = (a[n] – 0.5), truncated to the nearest integral value.
If a[n] >0, d[n] = (a[n] + 0.5), truncated to the nearest integral value.
If a[n] EQ 0, d[n] = 0.
Note: EQ is the equal operator.
Example
ARG1 = (-5.8, -2.3, 2.3, 5.8)
Result: (-6.0, -2.0, 2.0, 6.0)
vec_rsp
Purpose
Returns a vector containing the single-precision values of the corresponding
elements of the given vector.
76
Blue Gene/Q vector data type for C/C++
Syntax
d=vec_rsp(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
The value of each element of the result contains the single-precision value of the
corresponding element of a.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(double)
(double)
(double)
(double)
(
(
(
(
(float)
(float)
(float)
(float)
a[0]
a[1]
a[2]
a[3]
)
)
)
)
vec_trunc
Purpose
Returns a vector containing the truncated values of the corresponding elements of
the given vector.
Syntax
d=vec_trunc(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of the result contains the value of the corresponding element of a,
truncated to an integral value.
Example
a = (-5.8, -2.3, 2.3, 5.8)
d: (-5.0, -2.0, 2.0, 5.0)
Conversion functions
With the conversion functions, you can convert quad vectors to integer vectors.
Chapter 10. Vector built-in functions
77
vec_cfid
Purpose
Returns a vector of which each element is the floating point equivalent of the
64-bit signed integer in the corresponding element of a, rounded to
double-precision, using the rounding mode specified by FPSCRRN.
Syntax
d=vec_cfid(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
The value of each element of the result is the floating-point representation of the
64-bit signed integer in the corresponding element of a, rounded to
double-precision using the rounding mode specified by FPSCRRN.
Example
FPSCRRN = DFP_ROUND_TO_NEAREST_WITH_TIES_TO_EVEN
a = (
1,
-1,
2,
-2)
d: ( 1.0, -1.0, 2.0, -2.0)
Related functions
v FPSCR functions
vec_cfidu
Purpose
Returns a vector of which each element is the floating point equivalent of the
64-bit unsigned integer in the corresponding element of a, rounded to
double-precision, using the rounding mode specified by FPSCRRN.
Syntax
d=vec_cfidu(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
78
d
a
vector4double
vector4double
Blue Gene/Q vector data type for C/C++
Result value
The value of each element of the result is the floating-point representation of the
64-bit unsigned integer in the corresponding element of a, rounded to
double-precision using the rounding mode specified by FPSCRRN.
Example
FPSCRRN = DFP_ROUND_TO_NEAREST_WITH_TIES_TO_EVEN
a = ( 1, 2, 3,
4)
d: ( 1.0, 2.0, 3.0, 4.0)
Related functions
v FPSCR functions
vec_ctid
Purpose
Converts a quad vector to 64-bit signed integer values.
Syntax
d=vec_ctid(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of a is rounded to floating-point integral value according to
FPSCRRN. The corresponding element of the result vector is then set to one of the
following values:
v If the rounded value is greater than 263-1, the result is maximal long integer
(0x7FFF FFFF FFFF FFFF).
v If the rounded value is less than -263, the result is minimal long integer (0x8000
0000 0000 0000).
v Otherwise, the result is the 64-bit signed integer value equivalent to the rounded
value.
Example
FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY
a = (1.4, -2.9,
9.0e20,
-5.0e25)
d: ( 2,
-2, 0x7FFF FFFF FFFF FFFF, 0x8000 0000 0000 0000)
Related functions
v FPSCR functions
Chapter 10. Vector built-in functions
79
vec_ctidu
Purpose
Converts a quad vector to 64-bit unsigned integer values.
Syntax
d=vec_ctidu(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of a is rounded to floating-point integral value according to
FPSCRRN. The corresponding element of the result vector is then set to one of the
following values:
v If the rounded value is greater than 264-1, the result is maximal unsigned long
integer (0xFFFF FFFF FFFF FFFF).
v If the rounded value is less than 0, the result is 0 (0x0000 0000 0000 0000).
v Otherwise, the result is the 64-bit unsigned integer value equivalent to the
rounded value.
Example
FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY
a = (1.4, 1.9,
9.0e22, -5.0e25)
d: ( 2,
2, 0xFFFF FFFF FFFF FFFF,
0)
Related functions
v FPSCR functions
vec_ctidz
Purpose
Converts a quad vector to 64-bit signed integer values with rounding toward zero.
Syntax
d=vec_ctidz(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
80
d
a
vector4double
vector4double
Blue Gene/Q vector data type for C/C++
Result value
Each element of a is rounded towards zero to floating-point integral value. The
corresponding element of the result vector is then set to one of the following
values:
v If the rounded value is greater than 263-1, the result is maximal long integer
(0x7FFF FFFF FFFF FFFF).
v If the rounded value is less than -263, the result is minimal long integer (0x8000
0000 0000 0000).
v Otherwise, the result is the 64-bit signed integer value equivalent to the rounded
value.
Example
a = (1.6, -1.9,
9.0e20,
-5.0e25)
d: ( 1,
-1, 0x7FFF FFFF FFFF FFFF , 0x8000 0000 0000 0000)
vec_ctiduz
Purpose
Converts a quad vector to 64-bit unsigned integer values with rounding toward
zero.
Syntax
d=vec_ctiduz(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of a is rounded towards to zero to floating-point integral value. The
corresponding element of the result vector is then set to one of the following
values:
v If the rounded value is greater than 264-1, the result is maximal unsigned long
integer (0xFFFF FFFF FFFF FFFF).
v If the rounded value is less than 0, the result is 0 (0x0000 0000 0000 0000).
v Otherwise, the result is the 64-bit unsigned integer value equivalent to the
rounded value.
Example
a = (1.6, -8.8,
9.0e22, -5.0e25)
d: ( 1,
0, 0xFFFF FFFF FFFF FFFF,
0)
Chapter 10. Vector built-in functions
81
vec_ctiw
Purpose
Converts a quad vector to 32-bit signed integer values.
Syntax
d=vec_ctiw(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of a is rounded to floating-point integral value according to
FPSCRRN. The four low-order bytes of the corresponding element of the result
vector then contain one of the following values:
v If the rounded value is greater than 231-1, the result is maximal integer (0x7FFF
FFFF).
v If the rounded value is less than -231, the result is minimal integer (0x8000 0000).
v Otherwise, the result is the 32-bit signed integer value equivalent to the rounded
value.
Example
FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY
a = (1.4, -2.9,
9.0e11,
-5.0e12)
d: ( 2,
-2, 0x7FFF FFFF, 0x8000 0000)
Related functions
v FPSCR functions
vec_ctiwu
Purpose
Converts a quad vector to 32-bit unsigned integer values.
Syntax
d=vec_ctiwu(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
82
d
a
vector4double
vector4double
Blue Gene/Q vector data type for C/C++
Result value
Each element of a is rounded to floating-point integral value according to
FPSCRRN. The four low-order bytes of the corresponding element of the result
vector then contain one of the following values:
v If the rounded value is greater than 232-1, the result is maximal unsigned integer
(0xFFFF FFFF).
v If the rounded value is less than 0, the result is 0 (0x0000 0000).
v Otherwise, the result is the 32-bit unsigned integer value equivalent to the
rounded value.
Example
FPSCRRN = DFP_ROUND_TOWARD_POSITIVE_INFINITY
a = (1.4, 1.9,
9.0e11, -5.0e12)
d: ( 2,
2, 0xFFFF FFFF,
0)
Related functions
v FPSCR functions
vec_ctiwz
Purpose
Converts a quad vector to 32-bit signed integer values with rounding toward zero.
Syntax
d=vec_ctiwz(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of a is rounded towards zero to floating-point integral value. The
four low-order bytes of the corresponding element of the result vector then contain
one of the following values:
v If the rounded value is greater than 231-1, the result is maximal integer (0x7FFF
FFFF).
v If the rounded value is less than -231, the result is minimal integer (0x8000 0000).
v Otherwise, the result is the 32-bit signed integer value equivalent to the rounded
value.
Example
a = (1.6, -1.9,
9.0e11,
-5.0e12)
d: ( 1,
-1, 0x7FFF FFFF, 0x8000 0000)
Chapter 10. Vector built-in functions
83
vec_ctiwuz
Purpose
Converts a quad vector to 32-bit unsigned integer values with rounding toward
zero.
Syntax
d=vec_ctiwuz(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Result value
Each element of a is rounded towards zero to floating-point integral value. The
four low-order bytes of the corresponding element of the result vector then contain
one of the following values:
v If the rounded value is greater than 232-1, the result is maximal unsigned integer
(0xFFFF FFFF).
v If the rounded value is less than 0, the result is 0 (0x0000 0000).
v Otherwise, the result is the 32-bit unsigned integer value equivalent to the
rounded value.
Example
a = (1.6, -1.9,
9.0e11, -5.0e12)
d: ( 1,
0, 0xFFFF FFFF,
0)
Comparison functions
With the comparison functions, you can compare quad vectors.
In the result values, floating-point boolean values are as follows:
v True is 1.0.
v False is -1.0.
vec_cmpgt
Purpose
Returns a vector containing the results of a greater-than comparison between each
set of corresponding elements of the given vectors.
Syntax
d=vec_cmpgt(a, b)
84
Blue Gene/Q vector data type for C/C++
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is 1.0 if the corresponding element of a is
greater than the corresponding element of b. Otherwise, the value is -1.0.
Formula
If
If
If
If
(a[0]
(a[1]
(a[2]
(a[3]
>
>
>
>
b[0])
b[1])
b[2])
b[3])
Then
Then
Then
Then
d[0]
d[1]
d[2]
d[3]
=
=
=
=
1.0
1.0
1.0
1.0
Else
Else
Else
Else
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-1.0
-1.0
-1.0
-1.0
Example
a = (10.0, 20.0, 30.0, -40.0)
b = (20.0, -10.0, 10.0, 80.0)
d: (-1.0,
1.0, 1.0, -1.0)
vec_cmplt
Purpose
Returns a vector containing the results of a less-than comparison between each set
of corresponding elements of the given vectors.
Syntax
d=vec_cmplt(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is 1.0 if the corresponding element of a is
less than the corresponding element of b. Otherwise, the value is -1.0.
Formula
If
If
If
If
(a[0]
(a[1]
(a[2]
(a[3]
<
<
<
<
b[0])
b[1])
b[2])
b[3])
Then
Then
Then
Then
d[0]
d[1]
d[2]
d[3]
=
=
=
=
1.0
1.0
1.0
1.0
Else
Else
Else
Else
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-1.0
-1.0
-1.0
-1.0
Chapter 10. Vector built-in functions
85
Example
a = (20.0, -10.0, 10.0, 80.0)
b = (10.0, 20.0, 30.0, -40.0)
d: (-1.0,
1.0, 1.0, -1.0)
vec_cmpeq
Purpose
Returns a vector containing the results of comparing each set of corresponding
elements of the given vectors for equality.
Syntax
d=vec_cmpeq(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is 1.0 if the corresponding element of a is
equal to the corresponding element of b. Otherwise, the value is -1.0.
Formula
If
If
If
If
(a[0]
(a[1]
(a[2]
(a[3]
EQ
EQ
EQ
EQ
b[0])
b[1])
b[2])
b[3])
Then
Then
Then
Then
d[0]
d[1]
d[2]
d[3]
=
=
=
=
1.0
1.0
1.0
1.0
Else
Else
Else
Else
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-1.0
-1.0
-1.0
-1.0
Note: EQ is the equal operator.
Example
a = (10.0, -10.0, -10.0, 80.0)
b = (10.0, 20.0, -10.0, -40.0)
d: ( 1.0, -1.0,
1.0, -1.0)
vec_sel
Purpose
Returns a vector containing the value of either a or b depending on the value of c.
Syntax
d=vec_sel(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
86
Blue Gene/Q vector data type for C/C++
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The value of each element of the result is equal to the corresponding element of b
if the corresponding element of c is greater than or equal to zero (regardless of
sign), or the value is equal to the corresponding element of a if the corresponding
element of c is less than zero or NaN.
Formula
If
If
If
If
(c[0]
(c[1]
(c[2]
(c[3]
≥
≥
≥
≥
0)
0)
0)
0)
Then
Then
Then
Then
d[0]
d[1]
d[2]
d[3]
=
=
=
=
b[0]
b[1]
b[2]
b[3]
Else
Else
Else
Else
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
Example
a = (20.0, 20.0, 20.0, 20.0)
b = (10.0, 10.0, 10.0, 10.0)
c = ( 1.0, -1.0, 2.5, -2.5)
d: (10.0, 20.0, 10.0, 20.0)
vec_tstnan
Purpose
Returns a vector whose elements depend on if the value of the corresponding
element of a or b is NaN.
Syntax
d=vec_tstnan(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is 1.0 if the corresponding element of a or b
is a NaN, otherwise the value is -1.0.
Formula
If
If
If
If
((a[0]
((a[1]
((a[2]
((a[3]
EQ
EQ
EQ
EQ
NaN)
NaN)
NaN)
NaN)
or
or
or
or
(b[0]
(b[1]
(b[2]
(b[3]
EQ
EQ
EQ
EQ
NaN))
NaN))
NaN))
NaN))
Then
Then
Then
Then
d[0]
d[1]
d[2]
d[3]
=
=
=
=
1.0
1.0
1.0
1.0
Else
Else
Else
Else
d[0]
d[1]
d[2]
d[3]
=
=
=
=
-1.0
-1.0
-1.0
-1.0
Note: EQ is the equal operator.
Chapter 10. Vector built-in functions
87
Example
a = (10.0, 20.0, NaN, 40.0)
b = (50.0, NaN, 70.0, 80.0)
d: (-1.0, 1.0, 1.0, -1.0)
Element manipulation functions
With the element manipulation functions, you can manipulate vectors at the
element level. For example, you can permute elements.
vec_extract
Purpose
Returns the value of element a from the vector b.
Syntax
d=vec_extract(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
double
vector4double
int
Result value
This function uses the modulo arithmetic on b to determine the element number.
For example, if b is out of range, the compiler uses b modulo the number of
elements in the vector to determine the element position.
Formula
d = a[b MOD 4]
Note: MOD is the modulo operator.
Example
a = (10.0, 20.0, 30.0, 40.0)
b = 1
d: 20.0
vec_insert
Purpose
Returns a copy of the vector b with the value of its element c replaced by a.
Syntax
d=vec_insert(a, b, c)
88
Blue Gene/Q vector data type for C/C++
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
double
vector4double
int
Result value
This function uses the modulo arithmetic on c to determine the element number.
For example, if c is out of range, the compiler uses c modulo the number of
elements in the vector to determine the element position.
Formula
If
If
If
If
((c
((c
((c
((c
MOD
MOD
MOD
MOD
4)
4)
4)
4)
EQ
EQ
EQ
EQ
0)
1)
2)
3)
Then
Then
Then
Then
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a
a
a
a
Else
Else
Else
Else
d[0]
d[1]
d[2]
d[3]
=
=
=
=
b[0]
b[1]
b[2]
b[3]
Notes:
v MOD is the modulo operator.
v EQ is the equal operator.
Example
a = 50.0
b = (10.0, 20.0, 30.0, 40.0)
c = 1
d: (10.0, 50.0, 30.0, 40.0)
vec_gpci
Purpose
Returns a vector containing the results of dispersing the 12-bit literal a to be used
as control value for a permute instruction.
Syntax
d=vec_gpci(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
int, a value in 0x000 - 0xFFF
Result value
The value of each element of the result has a sign bit set to 0, an exponent set to
0x400, and a mantissa where bits 0:2 are taken from the 12-bit literal a as shown in
the formula.
Chapter 10. Vector built-in functions
89
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
(double)
(double)
(double)
(double)
{sign
{sign
{sign
{sign
=
=
=
=
0,
0,
0,
0,
mantissa0:2
mantissa0:2
mantissa0:2
mantissa0:2
=
=
=
=
a0:2, exponent = 0x400}
a3:5, exponent = 0x400}
a6:8, exponent = 0x400}
a9:11, exponent = 0x400}
Example
Shifting the elements of a given vector to the left by one step and rotate around
requires the pattern 1–2–3–0. It can be obtained by the following code:
pattern = vec_gpci(0x298);
v = vec_perm(v,v,pattern);
Fortran:
pattern = vec_gpci(Z’298’)
v = vec_perm(v,v,pattern)
With the pattern 1–2–3–0, the vector
(0.0, 1.0, 2.0, 3.0)
becomes
(1.0, 2.0, 3.0, 0.0).
vec_lvsl
Purpose
Returns a vector useful for aligning non-aligned data.
Syntax
d=vec_lvsl(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
double*
_Complex double*
float*
_Complex float*
Result value
The result value is a quad vector. The elements of the quad vector are generated in
the following ways:
v Sign: 0
v Mantissa:
1. For the first element, the mantissa is the result of following operations:
– If b is a pointer to a double-precision floating-point value or complex
value:
a. Add a and b.
90
Blue Gene/Q vector data type for C/C++
b. Mask the result of the previous step with 0b11000.
c. Take the integer value of bits 58 - 60 from the result of the previous
step.
– If b is a pointer to a single-precision floating-point value or complex
value:
a. Add a and b.
b. Multiply the result of the previous step by two.
c. Mask the result of the previous step with 0b11000.
d. Take the integer value of bits 58 - 60 from the result of the previous
step.
2. The mantissa is incremented by one for each subsequent element.
The mantissa is seen as a 3-bit value for the increment operation. That is,
incrementing 0b111 produces 0b000.
v Exponent: 0x400
You can use the result as an argument of the vec_perm function.
Formula
The following formula is applicable if b is a pointer to a double-precision
floating-point value or complex value:
EA = a + b
AA = EA AND 0b11000
Offset = AA58:60
d[0] = (double) {sign
d[1] = (double) {sign
d[2] = (double) {sign
d[3] = (double) {sign
=
=
=
=
0,
0,
0,
0,
mantissa
mantissa
mantissa
mantissa
= Offset
,
= (Offset+1) AND 0b111,
= (Offset+2) AND 0b111,
= (Offset+3) AND 0b111,
exponent
exponent
exponent
exponent
=
=
=
=
0x400}
0x400}
0x400}
0x400}
The following formula is applicable if b is a pointer to a single-precision
floating-point value or complex value:
EA = a + b
AA = (EA × 2) AND 0b11000
Offset = AA58:60
d[0] = (double) {sign = 0,
d[1] = (double) {sign = 0,
d[2] = (double) {sign = 0,
d[3] = (double) {sign = 0,
mantissa
mantissa
mantissa
mantissa
= Offset
,
= (Offset+1) AND 0b111,
= (Offset+2) AND 0b111,
= (Offset+3) AND 0b111,
exponent
exponent
exponent
exponent
=
=
=
=
0x400}
0x400}
0x400}
0x400}
Note:
v AND is the bitwise AND operator.
Example: Loading 8-byte aligned vectors
// my_array is an array of the double type
vector4double v, v1, v2, vp;
v1 = vec_ld(0,my_array)
// Load the left part of the vector
v2 = vec_ld(32,my_array)
// Load the right part of the vector
vp = vec_lvsl(0,my_array) // Generate control value
v = vec_perm(v1,v2,vp)
// Generate the aligned vector
Example: Loading 4-byte aligned vectors
// my_array is an array of the float type
vector4double v, v1, v2, vp;
v1 = vec_ld(0,my_array)
// Load the left part of the vector
Chapter 10. Vector built-in functions
91
v2 = vec_ld(16,my_array)
vp = vec_lvsl(0,my_array)
v = vec_perm(v1,v2,vp)
// Load the right part of the vector
// Generate control value
// Generate the aligned vector
vec_lvsr
Purpose
Returns a vector useful for aligning non-aligned data.
Syntax
d=vec_lvsr(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
long
double*
_Complex double*
float*
_Complex float*
Result value
The result value is a quad vector. The elements of the quad vector are generated in
the following ways:
v Sign: 0
v Mantissa:
1. For the first element, the mantissa is the result of following operations:
– If b is a pointer to a double-precision floating-point value or complex
value:
a. Add a and b.
b. Mask the result of the previous step with 0b11000.
c. Subtract the result of the previous step from 32.
d. Take the integer value of bits 58 - 60 from the result of the previous
step.
– If b is a pointer to a single-precision floating-point value or complex
value:
a. Add a and b.
b. Mask the result of the previous step with 0b1100.
c. Subtract the result of the previous step from 16.
d. Take the integer value of bits 59 - 61 from the result of the previous
step.
2. The mantissa is incremented by one for each subsequent element.
The mantissa is seen as a 3-bit value for the increment operation. That is,
incrementing 0b111 produces 0b000.
v Exponent: 0x400
92
Blue Gene/Q vector data type for C/C++
You can use the result as an argument of the vec_perm function.
Formula
The following formula is applicable if b is a pointer to a double-precision
floating-point value or complex value:
EA = a + b
AA = 32 – (EA AND 0b11000)
Offset = AA58:60
d[0] = (double) {sign = 0,
d[1] = (double) {sign = 0,
d[2] = (double) {sign = 0,
d[3] = (double) {sign = 0,
mantissa
mantissa
mantissa
mantissa
= Offset
,
= (Offset+1) AND 0b111,
= (Offset+2) AND 0b111,
= (Offset+3) AND 0b111,
exponent
exponent
exponent
exponent
=
=
=
=
0x400}
0x400}
0x400}
0x400}
The following formula is applicable if b is a pointer to a single-precision
floating-point value or complex value:
EA = a + b
AA = 16 – (EA AND 0b1100)
Offset = AA59:61
d[0] = (double) {sign = 0,
d[1] = (double) {sign = 0,
d[2] = (double) {sign = 0,
d[3] = (double) {sign = 0,
mantissa
mantissa
mantissa
mantissa
= Offset
,
= (Offset+1) AND 0b111,
= (Offset+2) AND 0b111,
= (Offset+3) AND 0b111,
exponent
exponent
exponent
exponent
=
=
=
=
0x400}
0x400}
0x400}
0x400}
Note:
v AND is the bitwise AND operator.
Example: Storing 8-byte aligned vectors
void my_vec_store(vector4double v, double *arr)
{
vector4double v1, v2, v3, p, m1, m2, m3;
/* generate insert masks */
p = vec_lvsr(0,arr);
m1 = vec_cmplt(p,p); /* generate vector of all FALSE */
m2 = vec_neg(m1);
/* generate vector of all TRUE */
m3 = vec_perm(m1,m2,p);
/* get existing data */
v1 = vec_ld(0,arr);
v2 = vec_ld(0,arr+4);
/* permute and insert */
v3 = vec_perm(v,v,p);
v1 = vec_sel(v1,v3,m3);
v2 = vec_sel(v3,v2,m3);
/* store data back */
vec_st(0,arr,v1);
vec_st(0,arr+4,v2);
}
Example: Storing 4-byte aligned vectors
void my_vec_store(vector4double v, float *arr)
{
vector4double v1, v2, v3, p, m1, m2, m3
/* generate insert masks */
p = vec_lvsr(0,arr);
m1 = vec_cmplt(p,p); /* generate vector of all FALSE */
m2 = vec_neg(m1);
/* generate vector of all TRUE */
m3 = vec_perm(m1,m2,p);
/* get existing data */
v1 = vec_ld(0,arr);
v2 = vec_ld(0,arr+4);
/* permute and insert */
v3 = vec_perm(v,v,p);
Chapter 10. Vector built-in functions
93
v1 = vec_sel(v1,v3,m3);
v2 = vec_sel(v3,v2,m3);
/* store data back */
vec_st(0,arr,v1);
vec_st(0,arr+4,v2);
}
vec_perm
Purpose
Returns a vector that contains some elements of two vectors, in the order specified
by a third vector.
Syntax
d=vec_perm(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the element of the concatenation of a and
b that is specified by bits 0:2 of the mantissa of the corresponding element of c.
Each element of c must have an exponent equal to 0x400, or the corresponding
element of the result is undefined.
Note: The following functions generate control values that can be used for c:
v “vec_gpci” on page 89
v “vec_lvsl” on page 90
v “vec_lvsr” on page 92
Formula
Concat = ( a[0], a[1], a[2], a[3],
b[0], b[1], b[2], b[3] )
d[0] = Concat[Mantissa02(c[0])]
d[1] = Concat[Mantissa02(c[1])]
d[2] = Concat[Mantissa02(c[2])]
d[3] = Concat[Mantissa02(c[3])]
Note:
Mantissa02 is a function that returns the integer that is equivalent to the bits 0:2
of the mantissa of its argument.
Example
If a = (10.0, 20.0, 30.0, 40.0), b = (50.0, 60.0, 70.0, 80.0), and the
mantissas of the elements of c = (2,3,4,5), the result value is (30.0, 40.0, 50.0,
60.0).
94
Blue Gene/Q vector data type for C/C++
vec_promote
Purpose
Returns a vector with a in element position b.
Syntax
d=vec_promote(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
double
int
Result value
The result is a vector with a in element position b. This function uses modulo
arithmetic on b to determine the element number. For example, if b is out of range,
the compiler uses b modulo the number of elements in the vector to determine the
element position. The other elements of the vector are undefined.
Formula
d[b MOD 4] = a
Note: MOD is the modulo operator.
Example
a = 50.0
b = 1
d: ( X, 50.0, Y, Z) // X, Y, and Z are undefined values
vec_sldw
Purpose
Returns a vector by concatenating a and b, and then left-shifting the result vector
by multiples of 8 bytes. c specifies the offset for the shifting operation.
Syntax
d=vec_sldw(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
int, a value in 0 - 3
Chapter 10. Vector built-in functions
95
Result value
After left-shifting the concatenated a and b by multiples of 8 bytes specified by c,
the function takes the four leftmost 8-byte values and forms the result vector.
Formula
Concat = ( a[0], a[1], a[2], a[3],
b[0], b[1], b[2], b[3] )
d[0] = Concat[c]
d[1] = Concat[c+1]
d[2] = Concat[c+2]
d[3] = Concat[c+3]
Example
a = (10.0, 20.0, 30.0, 40.0)
b = (50.0, 60.0, 70.0, 80.0)
c = 2
d: (30.0, 40.0, 50.0, 60.0)
vec_splat
Purpose
Returns a vector that has all of its elements set to a given value.
Syntax
d=vec_splat(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
int, a value in 0 - 3
Result value
The value of each element of the result is the value of the element of a specified by
b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[b]
a[b]
a[b]
a[b]
Example
a = (10.0, 20.0, 30.0, 40.0)
b = 1
d: (20.0, 20.0, 20.0, 20.0)
96
Blue Gene/Q vector data type for C/C++
vec_splats
Purpose
Returns a vector of which the value of each element is set to a.
Syntax
d=vec_splats(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
double
Result value
The value of each element of the result is a.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a
a
a
a
Example
a = 50.0
d: (50.0, 50.0, 50.0, 50.0)
Logical functions
With the logical functions, you can perform logical operations between quad
vectors.
vec_and
Purpose
Returns a vector containing the results of performing a logical AND operation
between the given vectors.
Syntax
d=vec_and(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Chapter 10. Vector built-in functions
97
Result value
The value of each element of the result is the result of a logical AND operation
between the corresponding elements of a and b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
AND
AND
AND
AND
b[0]
b[1]
b[2]
b[3]
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: (-1.0, -1.0, -1.0, 1.0)
vec_andc
Purpose
Returns a vector containing the results of performing a logical AND operation
between a and the complement of b.
Syntax
d=vec_andc(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of a logical AND operation
between the corresponding element of a and the complement of the corresponding
element of b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
AND
AND
AND
AND
NOT
NOT
NOT
NOT
(b[0])
(b[1])
(b[2])
(b[3])
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: (-1.0, -1.0, 1.0,-1.0)
98
Blue Gene/Q vector data type for C/C++
vec_logical
Purpose
Returns a vector containing the results of performing a logical operation between a
and b, using the truth table specified by c.
Syntax
d=vec_logical(a, b, c)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
c
vector4double
vector4double
vector4double
int, a value in the
range of [0x0, 0xF]
Result value
The value of each element of the result is the result of the logical operation
between the corresponding elements of a and b, using the truth table specified by
c.
The following table shows how to read the truth table in c for the nth element of a
and b.
a[n]
b[n]
Binary result
False
False
c0
True
False
c1
False
True
c2
True
True
c3
The result value is calculated from the binary result.
Binary result
Result value
0
1.0 (True)
1
-1.0 (False)
Formula
If (a[n] < 0.0) AND (b[n] <
If (c0 EQ 0), d[n]= -1.0
Else d[n]= 1.0
If (a[n] ≥ 0.0) AND (b[n] <
If (c1 EQ 0), d[n]= -1.0
Else d[n]= 1.0
If (a[n] < 0.0) AND (b[n] ≥
If (c2 EQ 0), d[n]= -1.0
Else d[n]= 1.0
If (a[n] ≥ 0.0) AND (b[n] ≥
If (c3 EQ 0), d[n]= -1.0
Else d[n]= 1.0
0.0)
0.0)
0.0)
0.0)
Chapter 10. Vector built-in functions
99
Notes:
v EQ is the equal operator.
v In this function, NaN is considered to be less than zero.
Example
You can use the values for c from the following table to replicate some usual
logical operators.
Binary
c
Operator
0001
0x1
AND
0110
0x6
XOR
0111
0x7
OR
1000
0x8
NOR
1110
0xE
NAND
vec_nand
Purpose
Returns a vector containing the results of performing a logical NOT operation of
the result of a logical AND operation between the given vectors.
Syntax
d=vec_nand(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of a logical NOT operation of a
logical AND operation between the corresponding elements of a and b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
NOT
NOT
NOT
NOT
(a[0]
(a[1]
(a[2]
(a[3]
AND
AND
AND
AND
b[0])
b[1])
b[2])
b[3])
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: ( 1.0, 1.0, 1.0,-1.0)
100
Blue Gene/Q vector data type for C/C++
vec_nor
Purpose
Returns a vector containing the results of performing a logical NOT operation of
the result of a logical OR operation between the given vectors.
Syntax
d=vec_nor(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of a logical NOT operation of a
logical OR operation between the corresponding elements of a and b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
NOT
NOT
NOT
NOT
(a[0]
(a[1]
(a[2]
(a[3]
OR
OR
OR
OR
b[0])
b[1])
b[2])
b[3])
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: ( 1.0, -1.0, -1.0,-1.0)
vec_not
Purpose
Returns a vector containing the result of a logical NOT operation on the given
vector.
Syntax
d=vec_not(a)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
vector4double
vector4double
Chapter 10. Vector built-in functions
101
Result value
The value of each element of the result is the result of a logical NOT operation of
the corresponding element of a.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
NOT
NOT
NOT
NOT
a[0]
a[1]
a[2]
a[3]
Example
a = (-1.0, -2.0, 1.0, 2.0)
d: ( 1.0, 1.0, -1.0, -1.0)
vec_or
Purpose
Returns a vector containing the results of performing a logical OR operation
between the given vectors.
Syntax
d=vec_or(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of a logical OR operation
between the corresponding elements of a and b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
OR
OR
OR
OR
b[0]
b[1]
b[2]
b[3]
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: (-1.0, 1.0, 1.0, 1.0)
vec_orc
Purpose
Returns a vector containing the result of performing a logical OR operation
between a and the complement of b.
102
Blue Gene/Q vector data type for C/C++
Syntax
d=vec_orc(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of a logical OR operation
between the corresponding element of a and the complement of the corresponding
element of b.
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
OR
OR
OR
OR
NOT
NOT
NOT
NOT
(b[0])
(b[1])
(b[2])
(b[3])
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: ( 1.0, -1.0, 1.0, 1.0)
vec_xor
Purpose
Returns a vector containing the results of performing a logical exclusive OR
operation between the given vectors.
Syntax
d=vec_xor(a, b)
Result and argument types
The following table describes the types of the returned value and the function
arguments.
d
a
b
vector4double
vector4double
vector4double
Result value
The value of each element of the result is the result of a logical exclusive OR
between the corresponding elements of a and b.
Chapter 10. Vector built-in functions
103
Formula
d[0]
d[1]
d[2]
d[3]
=
=
=
=
a[0]
a[1]
a[2]
a[3]
XOR
XOR
XOR
XOR
b[0]
b[1]
b[2]
b[3]
Example
a = (-1.0, -1.0, 1.0, 1.0)
b = (-1.0, 1.0, -1.0, 1.0)
d: (-1.0, 1.0, 1.0, -1.0)
104
Blue Gene/Q vector data type for C/C++
Chapter 11. Using the Mathematical Acceleration Subsystem
libraries (MASS)
XL C/C++ is shipped with a set of Mathematical Acceleration Subsystem (MASS)
libraries for high-performance mathematical computing.
The MASS libraries consist of a library of scalar C/C++ functions described in
“Using the scalar library,” a set of vector libraries tuned for specific architectures
described in Using the vector libraries, and a SIMD library described in “Using the
SIMD library” on page 108. The functions contained in both scalar and vector
libraries are automatically called at certain levels of optimization, but you can also
call them explicitly in your programs. Note that the accuracy and exception
handling might not be identical in MASS functions and system library functions.
The MASS functions must run with the default rounding mode and floating-point
exception trapping settings.
When you compile programs with any of the following sets of options:
v -qhot -qignerrno -qnostrict
v -qhot -O3
v -O4
v -O5
the compiler automatically attempts to vectorize calls to system math functions by
calling the equivalent MASS vector functions (with the exceptions of functions
vdnint, vdint, vcosisin, vscosisin, vqdrt, vsqdrt, vrqdrt, vsrqdrt, vpopcnt4,
vpopcnt8, vexp2, vexp2m1, vsexp2, vsexp2m1, vlog2, vlog21p, vslog2, and vslog21p).
If it cannot vectorize, it automatically tries to call the equivalent MASS scalar
functions. For automatic vectorization or scalarization, the compiler uses versions
of the MASS functions contained in the XLOPT library libxlopt.a.
In addition to any of the preceding sets of options, when the -qipa option is in
effect, if the compiler cannot vectorize, it tries to inline the MASS scalar functions
before deciding to call them.
“Compiling and linking a program with MASS” on page 111 describes how to
compile and link a program that uses the MASS libraries, and how to selectively
use the MASS scalar library functions in conjunction with the regular system
libraries.
Related external information
Mathematical Acceleration Subsystem website, available at
http://www.ibm.com/software/awdtools/mass/
Using the scalar library
The MASS scalar library libmass.a contains an accelerated set of frequently used
math intrinsic functions that provide improved performance over the
corresponding standard system library functions. The MASS scalar functions are
used when explicitly linking libmass.a.
105
If you want to explicitly call the MASS scalar functions, you can take the following
steps:
1. Provide the prototypes for the functions (except anint, cosisin, dnint, sincos,
and rsqrt), by including math.h in your source files.
2. Provide the prototypes for anint, cosisin, dnint, sincos, and rsqrt, by
including mass.h in your source files.
3. Link the MASS scalar library libmass.a with your application. For instructions,
see “Compiling and linking a program with MASS” on page 111.
The MASS scalar functions accept double-precision parameters and return a
double-precision result, or accept single-precision parameters and return a
single-precision result, except sincos which gives 2 double-precision results. They
are summarized in Table 4.
Table 4. MASS scalar functions
Doubleprecision
function
Singleprecision
function
Description
Double-precision function
prototype
acos
acosf
Returns the arccosine of double acos (double x);
x
float acosf (float x);
acosh
acoshf
Returns the hyperbolic
arccosine of x
float acoshf (float x);
anint
Returns the rounded
integer value of x
float anint (float x);
asin
asinf
Returns the arcsine of x double asin (double x);
float asinf (float x);
asinh
asinhf
Returns the hyperbolic
arcsine of x
double asinh (double x);
float asinhf (float x);
atan2
atan2f
Returns the arctangent
of x/y
double atan2 (double x,
double y);
float atan2f (float x, float y);
atan
atanf
Returns the arctangent
of x
double atan (double x);
float atanf (float x);
atanh
atanhf
Returns the hyperbolic
arctangent of x
double atanh (double x);
float atanhf (float x);
cbrt
cbrtf
Returns the cube root
of x
double cbrt (double x);
float cbrtf (float x);
copysign
copysignf
Returns x with the sign
of y
double copysign (double
x,double y);
float copysignf (float x);
cos
cosf
Returns the cosine of x
double cos (double x);
float cosf (float x);
cosh
coshf
Returns the hyperbolic
cosine of x
double cosh (double x);
float coshf (float x);
double acosh (double x);
cosisin
Returns a complex
double_Complex cosisin
(double);
number with the real
part the cosine of x and
the imaginary part the
sine of x.
dnint
Returns the nearest
integer to x (as a
double)
double dnint (double x);
Returns the error
function of x
double erf (double x);
erf
106
erff
Blue Gene/Q vector data type for C/C++
Single-precision function
prototype
float erff (float x);
Table 4. MASS scalar functions (continued)
Doubleprecision
function
Singleprecision
function
Description
Double-precision function
prototype
Single-precision function
prototype
erfc
erfcf
Returns the
complementary error
function of x
double erfc (double x);
float erfcf (float x);
exp
expf
Returns the exponential double exp (double x);
function of x
float expf (float x);
expm1
expm1f
Returns (the
exponential function of
x) - 1
float expm1f (float x);
hypot
hypotf
Returns the square root double hypot (double x,
double y);
of x2 + y2
float hypotf (float x, float y);
lgamma
lgammaf
Returns the natural
logarithm of the
absolute value of the
Gamma function of x
double lgamma (double x);
float lgammaf (float x);
log
logf
Returns the natural
logarithm of x
double log (double x);
float logf (float x);
log10
log10f
Returns the base 10
logarithm of x
double log10 (double x);
float log10f (float x);
log1p
log1pf
Returns the natural
logarithm of (x + 1)
double log1p (double x);
float log1pf (float x);
pow
powf
Returns x raised to the
power y
double pow (double x,
double y);
float powf (float x, float y);
Returns the reciprocal
of the square root of x
double rsqrt (double x);
Returns the sine of x
double sin (double x);
Sets *s to the sine of x
and *c to the cosine of
x
void sincos (double x,
double* s, double* c);
Returns the hyperbolic
sine of x
double sinh (double x);
rsqrt
sin
sinf
sincos
sinh
sinhf
sqrt
double expm1 (double x);
float sinf (float x);
float sinhf (float x);
Returns the square root double sqrt (double x);
of x
tan
tanf
Returns the tangent of x double tan (double x);
float tanf (float x);
tanh
tanhf
Returns the hyperbolic
tangent of x
float tanhf (float x);
double tanh (double x);
Notes:
v The trigonometric functions (sin, cos, tan) return NaN (Not-a-Number) for large
arguments (where the absolute value is greater than 250pi).
v In some cases, the MASS functions are not as accurate as the libm.a library, and
they might handle edge cases differently (sqrt(Inf), for example).
v See the Mathematical Acceleration Subsystem website for accuracy comparisons with
libm.a.
Related external information
Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS)
107
Mathematical Acceleration Subsystem website, available at
http://www.ibm.com/software/awdtools/mass/
Using the SIMD library
The MASS SIMD library libmass_simd.a contains a set of frequently used math
intrinsic functions that provide improved performance over the corresponding
standard system library functions. If you want to use the MASS SIMD functions,
you can do so as follows:
1. Provide the prototypes for the functions by including mass_simd.h in your
source files.
2. Link the MASS SIMD library libmass_simd.a with your application. For
instructions, see “Compiling and linking a program with MASS” on page 111.
The single/double-precision MASS SIMD functions accept single/double-precision
arguments and return single/double-precision results. They are summarized in
Table 5.
Table 5. MASS SIMD functions
Doubleprecision
function
Singleprecision
function
Description
Double-precision function
prototype
Single-precision function
prototype
acosd4
acosf4
Computes the arc
cosine of each element
of vx.
vector4double acosd4
(vector4double vx);
vector4double acosf4
(vector4double vx);
acoshd4
acoshf4
Computes the arc
hyperbolic cosine of
each element of vx.
vector4double acoshd4
(vector4double vx);
vector4double acoshf4
(vector4double vx);
asind4
asinf4
Computes the arc sine
of each element of vx.
vector4double asind4
(vector4double vx);
vector4double asinf4
(vector4double vx);
asinhd4
asinhf4
Computes the arc
hyperbolic sine of each
element of vx.
vector4double asinhd4
(vector4double vx);
vector4double asinhf4
(vector4double vx);
atand4
atanf4
Computes the arc
vector4double atand4
tangent of each element (vector4double vx);
of vx.
vector4double atanf4
(vector4double vx);
atan2d4
atan2f4
Computes the arc
vector4double atan2d4
tangent of each element (vector4double vx,
of vy/vx.
vector4double vy);
vector4double atan2f4
(vector4double vx,
vector4double vy);
atanhd4
atanhf4
Computes the arc
hyperbolic tangent of
each element of vx.
vector4double atanhf4
(vector4double vx);
cbrtd4
cbrtf4
Computes the cube root vector4double cbrtd4
of each element of vx.
(vector4double vx);
vector4double cbrtf4
(vector4double vx);
cosd4
cosf4
Computes the cosine of vector4double cosd4
each element of vx.
(vector4double vx);
vector4double cosf4
(vector4double vx);
coshd4
coshf4
Computes the
hyperbolic cosine of
each element of vx.
108
Blue Gene/Q vector data type for C/C++
vector4double atanhd4
(vector4double vx);
vector4double coshd4
(vector4double vx);
vector4double coshf4
(vector4double vx);
Table 5. MASS SIMD functions (continued)
Doubleprecision
function
Singleprecision
function
Description
Double-precision function
prototype
cosisind4
cosisinf4
void cosisind4 (vector4double x,
Computes the cosine
vector4double *y, vector4double
and sine of each
element of x, and stores *z)
the results in y and z as
follows:
Single-precision function
prototype
void cosisinf4 (vector4double x,
vector4double *y, vector4double
*z)
cosisind2 (x,y,z) sets
y and z to {cos(x1),
sin(x1)} and
{cos(x2), sin(x2)}
where x={x1,x2}.
cosisinf4 (x,y,z) sets
y and z to {cos(x1),
sin(x1), cos(x2),
sin(x2)} and
{cos(x3), sin(x3),
cos(x4), sin(x4)}
where x={x1,x2,x3,x4}.
divd4
divf4
Computes the quotient
vx/vy.
vector4double divd4
(vector4double vx,
vector4double vy);
vector4double divf4
(vector4double vx,
vector4double vy);
erfcd4
erfcf4
Computes the
complementary error
function of each
element of vx.
vector4double erfcd4
(vector4double vx);
vector4double erfcf4
(vector4double vx);
erfd4
erff4
Computes the error
function of each
element of vx.
vector4double erfd4
(vector4double vx);
vector4double erff4
(vector4double vx);
expd4
expf4
Computes the
exponential function of
each element of vx.
vector4double expd4
(vector4double vx);
vector4double expf4
(vector4double vx);
exp2d4
exp2f4
Computes 2 raised to
the power of each
element of vx.
vector4double exp2d4
(vector4double vx);
vector4double exp2f4
(vector4double vx);
expm1d4
expm1f4
Computes (the
exponential function of
each element of vx) - 1.
vector4double expm1d4
(vector4double vx);
vector4double expm1f4
(vector4double vx);
exp2m1d4 exp2m1f4
Computes (2 raised to
the power of each
element of vx) -1.
vector4double exp2m1d4
(vector4double vx);
vector4double exp2m1f4
(vector4double vx);
hypotd4
For each element of vx
and the corresponding
element of vy,
computes
sqrt(x*x+y*y).
vector4double hypotd4
(vector4double vx,
vector4double vy);
vector4double hypotf4
(vector4double vx,
vector4double vy);
vector4double lgammad4
(vector4double vx);
vector4double lgammaf4
(vector4double vx);
hypotf4
lgammad4 lgammaf4 Computes the natural
logarithm of the
absolute value of the
Gamma function of
each element of vx .
Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS)
109
Table 5. MASS SIMD functions (continued)
Doubleprecision
function
Singleprecision
function
Description
Double-precision function
prototype
Single-precision function
prototype
logd4
logf4
Computes the natural
logarithm of each
element of vx.
vector4double logd4
(vector4double vx);
vector4double logf4
(vector4double vx);
log2d4
log2f4
Computes the base-2
logarithm of each
element of vx.
vector4double log2d4
(vector4double vx);
vector4double log2f4
(vector4double vx);
log10d4
log10f4
Computes the base-10
logarithm of each
element of vx.
vector4double log10d4
(vector4double vx);
vector4double log10f4
(vector4double vx);
log1pd4
log1pf4
Computes the natural
logarithm of each
element of (vx +1).
vector4double log1pd4
(vector4double vx);
vector4double log1pf4
(vector4double vx);
log21pd4
log21pf4
Computes the base-2
logarithm of each
element of (vx +1).
vector4double log21pd4
(vector4double vx);
vector4double log21pf4
(vector4double vx);
powd4
powf4
Computes each element vector4double powd4
(vector4double vx,
of vx raised to the
vector4double vy);
power of the
corresponding element
of vy.
vector4double powf4
(vector4double vx,
vector4double vy);
qdrtd4
qdrtf4
Computes the quad
root of each element of
vx.
vector4double qdrtd4
(vector4double vx);
vector4double qdrtf4
(vector4double vx);
rcbrtd4
rcbrtf4
Computes the
reciprocal of the cube
root of each element of
vx.
vector4double rcbrtd4
(vector4double vx);
vector4double rcbrtf4
(vector4double vx);
recipd4
recipf4
Computes the
reciprocal of each
element of vx.
vector4double recipd4
(vector4double vx);
vector4double recipf4
(vector4double vx);
rqdrtd4
rqdrtf4
Computes the
reciprocal of the quad
root of each element of
vx.
vector4double rqdrtd4
(vector4double vx);
vector4double rqdrtf4
(vector4double vx);
rsqrtd4
rsqrtf4
vector4double rsqrtd4
Computes the
reciprocal of the square (vector4double vx);
root of each element of
vx.
vector4double rsqrtf4
(vector4double vx);
sincosd4
sincosf4
Computes the sine and
cosine of each element
of vx.
void sincosd4 (vector4double vx, void sincosf4 (vector4double vx,
vector4double *vs,
vector4double *vs,
vector4double *vc);
vector4double *vc);
sind4
sinf4
Computes the sine of
each element of vx.
vector4double sind4
(vector4double vx);
vector4double sinf4
(vector4double vx);
sinhd4
sinhf4
Computes the
hyperbolic sine of each
element of vx.
vector4double sinhd4
(vector4double vx);
vector4double sinhf4
(vector4double vx);
sqrtd4
sqrtf4
Computes the square
root of each element of
vx.
vector4double sqrtd4
(vector4double vx);
vector4double sqrtf4
(vector4double vx);
110
Blue Gene/Q vector data type for C/C++
Table 5. MASS SIMD functions (continued)
Doubleprecision
function
Singleprecision
function
Description
Double-precision function
prototype
Single-precision function
prototype
tand4
tanf4
Computes the tangent
of each element of vx.
vector4double tand4
(vector4double vx);
vector4double tanf4
(vector4double vx);
tanhd4
tanhf4
Computes the
hyperbolic tangent of
each element of vx.
vector4double tanhd4
(vector4double vx);
vector4double tanhf4
(vector4double vx);
Compiling and linking a program with MASS
To compile an application that calls the functions in the MASS libraries, specify
one or more of the following keywords on the -l linker option:
v mass
v massv
v mass_simd
For example, if the MASS libraries are installed in the default directory, you can
specify one of the following:
Link with scalar library libmass.a and vector library libmassv.a
bgxlc progc.c -o progc -lmass -lmassv
Link with SIMD library libmass_simd.a
bgxlc progc.c -o progc -lmass_simd
Using libmass.a with the math system library
If you want to use the libmass.a scalar library for some functions and the normal
math library libm.a for other functions, follow this procedure to compile and link
your program:
1. Use the ar command to extract the object files of the desired functions from
libmass.a. For most functions, the object file name is the function name
followed by .s64.o. 1 For example, to extract the object file for the tan function,
the command would be:
ar -x tan.s64.o libmass.a
2. Archive the extracted object files into another library:
ar -qv libfasttan.a tan.s64.o
ranlib libfasttan.a
3. Create the final executable using xlc, specifying -lfasttan instead of -lmass:
xlc sample.c -o sample -Ldir_containing_libfasttan -lfasttan
This links only the tan function from MASS (now in libfasttan.a) and the
remainder of the math functions from the standard system library.
Exceptions:
1. The sin and cos functions are both contained in the object file sincos.s64.o. The
cosisin and sincos functions are both contained in the object file cosisin.s64.o.
2. The XL C/C++ pow function is contained in the object file dxy.s64.o.
Note: The cos and sin functions will both be exported if either one is exported.
cosisin and sincos will both be exported if either one is exported.
Chapter 11. Using the Mathematical Acceleration Subsystem libraries (MASS)
111
112
Blue Gene/Q vector data type for C/C++
Index
Special characters
D
M
__align 15
-qflttrap compiler option 11
/= (compound assignment operator)
* (indirection operator) 28
*= (compound assignment operator)
[ ] (vector subscript operator) 34
>>= (compound assignment
operator) 32
<<= (compound assignment
operator) 32
& (address operator) 27
&= (compound assignment operator)
+= (compound assignment operator)
= (simple assignment operator) 32
^= (compound assignment operator)
data types
vector 3
declarations
vector types 3
dereferencing operator
macro
definition
typeof operator 31
macros
related to the platform 9
MASS libraries 105
scalar functions 105
modifiable lvalue 32
A
address operator (&) 27
aggregate
alignment 15
alignment 15, 17, 19
structures 17
structures and unions 15
alignof operator 28
arrays
as function parameter 24
declaration 24
asm
statements 35
assembly
statements 35
assignment operator (=)
compound 32
simple 32
B
best viable function
bit fields
type name 31
bool 3
22
C
candidate functions 22
cast expressions 3, 5
vector literal 5
compound
assignment 32
expression 32
conditional expression (? :)
const 3
conversions
standard 22
32
32
28
E
32
32
32
ellipsis
in function declaration 24
in function definition 24
examples
inline assembly statements 39
exception handling
for floating point 11
expressions
assignment 32
extended friend declarations
typedef names 3
F
floating-point
exceptions 11
function overload resolution
functions
declaration
parameter names 24
signature 24
I
implicit conversion 22
types 22
indirection operator (*) 3, 28
initialization
vector types 7
initializer lists 7
initializers
vector types 7
inline
assembly statements 35
32
operators
* (indirection) 28
[] (vector subscripting) 34
& (address) 27
= (simple assignment) 32
assignment 32
compound assignment 32
sizeof 29
typeof 31
optimization
math functions 105
overload resolution 22
P
packed
assignments and comparisons
variable attribute 19
pixel 3
pointer arithmetic 21
pointers
pointer arithmetic 3, 21
vector types 3
32
R
references
declarator 27
return type
size_t 29
S
L
libmass library 105
library
MASS 105
scalar 105
literals
vector 5
long long type specifier
long type specifier 3
21
O
3
scalar MASS library 105
SIGTRAP signal 11
size_t 29
sizeof operator 29
sizeof... operator 29
standard type conversions 22
statements
inline assembly
restrictions 39
static 3
in array declaration 24
structures
alignment 15
subscripting operator 34
113
T
type attributes
aligned 19
packed 20
type conversion 21
type name
typeof operator 31
type specifiers
vector data types 3
typedef names
friends 3
typedef specifier 3
typeof operator 31
U
unsubscripted arrays
description 24
V
variable argument list 21
variable length array
as function parameter 22
vector
literals 5
subscripting operator 34
vector built-in functions
vec_abs 56
vec_add 64
vec_and 97
vec_andc 98
vec_ceil 75
vec_cfid 78
vec_cfidu 78
vec_cmpeq 86
vec_cmpgt 84
vec_cmplt 85
vec_cpsgn 65
114
vector built-in functions (continued)
vec_ctid 79
vec_ctidu 80
vec_ctiduz 81
vec_ctidz 80
vec_ctiw 82
vec_ctiwu 82
vec_ctiwuz 84
vec_ctiwz 83
vec_extract 88
vec_floor 75
vec_gpci 89
vec_insert 88
vec_ld 45
vec_ld2 50
vec_ld2a 50
vec_lda 45
vec_ldia 47
vec_ldiaa 47
vec_ldiz 48
vec_ldiza 48
vec_lds 49
vec_ldsa 49
vec_logical 99
vec_lvsl 90
vec_lvsr 92
vec_madd 69
vec_msub 70
vec_mul 65
vec_nabs 57
vec_nand 100
vec_neg 56
vec_nmadd 71
vec_nmsub 71
vec_nor 101
vec_not 101
vec_or 102
vec_orc 102
vec_perm 94
vec_promote 95
vec_re 58
Blue Gene/Q vector data type for C/C++
vector built-in functions (continued)
vec_res 59
vec_round 76
vec_rsp 76
vec_rsqrte 60
vec_rsqrtes 61
vec_sel 86
vec_sldw 95
vec_splat 96
vec_splats 97
vec_st 51
vec_st2 55
vec_st2a 55
vec_sta 51
vec_sts 53
vec_stsa 53
vec_sub 66
vec_swdiv 67
vec_swdiv_nochk 67
vec_swdivs 68
vec_swdivs_nochk 68
vec_swsqrt 62
vec_swsqrt_nochk 62
vec_swsqrts 63
vec_swsqrts_nochk 63
vec_trunc 77
vec_tstnan 87
vec_xmadd 72
vec_xmul 69
vec_xor 103
vec_xxcpnmadd 73
vec_xxmadd 73
vec_xxnpmadd 74
vector data types 3
vector literal
cast expressions 5
vector types 31
in typedef declarations 3
literals 5
void
in function definition 24

Download Report

Blue Gene-Q vector data type for C

Paperzz.com

Your Paperzz