![]() |
Home · All Classes |
Files:
The Vector Addition example shows how to use Qt to add two vectors using OpenCL.
We start by creating a QCLContext object and initializing it for the default OpenCL computing device (usually the GPU):
QCLContext context; if (!context.create()) { fprintf(stderr, "Could not create OpenCL context for the GPU\n"); return 1; }
Next we need some input data. We are going to add two 2048-element vectors of integers together. The first vector will hold the numbers 0..2047 and the second vector will hold the numbers 2048..1. The following code creates and initializes our vectors:
QCLVector<int> input1 = context.createVector<int>(2048); QCLVector<int> input2 = context.createVector<int>(2048); for (int index = 0; index < 2048; ++index) { input1[index] = index; input2[index] = 2048 - index; }
At this point, the data is still in the CPU's address space, but by creating the vectors as QCLVector instances, they will be automatically transferred to the GPU when we execute the OpenCL program later. We also need somewhere for the GPU to store the results:
QCLVector<int> output = context.createVector<int>(2048);
Next, we build the program and locate the vectorAdd entry point, or "kernel" in OpenCL terminology:
QCLProgram program = context.buildProgramFromSourceFile(":/vectoradd.cl"); QCLKernel kernel = program.createKernel("vectorAdd");
The OpenCL source code for our program has been supplied as a Qt resource file called vectoradd.cl. Before we continue with the C++ code, let's have a look at the OpenCL source code:
__kernel void vectorAdd(__global __read_only int *input1,
__global __read_only int *input2,
__global __write_only int *output)
{
unsigned int index = get_global_id(0);
output[index] = input1[index] + input2[index];
}
The function vectorAdd has two arguments for the input vectors and a third argument for the output vector. The body of the function may look a little strange at first glance for a vector addition. It fetches an array index and then sets that output location to the sum of the two input locations. It isn't looping over the whole vector as would normally be expected in C++ code.
The magic happens with get_global_id(), which fetches the index value from an outer loop that is provided for us by the OpenCL environment. Most OpenCL functions look like this: they fetch the identifiers for the current "work item", and then process just that item. Back in our C++ code, we specify the number of work items with QCLKernel::setGlobalWorkSize():
kernel.setGlobalWorkSize(2048);
This tells the OpenCL implementation how many times it should iterate over the inputs, passing a different global identifier to each instance. These iterations are executed in parallel to speed up the calculations. In C++, we execute the kernel, passing it the three vectors, as follows:
kernel(input1, input2, output);
Behind the scenes, QCLKernel will make sure that the contents of our input vectors are transferred to the GPU before the kernel begins execution.
Finally, we read back the results from the output buffer and check that the answer is what we expected:
for (int index = 0; index < 2048; ++index) { if (output[index] != 2048) { fprintf(stderr, "Answer at index %d is %d, should be %d\n", index, output[index], 2048); return 1; } } printf("Answer is correct: %d\n", 2048);
The first time that the host refers to the contents of the output vector, the data is automatically transferred from the GPU to the host.
If all goes well, the following should be printed by the program:
Answer is correct: 2048
| Copyright © 2010 Nokia Corporation | QtOpenCL Documentation |