Despite the lack of updates on this site, I have been hard at work on a new provider, Brahma.OpenCL. I am very excited at all the possibilities that OpenCL brings to the table. I will try to summarize some of the new features that OpenCL will bring to Brahma.
- Different memory pools – OpenCL supports the idea of host memory, device memory and host-addressable memory. Brahma will allow the creation of DataParallelArrays in these memory pools and will allow host-addressable memory (and host memory) to maintain only one copy of the data.
- Asynchronous operations – OpenCL supports the idea of asynchronous operations (data-transfers, concurrent kernel execution) and now so does Brahma. Brahma will also allow the user to introduce “fences” in any command queue for fork-join parallelism.
- Free memory-layout – Because memory layouts vary across applications (Morton coding, bricking, overlapped bricks or other index based arbitrary data-structures), DataParallelArrays will now be independent of layout (with a layout manager YOU provide – or use a stock one). This means Data-ParallelArrays will now have arbitrary dimensionality.
- Arbitrary data – This is perhaps very exciting. DataParallelArrays can now contain any kind of blittable struct. So, if you’re working with an equation that requires two floats and an int called a, b and factor; you can create a struct with those members and use them in your calculations!
And, to whet your appetite; here’s what Brahma.OpenCL code will look like.
// Create compiled, named query
CompiledQuery query =
_provider.Compile("q1", (d1, d2) =>
from val1 in d1
from val2 in d2
select new Coefficients
{
a = val1.a + val2.a,
b = 0f,
factor = a.factor++
});
// Create a dataparallel array, and provide a
// memory layout implementation, and a pool to use
var d1 = new DataParallelArray<Coefficients>(
_provider, null, length,
new LinearMemoryLayout(length), Pool.Shared);
// Create a command that takes the array and
// a GetValues lambda
// but - don't do anything just yet
var transferd1 += Command.Transfer(d1,
x => new Coefficients
{
a = 0.0,
b = 1.0,
factor = 0
});
// Ditto for d2 - create ...
var d2 = new DataParallelArray<Coefficients>(
_provider, null, length,
new LinearMemoryLayout(length), Pool.Shared);
// ... and create a transfer command
var transferd2 += Command.Transfer(d1,
x => new Coefficients
{
a = 0.0,
b = 1.0,
factor = 0
});
// Queue 0 - add the transfer command
_provider.CommandQueues[0] += transferd1;
// Queue 1 - add the transfer
// and wait for transferd1
_provider.CommandQueues[1] +=
new[]
{
transferd2,
Command.Wait(transferd1)
};
// Begin running commands on queues 0 and 1
ResultSet results = _provider.Run(0, 1);
// Access results from the result-set
foreach (Coefficient coeff in results["q1"])
...
Hope that got you excited . I would love to hear comments and suggestions from everyone!
For those of you who have written to me and not received a reply, please write to me again, these past few months have been terribly hectic; settling down in a new job in a new country.
Awesome. Would this be able to target ANY of the existing or upcoming OpenCL drivers from the usual suspects? CPU implementations included?
Yes, it does. At this time, I have been able to test OpenCL context creation on the ATI stream SDK (beta only – the current version doesn’t install anymore on my intel-based machine) for a CPU device (Intel Core2 Duo) and an nVidia 8600M.
Just a thought, maybe you could partner with a basic math library and hide away most of the Initialiser GPU work for basic math operations? Perhaps use static compiled queries for the common requests (eg, vector operations, variance).
Finding standard deviation of a large dataset can be computationally intensive, I’m sure a lot of Math Libs could be enhanced with this work =)
Have you considered an already existing OpenCL backend?
This one looks quite capable: http://sourceforge.net/projects/cloo/
When I said backend, I meant the part of Brahma that generates OpenCL kernels from an expression tree (and the associated classes), not the bindings/wrapper classes.
I am, however also writing my own at this time because I find that most of the other wrappers do a lot under the hood per call – which may affect performance. My bindings are all single line calls (while being managed type friendly at the same time).
I am really impressed by what i have seen with Brahma, type safe, Linq, etc. Your methodology is elegant and very flexible. I also am excited about your OpenCL provider, as that will really add some amazing functionality to Brahma.
I can’t wait until this is ready! Your OpenGL implementation wasn’t that far away from what’s needed for OpenCL …. you can do it!
Will this work on the mono framework too and on linux and mac os x? I am very interested in using OpenCL powered heavy calculations
Yes, it should. I will be unable to test on Mac OS X, though. What I have so far works fine on Mono under Windows
More Examples
hope to see more examples on how either a cuda or opencl code can be ported to Brahma as an illustration of the process involved and the tips to avoid some common mistakes.
For example, under Nivida SDK, OpenCL volume render code, what would be the challenge to port the opencl code into Brahma? Very interested to see the potential of Brahma.