Is it feasible to use GPU to speed up (dynamic) LINQ queries?

c# dynamic-linq gpu linq

Question

I have been searching for some days for solid information on the possibility to accelerate LINQ queries using a GPU.

Technologies I have "investigated" so far:

  • Microsoft Accelerator
  • Cudafy
  • Brahma

In short, would it even be possible at all to do an in-memory filtering of objects on the GPU?

Let´s say we have a list of some objects and we want to filter something like:

var result = myList.Where(x => x.SomeProperty == SomeValue);

Any pointers on this one?

Thanks in advance!

UPDATE

I´ll try to be more specific about what I am trying to achieve :)

The goal is, to use any technology, which is able to filter a list of objects (ranging from ~50 000 to ~2 000 000), in the absolutely fastest way possible.

The operations I perform on the data when the filtering is done (sum, min, max etc) is made using the built in LINQ-methods and is already fast enough for our application, so that´s not a problem.

The bottleneck is "simply" the filtering of data.

UPDATE

Just wanted to add that I have tested about 15 databases, including MySQL (checking possible cluster approach / memcached solution), H2, HSQLDB, VelocityDB (currently investigating further), SQLite, MongoDB etc, and NONE is good enough when it comes to the speed of filtering data (of course, the NO-sql solutions do not offer this like the sql ones, but you get the idea) and/or the returning of the actual data.

Just to summarize what I/we need:

A database which is able to sort data in the format of 200 columns and about 250 000 rows in less than 100 ms.

I currently have a solution with parallellized LINQ which is able (on a specific machine) to spend only nano-seconds on each row when filtering AND processing the result!

So, we need like sub-nano-second-filtering on each row.

  1. Why does it seem that only in-memory LINQ is able to provide this?
  2. Why would this be impossible?

Some figures from the logfile:

Total tid för 1164 frågor: 2579

This is Swedish and translates:

Total time for 1164 queries: 2579

Where the queries in this case are queries like:

WHERE SomeProperty = SomeValue

And those queries are all being done in parallell on 225639 rows.

So, 225639 rows are being filtered in memory 1164 times in about 2.5 seconds.

That´s 9,5185952917007032597107300413827e-9 seconds / row, BUT, that also includes the actual processing of the numbers! We do Count (not null), total count, Sum, Min, Max, Avg, Median. So, we have 7 operations on these filtered rows.

So, we could say it´s actually 7 times faster than the the databases we´ve tried, since we do NOT do any aggregation-stuff in those cases!

So, in conclusion, why are the databases so poor at filtering data compared to in-memory LINQ filtering? Have Microsoft really done such a good job that it is impossible to compete with it? :)

It makes sense though that in-memory filtering should be faster, but I don´t want a sense that it is faster. I want to know what is faster, and if it´s possible why.

1
14
2/17/2012 11:00:16 AM

Popular Answer

I will answer definitively about Brahma since it's my library, but it probably applies to other approaches as well. The GPU has no knowledge of objects. It's memory is also mostly completely separate from CPU memory.

If you do have a LARGE set of objects and want to operate on them, you can only pack the data you want to operate on into a buffer suitable for the GPU/API you're using and send it off to be processed.

Note that this will make two round trips over the CPU-GPU memory interface, so if you aren't doing enough work on the GPU to make it worthwhile, you'll be slower than if you simply used the CPU in the first place (like the sample above).

Hope this helps.

9
2/15/2012 5:49:58 PM


Related Questions





Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow