Thread: Parallel Execution of Query

Parallel Execution of Query

From

Shmagi Kavtaradze

Date:

30 November 2015, 20:00:54

I am doing similarity matching (cosine, euclidean). If I have 4000 entries in a table, the number of comparisons will be 16M. I am running postgres on a virtual machine, so it takes 20-25 minutes to run the query or the system crashes. Can I run the query in parallel? I heard there are tools like PL/Proxy and pgpool, can I use them to create several databases on the same machine and run query in parallel?

Re: Parallel Execution of Query

From

James Keener

Date:

30 November 2015, 23:56:49

While parallel queries is an area of ongoing work, I do not believe that
a single connection can currently use more than a single core.  You
could experiment with turning your 4000x4000 comparisons into 4 queries
doing 2000x2000 comparison (assuming you have 4 cores available to the
VM, the more cores, the more cores you could split it up) (just remember
to use async for each of those so you don't execute them in serial!).

Jim

On 11/30/2015 12:00 PM, Shmagi Kavtaradze wrote:
> I am doing similarity matching (cosine, euclidean). If I have 4000
> entries in a table, the number of comparisons will be 16M. I am running
> postgres on a virtual machine, so it takes 20-25 minutes to run the
> query or the system crashes. Can I run the query in parallel? I heard
> there are tools like PL/Proxy and pgpool, can I use them to create
> several databases on the same machine and run query in parallel?

Attachment

signature.asc

Re: Parallel Execution of Query

From

Shmagi Kavtaradze

Date:

01 December 2015, 00:10:10

Thanks for answer. What about async? can you tell in details?

On Mon, Nov 30, 2015 at 9:56 PM, James Keener <jim@jimkeener.com> wrote:

While parallel queries is an area of ongoing work, I do not believe that
a single connection can currently use more than a single core. You
could experiment with turning your 4000x4000 comparisons into 4 queries
doing 2000x2000 comparison (assuming you have 4 cores available to the
VM, the more cores, the more cores you could split it up) (just remember
to use async for each of those so you don't execute them in serial!).

Jim

On 11/30/2015 12:00 PM, Shmagi Kavtaradze wrote:
> I am doing similarity matching (cosine, euclidean). If I have 4000
> entries in a table, the number of comparisons will be 16M. I am running
> postgres on a virtual machine, so it takes 20-25 minutes to run the
> query or the system crashes. Can I run the query in parallel? I heard
> there are tools like PL/Proxy and pgpool, can I use them to create
> several databases on the same machine and run query in parallel?

Re: Parallel Execution of Query

From

James Keener

Date:

01 December 2015, 00:19:55

async is just an ability some client libraries offer/expose which allow
you to resume execution of your application before results are returned
from the database. This is similar to async HTTP requests, "promises"
(common in modern JavaScript), or async/await (as offered in some
languages) if you're familiar with those.

It doesn't change the 1-core per query limitation on the server.

Jim

Attachment

signature.asc

Re: Parallel Execution of Query

From

tim.child@comcast.net

Date:

01 December 2015, 00:39:20

Shmagi,

First, I would explore creating multiple VM's and trying to run the query in parallel VM's. If you can easily clone your VMs, try creating two VM and running 1/2 the query on each VM.

Then try 4 VM's, then 8 and so on.

An complex approach for a single VM is to write C UDF (User Defined Function). The UDF should do the following

1) Take a select query a the input argument

2) Run the query and store the results in a C collection (a list or array of C structs)

3) Loop over the C collection N by N times computing the similarity matching (cosine, euclidean)

4) Output the result as a set of rows

This is a non-trivial approach, as it requires deep knowledge of PostgreSQL C functions. But it could

speed up calculations like this by orders of magnitude

Regards

Tim

From: "Shmagi Kavtaradze" <kavtaradze.s@gmail.com>
To: pgsql-novice@postgresql.org
Sent: Monday, November 30, 2015 9:00:40 AM
Subject: [NOVICE] Parallel Execution of Query