Thread: Parallel Execution of Query

Parallel Execution of Query

From
Shmagi Kavtaradze
Date:
I am doing similarity matching (cosine, euclidean). If I have 4000 entries in a table, the number of comparisons will be 16M. I am running postgres on a virtual machine, so it takes 20-25 minutes to run the query or the system crashes. Can I run the query in parallel? I heard there are tools like PL/Proxy and pgpool, can I use them to create several databases on the same machine and run query in parallel?

Re: Parallel Execution of Query

From
James Keener
Date:
While parallel queries is an area of ongoing work, I do not believe that
a single connection can currently use more than a single core.  You
could experiment with turning your 4000x4000 comparisons into 4 queries
doing 2000x2000 comparison (assuming you have 4 cores available to the
VM, the more cores, the more cores you could split it up) (just remember
to use async for each of those so you don't execute them in serial!).

Jim

On 11/30/2015 12:00 PM, Shmagi Kavtaradze wrote:
> I am doing similarity matching (cosine, euclidean). If I have 4000
> entries in a table, the number of comparisons will be 16M. I am running
> postgres on a virtual machine, so it takes 20-25 minutes to run the
> query or the system crashes. Can I run the query in parallel? I heard
> there are tools like PL/Proxy and pgpool, can I use them to create
> several databases on the same machine and run query in parallel?


Attachment

Re: Parallel Execution of Query

From
Shmagi Kavtaradze
Date:
Thanks for answer. What about async? can you tell in details?

On Mon, Nov 30, 2015 at 9:56 PM, James Keener <jim@jimkeener.com> wrote:
While parallel queries is an area of ongoing work, I do not believe that
a single connection can currently use more than a single core.  You
could experiment with turning your 4000x4000 comparisons into 4 queries
doing 2000x2000 comparison (assuming you have 4 cores available to the
VM, the more cores, the more cores you could split it up) (just remember
to use async for each of those so you don't execute them in serial!).

Jim

On 11/30/2015 12:00 PM, Shmagi Kavtaradze wrote:
> I am doing similarity matching (cosine, euclidean). If I have 4000
> entries in a table, the number of comparisons will be 16M. I am running
> postgres on a virtual machine, so it takes 20-25 minutes to run the
> query or the system crashes. Can I run the query in parallel? I heard
> there are tools like PL/Proxy and pgpool, can I use them to create
> several databases on the same machine and run query in parallel?


Re: Parallel Execution of Query

From
James Keener
Date:
async is just an ability some client libraries offer/expose which allow
you to resume execution of your application before results are returned
from the database. This is similar to async HTTP requests, "promises"
(common in modern JavaScript), or async/await (as offered in some
languages) if you're familiar with those.

It doesn't change the 1-core per query limitation on the server.

Jim


Attachment

Re: Parallel Execution of Query

From
tim.child@comcast.net
Date:
Shmagi,

First, I would explore creating multiple VM's and trying to run the query in parallel VM's. If you can easily clone your VMs,  try creating two VM and running 1/2  the query on each VM. 
Then try 4 VM's, then 8 and so on.

An complex approach for a single VM is to write C UDF (User Defined Function). The UDF  should do the following

1) Take a select  query a the input argument
2) Run the query and store the results in a C collection (a list or array  of C structs)
3) Loop over the C  collection N  by  N times   computing the  similarity matching (cosine, euclidean)
4) Output the result as a set of rows

This is a non-trivial approach, as it requires deep knowledge of  PostgreSQL C functions. But it  could
speed up calculations like this by orders of magnitude

Regards

Tim





From: "Shmagi Kavtaradze" <kavtaradze.s@gmail.com>
To: pgsql-novice@postgresql.org
Sent: Monday, November 30, 2015 9:00:40 AM
Subject: [NOVICE] Parallel Execution of Query

I am doing similarity matching (cosine, euclidean). If I have 4000 entries in a table, the number of comparisons will be 16M. I am running postgres on a virtual machine, so it takes 20-25 minutes to run the query or the system crashes. Can I run the query in parallel? I heard there are tools like PL/Proxy and pgpool, can I use them to create several databases on the same machine and run query in parallel?