Home > mailing lists

Re: Parallel Execution of Query - Mailing list pgsql-novice

From	tim.child@comcast.net
Subject	Re: Parallel Execution of Query
Date	December 1, 2015 00:39:20
Msg-id	88192495.4184600.1448919555445.JavaMail.zimbra@comcast.net Whole thread Raw
In response to	Parallel Execution of Query (Shmagi Kavtaradze <kavtaradze.s@gmail.com>)
List	pgsql-novice

Tree view

Shmagi,

First, I would explore creating multiple VM's and trying to run the query in parallel VM's. If you can easily clone your VMs, try creating two VM and running 1/2 the query on each VM.

Then try 4 VM's, then 8 and so on.

An complex approach for a single VM is to write C UDF (User Defined Function). The UDF should do the following

1) Take a select query a the input argument

2) Run the query and store the results in a C collection (a list or array of C structs)

3) Loop over the C collection N by N times computing the similarity matching (cosine, euclidean)

4) Output the result as a set of rows

This is a non-trivial approach, as it requires deep knowledge of PostgreSQL C functions. But it could

speed up calculations like this by orders of magnitude

Regards

Tim

From: "Shmagi Kavtaradze" <kavtaradze.s@gmail.com>
To: pgsql-novice@postgresql.org
Sent: Monday, November 30, 2015 9:00:40 AM
Subject: [NOVICE] Parallel Execution of Query

I am doing similarity matching (cosine, euclidean). If I have 4000 entries in a table, the number of comparisons will be 16M. I am running postgres on a virtual machine, so it takes 20-25 minutes to run the query or the system crashes. Can I run the query in parallel? I heard there are tools like PL/Proxy and pgpool, can I use them to create several databases on the same machine and run query in parallel?

pgsql-novice by date:

From: James Keener
Date: 01 December 2015, 00:19:55
Subject: Re: Parallel Execution of Query

From: "Robert Beyne"
Date: 01 December 2015, 02:45:52
Subject: Last Chance to Defend Your Freedom

Re: Parallel Execution of Query - Mailing list pgsql-novice

Previous

Next