Thread: getting the most of out multi-core systems for repeated complex SELECT statements

getting the most of out multi-core systems for repeated complex SELECT statements

From
Mark Stosberg
Date:
Each night we run over a 100,000 "saved searches" against PostgreSQL
9.0.x. These are all complex SELECTs using "cube" functions to perform a
geo-spatial search to help people find adoptable pets at shelters.

All of our machines in development in production have at least 2 cores
in them, and I'm wondering about the best way to maximally engage all
the processors.

Now we simply run the searches in serial. I realize PostgreSQL may be
taking advantage of the multiple cores some in this arrangement, but I'm
seeking advice about the possibility and methods for running the
searches in parallel.

One naive I approach I considered was to use parallel cron scripts. One
would run the "odd" searches and the other would run the "even"
searches. This would be easy to implement, but perhaps there is a better
way.  To those who have covered this area already, what's the best way
to put multiple cores to use when running repeated SELECTs with PostgreSQL?

Thanks!

    Mark

On 2/3/2011 9:08 AM, Mark Stosberg wrote:
>
> Each night we run over a 100,000 "saved searches" against PostgreSQL
> 9.0.x. These are all complex SELECTs using "cube" functions to perform a
> geo-spatial search to help people find adoptable pets at shelters.
>
> All of our machines in development in production have at least 2 cores
> in them, and I'm wondering about the best way to maximally engage all
> the processors.
>
> Now we simply run the searches in serial. I realize PostgreSQL may be
> taking advantage of the multiple cores some in this arrangement, but I'm
> seeking advice about the possibility and methods for running the
> searches in parallel.
>
> One naive I approach I considered was to use parallel cron scripts. One
> would run the "odd" searches and the other would run the "even"
> searches. This would be easy to implement, but perhaps there is a better
> way.  To those who have covered this area already, what's the best way
> to put multiple cores to use when running repeated SELECTs with PostgreSQL?
>
> Thanks!
>
>      Mark
>
>

1) I'm assuming this is all server side processing.
2) One database connection will use one core.  To use multiple cores you
need multiple database connections.
3) If your jobs are IO bound, then running multiple jobs may hurt
performance.

Your naive approach is the best.  Just spawn off two jobs (or three, or
whatever).  I think its also the only method.  (If there is another
method, I dont know what it would be)

-Andy

Mark,

you could try gevel module to get structure of GIST index and look if
items distributed more or less homogenous (see different levels).
You can visualize index like http://www.sai.msu.su/~megera/wiki/Rtree_Index
Also, if your searches are neighbourhood searches, them you could try knn, available
in 9.1 development version.


Oleg

On Thu, 3 Feb 2011, Mark Stosberg wrote:

>
> Each night we run over a 100,000 "saved searches" against PostgreSQL
> 9.0.x. These are all complex SELECTs using "cube" functions to perform a
> geo-spatial search to help people find adoptable pets at shelters.
>
> All of our machines in development in production have at least 2 cores
> in them, and I'm wondering about the best way to maximally engage all
> the processors.
>
> Now we simply run the searches in serial. I realize PostgreSQL may be
> taking advantage of the multiple cores some in this arrangement, but I'm
> seeking advice about the possibility and methods for running the
> searches in parallel.
>
> One naive I approach I considered was to use parallel cron scripts. One
> would run the "odd" searches and the other would run the "even"
> searches. This would be easy to implement, but perhaps there is a better
> way.  To those who have covered this area already, what's the best way
> to put multiple cores to use when running repeated SELECTs with PostgreSQL?
>
> Thanks!
>
>    Mark
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

On 02/03/2011 10:54 AM, Oleg Bartunov wrote:
> Mark,
>
> you could try gevel module to get structure of GIST index and look if
> items distributed more or less homogenous (see different levels). You
> can visualize index like http://www.sai.msu.su/~megera/wiki/Rtree_Index
> Also, if your searches are neighbourhood searches, them you could try
> knn, available
> in 9.1 development version.

Oleg,

Those are interesting details to consider. I read more about KNN here:

http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/

Will I be able to use it improve the performance of finding nearby
zipcodes? It sounds like KNN has great potential for performance
improvements!

   Mark