Thread: Get Postgres to use multiple proc cores?
With my hard drive issues from an earlier email solved (thanks all) I am now running queries against my 65M rows. I noticedin atop that postmaster is using 100% of just one processor core. Is there any way to tell it to use both (or at leastsome of the other one). System is RHEL5, PG is version 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)4.1.2 20080704 (Red Hat 4.1.2-46). Intel Core 2 Duo, 4 GB RAM. The query in question has been running for 15 minutes, and the only thing atop is reporting in the "red" is the processor.Although, just now the proc has died down and the drive is apparently now "red" (the main delay point). Here isthe query (any help on improving it?) ------------------------------ | a | b | --------------|--------------- |guid vc(32) | guid vc(32) | |unitid vc(15)| unitid vc(15)| |ftime ts | | |source vc(10)| | ------------------------------ |65,000,000 rw| 500,000 rows | ------------------------------ SELECT DISTINCT guid FROM b WHERE guid NOT IN (SELECT DISTINCT guid FROM a) AND unitid IN (SELECT DISTINCT unitid FROM a) Eventually, I want to project b.unitid and group by it, with a count(b.guid).
On Fri, Nov 5, 2010 at 8:59 AM, Jonathan Hoover <jhoover@yahoo-inc.com> wrote: > With my hard drive issues from an earlier email solved (thanks all) I am now running queries against my 65M rows. I noticedin atop that postmaster is using 100% of just one processor core. Is there any way to tell it to use both (or at leastsome of the other one). System is RHEL5, PG is version 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)4.1.2 20080704 (Red Hat 4.1.2-46). Intel Core 2 Duo, 4 GB RAM. > > The query in question has been running for 15 minutes, and the only thing atop is reporting in the "red" is the processor.Although, just now the proc has died down and the drive is apparently now "red" (the main delay point). Here isthe query (any help on improving it?) A group by is almost always faster than a distinct. And a single query uses a single core, sorry.
Scott, I'd like to take this chance to ask another (related though) question. What is faster? A JOIN or a WHERE a IN (SELECT ... ) I've heard that the nested subquery has a tendency to be slower, but I'd like to check it with people from the list. Is one faster than the other? Or, all in all, they run at the same speed ? Best, Oliveiros ----- Original Message ----- From: "Scott Marlowe" <scott.marlowe@gmail.com> To: "Jonathan Hoover" <jhoover@yahoo-inc.com> Cc: <pgsql-admin@postgresql.org> Sent: Friday, November 05, 2010 3:17 PM Subject: Re: [ADMIN] Get Postgres to use multiple proc cores? On Fri, Nov 5, 2010 at 8:59 AM, Jonathan Hoover <jhoover@yahoo-inc.com> wrote: > With my hard drive issues from an earlier email solved (thanks all) I am > now running queries against my 65M rows. I noticed in atop that postmaster > is using 100% of just one processor core. Is there any way to tell it to > use both (or at least some of the other one). System is RHEL5, PG is > version 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 > 20080704 (Red Hat 4.1.2-46). Intel Core 2 Duo, 4 GB RAM. > > The query in question has been running for 15 minutes, and the only thing > atop is reporting in the "red" is the processor. Although, just now the > proc has died down and the drive is apparently now "red" (the main delay > point). Here is the query (any help on improving it?) A group by is almost always faster than a distinct. And a single query uses a single core, sorry. -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
On Fri, Nov 5, 2010 at 9:29 AM, Oliveiros d'Azevedo Cristina <oliveiros.cristina@marktest.pt> wrote: > Scott, > > I'd like to take this chance to ask another (related though) question. > > What is faster? A JOIN or a WHERE a IN (SELECT ... ) > > I've heard that the nested subquery has a tendency to be slower, but I'd > like to check it with people from the list. > Is one faster than the other? Or, all in all, they run at the same speed ? Usually the join is faster. But there's history to consider. Back in the olden days, it was almost always faster to use a join than a where in. Then where in got a lot of work, and it's a lot faster. So, it depends a bit on which version your running and size of the data set in the in. Most of the time it's pretty close, but for larger numbers of rows in the subselect, the join is usually faster. Of course the only way to know for sure is to test your query and see. Make sure to run it multiple times to overcome differences in caching.
"Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> wrote: > What is faster? A JOIN or a WHERE a IN (SELECT ... ) On 8.4 and later an EXISTS or NOT EXISTS will use semi-join or anti-join (respectively). These should usually be much faster than the IN (SELECT DISTINCT ... ) technique. Be careful of the difference in semantics between NOT EXISTS and NOT IN if any of the columns involved in the matching can be null -- they are *not* the same thing. -Kevin