Home > mailing lists

Trying to figure out why these queries are so slow - Mailing list pgsql-general

From	Tim Uckun
Subject	Trying to figure out why these queries are so slow
Date	October 5, 2010 09:20:50
Msg-id	AANLkTim-BRRttUjWBDQ1_3gXztp_8RhYkhK7HBbmL6wp@mail.gmail.com Whole thread Raw
Responses	Re: Trying to figure out why these queries are so slow Re: Trying to figure out why these queries are so slow Re: Trying to figure out why these queries are so slow
List	pgsql-general

Tree view

I have two tables. Table C has about 300K records in it. Table E has
about a million records in it.  Today I tried to run this query.

update C
             set result_count = X.result_count
            from C
            inner join (select  c_id, count(c_id) as result_count
                          from E
                          where c_id is not null
                          group by c_id) as  X
            on C.id = X.c_id

All the fields mentioned are indexed. In the case of Table C it's the
primary key. In the case table E it's just an index (non unique).

I let this query run for about three hours before I cancelled it.

Next I did a CREATE TABLE of the sub query thinking the subquery was
the problem I extracted the results into a separate table and in that
table made the c_id field the primary key. That table only had about
80 thousand records. The creation of the table took just a couple of
minutes. Perfectly reasonable.

I tried to run the same thing again but this time joining the newly
created table instead of the subquery. I killed that query after an
hour.

Then I tried to be clever about it so I added a

Where id in (select id from C  where result_count =0 limit 10) into
the where clause. I figure this would run fast because it's only
updating ten records. The result_count field is indexed so that part
should run very quick and the where ID IN should be quick because it's
looking up ten items in the PKEY right?  Well that didn't go very well
either. I killed that query after about 15 minutes.

In the end (once again) I wrote a ruby script to pull up the records
from table E in small chunks and pull the corresponding records
records from C in small chunks and update them myself. In other words
I wrote a ruby script to do the join and the update in chunks.

I am simply baffled.  What am I doing wrong.   Is the database not
tuned? Do I not have enough RAM? Is there some parameter I need to
tweak someplace?

There was nothing else going on in the database when this query was
running. I shut off all applications touching the database.

The database machine is a linode host. It's got two gigs of RAM  I
realize that this is not the ideal host for a database but it's what I
have to work with.

kernel.shmmax=536870912 , kernel.shmall=2097152,  shared_buffers =
128MB effective_cache_size = 256MB log_min_duration_statement = 1000
max_connections = 100

This was a case of doing an update on about 80,000 records using an
inner join on two tables on using the primary keys of both records.
Why would this query take more than a few seconds to run?

pgsql-general by date:

From: Simon Riggs
Date: 05 October 2010, 09:08:12
Subject: Re: streaming replication question

From: Tim Uckun
Date: 05 October 2010, 09:39:58
Subject: Re: Trying to figure out why these queries are so slow

Trying to figure out why these queries are so slow - Mailing list pgsql-general

Previous

Next