Thread: Trying to figure out why these queries are so slow

Trying to figure out why these queries are so slow

From
Tim Uckun
Date:
I have two tables. Table C has about 300K records in it. Table E has
about a million records in it.  Today I tried to run this query.

update C
             set result_count = X.result_count
            from C
            inner join (select  c_id, count(c_id) as result_count
                          from E
                          where c_id is not null
                          group by c_id) as  X
            on C.id = X.c_id

All the fields mentioned are indexed. In the case of Table C it's the
primary key. In the case table E it's just an index (non unique).

I let this query run for about three hours before I cancelled it.

Next I did a CREATE TABLE of the sub query thinking the subquery was
the problem I extracted the results into a separate table and in that
table made the c_id field the primary key. That table only had about
80 thousand records. The creation of the table took just a couple of
minutes. Perfectly reasonable.

I tried to run the same thing again but this time joining the newly
created table instead of the subquery. I killed that query after an
hour.

Then I tried to be clever about it so I added a

Where id in (select id from C  where result_count =0 limit 10) into
the where clause. I figure this would run fast because it's only
updating ten records. The result_count field is indexed so that part
should run very quick and the where ID IN should be quick because it's
looking up ten items in the PKEY right?  Well that didn't go very well
either. I killed that query after about 15 minutes.

In the end (once again) I wrote a ruby script to pull up the records
from table E in small chunks and pull the corresponding records
records from C in small chunks and update them myself. In other words
I wrote a ruby script to do the join and the update in chunks.

I am simply baffled.  What am I doing wrong.   Is the database not
tuned? Do I not have enough RAM? Is there some parameter I need to
tweak someplace?

There was nothing else going on in the database when this query was
running. I shut off all applications touching the database.

The database machine is a linode host. It's got two gigs of RAM  I
realize that this is not the ideal host for a database but it's what I
have to work with.

kernel.shmmax=536870912 , kernel.shmall=2097152,  shared_buffers =
128MB effective_cache_size = 256MB log_min_duration_statement = 1000
max_connections = 100

This was a case of doing an update on about 80,000 records using an
inner join on two tables on using the primary keys of both records.
Why would this query take more than a few seconds to run?

Re: Trying to figure out why these queries are so slow

From
Tim Uckun
Date:
On Tue, Oct 5, 2010 at 10:33 PM, Thomas Guettler <hv@tbz-pariv.de> wrote:
> Hi,
>
> just a guess: Counting is slow, since it needs to check all rows. Explained here:
>

Actually counting was pretty fast. As I said when I created a table
using the subquery that went pretty fast. The update was the slow
part.

Re: Trying to figure out why these queries are so slow

From
Thomas Guettler
Date:
Hi,

just a guess: Counting is slow, since it needs to check all rows. Explained here:

http://wiki.postgresql.org/wiki/Slow_Counting

  Thomas Güttler

Tim Uckun wrote:
> I have two tables. Table C has about 300K records in it. Table E has
> about a million records in it.  Today I tried to run this query.
>
> update C
>              set result_count = X.result_count
>             from C
>             inner join (select  c_id, count(c_id) as result_count
>                           from E
>                           where c_id is not null
>                           group by c_id) as  X
>             on C.id = X.c_id
>
> All the fields mentioned are indexed. In the case of Table C it's the
> primary key. In the case table E it's just an index (non unique).
>
> I let this query run for about three hours before I cancelled it.
> ...

--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

Re: Trying to figure out why these queries are so slow

From
Thom Brown
Date:
On 5 October 2010 10:20, Tim Uckun <timuckun@gmail.com> wrote:
> The database machine is a linode host.

Good choice ;)

Obvious question, but I'll ask it: Have you run ANALYZE or VACUUM
ANALZYE on the database?  Also being able to see the query plan would
help.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Trying to figure out why these queries are so slow

From
Tim Uckun
Date:
On Tue, Oct 5, 2010 at 11:38 PM, Thom Brown <thom@linux.com> wrote:
> On 5 October 2010 10:20, Tim Uckun <timuckun@gmail.com> wrote:
>> The database machine is a linode host.
>
> Good choice ;)
>
> Obvious question, but I'll ask it: Have you run ANALYZE or VACUUM
> ANALZYE on the database?  Also being able to see the query plan would
> help.
>


I did run vacuum analzye on the table I created. The other table I
presume is auto vacuumed since that is on.

I can get you the plan but I don't think it would help because the
result table is now blank.

Re: Trying to figure out why these queries are so slow

From
Thom Brown
Date:
On 5 October 2010 12:06, Tim Uckun <timuckun@gmail.com> wrote:
> On Tue, Oct 5, 2010 at 11:38 PM, Thom Brown <thom@linux.com> wrote:
>> On 5 October 2010 10:20, Tim Uckun <timuckun@gmail.com> wrote:
>>> The database machine is a linode host.
>>
>> Good choice ;)
>>
>> Obvious question, but I'll ask it: Have you run ANALYZE or VACUUM
>> ANALZYE on the database?  Also being able to see the query plan would
>> help.
>>
>
> I did run vacuum analzye on the table I created. The other table I
> presume is auto vacuumed since that is on.

I'd still recommend manually running ANALYZE on any involved tables,
just in case.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Trying to figure out why these queries are so slow

From
Tim Uckun
Date:
>
> I'd still recommend manually running ANALYZE on any involved tables,
> just in case.
>


Just out of curiosity....

Say I did run it. Would it make the query run a few orders of magnitude faster?

If you read my original post you'd see that I let the query run for
over an  hour before I killed it.  It seems to me that a query joining
two tables on primary keys, updating one field on one table with
another field on another table on less than 100K records ought to take
less than three minutes.

Do the database statistics go that far out of whack especially if
autovacuum is on?

Re: Trying to figure out why these queries are so slow

From
John R Pierce
Date:
  On 10/05/10 2:20 AM, Tim Uckun wrote:
> The database machine is a linode host. It's got two gigs of RAM  I
> realize that this is not the ideal host for a database but it's what I
> have to work with.

whats the storage like on linode?  Is that some sort of virtualized
cloudy thing?   wild guess says committed random writes are really slow.



Re: Trying to figure out why these queries are so slow

From
Tim Uckun
Date:
> whats the storage like on linode?  Is that some sort of virtualized cloudy
> thing?   wild guess says committed random writes are really slow.


Yes it's some sort of a virtual cloudy thing.  I can't tell you more
than that though. They sell virtual machines and this project runs on
them.

Does anybody else have any suggestions for how I can make this
database behave a little better? Is there some tweak I am missing?