Thread: long-running query - needs tuning

long-running query - needs tuning

From
Kevin Kempter
Date:
Hi List;

I've just started working with a new client and they have amoung other issues
with their databases a particular update that basically locks out users.

The below query was running for over 6 hours this morning and the CPU load had
climbed to a point where new connections simply hung waiting to connect. We
had to kill the query to allow business users to connect to the applications
that connect to the database, thus I could not post an explain analyze.

In any case the query looks like this:

update dat_customer_mailbox_counts
set total_contacts = contacts.ct,
total_contact_users = contacts.dct
from
( select customer_id, count(*) as ct,
count( distinct con.user_id ) as dct
from dat_user_contacts con
group by customer_id        )
contacts where contacts.customer_id = dat_customer_mailbox_counts.customer_id

Here's the latest counts from the system catalogs:

dat_customer_mailbox_counts:       423
dat_user_contacts                           59,469,476



And here's an explain plan:

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Merge Join  (cost=17118858.51..17727442.30 rows=155 width=90)
   Merge Cond: ("outer".customer_id = "inner".customer_id)
   ->  GroupAggregate  (cost=17118772.93..17727347.34 rows=155 width=8)
         ->  Sort  (cost=17118772.93..17270915.95 rows=60857208 width=8)
               Sort Key: con.customer_id
               ->  Seq Scan on dat_user_contacts con  (cost=0.00..7332483.08
rows=60857208 width=8)
   ->  Sort  (cost=85.57..88.14 rows=1026 width=74)
         Sort Key: dat_customer_mailbox_counts.customer_id
         ->  Seq Scan on dat_customer_mailbox_counts  (cost=0.00..34.26
rows=1026 width=74)
(9 rows)


Any thoughts, comments, Ideas for debugging, etc would be way helpful...

Thanks in advance.

/Kevin





Re: long-running query - needs tuning

From
Tom Lane
Date:
Kevin Kempter <kevin@kevinkempterllc.com> writes:
>  Merge Join  (cost=17118858.51..17727442.30 rows=155 width=90)
>    Merge Cond: ("outer".customer_id = "inner".customer_id)
>    ->  GroupAggregate  (cost=17118772.93..17727347.34 rows=155 width=8)
>          ->  Sort  (cost=17118772.93..17270915.95 rows=60857208 width=8)
>                Sort Key: con.customer_id
>                ->  Seq Scan on dat_user_contacts con  (cost=0.00..7332483.08
> rows=60857208 width=8)
>    ->  Sort  (cost=85.57..88.14 rows=1026 width=74)
>          Sort Key: dat_customer_mailbox_counts.customer_id
>          ->  Seq Scan on dat_customer_mailbox_counts  (cost=0.00..34.26
> rows=1026 width=74)

The planner, at least, thinks that all the time will go into the sort
step.  Sorting 60M rows is gonna take awhile :-(.  What PG version is
this?  (8.2 has noticeably faster sort code than prior releases...)
What have you got work_mem set to?

Bad as the sort is, I suspect that the real problem is the
count(distinct) operator, which is going to require *another*
sort-and-uniq step for each customer_id group --- and judging by
the rowcount estimates, at least some of those groups must be
pretty large.  (AFAIR this time is not counted in the planner
estimates.)  Again, work_mem would have an effect on how fast
that goes.

            regards, tom lane