Thread: 50K record DELETE Begins, 100% CPU, Never Completes 1 hour later

50K record DELETE Begins, 100% CPU, Never Completes 1 hour later

From
"Clay Luther"
Date:
Again, we have an odd performance problem with PGSQL, 7.4b2.

Here is the query:

delete from numplan where pkid in
    (select numplan.pkid from numplan
        left outer join pilothuntgroup on numplan.pkid=pilothuntgroup.fknumplan
        left outer join devicenumplanmap on numplan.pkid = devicenumplanmap.fknumplan
            where numplan.tkpatternusage=2
            and pilothuntgroup.fknumplan is null
            and devicenumplanmap.fknumplan is null);

The query starts, PGSQL shoots to 134MB(!) of memory and 100% CPU and never completes.

The query works fine on smaller datasets.  This occurs when 50K+ records exist in the numplan table.

Here is the query plan:

ccm=# explain delete from numplan where pkid in (select numplan.pkid from numplan left outer join pilothuntgroup on
numplan.pkid=pilothuntgroup.fknumplanleft outer join devicenumplanmap on numplan.pkid = devicenumplanmap.fknumplan
wherenumplan.tkpatternusage=2 and pilothuntgroup.fknumplan is null and devicenumplanmap.fknumplan is null); 
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Merge IN Join  (cost=37947.25..40851.71 rows=82225 width=6)
   Merge Cond: ("outer"."?column3?" = ("inner".pkid)::text)
   ->  Sort  (cost=11481.65..11687.35 rows=82279 width=46)
         Sort Key: (public.numplan.pkid)::text
         ->  Seq Scan on numplan  (cost=0.00..2936.79 rows=82279 width=46)
   ->  Materialize  (cost=26465.60..27930.85 rows=82225 width=40)
         ->  Merge Left Join  (cost=23917.22..25822.60 rows=82225 width=40)
               Merge Cond: (("outer".pkid)::text = "inner"."?column2?")
               Filter: ("inner".fknumplan IS NULL)
               ->  Merge Left Join  (cost=11407.97..11819.13 rows=82225 width=40)
                     Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
                     Filter: ("inner".fknumplan IS NULL)
                     ->  Sort  (cost=11406.89..11612.45 rows=82225 width=40)
                           Sort Key: (public.numplan.pkid)::text
                           ->  Seq Scan on numplan  (cost=0.00..3142.49 rows=82225 width=40)
                                 Filter: (tkpatternusage = 2)
                     ->  Sort  (cost=1.08..1.09 rows=4 width=42)
                           Sort Key: (pilothuntgroup.fknumplan)::text
                           ->  Seq Scan on pilothuntgroup  (cost=0.00..1.04 rows=4 width=42)
               ->  Sort  (cost=12509.25..12734.70 rows=90180 width=40)
                     Sort Key: (devicenumplanmap.fknumplan)::text
                     ->  Seq Scan on devicenumplanmap  (cost=0.00..3326.80 rows=90180 width=40)
(22 rows)

---
Clay
Cisco Systems, Inc.
claycle@cisco.com
(972) 813-5004


I've stopped 19,658 spam messages. You can too!
One month FREE spam protection at http://www.cloudmark.com/spamnetsig/}

Re: 50K record DELETE Begins, 100% CPU, Never Completes 1 hour later

From
Tom Lane
Date:
"Clay Luther" <claycle@cisco.com> writes:
> ccm=# explain delete from numplan where pkid in (select numplan.pkid from numplan left outer join pilothuntgroup on
numplan.pkid=pilothuntgroup.fknumplanleft outer join devicenumplanmap on numplan.pkid = devicenumplanmap.fknumplan
wherenumplan.tkpatternusage=2 and pilothuntgroup.fknumplan is null and devicenumplanmap.fknumplan is null); 

The left join/is null thingies look like a workaround for our pre-7.4
lack of performance with NOT IN queries.  Have you tried expressing
this more straightforwardly with NOT IN?

Also, what sort_mem setting are you using?

            regards, tom lane