Thread: DELETE performance problem

DELETE performance problem

From

Luca Tettamanti

Date:

24 November 2009, 09:39:18

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
    SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's
not even unique, in fact there are duplicates - there are about 20M distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the full
tables) with the following schemas:

test2=# \d t1
         Table "public.t1"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 annotation_id | bigint | not null
Indexes:
    "t1_pkey" PRIMARY KEY, btree (annotation_id)

test2=# \d t2
         Table "public.t2"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 annotation_id | bigint |
Indexes:
    "t2_idx" btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using
procexp I see the process churning the disk and using more memory until it hits
some limit (at about 1.8GB) then the IO slows down considerably. See this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2=> explain analyze delete from t1 where annotation_id in (select annotation
_id from t2);
                                                               QUERY PLAN

--------------------------------------------------------------------------------
---------------------------------------------------------
 Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339
5.565..1832056.588 rows=26185953 loops=1)
   Hash Cond: (t1.annotation_id = t2.annotation_id)
   ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual tim
e=0.291..179119.487 rows=45874812 loops=1)
   ->  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433
93.742..643393.742 rows=26185953 loops=1)
         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
               ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812 width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
 Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms


An identital linux machine (with 8.4.1) shows the same issue; with strace I see
a lots of seeks:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.37    0.155484          15     10601           read
  9.10    0.015649        5216         3           fadvise64
  0.39    0.000668           0      5499           write
  0.15    0.000253           0     10733           lseek
  0.00    0.000000           0         3           open
  0.00    0.000000           0         3           close
  0.00    0.000000           0         3           semop
------ ----------- ----------- --------- --------- ----------------
100.00    0.172054                 26845           total

(30s sample)

Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the trace
is the following:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862           0    321597           read
  0.00    0.000000           0         3           lseek
  0.00    0.000000           0        76           mmap
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862                321676           total


The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate array.
One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers - 16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

Re: DELETE performance problem

From

Luca Tettamanti

Date:

24 November 2009, 11:14:59

On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin
<jchamplin@absolute-performance.com> wrote:
> You may want to consider using partitioning.  That way you can drop the
> appropriate partition and never have the overhead of a delete.

Hum, I don't think it's doable in my case; the partitioning is not
know a priori. First t1 is fully populated, then the data is loaded
and manipulated by my application, the result is stored in t2; only
then I want to remove (part of) the data from t1.

thanks,
Luca

Re: DELETE performance problem

From

Thom Brown

Date:

24 November 2009, 11:19:57

2009/11/24 Luca Tettamanti <kronos.it@gmail.com>

On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin
<jchamplin@absolute-performance.com> wrote:
> You may want to consider using partitioning. That way you can drop the
> appropriate partition and never have the overhead of a delete.

Hum, I don't think it's doable in my case; the partitioning is not
know a priori. First t1 is fully populated, then the data is loaded
and manipulated by my application, the result is stored in t2; only
then I want to remove (part of) the data from t1.

thanks,
Luca

It's a shame there isn't a LIMIT option on DELETE so this can be done in small batches.

Thom

Re: DELETE performance problem

From

"Jerry Champlin"

Date:

24 November 2009, 11:20:21

You may want to consider using partitioning.  That way you can drop the
appropriate partition and never have the overhead of a delete.

Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luca Tettamanti
Sent: Tuesday, November 24, 2009 6:37 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] DELETE performance problem

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
    SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2 contains about
60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's
not even unique, in fact there are duplicates - there are about 20M distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the full
tables) with the following schemas:

test2=# \d t1
         Table "public.t1"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 annotation_id | bigint | not null
Indexes:
    "t1_pkey" PRIMARY KEY, btree (annotation_id)

test2=# \d t2
         Table "public.t2"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 annotation_id | bigint |
Indexes:
    "t2_idx" btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using
procexp I see the process churning the disk and using more memory until it
hits
some limit (at about 1.8GB) then the IO slows down considerably. See this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2=> explain analyze delete from t1 where annotation_id in (select
annotation
_id from t2);
                                                               QUERY PLAN

----------------------------------------------------------------------------
----
---------------------------------------------------------
 Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual
time=64339
5.565..1832056.588 rows=26185953 loops=1)
   Hash Cond: (t1.annotation_id = t2.annotation_id)
   ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual
tim
e=0.291..179119.487 rows=45874812 loops=1)
   ->  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual
time=6433
93.742..643393.742 rows=26185953 loops=1)
         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605
width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
               ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812
width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
 Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms


An identital linux machine (with 8.4.1) shows the same issue; with strace I
see
a lots of seeks:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.37    0.155484          15     10601           read
  9.10    0.015649        5216         3           fadvise64
  0.39    0.000668           0      5499           write
  0.15    0.000253           0     10733           lseek
  0.00    0.000000           0         3           open
  0.00    0.000000           0         3           close
  0.00    0.000000           0         3           semop
------ ----------- ----------- --------- --------- ----------------
100.00    0.172054                 26845           total

(30s sample)

Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the
trace
is the following:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862           0    321597           read
  0.00    0.000000           0         3           lseek
  0.00    0.000000           0        76           mmap
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862                321676           total


The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data
directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate array.
One machine is running linux 64bit (Debian/stable), the other win2k8 (32
bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers - 16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: DELETE performance problem

From

Grzegorz Jaśkiewicz

Date:

24 November 2009, 11:36:54

On Tue, Nov 24, 2009 at 3:19 PM, Thom Brown <thombrown@gmail.com> wrote:

2009/11/24 Luca Tettamanti <kronos.it@gmail.com>

On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin
<jchamplin@absolute-performance.com> wrote:
> You may want to consider using partitioning. That way you can drop the
> appropriate partition and never have the overhead of a delete.

Hum, I don't think it's doable in my case; the partitioning is not
know a priori. First t1 is fully populated, then the data is loaded
and manipulated by my application, the result is stored in t2; only
then I want to remove (part of) the data from t1.

thanks,
Luca

It's a shame there isn't a LIMIT option on DELETE so this can be done in small batches.

you sort of can do it, using PK on table as pointer. DELETE FROM foo USING ... etc.
with subquery in using that will limit number of rows ;)

Thom

--
GJ

Re: DELETE performance problem

From

Alan Hodgson

Date:

24 November 2009, 12:08:10

On Tuesday 24 November 2009, Thom Brown <thombrown@gmail.com> wrote:
>
> It's a shame there isn't a LIMIT option on DELETE so this can be done in
> small batches.

delete from table where pk in (select pk from table where delete_condition
limit X);


--
"No animals were harmed in the recording of this episode. We tried but that
damn monkey was just too fast."

Re: DELETE performance problem

From

Kris Kewley

Date:

24 November 2009, 20:48:17

  Even though the column in question is not unique on t2 could you not
index it? That should improve the performance of the inline query.

Are dates applicable in any way? In some cases adding a date field,
partitioning or indexing on that and adding where date>x days. That
can be an effective way to limit records searched.

Kris

On 24-Nov-09, at 9:59, "Jerry Champlin" <jchamplin@absolute-performance.com
 > wrote:

> You may want to consider using partitioning.  That way you can drop
> the
> appropriate partition and never have the overhead of a delete.
>
> Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luca
> Tettamanti
> Sent: Tuesday, November 24, 2009 6:37 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] DELETE performance problem
>
> Hello,
> I've run in a severe performance problem with the following statement:
>
> DELETE FROM t1 WHERE t1.annotation_id IN (
>    SELECT t2.annotation_id FROM t2)
>
> t1 contains about 48M record (table size is 5.8GB), while t2
> contains about
> 60M
> record (total size 8.6GB). annotation_id is the PK in t1 but not in
> t2 (it's
> not even unique, in fact there are duplicates - there are about 20M
> distinct
> annotation_id in this table). There are no FKs on either tables.
> I've killed the query after 14h(!) of runtime...
>
> I've reproduced the problem using a only the ids (extracted from the
> full
> tables) with the following schemas:
>
> test2=# \d t1
>         Table "public.t1"
>    Column     |  Type  | Modifiers
> ---------------+--------+-----------
> annotation_id | bigint | not null
> Indexes:
>    "t1_pkey" PRIMARY KEY, btree (annotation_id)
>
> test2=# \d t2
>         Table "public.t2"
>    Column     |  Type  | Modifiers
> ---------------+--------+-----------
> annotation_id | bigint |
> Indexes:
>    "t2_idx" btree (annotation_id)
>
> The query above takes about 30 minutes to complete. The slowdown is
> not as
> severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8
> using
> procexp I see the process churning the disk and using more memory
> until it
> hits
> some limit (at about 1.8GB) then the IO slows down considerably. See
> this
> screenshot[1].
> This is exactly what happens with the full dataset.
>
> This is the output of the explain:
>
> test2=> explain analyze delete from t1 where annotation_id in (select
> annotation
> _id from t2);
>                                                               QUERY
> PLAN
>
> ---
> ---
> ----------------------------------------------------------------------
> ----
> ---------------------------------------------------------
> Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual
> time=64339
> 5.565..1832056.588 rows=26185953 loops=1)
>   Hash Cond: (t1.annotation_id = t2.annotation_id)
>   ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14)
> (actual
> tim
> e=0.291..179119.487 rows=45874812 loops=1)
>   ->  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual
> time=6433
> 93.742..643393.742 rows=26185953 loops=1)
>         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605
> width=8) (a
> ctual time=571807.575..610178.552 rows=26185953 loops=1)
>               ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812
> width=8)
> (actual time=2460.595..480446.581 rows=60956812 loops=1)
> Total runtime: 2271122.474 ms
> (7 rows)
>
> Time: 2274723,284 ms
>
>
> An identital linux machine (with 8.4.1) shows the same issue; with
> strace I
> see
> a lots of seeks:
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 90.37    0.155484          15     10601           read
>  9.10    0.015649        5216         3           fadvise64
>  0.39    0.000668           0      5499           write
>  0.15    0.000253           0     10733           lseek
>  0.00    0.000000           0         3           open
>  0.00    0.000000           0         3           close
>  0.00    0.000000           0         3           semop
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.172054                 26845           total
>
> (30s sample)
>
> Before hitting the memory "limit" (AS on win2k8, unsure about Linux)
> the
> trace
> is the following:
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.063862           0    321597           read
>  0.00    0.000000           0         3           lseek
>  0.00    0.000000           0        76           mmap
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.063862                321676           total
>
>
> The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data
> directory
> is on hardware (Dell PERC5) raid mirror, with the log on a separate
> array.
> One machine is running linux 64bit (Debian/stable), the other win2k8
> (32
> bit).
>
> shared_buffers = 512MB
> work_mem = 512MB
> maintenance_work_mem = 1GB
> checkpoint_segments = 16
> wal_buffers = 8MB
> fsync = off # Just in case... usually it's enabled
> effective_cache_size = 4096MB
>
> (the machine with win2k8 is running with a smaller shared_buffers -
> 16MB)
>
> Any idea on what's going wrong here?
>
> thanks,
> Luca
> [1] http://img10.imageshack.us/i/psql2.png/
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: DELETE performance problem

From

marcin mank

Date:

25 November 2009, 11:23:19

On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti <kronos.it@gmail.com> wrote:
>         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) (a
> ctual time=571807.575..610178.552 rows=26185953 loops=1)

This is Your problem. The system`s estimate for the number of distinct
annotation_ids in t2 is wildly off.

The disk activity is almost certainly swapping (You can check it
iostat on the linux machine).

Can You try "analyze t2" just before the delete quety? maybe try
raising statistics target for the annotation_id column.

If all else fails, You may try "set enable_hashagg to false" just
before the query.

Greetings
Marcin Mańk

Greetings
Marcin Mańk

Re: DELETE performance problem

From

Luca Tettamanti

Date:

25 November 2009, 12:13:25

On Wed, Nov 25, 2009 at 04:22:47PM +0100, marcin mank wrote:
> On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti <kronos.it@gmail.com> wrote:
> >         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) (a
> > ctual time=571807.575..610178.552 rows=26185953 loops=1)
>
>
> This is Your problem. The system`s estimate for the number of distinct
> annotation_ids in t2 is wildly off.

Ah, I see.

> The disk activity is almost certainly swapping (You can check it
> iostat on the linux machine).

Nope, zero swap activity. Under Linux postgres tops up at about 4.4GB, leaving
3.6GB of page cache (nothing else is running right now).

> Can You try "analyze t2" just before the delete quety? maybe try
> raising statistics target for the annotation_id column.

I already tried, the estimation is still way off.

> If all else fails, You may try "set enable_hashagg to false" just
> before the query.

 Hash IN Join  (cost=1879362.27..11080576.17 rows=202376 width=6) (actual time=250281.607..608638.141 rows=26185953
loops=1)
    Hash Cond: (t1.annotation_id = t2.annotation_id)
       ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual time=0.017..193661.353 rows=45874812
loops=1)
          ->  Hash  (cost=879289.12..879289.12 rows=60956812 width=8) (actual time=250271.012..250271.012 rows=60956812
loops=1)
               ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812 width=8) (actual time=0.023..178297.862
rows=60956812loops=1) 
 Total runtime: 900019.033 ms
(6 rows)

This is after an analyze.

The alternative query suggested by Shrirang Chitnis:

DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id = t2.annotation_id)

performs event better:

 Seq Scan on t1  (cost=0.00..170388415.89 rows=22937406 width=6) (actual time=272.625..561241.294 rows=26185953
loops=1)
    Filter: (subplan)
       SubPlan
            ->  Index Scan using t2_idx on t2  (cost=0.00..1113.63 rows=301 width=0) (actual time=0.008..0.008 rows=1
loops=45874812)
                   Index Cond: ($0 = annotation_id)
 Total runtime: 629426.014 ms
(6 rows)

Will try on the full data set.

thanks,
Luca

Re: DELETE performance problem

From

Grzegorz Jaśkiewicz

Date:

25 November 2009, 12:16:11

On Wed, Nov 25, 2009 at 4:13 PM, Luca Tettamanti <kronos.it@gmail.com> wrote:

DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id = t2.annotation_id)

performs event better:

Seq Scan on t1 (cost=0.00..170388415.89 rows=22937406 width=6) (actual time=272.625..561241.294 rows=26185953 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using t2_idx on t2 (cost=0.00..1113.63 rows=301 width=0) (actual time=0.008..0.008 rows=1 loops=45874812)
Index Cond: ($0 = annotation_id)
Total runtime: 629426.014 ms
(6 rows)

Have you tried:
DELETE FROM t1 USING t2 WHERE t1.annotation_id = t2.annotation_id;

?

--
GJ