Thread: Massive delete performance

Massive delete performance

From
"Andy"
Date:
Hi to all,
 
I have the following problem: I have a client to which we send every night a "dump" with a the database in which there are only their data's. It is a stupid solution but I choose this solution because I couldn't find any better. The target machine is a windows 2003.  
 
So, I have a replication only with the tables that I need to send, then I make a copy of this replication, and from this copy I delete all the data's that are not needed.
 
How can I increase this DELETE procedure because it is really slow???  There are of corse a lot of data's to be deleted.
 
 
 
Or is there any other solution for this?
DB -> (replication) RE_DB -> (copy) -> COPY_DB -> (Delete unnecesary data) -> CLIENT_DB -> (ISDN connection) -> Data's to the client.
 
Regards,
Andy.
 
 
 

Re: Massive delete performance

From
Sean Davis
Date:
On 10/11/05 3:47 AM, "Andy" <frum@ar-sd.net> wrote:

> Hi to all,
>
> I have the following problem: I have a client to which we send every night a
> "dump" with a the database in which there are only their data's. It is a
> stupid solution but I choose this solution because I couldn't find any better.
> The target machine is a windows 2003.
>
> So, I have a replication only with the tables that I need to send, then I make
> a copy of this replication, and from this copy I delete all the data's that
> are not needed.
>
> How can I increase this DELETE procedure because it is really slow???  There
> are of corse a lot of data's to be deleted.

Do you have foreign key relationships that must be followed for cascade
delete?  If so, make sure that you have indices on them.  Are you running
any type of vacuum after the whole process?  What kind?

Sean


Re: Massive delete performance

From
"Andy"
Date:
> Do you have foreign key relationships that must be followed for cascade
> delete?  If so, make sure that you have indices on them.
Yes I have such things. Indexes are on these fields. >> To be onest this
delete is taking the longest time, but it involves about 10 tables.

> Are you running
> any type of vacuum after the whole process?  What kind?
Full vacuum. (cmd: vacuumdb -f)

Is there any configuration parameter for delete speed up?


----- Original Message -----
From: "Sean Davis" <sdavis2@mail.nih.gov>
To: "Andy" <frum@ar-sd.net>; <pgsql-performance@postgresql.org>
Sent: Tuesday, October 11, 2005 2:54 PM
Subject: Re: [PERFORM] Massive delete performance


> On 10/11/05 3:47 AM, "Andy" <frum@ar-sd.net> wrote:
>
>> Hi to all,
>>
>> I have the following problem: I have a client to which we send every
>> night a
>> "dump" with a the database in which there are only their data's. It is a
>> stupid solution but I choose this solution because I couldn't find any
>> better.
>> The target machine is a windows 2003.
>>
>> So, I have a replication only with the tables that I need to send, then I
>> make
>> a copy of this replication, and from this copy I delete all the data's
>> that
>> are not needed.
>>
>> How can I increase this DELETE procedure because it is really slow???
>> There
>> are of corse a lot of data's to be deleted.
>
> Do you have foreign key relationships that must be followed for cascade
> delete?  If so, make sure that you have indices on them.  Are you running
> any type of vacuum after the whole process?  What kind?
>
> Sean
>
>
>


Re: Massive delete performance

From
Sean Davis
Date:
On 10/11/05 8:05 AM, "Andy" <frum@ar-sd.net> wrote:

>> Do you have foreign key relationships that must be followed for cascade
>> delete?  If so, make sure that you have indices on them.
> Yes I have such things. Indexes are on these fields. >> To be onest this
> delete is taking the longest time, but it involves about 10 tables.

Can you post explain analyze output of the next delete?

Sean


Re: Massive delete performance

From
"Steinar H. Gunderson"
Date:
On Tue, Oct 11, 2005 at 10:47:03AM +0300, Andy wrote:
> So, I have a replication only with the tables that I need to send, then I
> make a copy of this replication, and from this copy I delete all the data's
> that are not needed.
>
> How can I increase this DELETE procedure because it is really slow???
> There are of corse a lot of data's to be deleted.

Instead of copying and then deleting, could you try just selecting out what
you wanted in the first place?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Massive delete performance

From
"Andy"
Date:
We run the DB on a linux system. The client has a windows system. The
application is almost the same (so the database structure is 80% the same).
The difference is that the client does not need all the tables.

So, in the remaining tables there are a lot of extra data's that don't
belong to this client. We have to send every night a updated "info" to the
client database. Our (have to admin) "fast and not the best" solution was so
replicated the needed tables, and delete from these the info that is not
needed.

So, I send to this client a "dump" from the database.

I also find the ideea "not the best", but couldn't find in two days another
fast solution. And it works this way for 4 months.

Out database is not THAT big (500MB), the replication about (300MB)...
everything works fast enough except this delete....


How can I evidence the cascade deletes also on explain analyze?

The answer for Sean Davis <sdavis2@mail.nih.gov>:

EXPLAIN ANALYZE
DELETE FROM report WHERE id_order IN
(SELECT o.id FROM orders o WHERE o.id_ag NOT IN (SELECT cp.id_ag FROM users
u INNER JOIN
contactpartner cp ON cp.id_user=u.id WHERE u.name in ('dc') ORDER BY
cp.id_ag))

Hash IN Join  (cost=3532.83..8182.33 rows=32042 width=6) (actual
time=923.456..2457.323 rows=59557 loops=1)
  Hash Cond: ("outer".id_order = "inner".id)
  ->  Seq Scan on report  (cost=0.00..2613.83 rows=64083 width=10) (actual
time=33.269..1159.024 rows=64083 loops=1)
  ->  Hash  (cost=3323.31..3323.31 rows=32608 width=4) (actual
time=890.021..890.021 rows=0 loops=1)
        ->  Seq Scan on orders o  (cost=21.12..3323.31 rows=32608 width=4)
(actual time=58.428..825.306 rows=60596 loops=1)
              Filter: (NOT (hashed subplan))
              SubPlan
                ->  Sort  (cost=21.11..21.12 rows=3 width=4) (actual
time=47.612..47.612 rows=1 loops=1)
                      Sort Key: cp.id_ag
                      ->  Nested Loop  (cost=0.00..21.08 rows=3 width=4)
(actual time=47.506..47.516 rows=1 loops=1)
                            ->  Index Scan using users_name_idx on users u
(cost=0.00..5.65 rows=1 width=4) (actual time=20.145..20.148 rows=1 loops=1)
                                  Index Cond: ((name)::text = 'dc'::text)
                            ->  Index Scan using contactpartner_id_user_idx
on contactpartner cp  (cost=0.00..15.38 rows=4 width=8) (actual
time=27.348..27.352 rows=1 loops=1)
                                  Index Cond: (cp.id_user = "outer".id)
Total runtime: 456718.658 ms




----- Original Message -----
From: "Steinar H. Gunderson" <sgunderson@bigfoot.com>
To: <pgsql-performance@postgresql.org>
Sent: Tuesday, October 11, 2005 3:19 PM
Subject: Re: [PERFORM] Massive delete performance


> On Tue, Oct 11, 2005 at 10:47:03AM +0300, Andy wrote:
>> So, I have a replication only with the tables that I need to send, then I
>> make a copy of this replication, and from this copy I delete all the
>> data's
>> that are not needed.
>>
>> How can I increase this DELETE procedure because it is really slow???
>> There are of corse a lot of data's to be deleted.
>
> Instead of copying and then deleting, could you try just selecting out
> what
> you wanted in the first place?
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>


Re: Massive delete performance

From
Tom Lane
Date:
"Andy" <frum@ar-sd.net> writes:
> EXPLAIN ANALYZE
> DELETE FROM report WHERE id_order IN
> ...

> Hash IN Join  (cost=3532.83..8182.33 rows=32042 width=6) (actual
> time=923.456..2457.323 rows=59557 loops=1)
> ...
> Total runtime: 456718.658 ms

So the runtime is all in the delete triggers.  The usual conclusion from
this is that there is a foreign key column pointing at this table that
does not have an index, or is not the same datatype as the column it
references.  Either condition will force a fairly inefficient way of
handling the FK deletion check.

            regards, tom lane

Re: Massive delete performance

From
"Andy"
Date:
Ups folks,

Indeed there were 2 important indexes missing. Now it runs about 10 times
faster. Sorry for the caused trouble :) and thanx for help.


Hash IN Join  (cost=3307.49..7689.47 rows=30250 width=6) (actual
time=227.666..813.786 rows=56374 loops=1)
  Hash Cond: ("outer".id_order = "inner".id)
  ->  Seq Scan on report  (cost=0.00..2458.99 rows=60499 width=10) (actual
time=0.035..269.422 rows=60499 loops=1)
  ->  Hash  (cost=3109.24..3109.24 rows=30901 width=4) (actual
time=227.459..227.459 rows=0 loops=1)
        ->  Seq Scan on orders o  (cost=9.73..3109.24 rows=30901 width=4)
(actual time=0.429..154.219 rows=57543 loops=1)
              Filter: (NOT (hashed subplan))
              SubPlan
                ->  Sort  (cost=9.71..9.72 rows=3 width=4) (actual
time=0.329..0.330 rows=1 loops=1)
                      Sort Key: cp.id_ag
                      ->  Nested Loop  (cost=0.00..9.69 rows=3 width=4)
(actual time=0.218..0.224 rows=1 loops=1)
                            ->  Index Scan using users_name_idx on users u
(cost=0.00..5.61 rows=1 width=4) (actual time=0.082..0.084 rows=1 loops=1)
                                  Index Cond: ((name)::text = 'dc'::text)
                            ->  Index Scan using contactpartner_id_user_idx
on contactpartner cp  (cost=0.00..4.03 rows=3 width=8) (actual
time=0.125..0.127 rows=1 loops=1)
                                  Index Cond: (cp.id_user = "outer".id)
Total runtime: 31952.811 ms



----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Andy" <frum@ar-sd.net>
Cc: "Steinar H. Gunderson" <sgunderson@bigfoot.com>;
<pgsql-performance@postgresql.org>
Sent: Tuesday, October 11, 2005 5:17 PM
Subject: Re: [PERFORM] Massive delete performance


> "Andy" <frum@ar-sd.net> writes:
>> EXPLAIN ANALYZE
>> DELETE FROM report WHERE id_order IN
>> ...
>
>> Hash IN Join  (cost=3532.83..8182.33 rows=32042 width=6) (actual
>> time=923.456..2457.323 rows=59557 loops=1)
>> ...
>> Total runtime: 456718.658 ms
>
> So the runtime is all in the delete triggers.  The usual conclusion from
> this is that there is a foreign key column pointing at this table that
> does not have an index, or is not the same datatype as the column it
> references.  Either condition will force a fairly inefficient way of
> handling the FK deletion check.
>
> regards, tom lane
>
>


Re: Massive delete performance

From
Enrico Weigelt
Date:
* Andy <frum@ar-sd.net> wrote:

<snip>
>    I have the following problem: I have a client to which we send every
>    night a "dump" with a the database in which there are only their
>    data's. It is a stupid solution but I choose this solution because I
>    couldn't find any better. The target machine is a windows 2003.
>
>    So, I have a replication only with the tables that I need to send,
>    then I make a copy of this replication, and from this copy I delete
>    all the data's that are not needed.

Why not filtering out as much unnecessary stuff as possible on copying ?

<snip>

>    How can I increase this DELETE procedure because it is really slow???
>     There are of corse a lot of data's to be deleted.

Have you set up the right indices ?


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service
  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     contact@metux.de
---------------------------------------------------------------------
  Realtime Forex/Stock Exchange trading powered by postgreSQL :))
                                            http://www.fxignal.net/
---------------------------------------------------------------------