Thread: Massive delete performance
Hi to all,
I have the following problem: I have a client to which we send every night a "dump" with a the database in which there are only their data's. It is a stupid solution but I choose this solution because I couldn't find any better. The target machine is a windows 2003.
So, I have a replication only with the tables that I need to send, then I make a copy of this replication, and from this copy I delete all the data's that are not needed.
How can I increase this DELETE procedure because it is really slow??? There are of corse a lot of data's to be deleted.
Or is there any other solution for this?
DB -> (replication) RE_DB -> (copy) -> COPY_DB -> (Delete unnecesary data) -> CLIENT_DB -> (ISDN connection) -> Data's to the client.
Regards,
Andy.
On 10/11/05 3:47 AM, "Andy" <frum@ar-sd.net> wrote: > Hi to all, > > I have the following problem: I have a client to which we send every night a > "dump" with a the database in which there are only their data's. It is a > stupid solution but I choose this solution because I couldn't find any better. > The target machine is a windows 2003. > > So, I have a replication only with the tables that I need to send, then I make > a copy of this replication, and from this copy I delete all the data's that > are not needed. > > How can I increase this DELETE procedure because it is really slow??? There > are of corse a lot of data's to be deleted. Do you have foreign key relationships that must be followed for cascade delete? If so, make sure that you have indices on them. Are you running any type of vacuum after the whole process? What kind? Sean
> Do you have foreign key relationships that must be followed for cascade > delete? If so, make sure that you have indices on them. Yes I have such things. Indexes are on these fields. >> To be onest this delete is taking the longest time, but it involves about 10 tables. > Are you running > any type of vacuum after the whole process? What kind? Full vacuum. (cmd: vacuumdb -f) Is there any configuration parameter for delete speed up? ----- Original Message ----- From: "Sean Davis" <sdavis2@mail.nih.gov> To: "Andy" <frum@ar-sd.net>; <pgsql-performance@postgresql.org> Sent: Tuesday, October 11, 2005 2:54 PM Subject: Re: [PERFORM] Massive delete performance > On 10/11/05 3:47 AM, "Andy" <frum@ar-sd.net> wrote: > >> Hi to all, >> >> I have the following problem: I have a client to which we send every >> night a >> "dump" with a the database in which there are only their data's. It is a >> stupid solution but I choose this solution because I couldn't find any >> better. >> The target machine is a windows 2003. >> >> So, I have a replication only with the tables that I need to send, then I >> make >> a copy of this replication, and from this copy I delete all the data's >> that >> are not needed. >> >> How can I increase this DELETE procedure because it is really slow??? >> There >> are of corse a lot of data's to be deleted. > > Do you have foreign key relationships that must be followed for cascade > delete? If so, make sure that you have indices on them. Are you running > any type of vacuum after the whole process? What kind? > > Sean > > >
On 10/11/05 8:05 AM, "Andy" <frum@ar-sd.net> wrote: >> Do you have foreign key relationships that must be followed for cascade >> delete? If so, make sure that you have indices on them. > Yes I have such things. Indexes are on these fields. >> To be onest this > delete is taking the longest time, but it involves about 10 tables. Can you post explain analyze output of the next delete? Sean
On Tue, Oct 11, 2005 at 10:47:03AM +0300, Andy wrote: > So, I have a replication only with the tables that I need to send, then I > make a copy of this replication, and from this copy I delete all the data's > that are not needed. > > How can I increase this DELETE procedure because it is really slow??? > There are of corse a lot of data's to be deleted. Instead of copying and then deleting, could you try just selecting out what you wanted in the first place? /* Steinar */ -- Homepage: http://www.sesse.net/
We run the DB on a linux system. The client has a windows system. The application is almost the same (so the database structure is 80% the same). The difference is that the client does not need all the tables. So, in the remaining tables there are a lot of extra data's that don't belong to this client. We have to send every night a updated "info" to the client database. Our (have to admin) "fast and not the best" solution was so replicated the needed tables, and delete from these the info that is not needed. So, I send to this client a "dump" from the database. I also find the ideea "not the best", but couldn't find in two days another fast solution. And it works this way for 4 months. Out database is not THAT big (500MB), the replication about (300MB)... everything works fast enough except this delete.... How can I evidence the cascade deletes also on explain analyze? The answer for Sean Davis <sdavis2@mail.nih.gov>: EXPLAIN ANALYZE DELETE FROM report WHERE id_order IN (SELECT o.id FROM orders o WHERE o.id_ag NOT IN (SELECT cp.id_ag FROM users u INNER JOIN contactpartner cp ON cp.id_user=u.id WHERE u.name in ('dc') ORDER BY cp.id_ag)) Hash IN Join (cost=3532.83..8182.33 rows=32042 width=6) (actual time=923.456..2457.323 rows=59557 loops=1) Hash Cond: ("outer".id_order = "inner".id) -> Seq Scan on report (cost=0.00..2613.83 rows=64083 width=10) (actual time=33.269..1159.024 rows=64083 loops=1) -> Hash (cost=3323.31..3323.31 rows=32608 width=4) (actual time=890.021..890.021 rows=0 loops=1) -> Seq Scan on orders o (cost=21.12..3323.31 rows=32608 width=4) (actual time=58.428..825.306 rows=60596 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Sort (cost=21.11..21.12 rows=3 width=4) (actual time=47.612..47.612 rows=1 loops=1) Sort Key: cp.id_ag -> Nested Loop (cost=0.00..21.08 rows=3 width=4) (actual time=47.506..47.516 rows=1 loops=1) -> Index Scan using users_name_idx on users u (cost=0.00..5.65 rows=1 width=4) (actual time=20.145..20.148 rows=1 loops=1) Index Cond: ((name)::text = 'dc'::text) -> Index Scan using contactpartner_id_user_idx on contactpartner cp (cost=0.00..15.38 rows=4 width=8) (actual time=27.348..27.352 rows=1 loops=1) Index Cond: (cp.id_user = "outer".id) Total runtime: 456718.658 ms ----- Original Message ----- From: "Steinar H. Gunderson" <sgunderson@bigfoot.com> To: <pgsql-performance@postgresql.org> Sent: Tuesday, October 11, 2005 3:19 PM Subject: Re: [PERFORM] Massive delete performance > On Tue, Oct 11, 2005 at 10:47:03AM +0300, Andy wrote: >> So, I have a replication only with the tables that I need to send, then I >> make a copy of this replication, and from this copy I delete all the >> data's >> that are not needed. >> >> How can I increase this DELETE procedure because it is really slow??? >> There are of corse a lot of data's to be deleted. > > Instead of copying and then deleting, could you try just selecting out > what > you wanted in the first place? > > /* Steinar */ > -- > Homepage: http://www.sesse.net/ > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > >
"Andy" <frum@ar-sd.net> writes: > EXPLAIN ANALYZE > DELETE FROM report WHERE id_order IN > ... > Hash IN Join (cost=3532.83..8182.33 rows=32042 width=6) (actual > time=923.456..2457.323 rows=59557 loops=1) > ... > Total runtime: 456718.658 ms So the runtime is all in the delete triggers. The usual conclusion from this is that there is a foreign key column pointing at this table that does not have an index, or is not the same datatype as the column it references. Either condition will force a fairly inefficient way of handling the FK deletion check. regards, tom lane
Ups folks, Indeed there were 2 important indexes missing. Now it runs about 10 times faster. Sorry for the caused trouble :) and thanx for help. Hash IN Join (cost=3307.49..7689.47 rows=30250 width=6) (actual time=227.666..813.786 rows=56374 loops=1) Hash Cond: ("outer".id_order = "inner".id) -> Seq Scan on report (cost=0.00..2458.99 rows=60499 width=10) (actual time=0.035..269.422 rows=60499 loops=1) -> Hash (cost=3109.24..3109.24 rows=30901 width=4) (actual time=227.459..227.459 rows=0 loops=1) -> Seq Scan on orders o (cost=9.73..3109.24 rows=30901 width=4) (actual time=0.429..154.219 rows=57543 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Sort (cost=9.71..9.72 rows=3 width=4) (actual time=0.329..0.330 rows=1 loops=1) Sort Key: cp.id_ag -> Nested Loop (cost=0.00..9.69 rows=3 width=4) (actual time=0.218..0.224 rows=1 loops=1) -> Index Scan using users_name_idx on users u (cost=0.00..5.61 rows=1 width=4) (actual time=0.082..0.084 rows=1 loops=1) Index Cond: ((name)::text = 'dc'::text) -> Index Scan using contactpartner_id_user_idx on contactpartner cp (cost=0.00..4.03 rows=3 width=8) (actual time=0.125..0.127 rows=1 loops=1) Index Cond: (cp.id_user = "outer".id) Total runtime: 31952.811 ms ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Andy" <frum@ar-sd.net> Cc: "Steinar H. Gunderson" <sgunderson@bigfoot.com>; <pgsql-performance@postgresql.org> Sent: Tuesday, October 11, 2005 5:17 PM Subject: Re: [PERFORM] Massive delete performance > "Andy" <frum@ar-sd.net> writes: >> EXPLAIN ANALYZE >> DELETE FROM report WHERE id_order IN >> ... > >> Hash IN Join (cost=3532.83..8182.33 rows=32042 width=6) (actual >> time=923.456..2457.323 rows=59557 loops=1) >> ... >> Total runtime: 456718.658 ms > > So the runtime is all in the delete triggers. The usual conclusion from > this is that there is a foreign key column pointing at this table that > does not have an index, or is not the same datatype as the column it > references. Either condition will force a fairly inefficient way of > handling the FK deletion check. > > regards, tom lane > >
* Andy <frum@ar-sd.net> wrote: <snip> > I have the following problem: I have a client to which we send every > night a "dump" with a the database in which there are only their > data's. It is a stupid solution but I choose this solution because I > couldn't find any better. The target machine is a windows 2003. > > So, I have a replication only with the tables that I need to send, > then I make a copy of this replication, and from this copy I delete > all the data's that are not needed. Why not filtering out as much unnecessary stuff as possible on copying ? <snip> > How can I increase this DELETE procedure because it is really slow??? > There are of corse a lot of data's to be deleted. Have you set up the right indices ? cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de --------------------------------------------------------------------- Realtime Forex/Stock Exchange trading powered by postgreSQL :)) http://www.fxignal.net/ ---------------------------------------------------------------------