Re: Are JOINs allowed with DELETE FROM - Mailing list pgsql-performance
From | Steven Rosenstein |
---|---|
Subject | Re: Are JOINs allowed with DELETE FROM |
Date | |
Msg-id | OF8B3991BA.5AD3582E-ON85256FA0.0065EFA7-85256FA0.006B6A96@us.ibm.com Whole thread Raw |
In response to | Re: Are JOINs allowed with DELETE FROM (Michael Fuhr <mike@fuhr.org>) |
Responses |
Re: Are JOINs allowed with DELETE FROM
Re: Are JOINs allowed with DELETE FROM |
List | pgsql-performance |
Hi Michael, Thank you for the link to the documentation page. I forgot to mention that we're still using version 7.3. When I checked the 7.3 documentation for DELETE, there was no mention of being able to use fields from different tables in a WHERE clause. This feature must have been added in a subsequent release of PostgreSQL. Gaetano & John: I *did* try your suggestion. However, there were so many summary ID's returned (9810 to be exact) that the DELETE seemed to be taking forever. Here's an actual SELECT query that I ran as a test: vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan; [This is the "summary" table from my abstracted example] INFO: --Relation vsa.tbl_win_patch_scan-- INFO: Pages 374: Changed 0, Empty 0; Tup 10485: Vac 0, Keep 0, UnUsed 0. Total CPU 0.01s/0.00u sec elapsed 0.00 sec. INFO: --Relation pg_toast.pg_toast_39384-- INFO: Pages 62679: Changed 0, Empty 0; Tup 254116: Vac 0, Keep 0, UnUsed 0. Total CPU 0.86s/0.21u sec elapsed 13.79 sec. INFO: Analyzing vsa.tbl_win_patch_scan VACUUM Time: 18451.32 ms vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan_item; [This is the "detail" table from my abstracted example] INFO: --Relation vsa.tbl_win_patch_scan_item-- INFO: Pages 110455: Changed 0, Empty 0; Tup 752066: Vac 0, Keep 0, UnUsed 0. Total CPU 2.23s/0.45u sec elapsed 42.07 sec. INFO: --Relation pg_toast.pg_toast_39393-- INFO: Pages 2464: Changed 0, Empty 0; Tup 14780: Vac 0, Keep 0, UnUsed 0. Total CPU 0.02s/0.02u sec elapsed 2.31 sec. INFO: Analyzing vsa.tbl_win_patch_scan_item VACUUM Time: 62075.52 ms vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-01 00:00:00'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tbl_win_patch_scan_item (cost=0.00..379976970.68 rows=376033 width=1150) (actual time=11.50..27373.29 rows=62 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..505.11 rows=4 width=4) (actual time=0.00..0.00 rows=2 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=4 width=4) (actual time=0.03..11.16 rows=2 loops=1) Filter: (scan_datetime < '2004-09-01 00:00:00'::timestamp without time zone) Total runtime: 27373.65 msec (7 rows) Time: 27384.12 ms I put in a very early date (2004-09-01) because I knew there would be very few rows to look at (2 rows in vsa.tbl_win_patch_scan meet the date criteria, and a total of 62 rows in vsa.tbl_win_patch_scan_item match either of the two tbl_win_patch_scan ID's returned in the WHERE subquery). Can anyone see a way of optimizing this so that it runs faster? The real date I should be using is 2004-12-06 (~60 days retention), and when I do use it, the query seems to take forever. I ran a number explan analyzes with different scan_datetimes, and it seems that the execution time increases exponentially with the number of rows (ID's) returned by the subquery. Running top shows that the EXPLAIN is entirely CPU-bound. There is no disk I/O during any query execution: DATE=2004-09-01; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=27.37 sec (Included initial query cache loading effect) vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-01 00:00:00'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tbl_win_patch_scan_item (cost=0.00..379976970.68 rows=376033 width=1150) (actual time=11.50..27373.29 rows=62 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..505.11 rows=4 width=4) (actual time=0.00..0.00 rows=2 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=4 width=4) (actual time=0.03..11.16 rows=2 loops=1) Filter: (scan_datetime < '2004-09-01 00:00:00'::timestamp without time zone) Total runtime: 27373.65 msec (7 rows) Time: 27384.12 ms DATE=2004-09-02; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=8.26 sec vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-02 00:00:00'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on tbl_win_patch_scan_item (cost=0.00..380115740.80 rows=376033 width=1142) (actual time=10.42..8259.79 rows=62 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..505.48 rows=41 width=4) (actual time=0.00..0.00 rows=2 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=41 width=4) (actual time=0.02..10.08 rows=2 loops=1) Filter: (scan_datetime < '2004-09-02 00:00:00'::timestamp without time zone) Total runtime: 8259.91 msec (7 rows) Time: 8263.52 ms DATE=2004-09-05; SUMMARY ROWS=3; DETAIL ROWS=93; TIME=5.61 sec vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-05 00:00:00'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on tbl_win_patch_scan_item (cost=0.00..380531977.65 rows=376033 width=1142) (actual time=10.11..5616.68 rows=93 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..506.58 rows=152 width=4) (actual time=0.00..0.00 rows=3 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=152 width=4) (actual time=0.02..10.05 rows=3 loops=1) Filter: (scan_datetime < '2004-09-05 00:00:00'::timestamp without time zone) Total runtime: 5616.81 msec (7 rows) Time: 5617.87 ms DATE=2004-09-15; SUMMARY ROWS=16; DETAIL ROWS=674; TIME=18.03 sec vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-15 00:00:00'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tbl_win_patch_scan_item (cost=0.00..381919433.78 rows=376033 width=1142) (actual time=10.18..18032.25 rows=674 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..510.27 rows=521 width=4) (actual time=0.00..0.01 rows=16 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=521 width=4) (actual time=0.02..10.11 rows=16 loops=1) Filter: (scan_datetime < '2004-09-15 00:00:00'::timestamp without time zone) Total runtime: 18032.72 msec (7 rows) Time: 18033.78 ms DATE=2004-09-16; SUMMARY ROWS=25; DETAIL ROWS=1131; TIME=26.22 sec vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-16 00:00:00'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tbl_win_patch_scan_item (cost=0.00..382058179.39 rows=376033 width=1142) (actual time=6.14..26218.56 rows=1131 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..510.64 rows=558 width=4) (actual time=0.00..0.01 rows=25 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=558 width=4) (actual time=0.01..6.09 rows=25 loops=1) Filter: (scan_datetime < '2004-09-16 00:00:00'::timestamp without time zone) Total runtime: 26219.24 msec (7 rows) Time: 26220.44 ms DATE=2004-09-17; SUMMARY ROWS=34; DETAIL ROWS=1588; TIME=34.97 sec vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime < '2004-09-17 00:00:00'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tbl_win_patch_scan_item (cost=0.00..382196925.01 rows=376033 width=1142) (actual time=10.25..34965.95 rows=1588 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=505.06..511.01 rows=595 width=4) (actual time=0.00..0.02 rows=34 loops=752066) -> Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=595 width=4) (actual time=0.02..10.16 rows=34 loops=1) Filter: (scan_datetime < '2004-09-17 00:00:00'::timestamp without time zone) Total runtime: 34966.90 msec (7 rows) Time: 34967.95 ms What I may end up doing is using the scripting language PHP to solve the issue by running one query just to return the summary table ID's, and then DELETE all the rows matching each ID individually by looping through the ID's. I was looking for something more elegant, but this will work if its the only solution. Thank you all for your help with this. --- Steve ___________________________________________________________________________________ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com "Learn from the mistakes of others because you can't live long enough to make them all yourself." -- Eleanor Roosevelt Michael Fuhr <mike@fuhr.org> To 02/06/2005 12:50 Steven Rosenstein/New PM York/IBM@IBMUS cc pgsql-performance@postgresql.org Subject Re: [PERFORM] Are JOINs allowed with DELETE FROM On Sun, Feb 06, 2005 at 12:16:13PM -0500, Steven Rosenstein wrote: > > DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE > collect_date='2005-02-05'; > > But I keep getting a parser error. Am I not allowed to use JOINs in a > DELETE statement, or am I just fat-fingering the SQL text somewhere. See the documentation for DELETE: http://www.postgresql.org/docs/8.0/static/sql-delete.html If you intend to delete the date's record from the summary table, then the detail table could use a foreign key constraint defined with ON DELETE CASCADE. Deleting a record from summary would then automatically delete all associated records in detail. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
pgsql-performance by date: