Thread: Are JOINs allowed with DELETE FROM

Are JOINs allowed with DELETE FROM

From
Steven Rosenstein
Date:



This is probably a very trivial question and I feel foolish in even posting
it, but I cannot seem to get it to work.

SCENARIO (abstracted):

Two tables, "summary" and "detail".  The schema of summary looks like:

id          int   serial sequential record id
collect_date      date  date the detail events were collected

The schema of detail looks like:

id          int   serial sequential record id
sum_id            int   the id of the parent record in the summary table
details           text  a particular event's details

The relationship is obvious.  If I want to extract all the detail records
for a particular date (2/5/05), I construct a query as follows:

SELECT * FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

Now... I want to *delete* all the detail records for a particular date, I
tried:

DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

But I keep getting a parser error.  Am I not allowed to use JOINs in a
DELETE statement, or am I just fat-fingering the SQL text somewhere.  If
I'm *not* allowed to use a JOIN with a DELETE, what is the best workaround?
I want to delete just the records in the detail table, and not its parent
summary record.

Thanks in advance for your help,
--- Steve
___________________________________________________________________________________

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


Re: Are JOINs allowed with DELETE FROM

From
Gaetano Mendola
Date:
Steven Rosenstein wrote:
> DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
> collect_date='2005-02-05';

DELETE FROM detail WHERE detail.sum_id in ( select id from summary  )
AND collect_date='2005-02-05';


Regards
Gaetano Mendola








Re: Are JOINs allowed with DELETE FROM

From
Michael Fuhr
Date:
On Sun, Feb 06, 2005 at 12:16:13PM -0500, Steven Rosenstein wrote:
>
> DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
> collect_date='2005-02-05';
>
> But I keep getting a parser error.  Am I not allowed to use JOINs in a
> DELETE statement, or am I just fat-fingering the SQL text somewhere.

See the documentation for DELETE:

http://www.postgresql.org/docs/8.0/static/sql-delete.html

If you intend to delete the date's record from the summary table,
then the detail table could use a foreign key constraint defined
with ON DELETE CASCADE.  Deleting a record from summary would then
automatically delete all associated records in detail.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Are JOINs allowed with DELETE FROM

From
John Arbash Meinel
Date:
Gaetano Mendola wrote:

> Steven Rosenstein wrote:
>
>> DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
>> collect_date='2005-02-05';
>
>
You have to tell it what table you are deleting from. Select * from A
join B is both tables. What you want to do is fix the where clause.

> DELETE FROM detail WHERE detail.sum_id in ( select id from summary  )
> AND collect_date='2005-02-05';
>
I'm guessing this should actually be
DELETE FROM detail WHERE detail.sum_id in ( SELECT id FROM summary WHERE
collect_date='2005-02-05' );
Otherwise you wouldn't really need the join.

You have to come up with a plan that yields rows that are in the table
you want to delete. The rows that result from
select * from detail join summary, contain values from both tables.

If you want to delete from both tables, I think this has to be 2
deletes. Probably best to be in a transaction.

BEGIN;
DELETE FROM detail WHERE ...
DELETE FROM summary WHERE collect_date = '2005-02-05';
COMMIT;

>
> Regards
> Gaetano Mendola
>
John
=:->


Attachment

Re: Are JOINs allowed with DELETE FROM

From
Steven Rosenstein
Date:



Hi Michael,

Thank you for the link to the documentation page.  I forgot to mention that
we're still using version 7.3.  When I checked the 7.3 documentation for
DELETE, there was no mention of being able to use fields from different
tables in a WHERE clause.  This feature must have been added in a
subsequent release of PostgreSQL.

Gaetano & John:  I *did* try your suggestion.  However, there were so many
summary ID's returned (9810 to be exact) that the DELETE seemed to be
taking forever.  Here's an actual SELECT query that I ran as a test:

vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan;   [This is the
"summary" table from my abstracted example]
INFO:  --Relation vsa.tbl_win_patch_scan--
INFO:  Pages 374: Changed 0, Empty 0; Tup 10485: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO:  --Relation pg_toast.pg_toast_39384--
INFO:  Pages 62679: Changed 0, Empty 0; Tup 254116: Vac 0, Keep 0, UnUsed
0.
        Total CPU 0.86s/0.21u sec elapsed 13.79 sec.
INFO:  Analyzing vsa.tbl_win_patch_scan
VACUUM
Time: 18451.32 ms

vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan_item;   [This is the
"detail" table from my abstracted example]
INFO:  --Relation vsa.tbl_win_patch_scan_item--
INFO:  Pages 110455: Changed 0, Empty 0; Tup 752066: Vac 0, Keep 0, UnUsed
0.
        Total CPU 2.23s/0.45u sec elapsed 42.07 sec.
INFO:  --Relation pg_toast.pg_toast_39393--
INFO:  Pages 2464: Changed 0, Empty 0; Tup 14780: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.02s/0.02u sec elapsed 2.31 sec.
INFO:  Analyzing vsa.tbl_win_patch_scan_item
VACUUM
Time: 62075.52 ms

vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-01 00:00:00');
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..379976970.68 rows=376033
width=1150) (actual time=11.50..27373.29 rows=62 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..505.11 rows=4 width=4) (actual
time=0.00..0.00 rows=2 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=4
width=4) (actual time=0.03..11.16 rows=2 loops=1)
                 Filter: (scan_datetime < '2004-09-01 00:00:00'::timestamp
without time zone)
 Total runtime: 27373.65 msec
(7 rows)

Time: 27384.12 ms

I put in a very early date (2004-09-01) because I knew there would be very
few rows to look at (2 rows in vsa.tbl_win_patch_scan meet the date
criteria, and a total of 62 rows in vsa.tbl_win_patch_scan_item match
either of the two tbl_win_patch_scan ID's returned in the WHERE subquery).
Can anyone see a way of optimizing this so that it runs faster?  The real
date I should be using is 2004-12-06 (~60 days retention), and when I do
use it, the query seems to take forever.  I ran a number explan analyzes
with different scan_datetimes, and it seems that the execution time
increases exponentially with the number of rows (ID's) returned by the
subquery.  Running top shows that the EXPLAIN is entirely CPU-bound.  There
is no disk I/O during any query execution:

DATE=2004-09-01; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=27.37 sec (Included
initial query cache loading effect)
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-01 00:00:00');
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..379976970.68 rows=376033
width=1150) (actual time=11.50..27373.29 rows=62 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..505.11 rows=4 width=4) (actual
time=0.00..0.00 rows=2 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=4
width=4) (actual time=0.03..11.16 rows=2 loops=1)
                 Filter: (scan_datetime < '2004-09-01 00:00:00'::timestamp
without time zone)
 Total runtime: 27373.65 msec
(7 rows)

Time: 27384.12 ms

DATE=2004-09-02; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=8.26 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-02 00:00:00');
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..380115740.80 rows=376033
width=1142) (actual time=10.42..8259.79 rows=62 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..505.48 rows=41 width=4) (actual
time=0.00..0.00 rows=2 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=41
width=4) (actual time=0.02..10.08 rows=2 loops=1)
                 Filter: (scan_datetime < '2004-09-02 00:00:00'::timestamp
without time zone)
 Total runtime: 8259.91 msec
(7 rows)

Time: 8263.52 ms

DATE=2004-09-05; SUMMARY ROWS=3; DETAIL ROWS=93; TIME=5.61 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-05 00:00:00');
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..380531977.65 rows=376033
width=1142) (actual time=10.11..5616.68 rows=93 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..506.58 rows=152 width=4) (actual
time=0.00..0.00 rows=3 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=152
width=4) (actual time=0.02..10.05 rows=3 loops=1)
                 Filter: (scan_datetime < '2004-09-05 00:00:00'::timestamp
without time zone)
 Total runtime: 5616.81 msec
(7 rows)

Time: 5617.87 ms

DATE=2004-09-15; SUMMARY ROWS=16; DETAIL ROWS=674; TIME=18.03 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-15 00:00:00');
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..381919433.78 rows=376033
width=1142) (actual time=10.18..18032.25 rows=674 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..510.27 rows=521 width=4) (actual
time=0.00..0.01 rows=16 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=521
width=4) (actual time=0.02..10.11 rows=16 loops=1)
                 Filter: (scan_datetime < '2004-09-15 00:00:00'::timestamp
without time zone)
 Total runtime: 18032.72 msec
(7 rows)

Time: 18033.78 ms

DATE=2004-09-16; SUMMARY ROWS=25; DETAIL ROWS=1131; TIME=26.22 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-16 00:00:00');
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382058179.39 rows=376033
width=1142) (actual time=6.14..26218.56 rows=1131 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..510.64 rows=558 width=4) (actual
time=0.00..0.01 rows=25 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=558
width=4) (actual time=0.01..6.09 rows=25 loops=1)
                 Filter: (scan_datetime < '2004-09-16 00:00:00'::timestamp
without time zone)
 Total runtime: 26219.24 msec
(7 rows)

Time: 26220.44 ms

DATE=2004-09-17; SUMMARY ROWS=34; DETAIL ROWS=1588; TIME=34.97 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime < '2004-09-17 00:00:00');
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382196925.01 rows=376033
width=1142) (actual time=10.25..34965.95 rows=1588 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Materialize  (cost=505.06..511.01 rows=595 width=4) (actual
time=0.00..0.02 rows=34 loops=752066)
           ->  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=595
width=4) (actual time=0.02..10.16 rows=34 loops=1)
                 Filter: (scan_datetime < '2004-09-17 00:00:00'::timestamp
without time zone)
 Total runtime: 34966.90 msec
(7 rows)

Time: 34967.95 ms


What I may end up doing is using the scripting language PHP to solve the
issue by running one query just to return the summary table ID's, and then
DELETE all the rows matching each ID individually by looping through the
ID's.  I was looking for something more elegant, but this will work if its
the only solution.

Thank you all for your help with this.
--- Steve

___________________________________________________________________________________

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt



             Michael Fuhr
             <mike@fuhr.org>
                                                                        To
             02/06/2005 12:50          Steven Rosenstein/New
             PM                        York/IBM@IBMUS
                                                                        cc
                                       pgsql-performance@postgresql.org
                                                                   Subject
                                       Re: [PERFORM] Are JOINs allowed
                                       with DELETE FROM










On Sun, Feb 06, 2005 at 12:16:13PM -0500, Steven Rosenstein wrote:
>
> DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
> collect_date='2005-02-05';
>
> But I keep getting a parser error.  Am I not allowed to use JOINs in a
> DELETE statement, or am I just fat-fingering the SQL text somewhere.

See the documentation for DELETE:

http://www.postgresql.org/docs/8.0/static/sql-delete.html

If you intend to delete the date's record from the summary table,
then the detail table could use a foreign key constraint defined
with ON DELETE CASCADE.  Deleting a record from summary would then
automatically delete all associated records in detail.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/



Re: Are JOINs allowed with DELETE FROM

From
Tom Lane
Date:
Steven Rosenstein <srosenst@us.ibm.com> writes:
> Thank you for the link to the documentation page.  I forgot to mention that
> we're still using version 7.3.  When I checked the 7.3 documentation for
> DELETE, there was no mention of being able to use fields from different
> tables in a WHERE clause.  This feature must have been added in a
> subsequent release of PostgreSQL.

No, it's been there all along, if perhaps not well documented.

            regards, tom lane

Re: Are JOINs allowed with DELETE FROM

From
Steven Rosenstein
Date:



Many thanks to Gaetano Mendola and Tom Lane for the hints about using
fields from other tables in a DELETE's WHERE clause.  That was the magic
bullet I needed, and my application is working as expected.

--- Steve
___________________________________________________________________________________

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt



             Tom Lane
             <tgl@sss.pgh.pa.u
             s>                                                         To
             Sent by:                  Steven Rosenstein/New
             pgsql-performance         York/IBM@IBMUS
             -owner@postgresql                                          cc
             .org                      pgsql-performance@postgresql.org
                                                                   Subject
                                       Re: [PERFORM] Are JOINs allowed
             02/06/2005 02:49          with DELETE FROM
             PM









Steven Rosenstein <srosenst@us.ibm.com> writes:
> Thank you for the link to the documentation page.  I forgot to mention
that
> we're still using version 7.3.  When I checked the 7.3 documentation for
> DELETE, there was no mention of being able to use fields from different
> tables in a WHERE clause.  This feature must have been added in a
> subsequent release of PostgreSQL.

No, it's been there all along, if perhaps not well documented.

                                     regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



Re: Are JOINs allowed with DELETE FROM

From
Gaetano Mendola
Date:
Steven Rosenstein wrote:
 >
 >
 >
 > Hi Michael,
 >
 > Thank you for the link to the documentation page.  I forgot to mention that
 > we're still using version 7.3.  When I checked the 7.3 documentation for
 > DELETE, there was no mention of being able to use fields from different
 > tables in a WHERE clause.  This feature must have been added in a
 > subsequent release of PostgreSQL.
 >
 > Gaetano & John:  I *did* try your suggestion.  However, there were so many
 > summary ID's returned (9810 to be exact) that the DELETE seemed to be
 > taking forever.

7.3 is affected by bad performances if you use IN.
Transform the IN in an  EXIST construct.

If it'is an option for you upgrade you DB engine.



Regards
Gaetano Mendola