Thread: DELETE syntax on JOINS

DELETE syntax on JOINS

From
Jean-Michel Pouré
Date:
Dear Friends,

First, thank you very much for considering a fix on the GROUP BY issue.
I am starting a new thread about another issue:

It seems that DELETE cannot understand INNER JOINS and needs HAVING.

Read:
http://drupal.org/node/555562 (main message)
http://drupal.org/node/555648

I don't see why PostgreSQL would not be able to run queries like:

DELETE h
FROM history AS h
INNER JOIN term_node AS tn ON (h.nid = tn.nid)
INNER JOIN term_data AS td ON (td.tid = tn.tid)
WHERE h.uid = 2067 AND td.vid = 2

Ultimately, why not allow:

DELETE h, tn
FROM history AS h
INNER JOIN term_node AS tn ON (h.nid = tn.nid)
INNER JOIN term_data AS td ON (td.tid = tn.tid)
WHERE h.uid = 2067 AND td.vid = 2

IMHO this would improve compliance towards other database systems. To me
this seems to be in the reasonable scope of compatibility.

Kind regards,
Jean-Michel

Re: DELETE syntax on JOINS

From
Bruce Momjian
Date:
Jean-Michel Pour� wrote:
-- Start of PGP signed section.
> Dear Friends,
> 
> First, thank you very much for considering a fix on the GROUP BY issue.
> I am starting a new thread about another issue:
> 
> It seems that DELETE cannot understand INNER JOINS and needs HAVING. 
> 
> Read: 
> http://drupal.org/node/555562 (main message)
> http://drupal.org/node/555648
> 
> I don't see why PostgreSQL would not be able to run queries like:
> 
> DELETE h
> FROM history AS h 
> INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> WHERE h.uid = 2067 AND td.vid = 2
> 
> Ultimately, why not allow:
> 
> DELETE h, tn
> FROM history AS h 
> INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> WHERE h.uid = 2067 AND td.vid = 2
> 
> IMHO this would improve compliance towards other database systems. To me
> this seems to be in the reasonable scope of compatibility.

Which "other database systems"?  Only MySQL?  If it is MySQL-only, we
are unlikely to add it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: DELETE syntax on JOINS

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> > Ultimately, why not allow:
> > 
> > DELETE h, tn
> > FROM history AS h 
> > INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> > INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> > WHERE h.uid = 2067 AND td.vid = 2
> > 
> > IMHO this would improve compliance towards other database systems. To me
> > this seems to be in the reasonable scope of compatibility.
> 
> Which "other database systems"?  Only MySQL?  If it is MySQL-only, we
> are unlikely to add it.

The SQL standard does not support this syntax.  They would have you put
the joins in a subselect (which is often not enough because then you
can't use outer joins).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: DELETE syntax on JOINS

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > > Ultimately, why not allow:
> > > 
> > > DELETE h, tn
> > > FROM history AS h 
> > > INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> > > INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> > > WHERE h.uid = 2067 AND td.vid = 2
> > > 
> > > IMHO this would improve compliance towards other database systems. To me
> > > this seems to be in the reasonable scope of compatibility.
> > 
> > Which "other database systems"?  Only MySQL?  If it is MySQL-only, we
> > are unlikely to add it.
> 
> The SQL standard does not support this syntax.  They would have you put
> the joins in a subselect (which is often not enough because then you
> can't use outer joins).

So the problem is that our DELETE ... USING does not allow ANSI join
syntax?  Can that be added?


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: DELETE syntax on JOINS

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> > > > Ultimately, why not allow:
> > > > 
> > > > DELETE h, tn
> > > > FROM history AS h 
> > > > INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> > > > INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> > > > WHERE h.uid = 2067 AND td.vid = 2
> > > > 
> > > > IMHO this would improve compliance towards other database systems. To me
> > > > this seems to be in the reasonable scope of compatibility.
> > > 
> > > Which "other database systems"?  Only MySQL?  If it is MySQL-only, we
> > > are unlikely to add it.
> > 
> > The SQL standard does not support this syntax.  They would have you put
> > the joins in a subselect (which is often not enough because then you
> > can't use outer joins).
> 
> So the problem is that our DELETE ... USING does not allow ANSI join
> syntax?  Can that be added?

Not sure about that.  USING is already an extension to the standard, so
if we extend it a bit more, it can't be a problem, can it?  But this
doesn't solve Jean Michel's problem, because MySQL does not support
DELETE USING (or does it?).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: DELETE syntax on JOINS

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > 
> > > > > Ultimately, why not allow:
> > > > > 
> > > > > DELETE h, tn
> > > > > FROM history AS h 
> > > > > INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> > > > > INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> > > > > WHERE h.uid = 2067 AND td.vid = 2
> > > > > 
> > > > > IMHO this would improve compliance towards other database systems. To me
> > > > > this seems to be in the reasonable scope of compatibility.
> > > > 
> > > > Which "other database systems"?  Only MySQL?  If it is MySQL-only, we
> > > > are unlikely to add it.
> > > 
> > > The SQL standard does not support this syntax.  They would have you put
> > > the joins in a subselect (which is often not enough because then you
> > > can't use outer joins).
> > 
> > So the problem is that our DELETE ... USING does not allow ANSI join
> > syntax?  Can that be added?
> 
> Not sure about that.  USING is already an extension to the standard, so
> if we extend it a bit more, it can't be a problem, can it?  But this
> doesn't solve Jean Michel's problem, because MySQL does not support
> DELETE USING (or does it?).

Right, but if we support ANSI joins in the USING clause, at least we
would have a _functional_ equivalent, which we don't know because of
missing outer join support.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: DELETE syntax on JOINS

From
Bill Moran
Date:
In response to Bruce Momjian <bruce@momjian.us>:

> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> > > > Ultimately, why not allow:
> > > > 
> > > > DELETE h, tn
> > > > FROM history AS h 
> > > > INNER JOIN term_node AS tn ON (h.nid = tn.nid) 
> > > > INNER JOIN term_data AS td ON (td.tid = tn.tid) 
> > > > WHERE h.uid = 2067 AND td.vid = 2
> > > > 
> > > > IMHO this would improve compliance towards other database systems. To me
> > > > this seems to be in the reasonable scope of compatibility.
> > > 
> > > Which "other database systems"?  Only MySQL?  If it is MySQL-only, we
> > > are unlikely to add it.
> > 
> > The SQL standard does not support this syntax.  They would have you put
> > the joins in a subselect (which is often not enough because then you
> > can't use outer joins).
> 
> So the problem is that our DELETE ... USING does not allow ANSI join
> syntax?  Can that be added?

I suspect that the reason MySQL has this syntax is because for a long time
they didn't have proper foreign keys and referential integrity.

With proper foreign keys and ON DELETE CASCADE, why would supporting
such syntax even be necessary?

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


Re: DELETE syntax on JOINS

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Bruce Momjian wrote:
>> So the problem is that our DELETE ... USING does not allow ANSI join
>> syntax?  Can that be added?

> Not sure about that.  USING is already an extension to the standard, so
> if we extend it a bit more, it can't be a problem, can it?

I don't see any very good way to extend the USING syntax to allow the
target table to be outer-joined to something else.  Some other systems
allow it by letting you re-specify the target in the other clause,
equivalently to

DELETE FROM target t USING t LEFT JOIN other_table ot ON ...

but we have always considered that the target is *not* to be identified
with any member of the FROM/USING clause, so it would be a serious
compatibility break to change that now.
        regards, tom lane


Re: DELETE syntax on JOINS

From
Robert Haas
Date:
2009/8/24 Tom Lane <tgl@sss.pgh.pa.us>:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Bruce Momjian wrote:
>>> So the problem is that our DELETE ... USING does not allow ANSI join
>>> syntax?  Can that be added?
>
>> Not sure about that.  USING is already an extension to the standard, so
>> if we extend it a bit more, it can't be a problem, can it?
>
> I don't see any very good way to extend the USING syntax to allow the
> target table to be outer-joined to something else.  Some other systems
> allow it by letting you re-specify the target in the other clause,
> equivalently to
>
> DELETE FROM target t USING t LEFT JOIN other_table ot ON ...
>
> but we have always considered that the target is *not* to be identified
> with any member of the FROM/USING clause, so it would be a serious
> compatibility break to change that now.

I'm all in favor of compatibility, but if there is any way to make
this work without massive collateral damage, I am also all in favor of
that.  I am forever writing queries that contain a needless self-join
to work around the impossibility of directly outer-joining against the
target.

...Robert


Re: DELETE syntax on JOINS

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> 2009/8/24 Tom Lane <tgl@sss.pgh.pa.us>:
>> ... Some other systems
>> allow it by letting you re-specify the target in the other clause,
>> equivalently to
>> 
>> DELETE FROM target t USING t LEFT JOIN other_table ot ON ...
>> 
>> but we have always considered that the target is *not* to be identified
>> with any member of the FROM/USING clause, so it would be a serious
>> compatibility break to change that now.

> I'm all in favor of compatibility, but if there is any way to make
> this work without massive collateral damage, I am also all in favor of
> that.  I am forever writing queries that contain a needless self-join
> to work around the impossibility of directly outer-joining against the
> target.

It'd be pretty easy to do if we were willing to introduce a new reserved
word; for example

DELETE FROM target t USING SELF LEFT JOIN other_table ot ON ...

(or maybe TARGET instead of SELF, or some other word).  Wouldn't do
anything for exact compatibility with MySQL or anybody else using the
respecify-the-target-table-name approach.  But it would be unambiguous
and backwards-compatible.  The real problem with this is that all the
good candidates for the reserved word are things people are probably
already using as aliases, so we'd have a large risk of breaking existing
queries.  We could avoid that with a sufficiently ugly choice like

DELETE FROM target t USING DELETE_TARGET LEFT JOIN other_table ot ON ...

but yech ...
        regards, tom lane


Re: DELETE syntax on JOINS

From
Sam Mason
Date:
On Mon, Aug 24, 2009 at 01:41:28PM -0400, Tom Lane wrote:
> The real problem with this is that all the
> good candidates for the reserved word are things people are probably
> already using as aliases, so we'd have a large risk of breaking existing
> queries.  We could avoid that with a sufficiently ugly choice like
> 
> DELETE FROM target t USING DELETE_TARGET LEFT JOIN other_table ot ON ...
> 
> but yech ...
 PRIMARY or TABLE?

Both are pretty grim, but I think they're reserved at the moment.

--  Sam  http://samason.me.uk/


Re: DELETE syntax on JOINS

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Bruce Momjian wrote:
> >> So the problem is that our DELETE ... USING does not allow ANSI join
> >> syntax?  Can that be added?
> 
> > Not sure about that.  USING is already an extension to the standard, so
> > if we extend it a bit more, it can't be a problem, can it?
> 
> I don't see any very good way to extend the USING syntax to allow the
> target table to be outer-joined to something else.  Some other systems
> allow it by letting you re-specify the target in the other clause,
> equivalently to
> 
> DELETE FROM target t USING t LEFT JOIN other_table ot ON ...
> 
> but we have always considered that the target is *not* to be identified
> with any member of the FROM/USING clause, so it would be a serious
> compatibility break to change that now.

Let's look at this a little closer.  We can use an alias in the DELETE
FROM clause:
test=> DELETE FROM test t;test=> DELETE FROM test t USING test;

What we cannot currently do is reference test twice:
test=> DELETE FROM test USING test;ERROR:  table name "test" specified more than oncetest=> DELETE FROM test t USING
testt;ERROR:  table name "t" specified more than once
 

As far as I understand it, allowing ANSI joins in USING would simple
mean removing that error message and linking the two table aliases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: DELETE syntax on JOINS

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> What we cannot currently do is reference test twice:

>     test=> DELETE FROM test USING test;
>     ERROR:  table name "test" specified more than once
>     test=> DELETE FROM test t USING test t;
>     ERROR:  table name "t" specified more than once

Hmm, I had forgotten that we throw errors in these cases now.
Maybe that *would* give us an escape-hatch for the other interpretation.

> As far as I understand it, allowing ANSI joins in USING would simple
> mean removing that error message and linking the two table aliases.

Well, you'd still need to complain about

DELETE FROM test USING test JOIN test ON ...

Also, it's not nearly as easy as just removing the error check.
There's stuff in the planner (and perhaps executor) that's dependent on
the assumption that the target table isn't on the inside of an outer
join, for example.  Still, getting agreement on a syntax would in itself
be a huge step forward.
        regards, tom lane


Re: DELETE syntax on JOINS

From
Josh Berkus
Date:
All,

>> DELETE FROM target t USING t LEFT JOIN other_table ot ON ...
>>
>> but we have always considered that the target is *not* to be identified
>> with any member of the FROM/USING clause, so it would be a serious
>> compatibility break to change that now.

What I don't get is why this is such a usability issue.  Subqueries in
DELETE FROM work perfectly well, and provide more flexibility than most
users know what to do with.

Personally, I'd be happy just to stop with the SQL extension we have.  I
think extending USING any further is going to cause more problems than
it solves.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: DELETE syntax on JOINS

From
Jean-Michel Pouré
Date:
> With proper foreign keys and ON DELETE CASCADE, why would supporting
> such syntax even be necessary?

Porting existing abstraction layers from ANSI JOINs to ON DELETE CASCADE
is complicated.

> What I don't get is why this is such a usability issue.  Subqueries in
> DELETE FROM work perfectly well, and provide more flexibility than
> most
> users know what to do with.

The ANSI syntax allows deleting one or several tables at once.
Subqueries are not supported by MySQL on DELETE.

Again, this is a usability issue to gain market shares and happy users
againts MySQL.

Kind regards,
Jean-Michel

Re: DELETE syntax on JOINS

From
Bruce Momjian
Date:
Josh Berkus wrote:
> All,
> 
> >> DELETE FROM target t USING t LEFT JOIN other_table ot ON ...
> >>
> >> but we have always considered that the target is *not* to be identified
> >> with any member of the FROM/USING clause, so it would be a serious
> >> compatibility break to change that now.
> 
> What I don't get is why this is such a usability issue.  Subqueries in
> DELETE FROM work perfectly well, and provide more flexibility than most
> users know what to do with.

OK, so you are saying that every OUTER join can be efficiently
reprsented as a subquery?  If that is true we don't need to add ANSI
join support to USING.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: DELETE syntax on JOINS

From
Robert Haas
Date:
On Mon, Aug 24, 2009 at 9:31 PM, Josh Berkus<josh@agliodbs.com> wrote:
> All,
>
>>> DELETE FROM target t USING t LEFT JOIN other_table ot ON ...
>>>
>>> but we have always considered that the target is *not* to be identified
>>> with any member of the FROM/USING clause, so it would be a serious
>>> compatibility break to change that now.
>
> What I don't get is why this is such a usability issue.  Subqueries in
> DELETE FROM work perfectly well, and provide more flexibility than most
> users know what to do with.
>
> Personally, I'd be happy just to stop with the SQL extension we have.  I
> think extending USING any further is going to cause more problems than
> it solves.

It's both a usability issue and a performance issue.  Suppose you want
to select all the rows in foo whose id field does not appear in
bar.foo_id.  The most efficient way to do this in PostgreSQL is
typically:

SELECT foo.* FROM foo LEFT JOIN bar ON foo.id = bar.foo_id WHERE
bar.foo_id IS NULL;

Now, if you want to delete those rows, you can't do it without an
extra join somewhere.  You can do it like this:

DELETE FROM foo AS foo1   USING foo AS foo2 LEFT JOIN bar ON foo2.id = bar.foo_id   WHERE foo1.id = foo2.id AND foo2;

Or like this:

DELETE FROM foo WHERE id IN (SELECT foo.id FROM foo LEFT JOIN bar ON
foo.id = bar.foo_id WHERE bar.foo_id IS NULL);

...but either way you now have foo in there twice when it really
shouldn't need to be, and you're doing a useless self-join to work
around a syntax limitation.

[ thinks ]

Actually, I guess in this case you can get around it like this:

DELETE FROM foo WHERE NOT EXISTS (SELECT 1 FROM bar WHERE bar.foo_id = foo.id);

...but I'm not sure it can be rewritten that way in every case - in
particular, that won't work if you have a RETURNING clause that
includes a value taken from bar.

...Robert


Re: DELETE syntax on JOINS

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 24, 2009 at 9:31 PM, Josh Berkus<josh@agliodbs.com> wrote:
>> What I don't get is why this is such a usability issue. �Subqueries in
>> DELETE FROM work perfectly well, and provide more flexibility than most
>> users know what to do with.

> It's both a usability issue and a performance issue.

On the usability front: if we were to take the position Josh advocates,
we should never have added FROM/USING to UPDATE/DELETE at all ... but
since we did, I think we should try to make it as flexible as the
corresponding feature in other DBMSes.

On the performance front: yeah, you can recast most joins as subqueries,
but you tend to end up with the equivalent of a nestloop plan.  Works
okay for small numbers of rows, scales horribly.
        regards, tom lane


Re: DELETE syntax on JOINS

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>   
>> On Mon, Aug 24, 2009 at 9:31 PM, Josh Berkus<josh@agliodbs.com> wrote:
>>     
>>> What I don't get is why this is such a usability issue.  Subqueries in
>>> DELETE FROM work perfectly well, and provide more flexibility than most
>>> users know what to do with.
>>>       
>
>   
>> It's both a usability issue and a performance issue.
>>     
>
> On the usability front: if we were to take the position Josh advocates,
> we should never have added FROM/USING to UPDATE/DELETE at all 


FWIW, I use update ... from  a lot - it make many update queries easier 
and less error prone to write -  but I rarely find I need delete ... using.


> ... but
> since we did, I think we should try to make it as flexible as the
> corresponding feature in other DBMSes.
>
>
>   

+1

cheers

andrew


Re: DELETE syntax on JOINS

From
Greg Stark
Date:
On Tue, Aug 25, 2009 at 2:50 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> On the performance front: yeah, you can recast most joins as subqueries,
> but you tend to end up with the equivalent of a nestloop plan.  Works
> okay for small numbers of rows, scales horribly.

Well that's our problem isn't it? I thought we were capable of
genearting semijoins for subqueries these days though?

The problem I thought was if you wanted to pull multiple values out of
the subquery.

So something like

UPDATE foo set a=aa, b=bb FROM bar WHERE ...

If you wanted to do an outer join from foo to bar then how would you
write it as an subquery even if our optimizer could notice the
semijoin and optimize it properly?

You would have to write something like

UPDATE foo set a = (select aa from bar where...)                        b = (select bb from bar where...)

and then the optimizer would have to notice the duplicates and
consolidate them? That seems inconvenient (and fragile).



--
greg
http://mit.edu/~gsstark/resume.pdf


Re: DELETE syntax on JOINS

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> You would have to write something like

> UPDATE foo set a = (select aa from bar where...)
>                          b = (select bb from bar where...)

> and then the optimizer would have to notice the duplicates and
> consolidate them? That seems inconvenient (and fragile).

Well, that's why the spec nowadays allows you to write
UPDATE foo SET (a,b) = (select aa,bb from bar where ...)

But we haven't got that, and if we did it would generate a nestloop
plan.  Getting to the point of absolute performance equivalence between
subqueries and joins would take a *lot* of work; I'm not even sure it's
possible at all.  And once we'd done all that work there would still
remain the fact that people are accustomed to using join syntax instead.
There's a lot of existing code out there that would be a lot easier
to port to PG if we supported that style (which was exactly the point
made by the OP).
        regards, tom lane