Thread: DELETE with JOIN

DELETE with JOIN

From
felix@crowfix.com
Date:
I want to delete with a join condition.  Google shows this is a common
problem, but the only solutions are either for MySQL or they don't
work in my situation because there are too many rows selected.  I also
have to make this work on several databases, includeing, grrr, Oracle,
so non-standard MySQL "solutions" are doubly aggravating.
   DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ?

I have tried to do this before and always found a way, usually
   DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?)

but I have too many rows, millions, in the IN crowd, ha ha, and it
barfs.  EXISTS is no better.  At least Oracle barfs, and I haven't got
to the others yet.  I figured I would go with the worst offender
first, and let me tell you, it is offensive.  Dang I wish it were
postgresql only!

I could write a Dumb Little Test Program (tm) to read in all those IN
ids and execute a zillion individual DELETE statements, but it would
be slow as puke and this little delete is going to come up quite often
now that I have a test program which needs to generate the junky data
and play with it for several days before deleting it and starting over
again.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
Ragnar
Date:
On fim, 2008-08-07 at 09:14 -0700, felix@crowfix.com wrote:
> I want to delete with a join condition.  Google shows this is a common
> problem, but the only solutions are either for MySQL or they don't
> work in my situation because there are too many rows selected.  I also
> have to make this work on several databases, includeing, grrr, Oracle,
> so non-standard MySQL "solutions" are doubly aggravating.
> 
>     DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ?
> 
> I have tried to do this before and always found a way, usually
> 
>     DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?)

did you look at DELETE FROM table1 USING table2 WHERE ... ?

gnari




Re: DELETE with JOIN

From
Steve Midgley
Date:
At 10:05 AM 8/7/2008, pgsql-sql-owner@postgresql.org wrote:
>Date: Thu, 7 Aug 2008 09:14:49 -0700
>From: felix@crowfix.com
>To: pgsql-sql@postgresql.org
>Subject: DELETE with JOIN
>Message-ID: <20080807161449.GA19337@crowfix.com>
>
>I want to delete with a join condition.  Google shows this is a common
>problem, but the only solutions are either for MySQL or they don't
>work in my situation because there are too many rows selected.  I also
>have to make this work on several databases, includeing, grrr, Oracle,
>so non-standard MySQL "solutions" are doubly aggravating.
>
>     DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ?
>
>I have tried to do this before and always found a way, usually
>
>     DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = 
> ?)
>
>but I have too many rows, millions, in the IN crowd, ha ha, and it
>barfs.  EXISTS is no better.  At least Oracle barfs, and I haven't got
>to the others yet.  I figured I would go with the worst offender
>first, and let me tell you, it is offensive.  Dang I wish it were
>postgresql only!
>
>I could write a Dumb Little Test Program (tm) to read in all those IN
>ids and execute a zillion individual DELETE statements, but it would
>be slow as puke and this little delete is going to come up quite often
>now that I have a test program which needs to generate the junky data
>and play with it for several days before deleting it and starting over
>again.

Hi,

Have you tried something where you read in all those "IN id's" and then 
group them into blocks (of say 1,000 or 10,000 or whatever number works 
best)? Then execute:

DELETE FROM a WHERE a.b_id in ([static_list_of_ids])

Replacing in a loop "[static_list_of_ids]" with each block of 1000 id's 
in a comma delimited string? I use this technique sometimes in 
middleware and it works pretty well. There's probably a pure-sql 
solution in Pg as well but this method should work across any SQL 
platform, which seems like one of your requirements.

Steve



Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 05:05:38PM +0000, Ragnar wrote:

> did you look at DELETE FROM table1 USING table2 WHERE ... ?

No, I hadn't known about that.  It looks handy to know about, at
least, but I don't see it for Oracle.  I am going to play with that,
but I don't think it will help here.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 10:40:22AM -0700, Steve Midgley wrote:

> Have you tried something where you read in all those "IN id's" and then 
> group them into blocks (of say 1,000 or 10,000 or whatever number works 
> best)? Then execute:
>
> DELETE FROM a WHERE a.b_id in ([static_list_of_ids])

It may come to something like that, but I figure handing over hubdreds
of static IDs is probably worse for the planner than an expression,
and it's ugly as sin :-)

I tried using "%" for a mod function, but that seems to not be a
universally recognized operator.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
Frank Bax
Date:
felix@crowfix.com wrote:
> On Thu, Aug 07, 2008 at 10:40:22AM -0700, Steve Midgley wrote:
> 
>> Have you tried something where you read in all those "IN id's" and then 
>> group them into blocks (of say 1,000 or 10,000 or whatever number works 
>> best)? Then execute:
>>
>> DELETE FROM a WHERE a.b_id in ([static_list_of_ids])
> 
> It may come to something like that, but I figure handing over hubdreds
> of static IDs is probably worse for the planner than an expression,
> and it's ugly as sin :-)
> 
> I tried using "%" for a mod function, but that seems to not be a
> universally recognized operator.



Could you not achieve the same result with a LIMIT on subSELECT and 
reissue the command until there is nothing to delete?

Is b_id already unique; or should you be using DISTINCT or GROUP BY on 
the subSELECT?

DELETE FROM a WHERE a.b_id IN (SELECT DISTINCT id FROM b WHERE second_id 
= ? LIMIT 1000)

DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ? 
GROUP BY id LIMIT 1000)


If you're really desperate; is it possible to alter table 'a' to add 
column b_id; populate it; delete your rows without a join; then drop the 
column?


Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 03:00:35PM -0400, Frank Bax wrote:

> Could you not achieve the same result with a LIMIT on subSELECT and reissue 
> the command until there is nothing to delete?

Oracle has some barbarous alternative to LIMIT.  I find myself
retching over Oracle almost as much as MySQL.

> If you're really desperate; is it possible to alter table 'a' to add column 
> b_id; populate it; delete your rows without a join; then drop the column?

I thought of something similar, but UPDATE has the same limitation as DELETE.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
Tom Lane
Date:
felix@crowfix.com writes:
> I have tried to do this before and always found a way, usually

>     DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?)

> but I have too many rows, millions, in the IN crowd, ha ha, and it
> barfs.

Define "barfs".  That seems like the standard way to do it, and it
should work.
        regards, tom lane


Re: DELETE with JOIN

From
Frank Bax
Date:
felix@crowfix.com wrote:
> On Thu, Aug 07, 2008 at 03:00:35PM -0400, Frank Bax wrote:
>> If you're really desperate; is it possible to alter table 'a' to add column 
>> b_id; populate it; delete your rows without a join; then drop the column?
> 
> I thought of something similar, but UPDATE has the same limitation as DELETE.


Instead of DELETE; what if you CREATE a new table of the rows you wish 
to keep (using JOIN instead of IN).  Then either drop original table and 
rename new table OR delete all rows and re-populate from new table.

You mentioned that the process of insert/delete is to be repeated.  Are 
all the rows that were inserted; the same ones that will be deleted when 
the cycle is complete?  If yes; then after you delete this batch of 
rows; add a 'junky' column and populate with any value different from 
the default.  Your mass-insert should populate with the default value; 
and you can delete based on this value.


Re: DELETE with JOIN

From
Mark Roberts
Date:
On Thu, 2008-08-07 at 09:14 -0700, felix@crowfix.com wrote:
>    DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ?
> 

This should work for your needs:
delete from a
using b
where a.id = b.id -- join criteria
and b.second_id = ?

> I have tried to do this before and always found a way, usually
>     DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id
> = ?)

This is akin to:
delete from a
where (a.key1, a.key2, a.key3) in (select key1, key2, key3 from b)

I use this every day for millions of rows per delete and it works just
fine and in a very reasonable time period.

-Mark



Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 04:01:29PM -0400, Frank Bax wrote:

> You mentioned that the process of insert/delete is to be repeated.  Are all 
> the rows that were inserted; the same ones that will be deleted when the 
> cycle is complete?  If yes; then after you delete this batch of rows; add a 
> 'junky' column and populate with any value different from the default.  
> Your mass-insert should populate with the default value; and you can delete 
> based on this value.

That would work, and might even be possible, but adding a column just
to mark test data doesn' quite sit right.  This data is, yucch, tax
data, and I have been generating test cases with bogus country names,
like Fredonia -- the taxes themselves are in a dozen tables with
foreign key references holding them together, one of them being the
country name.  I was sort of doing what you suggest by using bogus
country names, but that's only indirect.  To add a column to every
concerned table would be a pain for other reasons.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 03:58:51PM -0400, Tom Lane wrote:
> felix@crowfix.com writes:
> > I have tried to do this before and always found a way, usually
> 
> >     DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?)
> 
> > but I have too many rows, millions, in the IN crowd, ha ha, and it
> > barfs.
> 
> Define "barfs".  That seems like the standard way to do it, and it
> should work.

In this case, the first database I tried was Oracle, and it complained
of too much transactional data; I forget the exact wording now.  It
worked on some cases, but others with "too much" data died with the
complaint after thinking about it for a minute or so.  Since the test
data will only grow in size, I was hoping for some other way.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
Christophe
Date:
On Aug 7, 2008, at 2:39 PM, felix@crowfix.com wrote:
> In this case, the first database I tried was Oracle, and it complained
> of too much transactional data; I forget the exact wording now.

You might try it on PostgreSQL.  While it might have to spill the  
result of the subquery to disk, it shouldn't actually fail unless  
disk is very constrained.


Re: DELETE with JOIN

From
Alvaro Herrera
Date:
felix@crowfix.com wrote:
> On Thu, Aug 07, 2008 at 03:58:51PM -0400, Tom Lane wrote:
> > felix@crowfix.com writes:
> > > I have tried to do this before and always found a way, usually
> > 
> > >     DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?)
> > 
> > > but I have too many rows, millions, in the IN crowd, ha ha, and it
> > > barfs.
> > 
> > Define "barfs".  That seems like the standard way to do it, and it
> > should work.
> 
> In this case, the first database I tried was Oracle, and it complained
> of too much transactional data; I forget the exact wording now.

I suggest you do not assume that Oracle implementation details apply to
Postgres, because they do not, most of the time.  They certainly don't
in this case.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 06:12:32PM -0400, Alvaro Herrera wrote:

> I suggest you do not assume that Oracle implementation details apply to
> Postgres, because they do not, most of the time.  They certainly don't
> in this case.

And I suggest you go back and read where I said I had to do this on
several databases and am trying to avoid custom SQL for each one.  I
would much rather this were postgresql only, but it's not.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o


Re: DELETE with JOIN

From
Christophe
Date:
On Aug 7, 2008, at 4:37 PM, felix@crowfix.com wrote:
> And I suggest you go back and read where I said I had to do this on
> several databases and am trying to avoid custom SQL for each one.  I
> would much rather this were postgresql only, but it's not.

Then it does appear you have an Oracle debugging problem, more than a  
Postgres SQL formulation problem.


Re: DELETE with JOIN

From
"Scott Marlowe"
Date:
On Thu, Aug 7, 2008 at 5:37 PM,  <felix@crowfix.com> wrote:
> On Thu, Aug 07, 2008 at 06:12:32PM -0400, Alvaro Herrera wrote:
>
>> I suggest you do not assume that Oracle implementation details apply to
>> Postgres, because they do not, most of the time.  They certainly don't
>> in this case.
>
> And I suggest you go back and read where I said I had to do this on
> several databases and am trying to avoid custom SQL for each one.  I
> would much rather this were postgresql only, but it's not.

My experience with Oracle and PostgreSQL together tell me that you're
better off researching a proper method for each db independent of the
other.  While it may be possible to come with a generic method that
works well in both dbs for a lot of things, this doesn't look like one
of them.


Re: DELETE with JOIN

From
felix@crowfix.com
Date:
On Thu, Aug 07, 2008 at 10:55:17PM -0400, Joe wrote:

> I recall a similar problem ages ago and IIRC it was due to Oracle's locking 
> configuration, i.e., some parameter had to be increased and the instance 
> restarted so it could handle the transaction (or it had to be done in 
> chunks).

I gather that the general consensus is that Oracle and Postgresql are
different enough that I will have to figure them out independently.  I
have found that Postgresql is more tolerant of operator ignorance, so
I had hoped that any Oracle solution would work elsewhere, but maybe
not.

Thanks for the helpful responses.

--            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.    Felix Finch: scarecrow repairman & rocket
surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
 
I've found a solution to Fermat's Last Theorem but I see I've run out of room o