Thread: DELETE with JOIN
I want to delete with a join condition. Google shows this is a common problem, but the only solutions are either for MySQL or they don't work in my situation because there are too many rows selected. I also have to make this work on several databases, includeing, grrr, Oracle, so non-standard MySQL "solutions" are doubly aggravating. DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ? I have tried to do this before and always found a way, usually DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?) but I have too many rows, millions, in the IN crowd, ha ha, and it barfs. EXISTS is no better. At least Oracle barfs, and I haven't got to the others yet. I figured I would go with the worst offender first, and let me tell you, it is offensive. Dang I wish it were postgresql only! I could write a Dumb Little Test Program (tm) to read in all those IN ids and execute a zillion individual DELETE statements, but it would be slow as puke and this little delete is going to come up quite often now that I have a test program which needs to generate the junky data and play with it for several days before deleting it and starting over again. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
On fim, 2008-08-07 at 09:14 -0700, felix@crowfix.com wrote: > I want to delete with a join condition. Google shows this is a common > problem, but the only solutions are either for MySQL or they don't > work in my situation because there are too many rows selected. I also > have to make this work on several databases, includeing, grrr, Oracle, > so non-standard MySQL "solutions" are doubly aggravating. > > DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ? > > I have tried to do this before and always found a way, usually > > DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?) did you look at DELETE FROM table1 USING table2 WHERE ... ? gnari
At 10:05 AM 8/7/2008, pgsql-sql-owner@postgresql.org wrote: >Date: Thu, 7 Aug 2008 09:14:49 -0700 >From: felix@crowfix.com >To: pgsql-sql@postgresql.org >Subject: DELETE with JOIN >Message-ID: <20080807161449.GA19337@crowfix.com> > >I want to delete with a join condition. Google shows this is a common >problem, but the only solutions are either for MySQL or they don't >work in my situation because there are too many rows selected. I also >have to make this work on several databases, includeing, grrr, Oracle, >so non-standard MySQL "solutions" are doubly aggravating. > > DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ? > >I have tried to do this before and always found a way, usually > > DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = > ?) > >but I have too many rows, millions, in the IN crowd, ha ha, and it >barfs. EXISTS is no better. At least Oracle barfs, and I haven't got >to the others yet. I figured I would go with the worst offender >first, and let me tell you, it is offensive. Dang I wish it were >postgresql only! > >I could write a Dumb Little Test Program (tm) to read in all those IN >ids and execute a zillion individual DELETE statements, but it would >be slow as puke and this little delete is going to come up quite often >now that I have a test program which needs to generate the junky data >and play with it for several days before deleting it and starting over >again. Hi, Have you tried something where you read in all those "IN id's" and then group them into blocks (of say 1,000 or 10,000 or whatever number works best)? Then execute: DELETE FROM a WHERE a.b_id in ([static_list_of_ids]) Replacing in a loop "[static_list_of_ids]" with each block of 1000 id's in a comma delimited string? I use this technique sometimes in middleware and it works pretty well. There's probably a pure-sql solution in Pg as well but this method should work across any SQL platform, which seems like one of your requirements. Steve
On Thu, Aug 07, 2008 at 05:05:38PM +0000, Ragnar wrote: > did you look at DELETE FROM table1 USING table2 WHERE ... ? No, I hadn't known about that. It looks handy to know about, at least, but I don't see it for Oracle. I am going to play with that, but I don't think it will help here. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
On Thu, Aug 07, 2008 at 10:40:22AM -0700, Steve Midgley wrote: > Have you tried something where you read in all those "IN id's" and then > group them into blocks (of say 1,000 or 10,000 or whatever number works > best)? Then execute: > > DELETE FROM a WHERE a.b_id in ([static_list_of_ids]) It may come to something like that, but I figure handing over hubdreds of static IDs is probably worse for the planner than an expression, and it's ugly as sin :-) I tried using "%" for a mod function, but that seems to not be a universally recognized operator. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
felix@crowfix.com wrote: > On Thu, Aug 07, 2008 at 10:40:22AM -0700, Steve Midgley wrote: > >> Have you tried something where you read in all those "IN id's" and then >> group them into blocks (of say 1,000 or 10,000 or whatever number works >> best)? Then execute: >> >> DELETE FROM a WHERE a.b_id in ([static_list_of_ids]) > > It may come to something like that, but I figure handing over hubdreds > of static IDs is probably worse for the planner than an expression, > and it's ugly as sin :-) > > I tried using "%" for a mod function, but that seems to not be a > universally recognized operator. Could you not achieve the same result with a LIMIT on subSELECT and reissue the command until there is nothing to delete? Is b_id already unique; or should you be using DISTINCT or GROUP BY on the subSELECT? DELETE FROM a WHERE a.b_id IN (SELECT DISTINCT id FROM b WHERE second_id = ? LIMIT 1000) DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ? GROUP BY id LIMIT 1000) If you're really desperate; is it possible to alter table 'a' to add column b_id; populate it; delete your rows without a join; then drop the column?
On Thu, Aug 07, 2008 at 03:00:35PM -0400, Frank Bax wrote: > Could you not achieve the same result with a LIMIT on subSELECT and reissue > the command until there is nothing to delete? Oracle has some barbarous alternative to LIMIT. I find myself retching over Oracle almost as much as MySQL. > If you're really desperate; is it possible to alter table 'a' to add column > b_id; populate it; delete your rows without a join; then drop the column? I thought of something similar, but UPDATE has the same limitation as DELETE. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
felix@crowfix.com writes: > I have tried to do this before and always found a way, usually > DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?) > but I have too many rows, millions, in the IN crowd, ha ha, and it > barfs. Define "barfs". That seems like the standard way to do it, and it should work. regards, tom lane
felix@crowfix.com wrote: > On Thu, Aug 07, 2008 at 03:00:35PM -0400, Frank Bax wrote: >> If you're really desperate; is it possible to alter table 'a' to add column >> b_id; populate it; delete your rows without a join; then drop the column? > > I thought of something similar, but UPDATE has the same limitation as DELETE. Instead of DELETE; what if you CREATE a new table of the rows you wish to keep (using JOIN instead of IN). Then either drop original table and rename new table OR delete all rows and re-populate from new table. You mentioned that the process of insert/delete is to be repeated. Are all the rows that were inserted; the same ones that will be deleted when the cycle is complete? If yes; then after you delete this batch of rows; add a 'junky' column and populate with any value different from the default. Your mass-insert should populate with the default value; and you can delete based on this value.
On Thu, 2008-08-07 at 09:14 -0700, felix@crowfix.com wrote: > DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ? > This should work for your needs: delete from a using b where a.id = b.id -- join criteria and b.second_id = ? > I have tried to do this before and always found a way, usually > DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id > = ?) This is akin to: delete from a where (a.key1, a.key2, a.key3) in (select key1, key2, key3 from b) I use this every day for millions of rows per delete and it works just fine and in a very reasonable time period. -Mark
On Thu, Aug 07, 2008 at 04:01:29PM -0400, Frank Bax wrote: > You mentioned that the process of insert/delete is to be repeated. Are all > the rows that were inserted; the same ones that will be deleted when the > cycle is complete? If yes; then after you delete this batch of rows; add a > 'junky' column and populate with any value different from the default. > Your mass-insert should populate with the default value; and you can delete > based on this value. That would work, and might even be possible, but adding a column just to mark test data doesn' quite sit right. This data is, yucch, tax data, and I have been generating test cases with bogus country names, like Fredonia -- the taxes themselves are in a dozen tables with foreign key references holding them together, one of them being the country name. I was sort of doing what you suggest by using bogus country names, but that's only indirect. To add a column to every concerned table would be a pain for other reasons. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
On Thu, Aug 07, 2008 at 03:58:51PM -0400, Tom Lane wrote: > felix@crowfix.com writes: > > I have tried to do this before and always found a way, usually > > > DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?) > > > but I have too many rows, millions, in the IN crowd, ha ha, and it > > barfs. > > Define "barfs". That seems like the standard way to do it, and it > should work. In this case, the first database I tried was Oracle, and it complained of too much transactional data; I forget the exact wording now. It worked on some cases, but others with "too much" data died with the complaint after thinking about it for a minute or so. Since the test data will only grow in size, I was hoping for some other way. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
On Aug 7, 2008, at 2:39 PM, felix@crowfix.com wrote: > In this case, the first database I tried was Oracle, and it complained > of too much transactional data; I forget the exact wording now. You might try it on PostgreSQL. While it might have to spill the result of the subquery to disk, it shouldn't actually fail unless disk is very constrained.
felix@crowfix.com wrote: > On Thu, Aug 07, 2008 at 03:58:51PM -0400, Tom Lane wrote: > > felix@crowfix.com writes: > > > I have tried to do this before and always found a way, usually > > > > > DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = ?) > > > > > but I have too many rows, millions, in the IN crowd, ha ha, and it > > > barfs. > > > > Define "barfs". That seems like the standard way to do it, and it > > should work. > > In this case, the first database I tried was Oracle, and it complained > of too much transactional data; I forget the exact wording now. I suggest you do not assume that Oracle implementation details apply to Postgres, because they do not, most of the time. They certainly don't in this case. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Aug 07, 2008 at 06:12:32PM -0400, Alvaro Herrera wrote: > I suggest you do not assume that Oracle implementation details apply to > Postgres, because they do not, most of the time. They certainly don't > in this case. And I suggest you go back and read where I said I had to do this on several databases and am trying to avoid custom SQL for each one. I would much rather this were postgresql only, but it's not. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
On Aug 7, 2008, at 4:37 PM, felix@crowfix.com wrote: > And I suggest you go back and read where I said I had to do this on > several databases and am trying to avoid custom SQL for each one. I > would much rather this were postgresql only, but it's not. Then it does appear you have an Oracle debugging problem, more than a Postgres SQL formulation problem.
On Thu, Aug 7, 2008 at 5:37 PM, <felix@crowfix.com> wrote: > On Thu, Aug 07, 2008 at 06:12:32PM -0400, Alvaro Herrera wrote: > >> I suggest you do not assume that Oracle implementation details apply to >> Postgres, because they do not, most of the time. They certainly don't >> in this case. > > And I suggest you go back and read where I said I had to do this on > several databases and am trying to avoid custom SQL for each one. I > would much rather this were postgresql only, but it's not. My experience with Oracle and PostgreSQL together tell me that you're better off researching a proper method for each db independent of the other. While it may be possible to come with a generic method that works well in both dbs for a lot of things, this doesn't look like one of them.
On Thu, Aug 07, 2008 at 10:55:17PM -0400, Joe wrote: > I recall a similar problem ages ago and IIRC it was due to Oracle's locking > configuration, i.e., some parameter had to be increased and the instance > restarted so it could handle the transaction (or it had to be done in > chunks). I gather that the general consensus is that Oracle and Postgresql are different enough that I will have to figure them out independently. I have found that Postgresql is more tolerant of operator ignorance, so I had hoped that any Oracle solution would work elsewhere, but maybe not. Thanks for the helpful responses. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon/ felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o