Thread: Delete with subquery deleting all records
I have two tables exports export_messages They were done without a foreign key and I am trying to clean the data to put a constraint. For every record in exports_messages there is supposed to be a matching record in exports with a export_id (ie export_id is the foreign key for export_messages) The following query identified all records that I need to delete: SELECT distinct export_messages.export_id as id FROM export_messages LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id); I checked a number of them.. and all the records returned from that select do not have a matching export_id in exports. When I try to run: delete from export_messages where export_id in (SELECT distinct export_messages.export_id as id FROM export_messages LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id) ); All records from export_messages get deleted. Any suggestions?
Hint: LEFT JOIN is your mistake... Thought: are you sure you are going to delete those rows? In there cases human verification is usually the way to go, though it takes a lot of time. Read on... >-----Original Message----- >From: pgsql-general-owner@postgresql.org >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of >Francisco Reyes >Sent: donderdag 24 mei 2007 1:12 >To: PostgreSQL general >Subject: [GENERAL] Delete with subquery deleting all records > >I have two tables >exports >export_messages > >They were done without a foreign key and I am trying to clean >the data to put a constraint. > >For every record in exports_messages there is supposed to be a >matching record in exports with a export_id (ie export_id is >the foreign key for >export_messages) > >The following query identified all records that I need to delete: >SELECT distinct export_messages.export_id as id FROM >export_messages LEFT OUTER JOIN exports ON >(export_messages.export_id = exports.export_id); > >I checked a number of them.. and all the records returned from >that select do not have a matching export_id in exports. > >When I try to run: >delete from export_messages where export_id in (SELECT >distinct export_messages.export_id as id FROM export_messages >LEFT OUTER JOIN exports ON (export_messages.export_id = >exports.export_id) ); > What seems more resonable: DELETE FORM export_messages WHERE NOT export_id IN (SELECT export_id FROM exports) Make sure you know what you are doing (backup)... [snip] - Joris
Joris Dobbelsteen writes: > Hint: LEFT JOIN is your mistake... The use of left join in general.. or my left join? When I do the left join by itself I verified manually and it had the data I wanted. > Thought: are you sure you are going to delete those rows? In there cases > human verification is usually the way to go, though it takes a lot of > time. If I can't not find a way to do this through SQL I will write a program. The time to try and clean this by hand would be countless hours. There are a lot of records in the child table that do not have a matching record in the parent table.
>-----Original Message----- >From: Francisco Reyes [mailto:lists@stringsutils.com] >Sent: donderdag 24 mei 2007 2:04 >To: Joris Dobbelsteen >Cc: PostgreSQL general >Subject: Re: [GENERAL] Delete with subquery deleting all records > >Joris Dobbelsteen writes: > >> Hint: LEFT JOIN is your mistake... > >The use of left join in general.. or my left join? >When I do the left join by itself I verified manually and it >had the data I wanted. Your usage in this context... Did you really check your list thoroughly. >>>SELECT distinct export_messages.export_id as id >>>FROM export_messages >>>>LEFT OUTER JOIN exports ON >>>(export_messages.export_id = exports.export_id); Take any value from "SELECT export_id FROM exports" Does it not exist in your list? Try this: SELECT distinct export_messages.export_id as id, exports.export_id as exports_export_id FROM export_messages LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id) WHERE exports.export_id IS NOT NULL; At this point you should know whats going wrong... >>>DELETE FROM export_messages >>>WHERE export_id IN >>> (SELECT distinct export_messages.export_id as id >>> FROM export_messages >>> LEFT OUTER JOIN exports The LEFT OUTER join will at all times return ALL rows from export_messages. In effect, you generate a list with ALL export_messages.export_id. Thus we must conclude that for every row you are trying to delete, the condition must evaluate to true. >>> ON (export_messages.export_id = exports.export_id) >>> ); >> Thought: are you sure you are going to delete those rows? In there >> cases human verification is usually the way to go, though it takes a >> lot of time. > >If I can't not find a way to do this through SQL I will write >a program. >The time to try and clean this by hand would be countless >hours. There are a lot of records in the child table that do >not have a matching record in the parent table. That's the trade-off: effects of a mistake * chance of a mistake against the cost to prevent these. Hope this helps... - Joris
Joris Dobbelsteen writes: > Did you really check your list thoroughly. >>>>SELECT distinct export_messages.export_id as id >>>>FROM export_messages >>>>>LEFT OUTER JOIN exports ON >>>>(export_messages.export_id = exports.export_id); > > Take any value from "SELECT export_id FROM exports" > Does it not exist in your list? Correct. I thought of that.. and the outerjoin as I showed.. only shows values that are in export_messages but are not in exports. I went over nearly 100 values and that select only had the right values. > Try this: > SELECT distinct export_messages.export_id as id, > exports.export_id as exports_export_id > FROM export_messages > LEFT OUTER JOIN exports ON > (export_messages.export_id = exports.export_id) > WHERE exports.export_id IS NOT NULL; Thanks will try it. > The LEFT OUTER join will at all times return ALL rows from > export_messages. What is very, very strange is that it didn't return all values. > In effect, you generate a list with ALL export_messages.expor > we must conclude that for every row you are trying to delete, the > condition must evaluate to true. This is what was driving me crazy.. when I did the select by itself the list was correct. > That's the trade-off: effects of a mistake * chance of a mistake against > the cost to prevent these. Thanks much. Will try your query. doing all this within a transaction so I can double check the results.. that is the primary reason i would rather get it done from within psql. If I do it in a program I will have no easy way to tell if I am doing the right thing... Small tests.. and print statements will helpfully help, but once I believe the program is working.. and run it.. the only solution is a restore (I do a backup before doing any changes of course).
Joris Dobbelsteen writes: > Try this: > SELECT distinct export_messages.export_id as id, > exports.export_id as exports_export_id > FROM export_messages > LEFT OUTER JOIN exports ON > (export_messages.export_id = exports.export_id) > WHERE exports.export_id IS NOT NULL; In my case I needed "IS NULL" Your query worked. Thanks!!! However.. I find it very strange that just the selects by themselves produced the same ouput up to limit 100. SELECT distinct export_messages.export_id as id FROM export_messages LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id) limit 100; and SELECT distinct export_messages.export_id as id FROM export_messages LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id) WHERE exports.export_id IS NULL limit 100; Produced the same output. > At this point you should know whats going wrong... > >>>>DELETE FROM export_messages >>>>WHERE export_id IN >>>> (SELECT distinct export_messages.export_id as id >>>> FROM export_messages >>>> LEFT OUTER JOIN exports > > The LEFT OUTER join will at all times return ALL rows from > export_messages. > In effect, you generate a list with ALL export_messages.export_id. Thus > we must conclude that for every row you are trying to delete, the > condition must evaluate to true. > >>>> ON (export_messages.export_id = exports.export_id) >>>> ); > >>> Thought: are you sure you are going to delete those rows? In there >>> cases human verification is usually the way to go, though it takes a >>> lot of time. >> >>If I can't not find a way to do this through SQL I will write >>a program. >>The time to try and clean this by hand would be countless >>hours. There are a lot of records in the child table that do >>not have a matching record in the parent table. > > That's the trade-off: effects of a mistake * chance of a mistake against > the cost to prevent these. > > Hope this helps... > > - Joris > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
Francisco Reyes wrote: > When I try to run: > delete from export_messages where export_id in > (SELECT distinct export_messages.export_id as id > FROM export_messages > LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id) > ); Why not use EXISTS? DELETE FROM export_messages WHERE NOT EXISTS ( SELECT 1 FROM exports WHERE exports.export_id = export_messages.export_id ) I suppose you run those queries in a transaction block, right? -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys writes: > Why not use EXISTS? > > DELETE FROM export_messages WHERE NOT EXISTS ( > SELECT 1 > FROM exports > WHERE exports.export_id = export_messages.export_id > ) Didn't think of it. Thanks for the code. > I suppose you run those queries in a transaction block, right? Correct. Also I do a backup before doing the deletions.
Francisco Reyes wrote: > Alban Hertroys writes: > >> I suppose you run those queries in a transaction block, right? > > Correct. > Also I do a backup before doing the deletions. Note that if you're in a transaction, you don't technically need the backup (doesn't hurt though) as if you get it wrong you can just roll it back.
Scott Marlowe writes: > Note that if you're in a transaction, you don't technically need the > backup (doesn't hurt though) as if you get it wrong you can just roll it > back. I know.. I do the backup in case I forget to do the transaction. :-)
Your top-posting was confusing to me, but I eventually figured out what went where. Francisco Reyes wrote: > However.. I find it very strange that just the selects by themselves > produced the same ouput up to limit 100. Strange? Why? Did you expect a particular statistical distribution? Perhaps you were surprised by the extent of the situation, not thinking there could be 100 records that didn't match? Apparently by coincidence the first 100 distinct values returned from export_messages just happened not to have corresponding rows in exports. There is really nothing strange about your result, just like there's really nothing strange about getting 100 heads in a row in a coin toss. > SELECT distinct export_messages.export_id as id > FROM export_messages > LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id) > limit 100; > > and > SELECT distinct export_messages.export_id as id > FROM export_messages > LEFT OUTER JOIN exports ON > (export_messages.export_id = exports.export_id) > WHERE exports.export_id IS NULL limit 100; > > Produced the same output. The unconstrained outer join is guaranteed to return every distinct value of export_messages.export_id, the LEFT table, by the definition of LEFT OUTER JOIN. As Alban said, a better query would be (SELECT column changed from his suggestion): DELETE FROM export_messages WHERE NOT EXISTS ( SELECT export_id FROM exports WHERE exports.export_id = export_messages.export_id ); or DELETE FROM export_messages WHERE export_id NOT IN ( SELECT export_id FROM exports ); -- Lew
Lew writes: > Strange? Why? Did you expect a particular statistical distribution? Perhaps The impression was that one query was returning everything.. and the other only the records that did not exist in the one table. > you were surprised by the extent of the situation, not thinking there could be > 100 records that didn't match? Surprised that the outer join actually did ONLY display records that did not exist in the second table, even though I did not have a where clause to not list the records with a NULL value. > The unconstrained outer join is guaranteed to return every distinct value of > export_messages.export_id, the LEFT table, by the definition of LEFT OUTER JOIN. That is what I expected, BUT it only returned records that did NOT exist in the second table. It did not, as far as I could check, return all records. > > DELETE FROM export_messages WHERE NOT EXISTS > ( SELECT export_id FROM exports > WHERE exports.export_id = export_messages.export_id > ); That is what I ended up using. It worked. > DELETE FROM export_messages WHERE export_id NOT IN > ( SELECT export_id FROM exports ); Will keep that one handy too for future reference. I think I may have not explained properly what I think I was seeing.. The left outter join without the where clause seemed to return the right data.. only rows that existed in one table, but not in the other. I looked at a few hundred records and check a good deal of them manually. The issue was that I used it as a subquery to delete it seemed to produce the entire list (ie all records from both tables).. It is possible, or even likely, that I did something wrong or had some form of oversight.. After I got the suggestion with the better query (with the where clause) I did not go back to test anymore. Thanks for your feedback.
Francisco Reyes wrote: > Lew writes: > >> Strange? Why? Did you expect a particular statistical distribution? >> Perhaps > > > The impression was that one query was returning everything.. and the > other only the records that did not exist in the one table. > >> you were surprised by the extent of the situation, not thinking there >> could be 100 records that didn't match? > > Surprised that the outer join actually did ONLY display records that did > not exist in the second table, even though I did not have a where clause > to not list the records with a NULL value. You only looked at some of the records, not all of them, correct? Ah, yes, you did say, > I checked a number of them. Your evaluation of a whole data set by manual examination of a small subset of the returned results cannot be certain. Did you try SELECT COUNT(*) to check if the queries differed in the size of their returned result sets? > That is what I expected, BUT it only returned records that did NOT exist in the second table. It did not, as far as I couldcheck, return all records. You mean "as far as you did check". You still do not know the truth of your assertion that the outer join returned only a subset of the records. SELECT COUNT( DISTINCT export_messages.export_id ) FROM export_messages LEFT OUTER JOIN exports ON (export_messages.export_id = exports.export_id); vs. SELECT COUNT( DISTINCT export_messages.export_id ) FROM export_messages; will reveal. -- Lew