Thread: FAQ addition: deleteing all but one unique row
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message Small inline patch to address a frequently asked question about how to delete all-but-one unique rows. -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200302101008 Index: FAQ.html =================================================================== RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v retrieving revision 1.164 diff -c -r1.164 FAQ.html *** FAQ.html 2002/12/05 05:47:44 1.164 --- FAQ.html 2003/02/10 15:08:00 *************** *** 139,144 **** --- 139,145 ---- temporary tables in PL/PgSQL functions?<BR> <A href="#4.27">4.27</A>) What replication options are available?<BR> <A href="#4.28">4.28</A>) What encryption options are available?<BR> + <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR> <H2 align="center">Extending PostgreSQL</H2> *************** *** 1381,1386 **** --- 1382,1406 ---- <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI> <LI>The server can run using an encrypted file system.</LI> </UL> + + <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR> + </H4> + <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot + distinguish them apart. Each row always has a unique system column named + <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use + <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows, + then remove all matching rows except the one with that particular ctid:</P> + <PRE> + DELETE FROM mytable WHERE ctid != + (SELECT ctid FROM mytable WHERE id=12 LIMIT 1); + </PRE> + + <P>In the above example, all rows in the table named 'mytable' having a value + of 12 in the 'id' column will be deleted except for one. Exactly + which row is kept should not matter, as they are all otherwise identical. + The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but + because tables can be created without an oid column, the use of ctid + is preferred.</P> <HR> -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+R8BsvJuQZxSWSsgRAtyCAKCyjwFyjJuy92ayfIpci7roRTgfOQCgsqJo zoVZu5Qh++1ryvZFclrl8Dc= =Xw0Z -----END PGP SIGNATURE-----
greg@turnstep.com writes: > + <PRE> > + DELETE FROM mytable WHERE ctid != > + (SELECT ctid FROM mytable WHERE id=12 LIMIT 1); > + </PRE> > + > + <P>In the above example, all rows in the table named 'mytable' having a value > + of 12 in the 'id' column will be deleted except for one. You haven't actually tested this advice, have you? The delete as given will delete all but one row, full stop. Or at least it would if it worked at all. There's no != operator defined for CTID. (You could get around that with "NOT ctid =".) regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Or at least it would if it worked at all. There's no != operator > defined for CTID. (You could get around that with "NOT ctid =".) My bad. Forgot my own rule to never send email to a list on Monday mornings. Thanks. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200302101031 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+R8X7vJuQZxSWSsgRAgiQAJwNDKdJ2T7QNg7JXqwCRTtsRO5egQCg4/uJ Ie+eLWtZDkVai1L1da+1pTA= =u9qp -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message Second shot at a small doc patch, this time with testing. :) -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200302121158 Index: FAQ.html =================================================================== RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v retrieving revision 1.164 diff -c -r1.164 FAQ.html *** FAQ.html 2002/12/05 05:47:44 1.164 --- FAQ.html 2003/02/10 15:08:00 *************** *** 139,144 **** --- 139,145 ---- temporary tables in PL/PgSQL functions?<BR> <A href="#4.27">4.27</A>) What replication options are available?<BR> <A href="#4.28">4.28</A>) What encryption options are available?<BR> + <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR> <H2 align="center">Extending PostgreSQL</H2> *************** *** 1381,1386 **** --- 1382,1406 ---- <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI> <LI>The server can run using an encrypted file system.</LI> </UL> + + <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR> + </H4> + <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot + distinguish them apart. Each row always has a unique system column named + <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use + <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows, + then remove all matching rows except the one with that particular ctid:</P> + <PRE> + DELETE FROM mytable WHERE NOT ctid = + (SELECT ctid FROM mytable WHERE id=12 LIMIT 1); + </PRE> + + <P>In the above example, all rows in the table named 'mytable' having a value + of 12 in the 'id' column will be deleted except for one. Exactly + which row is kept should not matter, as they are all otherwise identical. + The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but + because tables can be created without an oid column, the use of ctid + is preferred.</P> <HR> -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+Sn2LvJuQZxSWSsgRAlqDAJ930nb9V8hjAB1eh9Z7U6KU5mtSqwCeORKy ONNSW87tAIAzV/WveYSAiK8= =LOGw -----END PGP SIGNATURE-----
You have still not addressed Tom's initial complaint about the delete deleting all but one row in the table. Shouldn't there be a "WHERE id=12" on the delete as well? Kris Jurka On Wed, 12 Feb 2003 greg@turnstep.com wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > Second shot at a small doc patch, this time with testing. :) > > -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200302121158 > > > Index: FAQ.html > =================================================================== > RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v > retrieving revision 1.164 > diff -c -r1.164 FAQ.html > *** FAQ.html 2002/12/05 05:47:44 1.164 > --- FAQ.html 2003/02/10 15:08:00 > *************** > *** 139,144 **** > --- 139,145 ---- > temporary tables in PL/PgSQL functions?<BR> > <A href="#4.27">4.27</A>) What replication options are available?<BR> > <A href="#4.28">4.28</A>) What encryption options are available?<BR> > + <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR> > > > <H2 align="center">Extending PostgreSQL</H2> > *************** > *** 1381,1386 **** > --- 1382,1406 ---- > <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI> > <LI>The server can run using an encrypted file system.</LI> > </UL> > + > + <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR> > + </H4> > + <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot > + distinguish them apart. Each row always has a unique system column named > + <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use > + <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows, > + then remove all matching rows except the one with that particular ctid:</P> > + <PRE> > + DELETE FROM mytable WHERE NOT ctid = > + (SELECT ctid FROM mytable WHERE id=12 LIMIT 1); > + </PRE> > + > + <P>In the above example, all rows in the table named 'mytable' having a value > + of 12 in the 'id' column will be deleted except for one. Exactly > + which row is kept should not matter, as they are all otherwise identical. > + The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but > + because tables can be created without an oid column, the use of ctid > + is preferred.</P> > > <HR> > > -----BEGIN PGP SIGNATURE----- > Comment: http://www.turnstep.com/pgp.html > > iD8DBQE+Sn2LvJuQZxSWSsgRAlqDAJ930nb9V8hjAB1eh9Z7U6KU5mtSqwCeORKy > ONNSW87tAIAzV/WveYSAiK8= > =LOGw > -----END PGP SIGNATURE----- > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message > You have still not addressed Tom's initial complaint about > the delete deleting all but one row in the table. Shouldn't > there be a "WHERE id=12" on the delete as well? Yes, thank you, there should! Third time is the charm: Index: FAQ.html =================================================================== RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v retrieving revision 1.164 diff -c -r1.164 FAQ.html *** FAQ.html 2002/12/05 05:47:44 1.164 --- FAQ.html 2003/02/10 15:08:00 *************** *** 139,144 **** --- 139,145 ---- temporary tables in PL/PgSQL functions?<BR> <A href="#4.27">4.27</A>) What replication options are available?<BR> <A href="#4.28">4.28</A>) What encryption options are available?<BR> + <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR> <H2 align="center">Extending PostgreSQL</H2> *************** *** 1381,1386 **** --- 1382,1406 ---- <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI> <LI>The server can run using an encrypted file system.</LI> </UL> + + <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR> + </H4> + <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot + distinguish them apart. Each row always has a unique system column named + <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use + <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows, + then remove all matching rows except the one with that particular ctid:</P> + <PRE> + DELETE FROM mytable WHERE id=12 AND NOT ctid = + (SELECT ctid FROM mytable WHERE id=12 LIMIT 1); + </PRE> + + <P>In the above example, all rows in the table named 'mytable' having a value + of 12 in the 'id' column will be deleted except for one. Exactly + which row is kept should not matter, as they are all otherwise identical. + The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but + because tables can be created without an oid column, the use of ctid + is preferred.</P> <HR> -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200302121640 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+SsAZvJuQZxSWSsgRAu4OAJ9SrvLn/Tv9z9/gTMqtn+EflL45BgCeLxO8 uzmdojhtoXVpush7kkoSMn4= =sPIw -----END PGP SIGNATURE-----
Is this asked frequently enough to be an FAQ? --------------------------------------------------------------------------- greg@turnstep.com wrote: [ There is text before PGP section. ] > [ PGP not available, raw data follows ] > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > > You have still not addressed Tom's initial complaint about > > the delete deleting all but one row in the table. Shouldn't > > there be a "WHERE id=12" on the delete as well? > > Yes, thank you, there should! Third time is the charm: > > Index: FAQ.html > =================================================================== > RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v > retrieving revision 1.164 > diff -c -r1.164 FAQ.html > *** FAQ.html 2002/12/05 05:47:44 1.164 > --- FAQ.html 2003/02/10 15:08:00 > *************** > *** 139,144 **** > --- 139,145 ---- > temporary tables in PL/PgSQL functions?<BR> > <A href="#4.27">4.27</A>) What replication options are available?<BR> > <A href="#4.28">4.28</A>) What encryption options are available?<BR> > + <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR> > > > <H2 align="center">Extending PostgreSQL</H2> > *************** > *** 1381,1386 **** > --- 1382,1406 ---- > <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI> > <LI>The server can run using an encrypted file system.</LI> > </UL> > + > + <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR> > + </H4> > + <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot > + distinguish them apart. Each row always has a unique system column named > + <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use > + <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows, > + then remove all matching rows except the one with that particular ctid:</P> > + <PRE> > + DELETE FROM mytable WHERE id=12 AND NOT ctid = > + (SELECT ctid FROM mytable WHERE id=12 LIMIT 1); > + </PRE> > + > + <P>In the above example, all rows in the table named 'mytable' having a value > + of 12 in the 'id' column will be deleted except for one. Exactly > + which row is kept should not matter, as they are all otherwise identical. > + The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but > + because tables can be created without an oid column, the use of ctid > + is preferred.</P> > > <HR> > > > > > -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200302121640 > > -----BEGIN PGP SIGNATURE----- > Comment: http://www.turnstep.com/pgp.html > > iD8DBQE+SsAZvJuQZxSWSsgRAu4OAJ9SrvLn/Tv9z9/gTMqtn+EflL45BgCeLxO8 > uzmdojhtoXVpush7kkoSMn4= > =sPIw > -----END PGP SIGNATURE----- > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > [ Decrypting message... End of raw data. ] -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Is this asked frequently enough to be an FAQ? I believe it is. I've answered this three times myself in the last couple of months. However, I am not happy with the way I've worded it so I'd like to submit a new patch for it. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200302180941 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+UkY2vJuQZxSWSsgRAs+gAJ0eYXYzqmAoC34laxnM1zl7j6oZMACggwPE 3YfaIhkfUSL6+lPx/nsKH4I= =nhez -----END PGP SIGNATURE-----
greg@turnstep.com wrote: [ There is text before PGP section. ] > [ PGP not available, raw data follows ] > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Is this asked frequently enough to be an FAQ? > > I believe it is. I've answered this three times myself in the > last couple of months. However, I am not happy with the way > I've worded it so I'd like to submit a new patch for it. The other idea would be to add it to the DELETE manual page. Seems that is where most folks are going to need it. I always try to push FAQ information into the docs if there is a logical place for it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 2003-02-18 at 12:36, Bruce Momjian wrote: > The other idea would be to add it to the DELETE manual page. It definitely doesn't seem appropriate for the DELETE reference page, though. Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC