Thread: FAQ addition: deleteing all but one unique row

FAQ addition: deleteing all but one unique row

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message


Small inline patch to address a frequently asked question about how
to delete all-but-one unique rows.

--
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302101008



Index: FAQ.html
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v
retrieving revision 1.164
diff -c -r1.164 FAQ.html
*** FAQ.html                2002/12/05 05:47:44                    1.164
--- FAQ.html                2003/02/10 15:08:00
***************
*** 139,144 ****
--- 139,145 ----
      temporary tables in PL/PgSQL functions?<BR>
       <A href="#4.27">4.27</A>) What replication options are available?<BR>
       <A href="#4.28">4.28</A>) What encryption options are available?<BR>
+      <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR>


      <H2 align="center">Extending PostgreSQL</H2>
***************
*** 1381,1386 ****
--- 1382,1406 ----
      <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI>
      <LI>The server can run using an encrypted file system.</LI>
      </UL>
+
+     <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR>
+     </H4>
+     <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot
+        distinguish them apart. Each row always has a unique system column named
+        <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use
+        <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows,
+        then remove all matching rows except the one with that particular ctid:</P>
+ <PRE>
+        DELETE FROM mytable WHERE ctid !=
+          (SELECT ctid FROM mytable WHERE id=12 LIMIT 1);
+ </PRE>
+
+     <P>In the above example, all rows in the table named 'mytable' having a value
+        of 12 in the 'id' column will be deleted except for one. Exactly
+        which row is kept should not matter, as they are all otherwise identical.
+        The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but
+        because tables can be created without an oid column, the use of ctid
+        is preferred.</P>

      <HR>

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+R8BsvJuQZxSWSsgRAtyCAKCyjwFyjJuy92ayfIpci7roRTgfOQCgsqJo
zoVZu5Qh++1ryvZFclrl8Dc=
=Xw0Z
-----END PGP SIGNATURE-----



Re: FAQ addition: deleteing all but one unique row

From
Tom Lane
Date:
greg@turnstep.com writes:
> + <PRE>
> +        DELETE FROM mytable WHERE ctid !=
> +          (SELECT ctid FROM mytable WHERE id=12 LIMIT 1);
> + </PRE>
> +
> +     <P>In the above example, all rows in the table named 'mytable' having a value
> +        of 12 in the 'id' column will be deleted except for one.

You haven't actually tested this advice, have you?

The delete as given will delete all but one row, full stop.

Or at least it would if it worked at all.  There's no != operator
defined for CTID.  (You could get around that with "NOT ctid =".)

            regards, tom lane

Re: FAQ addition: deleteing all but one unique row

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Or at least it would if it worked at all.  There's no != operator
> defined for CTID.  (You could get around that with "NOT ctid =".)

My bad. Forgot my own rule to never send email to a list on Monday
mornings. Thanks.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302101031

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+R8X7vJuQZxSWSsgRAgiQAJwNDKdJ2T7QNg7JXqwCRTtsRO5egQCg4/uJ
Ie+eLWtZDkVai1L1da+1pTA=
=u9qp
-----END PGP SIGNATURE-----



Re: FAQ addition: deleteing all but one unique row

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message


Second shot at a small doc patch, this time with testing. :)

--
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302121158


Index: FAQ.html
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v
retrieving revision 1.164
diff -c -r1.164 FAQ.html
*** FAQ.html                2002/12/05 05:47:44                    1.164
--- FAQ.html                2003/02/10 15:08:00
***************
*** 139,144 ****
--- 139,145 ----
      temporary tables in PL/PgSQL functions?<BR>
       <A href="#4.27">4.27</A>) What replication options are available?<BR>
       <A href="#4.28">4.28</A>) What encryption options are available?<BR>
+      <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR>


      <H2 align="center">Extending PostgreSQL</H2>
***************
*** 1381,1386 ****
--- 1382,1406 ----
      <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI>
      <LI>The server can run using an encrypted file system.</LI>
      </UL>
+
+     <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR>
+     </H4>
+     <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot
+        distinguish them apart. Each row always has a unique system column named
+        <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use
+        <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows,
+        then remove all matching rows except the one with that particular ctid:</P>
+ <PRE>
+        DELETE FROM mytable WHERE NOT ctid =
+          (SELECT ctid FROM mytable WHERE id=12 LIMIT 1);
+ </PRE>
+
+     <P>In the above example, all rows in the table named 'mytable' having a value
+        of 12 in the 'id' column will be deleted except for one. Exactly
+        which row is kept should not matter, as they are all otherwise identical.
+        The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but
+        because tables can be created without an oid column, the use of ctid
+        is preferred.</P>

      <HR>

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+Sn2LvJuQZxSWSsgRAlqDAJ930nb9V8hjAB1eh9Z7U6KU5mtSqwCeORKy
ONNSW87tAIAzV/WveYSAiK8=
=LOGw
-----END PGP SIGNATURE-----



Re: FAQ addition: deleteing all but one unique row

From
Kris Jurka
Date:
You have still not addressed Tom's initial complaint about the delete
deleting all but one row in the table.  Shouldn't there be a "WHERE id=12"
on the delete as well?

Kris Jurka

On Wed, 12 Feb 2003 greg@turnstep.com wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
>
> Second shot at a small doc patch, this time with testing. :)
>
> --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200302121158
>
>
> Index: FAQ.html
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v
> retrieving revision 1.164
> diff -c -r1.164 FAQ.html
> *** FAQ.html                2002/12/05 05:47:44                    1.164
> --- FAQ.html                2003/02/10 15:08:00
> ***************
> *** 139,144 ****
> --- 139,145 ----
>       temporary tables in PL/PgSQL functions?<BR>
>        <A href="#4.27">4.27</A>) What replication options are available?<BR>
>        <A href="#4.28">4.28</A>) What encryption options are available?<BR>
> +      <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR>
>
>
>       <H2 align="center">Extending PostgreSQL</H2>
> ***************
> *** 1381,1386 ****
> --- 1382,1406 ----
>       <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI>
>       <LI>The server can run using an encrypted file system.</LI>
>       </UL>
> +
> +     <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR>
> +     </H4>
> +     <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot
> +        distinguish them apart. Each row always has a unique system column named
> +        <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use
> +        <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows,
> +        then remove all matching rows except the one with that particular ctid:</P>
> + <PRE>
> +        DELETE FROM mytable WHERE NOT ctid =
> +          (SELECT ctid FROM mytable WHERE id=12 LIMIT 1);
> + </PRE>
> +
> +     <P>In the above example, all rows in the table named 'mytable' having a value
> +        of 12 in the 'id' column will be deleted except for one. Exactly
> +        which row is kept should not matter, as they are all otherwise identical.
> +        The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but
> +        because tables can be created without an oid column, the use of ctid
> +        is preferred.</P>
>
>       <HR>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: http://www.turnstep.com/pgp.html
>
> iD8DBQE+Sn2LvJuQZxSWSsgRAlqDAJ930nb9V8hjAB1eh9Z7U6KU5mtSqwCeORKy
> ONNSW87tAIAzV/WveYSAiK8=
> =LOGw
> -----END PGP SIGNATURE-----
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: FAQ addition: deleteing all but one unique row

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message


> You have still not addressed Tom's initial complaint about
> the delete deleting all but one row in the table.  Shouldn't
> there be a "WHERE id=12" on the delete as well?

Yes, thank you, there should! Third time is the charm:

Index: FAQ.html
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v
retrieving revision 1.164
diff -c -r1.164 FAQ.html
*** FAQ.html                2002/12/05 05:47:44                    1.164
--- FAQ.html                2003/02/10 15:08:00
***************
*** 139,144 ****
--- 139,145 ----
      temporary tables in PL/PgSQL functions?<BR>
       <A href="#4.27">4.27</A>) What replication options are available?<BR>
       <A href="#4.28">4.28</A>) What encryption options are available?<BR>
+      <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR>


      <H2 align="center">Extending PostgreSQL</H2>
***************
*** 1381,1386 ****
--- 1382,1406 ----
      <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI>
      <LI>The server can run using an encrypted file system.</LI>
      </UL>
+
+     <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR>
+     </H4>
+     <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot
+        distinguish them apart. Each row always has a unique system column named
+        <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use
+        <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows,
+        then remove all matching rows except the one with that particular ctid:</P>
+ <PRE>
+        DELETE FROM mytable WHERE id=12 AND NOT ctid =
+          (SELECT ctid FROM mytable WHERE id=12 LIMIT 1);
+ </PRE>
+
+     <P>In the above example, all rows in the table named 'mytable' having a value
+        of 12 in the 'id' column will be deleted except for one. Exactly
+        which row is kept should not matter, as they are all otherwise identical.
+        The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but
+        because tables can be created without an oid column, the use of ctid
+        is preferred.</P>

      <HR>




--
Greg Sabino Mullane  greg@turnstep.com
PGP Key: 0x14964AC8 200302121640

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+SsAZvJuQZxSWSsgRAu4OAJ9SrvLn/Tv9z9/gTMqtn+EflL45BgCeLxO8
uzmdojhtoXVpush7kkoSMn4=
=sPIw
-----END PGP SIGNATURE-----





Re: FAQ addition: deleteing all but one unique row

From
Bruce Momjian
Date:
Is this asked frequently enough to be an FAQ?

---------------------------------------------------------------------------

greg@turnstep.com wrote:
[ There is text before PGP section. ]
>
[ PGP not available, raw data follows ]
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
>
> > You have still not addressed Tom's initial complaint about
> > the delete deleting all but one row in the table.  Shouldn't
> > there be a "WHERE id=12" on the delete as well?
>
> Yes, thank you, there should! Third time is the charm:
>
> Index: FAQ.html
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/FAQ/FAQ.html,v
> retrieving revision 1.164
> diff -c -r1.164 FAQ.html
> *** FAQ.html                2002/12/05 05:47:44                    1.164
> --- FAQ.html                2003/02/10 15:08:00
> ***************
> *** 139,144 ****
> --- 139,145 ----
>       temporary tables in PL/PgSQL functions?<BR>
>        <A href="#4.27">4.27</A>) What replication options are available?<BR>
>        <A href="#4.28">4.28</A>) What encryption options are available?<BR>
> +      <A href="#4.29">4.29</A>) How can I delete all but one identical row?<BR>
>
>
>       <H2 align="center">Extending PostgreSQL</H2>
> ***************
> *** 1381,1386 ****
> --- 1382,1406 ----
>       <I>PASSWORD_ENCRYPTION</I> in <I>postgresql.conf</I>.</LI>
>       <LI>The server can run using an encrypted file system.</LI>
>       </UL>
> +
> +     <H4><A name="4.29">4.29</A>) How can I delete all but one identical row?<BR>
> +     </H4>
> +     <P>Sometimes you have rows that are so identical that a simple WHERE clause cannot
> +        distinguish them apart. Each row always has a unique system column named
> +        <CODE><SMALL>ctid</SMALL></CODE> that can be used to differentiate them. Use
> +        <SMALL>LIMIT 1</SMALL> to get back the ctid of one of the identical rows,
> +        then remove all matching rows except the one with that particular ctid:</P>
> + <PRE>
> +        DELETE FROM mytable WHERE id=12 AND NOT ctid =
> +          (SELECT ctid FROM mytable WHERE id=12 LIMIT 1);
> + </PRE>
> +
> +     <P>In the above example, all rows in the table named 'mytable' having a value
> +        of 12 in the 'id' column will be deleted except for one. Exactly
> +        which row is kept should not matter, as they are all otherwise identical.
> +        The system column <CODE><SMALL>oid</SMALL></CODE> can also be used, but
> +        because tables can be created without an oid column, the use of ctid
> +        is preferred.</P>
>
>       <HR>
>
>
>
>
> --
> Greg Sabino Mullane  greg@turnstep.com
> PGP Key: 0x14964AC8 200302121640
>
> -----BEGIN PGP SIGNATURE-----
> Comment: http://www.turnstep.com/pgp.html
>
> iD8DBQE+SsAZvJuQZxSWSsgRAu4OAJ9SrvLn/Tv9z9/gTMqtn+EflL45BgCeLxO8
> uzmdojhtoXVpush7kkoSMn4=
> =sPIw
> -----END PGP SIGNATURE-----
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
[ Decrypting message... End of raw data. ]

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FAQ addition: deleteing all but one unique row

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Is this asked frequently enough to be an FAQ?

I believe it is. I've answered this three times myself in the
last couple of months. However, I am not happy with the way
I've worded it so I'd like to submit a new patch for it.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302180941

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+UkY2vJuQZxSWSsgRAs+gAJ0eYXYzqmAoC34laxnM1zl7j6oZMACggwPE
3YfaIhkfUSL6+lPx/nsKH4I=
=nhez
-----END PGP SIGNATURE-----



Re: FAQ addition: deleteing all but one unique row

From
Bruce Momjian
Date:
greg@turnstep.com wrote:
[ There is text before PGP section. ]
>
[ PGP not available, raw data follows ]
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> > Is this asked frequently enough to be an FAQ?
>
> I believe it is. I've answered this three times myself in the
> last couple of months. However, I am not happy with the way
> I've worded it so I'd like to submit a new patch for it.

The other idea would be to add it to the DELETE manual page.  Seems that
is where most folks are going to need it.  I always try to push FAQ
information into the docs if there is a logical place for it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FAQ addition: deleteing all but one unique row

From
Neil Conway
Date:
On Tue, 2003-02-18 at 12:36, Bruce Momjian wrote:
> The other idea would be to add it to the DELETE manual page.

It definitely doesn't seem appropriate for the DELETE reference page,
though.

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC