Thread: DUPS in tables columns ERROR: column ". . . " does not exist
Hi, ~ I am trying to get dups from some data from files which md5sums I previously calculated ~ Here is my mere mortal SQL ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; ~ and this is what I get: ~ jpk=# SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; jpk-# jpk-# jpk-# jpk-# ERROR: column "md5cnt" does not exist LINE 3: WHERE (md5cnt > 1) ~ I think I know what that one means based on the clear error message, namely md5cntis not a table column itself, but I still think there should be a way to formulate a simple query like this because PG does take "ORDER BY md5cnt DESC" even if md5cnt is not a table column, why on earth then it does not swallow and digest the "WHERE (md5cnt > 1)" part? ~ You could go the monkey way running a query like: ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 ORDER BY md5cnt DESC; ~ and then use code to jump of the loop when md5cnt becomes 1 or you could use nested SQL statements ~ How can you find duplicate records in a table? ~ Thanks lbrtchx
Also I know there is a DISTINCT keyword, but I also need to know how many times the particular data in the column is repeated if it is, that is why I need to go: ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; ~ Thanks lbrtchx
Albretch Mueller wrote: > Hi, > ~ > I am trying to get dups from some data from files which md5sums I > previously calculated > ~ > Here is my mere mortal SQL > ~ > SELECT md5, COUNT(md5) AS md5cnt > FROM jdk1_6_0_07_txtfls_md5 > WHERE (md5cnt > 1) > GROUP BY md5 > ORDER BY md5cnt DESC; I think you are looking for HAVING as in: SELECT md5, COUNT(md5) FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING count(md5) > 1 Stefan
thank you Stefan your SQL worked, but still; I am just asking and my programming bias will certainly show, but aren't you effectivly "calling" count on the table three times if you go: ~ SELECT md5, COUNT(md5) FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING COUNT(md5) > 1 ORDER BY COUNT(md5) DESC; ~ Shouldn't ~ SELECT md5, COUNT(md5) AS CNT FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING CNT > 1 ORDER BY CNT DESC; ~ work? ~ jpk=# SELECT md5, COUNT(md5) AS CNT FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING CNT > 1 ORDER BY CNT DESC; jpk-# jpk-# jpk-# jpk-# ERROR: column "cnt" does not exist LINE 4: HAVING CNT > 1 ~ Thanks lbrtchx
"Albretch Mueller" <lbrtchx@gmail.com> writes: > thank you Stefan your SQL worked, but still; I am just asking and my > programming bias will certainly show, but aren't you effectivly > "calling" count on the table three times if you go: The system is smart enough to only do the count() once. regards, tom lane
> The system is smart enough to only do the count() once. ~ But not smart enough to make a variable you declare point to that internal variable so that things are clearer/ easier ;-) ~ Thanks lbrtchx
On Sat, Aug 30, 2008 at 01:36:25PM -0400, Albretch Mueller wrote: > > The system is smart enough to only do the count() once. > ~ > But not smart enough to make a variable you declare point to that > internal variable so that things are clearer/ easier ;-) The SQL standard has pretty clear rules about what variables can be referenced from where, and this is one of those places (the rationale is probably in there too). Have a nice day -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Albretch Mueller wrote: > Also I know there is a DISTINCT keyword, but I also need to know how > many times the particular data in the column is repeated if it is, > that is why I need to go: > ~ > SELECT md5, COUNT(md5) AS md5cnt > FROM jdk1_6_0_07_txtfls_md5 > WHERE (md5cnt > 1) > GROUP BY md5 > ORDER BY md5cnt DESC; Use HAVING instead of WHERE. -- Lew