Hi,
~
I am trying to get dups from some data from files which md5sums I
previously calculated
~
Here is my mere mortal SQL
~
SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
~
and this is what I get:
~
jpk=# SELECT md5, COUNT(md5) AS md5cnt
FROM jdk1_6_0_07_txtfls_md5
WHERE (md5cnt > 1)
GROUP BY md5
ORDER BY md5cnt DESC;
jpk-# jpk-# jpk-# jpk-# ERROR: column "md5cnt" does not exist
LINE 3: WHERE (md5cnt > 1)
~
I think I know what that one means based on the clear error message,
namely md5cntis not a table column itself, but I still think there
should be a way to formulate a simple query like this because PG does
take "ORDER BY md5cnt DESC" even if md5cnt is not a table column, why
on earth then it does not swallow and digest the "WHERE (md5cnt > 1)"
part?
~
You could go the monkey way running a query like:
~
SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 GROUP BY
md5 ORDER BY md5cnt DESC;
~
and then use code to jump of the loop when md5cnt becomes 1 or you
could use nested SQL statements
~
How can you find duplicate records in a table?
~
Thanks
lbrtchx