Thread: problem with distinct not distincting...
I have a pretty basic query; select distinct on (name) name, length(name) from drugs where customer_id IS NOT NULL order by name; which I'd expect to only return me a single drug name if there are duplicates, yet I get name | length ========== Roaccutane | 10 Roaccutane | 10 table encoding is UTF8... I'm scratching my head! Thanks, John.
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of John Beynon > Sent: Wednesday, October 17, 2012 6:48 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] problem with distinct not distincting... > > I have a pretty basic query; > > select distinct on (name) name, length(name) from drugs where > customer_id IS NOT NULL order by name; > > which I'd expect to only return me a single drug name if there are duplicates, > yet I get > > name | length > ========== > Roaccutane | 10 > Roaccutane | 10 > > table encoding is UTF8... > > I'm scratching my head! > > Thanks, > > John. > So, the following returns one record as expected on 9.0.3: SELECT DISTINCT ON (f) f, length(l) FROM (VALUES ('David','Johnston'),('David','Smith')) x (f, l) ORDER BY f Try: SELECT name, count(*) FROM drugs where customer_id IS NOT NULL GROUP BY name To see whether the GROUP BY logic considers the names identical. Using "name" as a column name also sometimes has issues so maybe try giving it an alias: SELECT ... FROM (SELECT name AS customer_name FROM drugs WHERE ....) AS drug_aliased ... DISTINCT ON has uses but I try to avoid it myself. In this specific case the "ON" is redundant since a simple DISTINCT will give you the same results. You also need to provide the PostgreSQL version and possibly server platform. David J.
Are you sure that one of those entries doesn't have a trailing space? Susan -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John Beynon Sent: Wednesday, October 17, 2012 3:48 AM To: pgsql-general@postgresql.org Subject: [GENERAL] problem with distinct not distincting... I have a pretty basic query; select distinct on (name) name, length(name) from drugs where customer_id IS NOT NULL order by name; which I'd expect to only return me a single drug name if there are duplicates, yet I get name | length ========== Roaccutane | 10 Roaccutane | 10 table encoding is UTF8... I'm scratching my head! Thanks, John. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
I see you have the length included. Perhaps, UTF8 characters in one and ascii in the other? Susan -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Susan Cassidy Sent: Wednesday, October 17, 2012 8:27 AM To: John Beynon; pgsql-general@postgresql.org Subject: Re: [GENERAL] problem with distinct not distincting... Are you sure that one of those entries doesn't have a trailing space? Susan -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John Beynon Sent: Wednesday, October 17, 2012 3:48 AM To: pgsql-general@postgresql.org Subject: [GENERAL] problem with distinct not distincting... I have a pretty basic query; select distinct on (name) name, length(name) from drugs where customer_id IS NOT NULL order by name; which I'd expect to only return me a single drug name if there are duplicates, yet I get name | length ========== Roaccutane | 10 Roaccutane | 10 table encoding is UTF8... I'm scratching my head! Thanks, John. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
John Beynon <john@kyan.com> writes: > I have a pretty basic query; > select distinct on (name) name, length(name) from > drugs > where customer_id IS NOT NULL > order by name; > which I'd expect to only return me a single drug name if there are > duplicates, yet I get > name | length > ========== > Roaccutane | 10 > Roaccutane | 10 > table encoding is UTF8... > I'm scratching my head! Yeah, me too. What PG version is this exactly? What does EXPLAIN show for the query? Can you extract a self-contained test case? regards, tom lane
On Wed, Oct 17, 2012 at 5:48 AM, John Beynon <john@kyan.com> wrote: > I have a pretty basic query; > > select distinct on (name) name, length(name) from > drugs > where customer_id IS NOT NULL > order by name; > > which I'd expect to only return me a single drug name if there are > duplicates, yet I get > > name | length > ========== > Roaccutane | 10 > Roaccutane | 10 > > table encoding is UTF8... > > I'm scratching my head! as a sanity check, try: select distinct on (name) name, length(name), md5(name) from drugs where customer_id IS NOT NULL order by name; merlin
I just managed to solve the problem infact. The trailing 'e' character on the name was different for one row. All my tools, (pgadmin and the source data in openoffice) showed the same 'e' character but psql showed it as different character... Thanks for all taking the time to read / answer. It stumped me for a while! John. On Wed, Oct 17, 2012 at 4:01 PM, David Johnston <polobo@yahoo.com> wrote: >> -----Original Message----- >> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- >> owner@postgresql.org] On Behalf Of John Beynon >> Sent: Wednesday, October 17, 2012 6:48 AM >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] problem with distinct not distincting... >> >> I have a pretty basic query; >> >> select distinct on (name) name, length(name) from drugs where >> customer_id IS NOT NULL order by name; >> >> which I'd expect to only return me a single drug name if there are > duplicates, >> yet I get >> >> name | length >> ========== >> Roaccutane | 10 >> Roaccutane | 10 >> >> table encoding is UTF8... >> >> I'm scratching my head! >> >> Thanks, >> >> John. >> > > So, the following returns one record as expected on 9.0.3: > > SELECT DISTINCT ON (f) f, length(l) > FROM (VALUES ('David','Johnston'),('David','Smith')) x (f, l) > ORDER BY f > > Try: > > SELECT name, count(*) FROM drugs where customer_id IS NOT NULL GROUP BY name > > To see whether the GROUP BY logic considers the names identical. > > Using "name" as a column name also sometimes has issues so maybe try giving > it an alias: > > SELECT ... FROM (SELECT name AS customer_name FROM drugs WHERE ....) AS > drug_aliased ... > > DISTINCT ON has uses but I try to avoid it myself. In this specific case > the "ON" is redundant since a simple DISTINCT will give you the same > results. > > You also need to provide the PostgreSQL version and possibly server > platform. > > David J. > > -- John Beynon Kyanmedia Ltd. Direct line: 01483 405210 http://kyan.com Registered in England. Company number: 4575679 Registered Office: 171 High Street, Guildford GU1 3AJ
On Thu, Oct 18, 2012 at 2:32 AM, John Beynon <john@kyan.com> wrote: > I just managed to solve the problem infact. > > The trailing 'e' character on the name was different for one row. All > my tools, (pgadmin and the source data in openoffice) showed the same > 'e' character but psql showed it as different character... > > Thanks for all taking the time to read / answer. It stumped me for a while! Cool! That's where the sanity check Merlin suggested comes in really handy. You can manually key in what you think it's showing, and compare with the two rows: sikorsky@sikorsky:~$ psql -c "select md5('Roaccutane')" md5 ---------------------------------- 70f818454267c719ed612f50fe563f64 (1 row) Anything that shows a different hash is clearly different... somehow. ChrisA