Thread: DISTINCT ON not working...?

DISTINCT ON not working...?

From
"Phillip Smith"
Date:
Hi all,

Strange one - I have a nightly export / import routine that exports from one
database and imports to another. Has been working fine for several months,
but last night it died on a unique constraint.

To cut out all the details, the code that is causing the problem:       SELECT  DISTINCT ON (ean)               code,
           CASE WHEN ean IS NULL OR valid_barcode(ean) = false THEN 
null ELSE ean END AS ean       FROM    TMPTABLE       WHERE   code NOT IN (SELECT code FROM stock_deleted)        AND
ean IS NOT NULL 

That is the code that generates the error on the unique constraint against
the ean column.

If I play with that and run this:SELECT  DISTINCT ON (ean)               CASE WHEN ean IS NULL OR valid_barcode(ean) =
falseTHEN 
null ELSE ean END AS ean,        count(*)       FROM    TMPTABLE       WHERE   code NOT IN (SELECT code FROM
stock_deleted)       AND    ean IS NOT NULL  GROUP BY ean 

I get a several thousand rows returned, all with a count(*) of 1, except one
row:3246576919422    2

DISTINCT ON should eliminate one of those rows that is making that 2 - as I
said, it's been working fine for several months, and it is still doing it
correctly for approximately 100 other rows that have duplicate ean codes.

Can anyone give me a hand to work out why this one is doubling up?!

Cheers,
~p


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments


Re: DISTINCT ON not working...?

From
Tom Lane
Date:
"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
> To cut out all the details, the code that is causing the problem:
>         SELECT  DISTINCT ON (ean)
>                 code,
>                 CASE WHEN ean IS NULL OR valid_barcode(ean) = false THEN
> null ELSE ean END AS ean
>         FROM    TMPTABLE
>         WHERE   code NOT IN (SELECT code FROM stock_deleted)
>          AND    ean IS NOT NULL

Perhaps you've confused yourself by using "ean" as both an input and an
output column name?  I think that the "ean" in the DISTINCT ON clause
will effectively refer to that CASE-expression, whereas the one in the
WHERE clause is just referring to the underlying column (and thus making
the IS NULL test in the CASE rather pointless).
        regards, tom lane


Re: DISTINCT ON not working...?

From
"Phillip Smith"
Date:
Removing the CASE statement all together:
SELECT  DISTINCT ON (ean)  ean,  count(*)
FROM    TMPTABLE
WHERE   code NOT IN (SELECT code FROM stock_deleted)AND    ean IS NOT NULL
GROUP BY ean

Still gives me:3246576919422    2



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us] 
Sent: Tuesday, 20 February 2007 15:33
To: Phillip Smith
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] DISTINCT ON not working...? 

Perhaps you've confused yourself by using "ean" as both an input and an
output column name?  I think that the "ean" in the DISTINCT ON clause
will effectively refer to that CASE-expression, whereas the one in the
WHERE clause is just referring to the underlying column (and thus making
the IS NULL test in the CASE rather pointless).
        regards, tom lane


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments


Re: DISTINCT ON not working...?

From
Marcin Stępnicki
Date:
Dnia Tue, 20 Feb 2007 15:36:32 +1100, Phillip Smith napisał(a):

> Removing the CASE statement all together:
> SELECT  DISTINCT ON (ean)
>       ean,
>       count(*)
> FROM    TMPTABLE
> WHERE   code NOT IN (SELECT code FROM stock_deleted)
>  AND    ean IS NOT NULL
> GROUP BY ean
> 
> Still gives me:
>     3246576919422    2

Wild guess - have you tried reindexing this table? I haven't seen
corrupted indexes since 7.1, though - it usually means subtle hardware
problems.

-- 
| And Do What You Will be the challenge | http://apcoln.linuxpl.org
|    So be it in love that harms none   | http://biznes.linux.pl
|   For this is the only commandment.   | http://www.juanperon.info
`---*  JID: Aragorn_Vime@jabber.org *---' http://www.naszedzieci.org 




Re: DISTINCT ON not working...?

From
"Phillip Smith"
Date:
This is a temporary table (with no indexes) that gets created in the same
transaction block as the SELECT gets run, but I tried creating an index on
the ean column anyway with no luck:

CREATE INDEX ean_idx ON TMPTABLE USING btree (ean);
SELECT  DISTINCT ON (ean)ean,count(*)
FROM    TMPTABLE
WHERE   code NOT IN (SELECT code FROM stock_deleted)AND    ean IS NOT NULL
GROUP BY ean;

Still returns:    3246576919422    2


-----Original Message-----
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]
On Behalf Of Marcin Stêpnicki
Sent: Tuesday, 20 February 2007 23:34
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] DISTINCT ON not working...?

Wild guess - have you tried reindexing this table? I haven't seen
corrupted indexes since 7.1, though - it usually means subtle hardware
problems.


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments