Re: distinct not working in a multiple join - Mailing list pgsql-general

From David Rio Deiros
Subject Re: distinct not working in a multiple join
Date
Msg-id 20060210022151.GD29347@milhouse.digitaria.com
Whole thread Raw
In response to Re: distinct not working in a multiple join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: distinct not working in a multiple join  (Stephen Frost <sfrost@snowman.net>)
List pgsql-general
On Thu, Feb 09, 2006 at 07:20:19PM -0500, Tom Lane wrote:
> David Rio Deiros <driodeiros@gmail.com> writes:
> > I have some issues with the query attached at the end of this email.
> > If I run that query I got this output ( I have removed some of the
> > fields) despite the distinct clause:
>
> > QC Q&A  | www.xxx.com | 44281
> > QC Q&A  | www.xxx.com | 44281
> > WhyMAX? |             | 44285
>
> Since you removed some fields, no one can tell if this output
> is wrong or not.

Tom,

Thanks for the answer and apologizes for the lack of information.
I think you found the problem already (see bellow). In anycase, just
to clarify:

This is the information about the machine/OS/Psql version:

1. Psql 8.0.4 - Linux 2.6.13

Now, This is the first query I tried:

            SELECT
                distinct ar.title,
                ar.raw_data,
                ar.upload_dt,
                ar.artifact_id,
                g.group_id,
                acl.read
            FROM
                artifact_acl acl,
                artifacts a,
                artifact_revisions ar,
                revisions_to_types rt,
                artifact_types at,
                groups g
            WHERE
                a.expire_dt > NOW() and
                acl.artifact_id = a.artifact_id and
                a.published_revision = ar.revision_id and
                ar.revision_id = rt.revision_id and
                rt.type_id = at.type_id and
                acl.group_id = g.group_id and
                a.suppress = false and
                at.is_resource = true and
                (
                acl.group_id = (
                                select
                                    group_id
                                from
                                    groups
                                where
                                    group_nm = 'Marketing'
                               )
                               OR
                acl.group_id = (
                                select
                                    group_id
                                from
                                    groups
                                where
                                    group_nm = 'Communicators'
                               )
                ) and
                acl.group_id IN (4,17,54,2,1,123) and
                acl.read = true and
                ((g.back_or_front = 'front') or
                (g.group_nm = 'PR Admin'))
                ORDER BY
                    ar.upload_dt DESC
                LIMIT 3;

and here you have the output:

-[ RECORD 1 ]---------------------------
title       | QC Q&A
raw_data    | www.com
upload_dt   | 2006-02-09 11:15:04.724525
artifact_id | 44281
group_id    | 2
read        | t
-[ RECORD 2 ]---------------------------
title       | QC Q&A
raw_data    | www.com
upload_dt   | 2006-02-09 11:15:04.724525
artifact_id | 44281
group_id    | 54
read        | t
-[ RECORD 3 ]---------------------------
title       | WhyMAX?
raw_data    |
upload_dt   | 2006-02-09 09:25:27.717663
artifact_id | 44061
group_id    | 2
read        | t


Now, I tried this query, which is the first one but removing the g.group_id
from the select clause. This is the output:

-[ RECORD 1 ]-----------------------------
title       | QC Q&A
raw_data    | www.com
upload_dt   | 2006-02-09 11:15:04.724525
artifact_id | 44281
read        | t
-[ RECORD 2 ]----------------------------
title       | WhyMAX?
raw_data    |
upload_dt   | 2006-02-09 09:25:27.717663
artifact_id | 44061
read        | t
-[ RECORD 3 ]---------------------------
title       | Business Assets-test
raw_data    | corpcomm.com/
upload_dt   | 2006-02-08 15:58:06.81578
artifact_id | 44280
read        | t

Which is the desired output.

> >             SELECT
> >                 distinct ar.title,
> >                 ar.raw_data,
> >                 ar.upload_dt,
> >                 ar.artifact_id,
> >                 g.group_id,
> >                 acl.read
> >             FROM
>
> The way you formatted that makes me wonder if you think that the
> DISTINCT applies only to the first column.  It does not, it applies
> to all the columns together --- that is, it only removes rows that
> are identical in all columns to some other row.  So if there were
> some rows that were identical except for group_id, you'd get the
> behavior you described.

Yes, that was the reason, I thought that distinct was only applying
to the first column.

> BTW, I'm not sure I believe this is actually the same query you
> ran.  The presence of the "GROUP BY ar.title" clause should have
> provoked errors about ungrouped columns.  If this is an exact
> copy of what you did, what Postgres version is this?

It wasn't the same query. Forget about the first email, in this email
you have the queries I launched (sorry again about that).

Now I have to redefine my query because I want to get the second
output but keeping the group_id. Ideas and suggestions are welcome.

Thanks again your help,

David



pgsql-general by date:

Previous
From: David Fetter
Date:
Subject: Re: Insert more than one t-uple in a single sql
Next
From: "Karl O. Pinc"
Date:
Subject: Re: Request to have VACUUM ignore cost based limits