Re: [HACKERS] VIEWS, DISTINCT and COUNT - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] VIEWS, DISTINCT and COUNT
Date
Msg-id m11jE3n-0003kLC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] VIEWS, DISTINCT and COUNT  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] VIEWS, DISTINCT and COUNT
List pgsql-hackers
>
> Brian Hirt <bhirt@mobygames.com> writes:
> > "select count(distinct id) from table" is not supported.
>
> Yup.  It's on the TODO list:
>    * Allow COUNT(DISTINCT col)
>
> > For both count(distinct) and distinct in views, I have this question:  Is
> > this something that needs to be supported but just never got implemented?
>
> I'm not sure what Jan has in mind for views, but certainly
> aggregate(DISTINCT ...) is an SQL-standard feature that we ought to
> support.  I don't think it's a simple addition though :-(

    All  these  DISTINCT,  AGGREGATE  etc.  problems on views are
    based on the fact, that the planner still requires  that  the
    rewriters output is equivalent to a regular, allowed query.

    I  would  like to be able to place a complete querytree (just
    an entire SELECT's Query node)  into  a  range  table  entry.
    AFAIK,  from  the  callers  point  of  view there is not much
    difference between the join-, group-, sort-,  aggregate-  and
    scan-nodes.   They are all just nodes returing some amount of
    tuples. All of them could be the toplevel executor node of  a
    SELECT - just something returning tuples.

    Unfortunately my knowledge in the planner is very limited, so
    I would need help to go for it. Who has that knowledge?

    The basic idea is this:

    Let's have a view defined as

        CREATE VIEW v1 AS SELECT a, count(*) AS n FROM t1 GROUP BY a;

    The plan for such a query would be a

        Aggregate
          ->  Group
            ->  Sort
              ->  Seq Scan on t1

    Thus doing a

        SELECT t2.a, v1.n FROM t2, v1 WHERE t2.a = v1.a;

    could finally result in a

        Merge Join
          ->  Sort
            ->  Seq Scan on t2
          ->  Sort
            ->  Aggregate
              ->  Group
                ->  Sort
                  ->  Seq Scan on t1

    It's impossible to  cause  such  an  execution  plan  from  a
    standard  SQL  statement. But why should it be impossible for
    the rewriter too? If v1 where a regular table (not  a  view),
    the generated plan would have been a

        Merge Join
          ->  Sort
            ->  Seq Scan on t2
          ->  Sort
             ->  Seq Scan on v1

    so  oviously the only difference is that the scan over the v1
    relation has been replaced by the more complicated  plan  for
    the  plain  definition  of  the view. If the planner could do
    that, I think we would get rid of  all  the  limitations  for
    views very soon.

    Again,  who  knows  enough about the planner to be able to do
    this kind of stuff?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] PostgreSQL 6.5.3 built, but not released ...
Next
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] PostgreSQL 6.5.3 built, but not released ...