Thread: VIEWS, DISTINCT and COUNT

VIEWS, DISTINCT and COUNT

From
Brian Hirt
Date:
Okay,

I'm used to a feature that allows combining count and distinct in Sybase. 
I thought it was standard SQL and expected to see it in Postgres.  Whatever
the case might be, postgres does not seem to support this.  I keep running
into queries that I cannot write.  Here's the skinny:

"select count(distinct id) from table" is not supported.  Getting this
information without count(distinct id) support is a pain and always seems
to require creating temporary table and running queries later.  My first
solution was to create a view that just selected the distinct colmuns that
I was interested in and then do a count on that table.  This too seems
impossible.

For both count(distinct) and distinct in views, I have this question:  Is 
this something that needs to be supported but just never got implemented?
Or, is it something that was conciously excluded?  If nobody is working on 
these, I may take a look them and pick the easier to implement.  What little
I know about the way postgres works, I expect the first one would be esier.


-- 
The world's most ambitious and comprehensive PC game database project.
                     http://www.mobygames.com


Re: [HACKERS] VIEWS, DISTINCT and COUNT

From
"Oliver Elphick"
Date:
Brian Hirt wrote: >Okay, > >I'm used to a feature that allows combining count and distinct in Sybase.  >I thought it
wasstandard SQL and expected to see it in Postgres.  Whatever >the case might be, postgres does not seem to support
this. I keep running >into queries that I cannot write.  Here's the skinny: > >"select count(distinct id) from table"
isnot supported. 
 

I'm not convinced I understand what your query would do, but it sounds as
if you need to use GROUP BY.  For example:

lfix=> select custid, count(custid) from invoice group by custid;
custid  |count
--------+-----
ACECS   |    1
ADG     |    8
FALKIRK |    1
JEFSYS  |   25
SOLPORT |   15
(5 rows)

lfix=> select count(*) from invoice;
count
-----  50
(1 row)

Is that what you want to achieve?

--      Vote against SPAM: http://www.politik-digital.de/spam/                ========================================
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver              PGP key from public servers; key
ID32B8FAA1                ========================================    "Lo, children are an heritage of the LORD; and
the      fruit of the womb is his reward."        Psalms 127:3 
 




Re: [HACKERS] VIEWS, DISTINCT and COUNT

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> "select count(distinct id) from table" is not supported.

Yup.  It's on the TODO list:* Allow COUNT(DISTINCT col)

> For both count(distinct) and distinct in views, I have this question:  Is 
> this something that needs to be supported but just never got implemented?

I'm not sure what Jan has in mind for views, but certainly
aggregate(DISTINCT ...) is an SQL-standard feature that we ought to
support.  I don't think it's a simple addition though :-(
        regards, tom lane


Re: [HACKERS] VIEWS, DISTINCT and COUNT

From
wieck@debis.com (Jan Wieck)
Date:
>
> Brian Hirt <bhirt@mobygames.com> writes:
> > "select count(distinct id) from table" is not supported.
>
> Yup.  It's on the TODO list:
>    * Allow COUNT(DISTINCT col)
>
> > For both count(distinct) and distinct in views, I have this question:  Is
> > this something that needs to be supported but just never got implemented?
>
> I'm not sure what Jan has in mind for views, but certainly
> aggregate(DISTINCT ...) is an SQL-standard feature that we ought to
> support.  I don't think it's a simple addition though :-(

    All  these  DISTINCT,  AGGREGATE  etc.  problems on views are
    based on the fact, that the planner still requires  that  the
    rewriters output is equivalent to a regular, allowed query.

    I  would  like to be able to place a complete querytree (just
    an entire SELECT's Query node)  into  a  range  table  entry.
    AFAIK,  from  the  callers  point  of  view there is not much
    difference between the join-, group-, sort-,  aggregate-  and
    scan-nodes.   They are all just nodes returing some amount of
    tuples. All of them could be the toplevel executor node of  a
    SELECT - just something returning tuples.

    Unfortunately my knowledge in the planner is very limited, so
    I would need help to go for it. Who has that knowledge?

    The basic idea is this:

    Let's have a view defined as

        CREATE VIEW v1 AS SELECT a, count(*) AS n FROM t1 GROUP BY a;

    The plan for such a query would be a

        Aggregate
          ->  Group
            ->  Sort
              ->  Seq Scan on t1

    Thus doing a

        SELECT t2.a, v1.n FROM t2, v1 WHERE t2.a = v1.a;

    could finally result in a

        Merge Join
          ->  Sort
            ->  Seq Scan on t2
          ->  Sort
            ->  Aggregate
              ->  Group
                ->  Sort
                  ->  Seq Scan on t1

    It's impossible to  cause  such  an  execution  plan  from  a
    standard  SQL  statement. But why should it be impossible for
    the rewriter too? If v1 where a regular table (not  a  view),
    the generated plan would have been a

        Merge Join
          ->  Sort
            ->  Seq Scan on t2
          ->  Sort
             ->  Seq Scan on v1

    so  oviously the only difference is that the scan over the v1
    relation has been replaced by the more complicated  plan  for
    the  plain  definition  of  the view. If the planner could do
    that, I think we would get rid of  all  the  limitations  for
    views very soon.

    Again,  who  knows  enough about the planner to be able to do
    this kind of stuff?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] VIEWS, DISTINCT and COUNT

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     All  these  DISTINCT,  AGGREGATE  etc.  problems on views are
>     based on the fact, that the planner still requires  that  the
>     rewriters output is equivalent to a regular, allowed query.

Right, and there's no good reason for that.

>     I  would  like to be able to place a complete querytree (just
>     an entire SELECT's Query node)  into  a  range  table  entry.

I've been saying for some time that the parser ought to emit something
close to a plan-tree representation --- not committing to a particular
query implementation method, of course, but nonetheless a tree of
query nodes.  The planner wouldn't find that any harder to work on
than what it gets now.  The executor might need some work, but
probably not much.

>     Unfortunately my knowledge in the planner is very limited, so
>     I would need help to go for it. Who has that knowledge?

I know enough to be dangerous, and so does Bruce.  Do you think there
is time to attack this for 7.0, or should we leave well enough alone
for now?

>     Let's have a view defined as

>         CREATE VIEW v1 AS SELECT a, count(*) AS n FROM t1 GROUP BY a;

>     The plan for such a query would be a

>         Aggregate
>           -> Group
>             -> Sort
>               -> Seq Scan on t1

Not necessarily --- the aggregate and group nodes must be there, but
we don't want to commit to seqscan&sort vs. indexscan sooner than we
have to.  I think what's needed here is some notion of an abstract
plan tree.  The trick is to pick the right level of abstraction.
Maybe "Aggregate -> Group -> OrderedTupleSource" is the way to think
about it.

But your end point is valid: we want to be able to make a structure
like that be an input to a higher-level plan tree.  This is also
necessary for subselect in FROM clause, isn't it?

>     Again,  who  knows  enough about the planner to be able to do
>     this kind of stuff?

I could take it on, but I have a lot of other stuff I want to do for
7.0.  Is this more important than fixing fmgr or improving the
planner's selectivity estimates?  I dunno...
        regards, tom lane


Re: [HACKERS] VIEWS, DISTINCT and COUNT

From
Bruce Momjian
Date:
> Not necessarily --- the aggregate and group nodes must be there, but
> we don't want to commit to seqscan&sort vs. indexscan sooner than we
> have to.  I think what's needed here is some notion of an abstract
> plan tree.  The trick is to pick the right level of abstraction.
> Maybe "Aggregate -> Group -> OrderedTupleSource" is the way to think
> about it.
> 
> But your end point is valid: we want to be able to make a structure
> like that be an input to a higher-level plan tree.  This is also
> necessary for subselect in FROM clause, isn't it?
> 
> >     Again,  who  knows  enough about the planner to be able to do
> >     this kind of stuff?
> 
> I could take it on, but I have a lot of other stuff I want to do for
> 7.0.  Is this more important than fixing fmgr or improving the
> planner's selectivity estimates?  I dunno...

Let me make a comment.  Seems like a whole host of problems will be
fixed by this overhaul, but none of the problems is major.

Jan's foreign key support, Vadim's WAL, and Tom Lane's cleanups are of
major importance for 7.0, so it seems we better focus on those, and if
we have time before 7.0, and all people involved have time, we can take
on that work.  We will need to have most of us available to discuss and
merge the changes into all the affected areas.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] VIEWS, DISTINCT and COUNT

From
Thomas Lockhart
Date:
> > I could take it on, but I have a lot of other stuff I want to do for
> > 7.0.  Is this more important than fixing fmgr or improving the
> > planner's selectivity estimates?  I dunno...
> Jan's foreign key support, Vadim's WAL, and Tom Lane's cleanups are of
> major importance for 7.0, so it seems we better focus on those, and if
> we have time before 7.0, and all people involved have time, we can take
> on that work.  We will need to have most of us available to discuss and
> merge the changes into all the affected areas.

Outer joins will likely require this. So far, I'm just working on the
join *syntax*, and (although stalled for the last week or two) will be
touching the rte structure to support table and column aliases in the
join syntax. But to move to outer joins, I need to be able to tie two
rte's together, which will be easier to do if I'm allowed to include a
query tree (which would, in turn, include rte's for the join tables).

So, we get join syntax in 7.0 without major parser changes. Not sure
we can get outer joins without more, which is required for Jan to go
farther too...
                     - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California