Thread: VIEWS, DISTINCT and COUNT
Okay, I'm used to a feature that allows combining count and distinct in Sybase. I thought it was standard SQL and expected to see it in Postgres. Whatever the case might be, postgres does not seem to support this. I keep running into queries that I cannot write. Here's the skinny: "select count(distinct id) from table" is not supported. Getting this information without count(distinct id) support is a pain and always seems to require creating temporary table and running queries later. My first solution was to create a view that just selected the distinct colmuns that I was interested in and then do a count on that table. This too seems impossible. For both count(distinct) and distinct in views, I have this question: Is this something that needs to be supported but just never got implemented? Or, is it something that was conciously excluded? If nobody is working on these, I may take a look them and pick the easier to implement. What little I know about the way postgres works, I expect the first one would be esier. -- The world's most ambitious and comprehensive PC game database project. http://www.mobygames.com
Brian Hirt wrote: >Okay, > >I'm used to a feature that allows combining count and distinct in Sybase. >I thought it wasstandard SQL and expected to see it in Postgres. Whatever >the case might be, postgres does not seem to support this. I keep running >into queries that I cannot write. Here's the skinny: > >"select count(distinct id) from table" isnot supported. I'm not convinced I understand what your query would do, but it sounds as if you need to use GROUP BY. For example: lfix=> select custid, count(custid) from invoice group by custid; custid |count --------+----- ACECS | 1 ADG | 8 FALKIRK | 1 JEFSYS | 25 SOLPORT | 15 (5 rows) lfix=> select count(*) from invoice; count ----- 50 (1 row) Is that what you want to achieve? -- Vote against SPAM: http://www.politik-digital.de/spam/ ======================================== Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID32B8FAA1 ======================================== "Lo, children are an heritage of the LORD; and the fruit of the womb is his reward." Psalms 127:3
Brian Hirt <bhirt@mobygames.com> writes: > "select count(distinct id) from table" is not supported. Yup. It's on the TODO list:* Allow COUNT(DISTINCT col) > For both count(distinct) and distinct in views, I have this question: Is > this something that needs to be supported but just never got implemented? I'm not sure what Jan has in mind for views, but certainly aggregate(DISTINCT ...) is an SQL-standard feature that we ought to support. I don't think it's a simple addition though :-( regards, tom lane
> > Brian Hirt <bhirt@mobygames.com> writes: > > "select count(distinct id) from table" is not supported. > > Yup. It's on the TODO list: > * Allow COUNT(DISTINCT col) > > > For both count(distinct) and distinct in views, I have this question: Is > > this something that needs to be supported but just never got implemented? > > I'm not sure what Jan has in mind for views, but certainly > aggregate(DISTINCT ...) is an SQL-standard feature that we ought to > support. I don't think it's a simple addition though :-( All these DISTINCT, AGGREGATE etc. problems on views are based on the fact, that the planner still requires that the rewriters output is equivalent to a regular, allowed query. I would like to be able to place a complete querytree (just an entire SELECT's Query node) into a range table entry. AFAIK, from the callers point of view there is not much difference between the join-, group-, sort-, aggregate- and scan-nodes. They are all just nodes returing some amount of tuples. All of them could be the toplevel executor node of a SELECT - just something returning tuples. Unfortunately my knowledge in the planner is very limited, so I would need help to go for it. Who has that knowledge? The basic idea is this: Let's have a view defined as CREATE VIEW v1 AS SELECT a, count(*) AS n FROM t1 GROUP BY a; The plan for such a query would be a Aggregate -> Group -> Sort -> Seq Scan on t1 Thus doing a SELECT t2.a, v1.n FROM t2, v1 WHERE t2.a = v1.a; could finally result in a Merge Join -> Sort -> Seq Scan on t2 -> Sort -> Aggregate -> Group -> Sort -> Seq Scan on t1 It's impossible to cause such an execution plan from a standard SQL statement. But why should it be impossible for the rewriter too? If v1 where a regular table (not a view), the generated plan would have been a Merge Join -> Sort -> Seq Scan on t2 -> Sort -> Seq Scan on v1 so oviously the only difference is that the scan over the v1 relation has been replaced by the more complicated plan for the plain definition of the view. If the planner could do that, I think we would get rid of all the limitations for views very soon. Again, who knows enough about the planner to be able to do this kind of stuff? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > All these DISTINCT, AGGREGATE etc. problems on views are > based on the fact, that the planner still requires that the > rewriters output is equivalent to a regular, allowed query. Right, and there's no good reason for that. > I would like to be able to place a complete querytree (just > an entire SELECT's Query node) into a range table entry. I've been saying for some time that the parser ought to emit something close to a plan-tree representation --- not committing to a particular query implementation method, of course, but nonetheless a tree of query nodes. The planner wouldn't find that any harder to work on than what it gets now. The executor might need some work, but probably not much. > Unfortunately my knowledge in the planner is very limited, so > I would need help to go for it. Who has that knowledge? I know enough to be dangerous, and so does Bruce. Do you think there is time to attack this for 7.0, or should we leave well enough alone for now? > Let's have a view defined as > CREATE VIEW v1 AS SELECT a, count(*) AS n FROM t1 GROUP BY a; > The plan for such a query would be a > Aggregate > -> Group > -> Sort > -> Seq Scan on t1 Not necessarily --- the aggregate and group nodes must be there, but we don't want to commit to seqscan&sort vs. indexscan sooner than we have to. I think what's needed here is some notion of an abstract plan tree. The trick is to pick the right level of abstraction. Maybe "Aggregate -> Group -> OrderedTupleSource" is the way to think about it. But your end point is valid: we want to be able to make a structure like that be an input to a higher-level plan tree. This is also necessary for subselect in FROM clause, isn't it? > Again, who knows enough about the planner to be able to do > this kind of stuff? I could take it on, but I have a lot of other stuff I want to do for 7.0. Is this more important than fixing fmgr or improving the planner's selectivity estimates? I dunno... regards, tom lane
> Not necessarily --- the aggregate and group nodes must be there, but > we don't want to commit to seqscan&sort vs. indexscan sooner than we > have to. I think what's needed here is some notion of an abstract > plan tree. The trick is to pick the right level of abstraction. > Maybe "Aggregate -> Group -> OrderedTupleSource" is the way to think > about it. > > But your end point is valid: we want to be able to make a structure > like that be an input to a higher-level plan tree. This is also > necessary for subselect in FROM clause, isn't it? > > > Again, who knows enough about the planner to be able to do > > this kind of stuff? > > I could take it on, but I have a lot of other stuff I want to do for > 7.0. Is this more important than fixing fmgr or improving the > planner's selectivity estimates? I dunno... Let me make a comment. Seems like a whole host of problems will be fixed by this overhaul, but none of the problems is major. Jan's foreign key support, Vadim's WAL, and Tom Lane's cleanups are of major importance for 7.0, so it seems we better focus on those, and if we have time before 7.0, and all people involved have time, we can take on that work. We will need to have most of us available to discuss and merge the changes into all the affected areas. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > I could take it on, but I have a lot of other stuff I want to do for > > 7.0. Is this more important than fixing fmgr or improving the > > planner's selectivity estimates? I dunno... > Jan's foreign key support, Vadim's WAL, and Tom Lane's cleanups are of > major importance for 7.0, so it seems we better focus on those, and if > we have time before 7.0, and all people involved have time, we can take > on that work. We will need to have most of us available to discuss and > merge the changes into all the affected areas. Outer joins will likely require this. So far, I'm just working on the join *syntax*, and (although stalled for the last week or two) will be touching the rte structure to support table and column aliases in the join syntax. But to move to outer joins, I need to be able to tie two rte's together, which will be easier to do if I'm allowed to include a query tree (which would, in turn, include rte's for the join tables). So, we get join syntax in 7.0 without major parser changes. Not sure we can get outer joins without more, which is required for Jan to go farther too... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California