Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id 874mszsy9g.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

With the high-priority questions out of the way, time to tackle the
rest:
Tom> My single biggest complaint is about the introduction of structTom> GroupedVar.  If we stick with that, we're
goingto have to teachTom> an extremely large number of places that know about Vars to alsoTom> know about GroupedVars.
Thiswill result in code bloat andTom> errors of omission.  If you think the latter concern isTom> hypothetical, note
thatyou can't get 40 lines into gsp1.patchTom> without finding such an omission, namely the patch fails toTom> teach
pg_stat_statements.cabout GroupedVars.  (That also pointsTom> up that some of the errors of omission will be in
third-partyTom>code that we can't fix easily.)
 

Except that GroupedVar is created only late in planning, and so only a
small proportion of places need to know about it (and certainly
pg_stat_statements does not). It also can't end up attached to any
foreign scan or otherwise potentially third-party plan node.
Tom> I think you should get rid of that concept and instead implementTom> the behavior by having nodeAgg.c set the
relevantfields of theTom> representative tuple slot to NULL, so that a regular Var doesTom> the right thing.
 

We did consider that. Messing with the null flags of the slot didn't
seem like an especially clean approach. But if that's how you want
it...
Tom> I don't really have any comments on the algorithms yet, havingTom> spent too much time trying to figure out
underdocumenteddataTom> structures to get to the algorithms.  However, noting theTom> addition of
list_intersection_int()made me wonder whether you'dTom> not be better off reducing the integer lists to bitmapsets a
lotTom>sooner, perhaps even at parse analysis.
 

list_intersection_int should not be time-critical; common queries do
not call it at all (simple cube or rollup clauses always have an empty
grouping set, causing the intersection test to bail immediately), and
in pathological worst-case constructions like putting a dozen
individually grouped columns in front of a 12-d cube (thus calling it
4096 times on lists at least 12 nodes long) it doesn't account for
more than a small percentage even with optimization off and debugging
and asserts on.

The code uses the list representation almost everywhere in parsing and
planning because in some places the order of elements matters, and I
didn't want to keep swapping between a bitmap and a list
representation.

(We _do_ use bitmapsets where we're potentially going to be doing an
O(N^2) number of subset comparisons to build the graph adjacency
list for computing the minimal set of sort operations, and at
execution time.)

I didn't even consider using bitmaps for the output of parse analysis
because at that stage we want to preserve most of the original query
substructure (otherwise view deparse won't look anything like the
original query did).
Tom> list_intersection_int() is going to be O(N^2) by nature.  MaybeTom> N can't get large enough to matter in this
context,but I do seeTom> places that seem to be concerned about performance.
 

My main feeling on performance is that simple cube and rollup clauses
or short lists of grouping sets should parse and plan very quickly;
more complex cases should parse and plan fast enough that execution
time on any nontrivial input will swamp the parse/plan time; and the
most complex cases that aren't outright rejected should plan in no
more than a few seconds extra. (We're limiting to 4096 grouping sets
in any query level, which is comparable to other databases and seems
quite excessively high compared to what people are actually likely to
need.)

(don't be fooled by the excessive EXPLAIN time on some queries. There
are performance issues in EXPLAIN output generation that have nothing
to do with this patch, and which I've not pinned down.)

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Michael Paquier
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes