Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers
From | Andrew Gierth |
---|---|
Subject | Re: Final Patch for GROUPING SETS |
Date | |
Msg-id | 874mszsy9g.fsf@news-spur.riddles.org.uk Whole thread Raw |
In response to | Re: Final Patch for GROUPING SETS (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: With the high-priority questions out of the way, time to tackle the rest: Tom> My single biggest complaint is about the introduction of structTom> GroupedVar. If we stick with that, we're goingto have to teachTom> an extremely large number of places that know about Vars to alsoTom> know about GroupedVars. Thiswill result in code bloat andTom> errors of omission. If you think the latter concern isTom> hypothetical, note thatyou can't get 40 lines into gsp1.patchTom> without finding such an omission, namely the patch fails toTom> teach pg_stat_statements.cabout GroupedVars. (That also pointsTom> up that some of the errors of omission will be in third-partyTom>code that we can't fix easily.) Except that GroupedVar is created only late in planning, and so only a small proportion of places need to know about it (and certainly pg_stat_statements does not). It also can't end up attached to any foreign scan or otherwise potentially third-party plan node. Tom> I think you should get rid of that concept and instead implementTom> the behavior by having nodeAgg.c set the relevantfields of theTom> representative tuple slot to NULL, so that a regular Var doesTom> the right thing. We did consider that. Messing with the null flags of the slot didn't seem like an especially clean approach. But if that's how you want it... Tom> I don't really have any comments on the algorithms yet, havingTom> spent too much time trying to figure out underdocumenteddataTom> structures to get to the algorithms. However, noting theTom> addition of list_intersection_int()made me wonder whether you'dTom> not be better off reducing the integer lists to bitmapsets a lotTom>sooner, perhaps even at parse analysis. list_intersection_int should not be time-critical; common queries do not call it at all (simple cube or rollup clauses always have an empty grouping set, causing the intersection test to bail immediately), and in pathological worst-case constructions like putting a dozen individually grouped columns in front of a 12-d cube (thus calling it 4096 times on lists at least 12 nodes long) it doesn't account for more than a small percentage even with optimization off and debugging and asserts on. The code uses the list representation almost everywhere in parsing and planning because in some places the order of elements matters, and I didn't want to keep swapping between a bitmap and a list representation. (We _do_ use bitmapsets where we're potentially going to be doing an O(N^2) number of subset comparisons to build the graph adjacency list for computing the minimal set of sort operations, and at execution time.) I didn't even consider using bitmaps for the output of parse analysis because at that stage we want to preserve most of the original query substructure (otherwise view deparse won't look anything like the original query did). Tom> list_intersection_int() is going to be O(N^2) by nature. MaybeTom> N can't get large enough to matter in this context,but I do seeTom> places that seem to be concerned about performance. My main feeling on performance is that simple cube and rollup clauses or short lists of grouping sets should parse and plan very quickly; more complex cases should parse and plan fast enough that execution time on any nontrivial input will swamp the parse/plan time; and the most complex cases that aren't outright rejected should plan in no more than a few seconds extra. (We're limiting to 4096 grouping sets in any query level, which is comparable to other databases and seems quite excessively high compared to what people are actually likely to need.) (don't be fooled by the excessive EXPLAIN time on some queries. There are performance issues in EXPLAIN output generation that have nothing to do with this patch, and which I've not pinned down.) -- Andrew (irc:RhodiumToad)
pgsql-hackers by date: