Re: Question: test "aggregates" failed in 32-bit machine - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Question: test "aggregates" failed in 32-bit machine |
Date | |
Msg-id | 674544.1664566802@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Question: test "aggregates" failed in 32-bit machine (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Question: test "aggregates" failed in 32-bit machine
|
List | pgsql-hackers |
I wrote: > Given the previous complaints about db0d67db2, I wonder if it's not > most prudent to revert it. I doubt we are going to get satisfactory > behavior out of it until there's fairly substantial improvements in > all these underlying estimates. After spending some more time looking at the code, I think that that is something we absolutely have to discuss. I already complained at [1] about how db0d67db2 made very significant changes in sort cost estimation behavior, which seem likely to result in significant user-visible plan changes that might or might not be for the better. But I hadn't read any of the code at that point. Now I have, and frankly it's not ready for prime time. Beyond the question of whether we have sufficiently accurate input values, I see these issues in and around compute_cpu_sort_cost(): 1. The algorithm is said to be based on Sedgewick & Bentley 2002 [2]. I have the highest regard for those two gentlemen, so I'm quite prepared to believe that their estimate of the number of comparisons used by Quicksort is good. However, the expression given in our comments: * log(N! / (X1! * X2! * ..)) ~ sum(Xi * log(N/Xi)) doesn't look much like anything they wrote. More, what we're actually doing is * We assume all Xi the same because now we don't have any estimation of * group sizes, we have only know the estimate of number of groups (distinct * values). In that case, formula becomes: * N * log(NumberOfGroups) That's a pretty drastic simplification. No argument is given as to why that's still reliable enough to be useful for the purposes to which this code tries to put it --- especially when you consider that real-world data is more likely to follow Zipf's law than have uniform group sizes. If you're going to go as far as doing this: * For multi-column sorts we need to estimate the number of comparisons for * each individual column - for example with columns (c1, c2, ..., ck) we * can estimate that number of comparisons on ck is roughly * ncomparisons(c1, c2, ..., ck) / ncomparisons(c1, c2, ..., c(k-1)) you'd better pray that your number-of-comparisons estimates are pretty darn good, or what you're going to get out is going to be mostly fiction. 2. Sedgewick & Bentley analyzed a specific version of Quicksort, which is ... um ... not the version we are using. It doesn't look to me like the choice of partitioning element is the same. Maybe that doesn't matter much in the end, but there's sure no discussion of the point in this patch. So at this point I've lost all faith in the estimates being meaningful at all. And that's assuming that the simplified algorithm is implemented accurately, which it is not: 3. totalFuncCost is started off at 1.0. Surely that should be zero? If not, at least a comment to justify it would be nice. 4. The code around the add_function_cost call evidently wants to carry the procost lookup result from one column to the next, because it skips the lookup when prev_datatype == em->em_datatype. However, the value of funcCost isn't carried across columns, because it's local to the loop. The effect of this is that anyplace where adjacent GROUP BY columns are of the same datatype, we'll use the fixed 1.0 value of funcCost instead of looking up the real procost. Admittedly, since the real procost is probably also 1.0, this might not mean much in practice. Nonetheless it's broken code. (Oh, btw: I doubt that using add_function_cost rather than raw procost is of any value whatsoever if you're just going to pass it a NULL node tree.) 5. I'm pretty dubious about the idea that we can use the rather-random first element of the EquivalenceClass to determine the datatype that will be compared, much less the average widths of the columns. It's entirely possible for an EC to contain both int4 and int8 vars, or text vars of substantially different average widths. I think we really need to be going back to the original GroupClauses and looking at the variables named there. 6. Worse than that, we're also using the first element of the EquivalenceClass to calculate the number of groups of this sort key. This is FLAT OUT WRONG, as certainly different EC members can have very different stats. 7. The code considers that presorted-key columns do not add to the comparison costs, yet the comment about it claims the opposite: /* * Presorted keys are not considered in the cost above, but we still * do have to compare them in the qsort comparator. So make sure to * factor in the cost in that case. */ if (i >= nPresortedKeys) { I'm not entirely sure whether the code is broken or the comment is, but at least one of them is. I'm also pretty confused about why we still add such columns' comparison functions to the running totalFuncCost if we think they're not sorted on. 8. In the case complained of to start this thread, we're unable to perceive any sort-cost difference between "p, d, c, v" and "p, c, d, v", which is a little surprising because that test case sets up c with twice as many distinct values as d. Other things being equal (which they are, because both columns are int4), surely the latter key ordering should be favored in hopes of reducing the number of times we have to compare the third column. But it's not. I think that this can probably be blamed on the early-exit condition at the bottom of the loop: /* * Once we get single-row group, it means tuples in the group are * unique and we can skip all remaining columns. */ if (tuplesPerPrevGroup <= 1.0) break; Ordering on p already gets us down to 2 tuples per group, so pretty much any of the other columns as second grouping column will compute a next group size of 1, and then we don't consider columns beyond that. 9. The is_fake_var() hackery is pretty sad. We should have found a better solution than that. Maybe estimate_num_groups() needs more work. 10. As I already mentioned, get_width_cost_multiplier() doesn't appear to have any foundation in reality; or if it does, the comments sure provide no justification for these particular equations rather than some other ones. The shakiness of the logic can be inferred immediately from the fact that the header comment is fundamentally confused about what it's doing: * Return value is in cpu_operator_cost units. No it isn't, it's a pure ratio. In short, I think the claim that this code provides better sort cost estimates than we had before is quite unjustified. Maybe it could get there eventually, but I do not want to ship v15 with this. I think we ought to revert all the changes around cost_sort. Perhaps we could salvage the GROUP BY changes by just ordering the columns by decreasing number of groups, which is the only component of the current cost estimation that I think has any detectable connection to reality. But I suspect the RMT will favor just reverting the whole thing for v15. regards, tom lane [1] https://www.postgresql.org/message-id/3242058.1659563057%40sss.pgh.pa.us [2] The URL given in the code doesn't work anymore, but this does: https://sedgewick.io/wp-content/uploads/2022/03/2002QuicksortIsOptimal.pdf
pgsql-hackers by date: