Re: Parallel append plan instability/randomness - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Parallel append plan instability/randomness |
Date | |
Msg-id | 12879.1515439091@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Parallel append plan instability/randomness (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel append plan instability/randomness
Re: Parallel append plan instability/randomness Re: Parallel append plan instability/randomness |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jan 7, 2018 at 11:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> One theory that can explain above failure is that the costs of >> scanning some of the sub-paths is very close due to which sometimes >> the results can vary. If that is the case, then probably using >> fuzz_factor in costs comparison (as is done in attached patch) can >> improve the situation, may be we have to consider some other factors >> like number of rows in each subpath. > This isn't an acceptable solution because sorting requires that the > comparison operator satisfy the transitive property; that is, if a = b > and b = c then a = c. With your proposed comparator, you could have a > = b and b = c but a < c. That will break stuff. > It seems like the obvious fix here is to use a query where the > contents of the partitions are such that the sorting always produces > the same result. We could do that either by changing the query or by > changing the data in the partitions or, maybe, by inserting ANALYZE > someplace. The foo_star tables are made in create_table.sql, filled in create_misc.sql, and not modified thereafter. The fact that we have accurate rowcounts for them in select_parallel.sql is because of the database-wide VACUUM that happens at the start of sanity_check.sql. Given the lack of any WHERE condition, the costs in this particular query depend only on the rowcount and physical table size, so inserting an ANALYZE shouldn't (and doesn't, for me) change anything. I would be concerned about side-effects on other queries anyway if we were to ANALYZE tables that have never been ANALYZEd in the regression tests before. I remain pretty much at a loss for exactly what happened on silverfish, but that doesn't mean I have no ideas for improving things. Both append_total_cost_compare and append_startup_cost_compare are seriously deficient IMO, because they ignore the question of what to do when total costs (resp. startup costs) are exactly equal, leaving the outcome to the none too tender mercies of qsort(). Which you'll recall is not a stable algorithm. (For fewer than 7 items, our implementation of qsort falls back to an insertion sort, which *is* stable, perhaps explaining why we've not seen failures in this test case many times before. So this line of thought doesn't explain what happened on silverfish. But there's surely room for improvement here.) The normal rule in the planner, as embodied in places like compare_path_costs, is to break ties on total cost by comparing startup cost, and vice versa, which these two functions fail to do; so my first point is that they ought to be using compare_path_costs rather than inventing their own approach. This would especially affect append_startup_cost_compare, because of the frequency with which plans have identical startup costs (ie 0). Right now I'm afraid we're getting basically random ordering of the partial subpaths in most cases. In the regression test case at hand, the startup costs are all zero so this change wouldn't improve the test case's stability. So I'm thinking that in addition, it would be a good idea for these functions to break exact compare_path_costs ties in some arbitrary but deterministic way, rather than letting qsort() have the final say on what happens. If the subplans were all simple relation scans we could order them by planner relid, but I'm not sure what to do if they're not. BTW, I had not looked closely at list_qsort before, but now that I have it seems seriously bizarre. Why is it allocating a new List header rather than reusing the old one? Because it stomps on the next-links of the list cells, the old header is effectively corrupted and can't possibly be useful afterwards, so we might as well re-use it and save a palloc cycle. (By the same token, it's pretty ludicrous that it claims the input List is "const". Not stomping on the list header is a pretty poor definition of not modifying the list.) regards, tom lane
pgsql-hackers by date: