Plan stability versus near-exact ties in cost estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Plan stability versus near-exact ties in cost estimates |
Date | |
Msg-id | 18065.1334875171@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Plan stability versus near-exact ties in cost estimates
Re: Plan stability versus near-exact ties in cost estimates Re: Plan stability versus near-exact ties in cost estimates Re: Plan stability versus near-exact ties in cost estimates Re: Plan stability versus near-exact ties in cost estimates |
List | pgsql-hackers |
So after committing the latest round of parameterized-plan hacking, I was dismayed to see the buildfarm breaking out in pink, thanks to some of the members producing a different plan than expected for one test query. I eventually managed to reproduce that (on the fourth machine I tried locally), and after some quality time with gdb I understand what is happening, to wit: the two alternative plans have exactly the same cost so far as our cost model is concerned. On my main development machine, the two plans look like this to add_path: $13 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x40193c08, param_info = 0x40194458, rows = 5, startup_cost =0, total_cost = 47.628499999999988, pathkeys = 0x0} $16 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x40193c08, param_info = 0x40194458, rows = 5, startup_cost =0, total_cost = 47.628499999999981, pathkeys = 0x0} so it picks the second one on the basis that its total_cost is better at the sixteenth decimal place. On the other machine, the same two paths look like this: $12 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x895b9e0, param_info = 0x895c198, rows = 5, startup_cost = 0, total_cost = 47.578499999999977, pathkeys = 0x0} Breakpoint 2, add_path (parent_rel=0x895b9e0, new_path=0x895c208) $15 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x895b9e0, param_info = 0x895c198, rows = 5, startup_cost = 0, total_cost = 47.578499999999977, pathkeys = 0x0} and add_path is coded to arbitrarily keep the first one when two paths are exactly the same on all its preference measures. Now, the fact that the two machines get different costs at the third decimal place isn't very interesting here; that's a pretty standard cross-platform difference arising from different MAXALIGN values. The important point is that the total_costs of the two paths are exactly the same on one machine, and on the other one different only by a microscopic amount that probably arises from a slightly different floating-point calculation sequence chosen by a different compiler. So, as our code stands, we're inevitably going to have very platform- and compiler-specific decisions as to which plan to prefer. I'm a bit surprised that it's taken us this long to trip over this type of situation, because it's surely not specific to parameterized paths. We could deal with this either by giving up on showing the selected plan in the regression test, or by creating multiple "expected" files, but neither of those alternatives is very appealing. The idea that I'm toying with is to try to make the choice a bit less platform-specific, by removing the exact cost test that add_path uses as its last-ditch comparison step, essentially this: /* * Same pathkeys and outer rels, and fuzzily * the same cost, so keep just one; to decide * which, first check rowsand then do an * exact cost comparison. */ if (new_path->rows < old_path->rows) remove_old = true; /* new dominatesold */ - else if (new_path->rows > old_path->rows) - accept_new = false; /* old dominates new */ - else if (compare_path_costs(new_path, old_path, - TOTAL_COST) < 0) - remove_old = true; /* new dominates old */ else accept_new = false; /* old equals or dominates new */ This change would mean that, when two paths have the same pathkeys, parameterization, and rowcount, and fuzzily the same cost, that we arbitrarily keep the first-submitted one rather than looking at low order digits of the costs. Since the order in which different paths are generated is a property of our code and not platform-specific, this should eliminate platform dependencies in cases where two paths are essentially identical to the cost model. A variant idea would be to replace the exact cost comparison with a second round of fuzzy cost comparison, but with a much tighter fuzz factor, maybe 1e-6 instead of 0.01. Now, neither of these fixes is perfect: what they would do is remove platform-specific instability from where the costs are basically equal and add some more in the range where the costs differ by almost exactly the fuzz factor. But the behavior near that point is platform-specific already, just not quite as much, and it's surely something we're unlikely to trip over in the regression tests. Thoughts, better ideas? regards, tom lane
pgsql-hackers by date: