Re: Assertion failure with LEFT JOINs among >500 relations - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Assertion failure with LEFT JOINs among >500 relations |
Date | |
Msg-id | CAApHDvowKW=Q7PTTNvHEeACfqttCkaY0QQt_cK0jpm+UMLnM+A@mail.gmail.com Whole thread Raw |
In response to | Re: Assertion failure with LEFT JOINs among >500 relations (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Assertion failure with LEFT JOINs among >500 relations
|
List | pgsql-hackers |
On Sat, 10 Oct 2020 at 02:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm fairly certain that every one of the existing NaN checks was put > there on the basis of hard experience. Possibly digging in the git > history would offer more info about exactly where the NaNs came from. I had a look at this and found there's been quite a number of fixes which added either that isnan checks or the <= 0 checks. Namely: ----------- commit 72826fb362c4aada6d2431df0b706df448806c02 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Apr 15 17:45:41 2011 -0400 Guard against incoming rowcount estimate of NaN in cost_mergejoin(). Although rowcount estimates really ought not be NaN, a bug elsewhere could perhaps result in that, and that would cause Assert failure in cost_mergejoin, which I believe to be the explanation for bug #5977 from Anton Kuznetsov. Seems like a good idea to expend a couple more cycles to prevent that, even though the real bug is elsewhere. Not back-patching, though, because we don't encourage running production systems with Asserts on. The discussion for that is in https://www.postgresql.org/message-id/flat/4602.1302705756%40sss.pgh.pa.us#69dd8c334aa714cfac4e0d9b04c5201c commit 76281aa9647e6a5dfc646514554d0f519e3b8a58 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sat Mar 26 12:03:12 2016 -0400 Avoid a couple of zero-divide scenarios in the planner. commit fd791e7b5a1bf53131ad15e68e4d4f8ca795fcb4 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon Mar 24 21:53:04 2008 +0000 When a relation has been proven empty by constraint exclusion, propagate that knowledge up through any joins it participates in. We were doing that already in some special cases but not in the general case. Also, defend against zero row estimates for the input relations in cost_mergejoin --- this fix may have eliminated the only scenario in which that can happen, but be safe. Per report from Alex Solovey. That was reported in https://www.postgresql.org/message-id/flat/BLU136-DAV79FF310AC13FFC96FA2FDAEFD0%40phx.gbl#4cde17b2369fc7e0da83cc7d4aeeaa48 The problem was that an Append with no subpaths could have a 0 row estimate. ----------- Because there's been quite a few of these, and this report is yet another one, I wonder if it's time to try and stamp these out at the source rather than where the row counts are being used. I toyed around with the attached patch, but I'm still not that excited about the clamping of infinite values to DBL_MAX. The test case I showed above with generate_Series(1,379) still ends up with NaN cost estimates due to costing a sort with DBL_MAX rows. When I was writing the patch, I had it in my head that the costs per row will always be lower than 1. I thought because of that that even if the row count is dangerously close to DBL_MAX, the costs will never be higher than the row count... Turns out, I was wrong about that as clearly sorting a number of rows even close to DBL_MAX would beyond astronomically expensive and cause the costs would go infinite. The fd791e7b5 fix was for a subpath-less Append node having a 0-row estimate and causing problems in the costing of merge join. In the patch, I thought it would be better just to fix this by insisting that Append always will have at least 1 row. That means even a dummy path would have 1 row, which will become a const-false Result in the plan. I've had to add a special case to set the plan_rows back to 0 so that EXPLAIN shows 0 rows as it did before. That's not exactly pretty, but I still feel there is merit in insisting we never have 0-row paths to get away from these types of bugs once at for all. The patch does fix the failing Assert. However, something along these lines seems more suitable for master only. The back branches maybe should just get a more localised isinf() check and clamp to DBL_MAX that I mentioned earlier in this thread. I've searched through the code to see if there are other possible cases where paths may be generated with a 0-row count. I imagine anything that has a qual and performs a selectivity estimate will already have a clamp_row_est() since we'd see fractional row counts if it didn't. That leaves me with Append / Merge Append and each join type + aggregates. Currently, it seems we never will generate a Merge Append without any sub-paths. I wondered if I should just Assert that's the case in create_merge_append_path(). I ended up just adding a clamp_row_est call instead. calc_joinrel_size_estimate() seems to handle all join path row estimates. That uses clamp_row_est. Aggregate paths can reduce the number of rows, but I think all the row estimates from those will go through estimate_num_groups(), which appears to never be able to return 0. David
Attachment
pgsql-hackers by date: