[Outcome] Queries running forever, because of wrong rowcount estimate - Mailing list pgsql-general

From Peter
Subject [Outcome] Queries running forever, because of wrong rowcount estimate
Date
Msg-id Y+vVM2JwUPZr3Csv@disp.intra.daemon.contact
Whole thread Raw
In response to Re: [Testcase] Queries running forever, because of wrong rowcount estimate  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Mon, Feb 13, 2023 at 12:38:12PM -0500, Tom Lane wrote:
! Peter <pmc@citylink.dinoex.sub.org> writes:
! > "rows=1" in the "Hash Anti Join" line is WRONG. It should be
! > 3000000. Or at least some thousands.
! 
! FWIW, this behaves better in v14 and up.  In older versions there's
! an ambiguity about what relpages=reltuples=0 means; it's the state
! when ANALYZE has never yet examined the table contents, but it's
! also the state when we have found the table to be empty.  Because
! of the former case, older planner versions don't like to believe
! that a table is empty even if reltuples says so.
! 
!             regards, tom lane

Okay, I got someway through with it.

Given, the destination table is empty, and VACUUMed. Then there is
relpages = reltuples = 0.

We do some
"source LEFT OUTER JOIN destination ON (matchfield) WHERE matchfield IS NULL"
The source is big, but carries only a few distinct matchfield values.

The so-called "hack" in heapam_handler.c:heapam_estimate_rel_size()
does then makes the planner believe that there are actually 10
curpages. This is not wrong, because the same might happen if one does
an INSERT amounting for 10 pages and not yet ANALYZE.

We then get a calculated rowcount of whatever, lets assume 2330 rows.

Then we go into eqjoinsel():

n_distinct for the left source table is known, it is (lets assume) 4.
n_distinct for the right destination table is not known because it is
not analyzed, but it is UNIQUE, so it becomes 2330.

We get into eqjoinsel_inner(): 

MCV (right) for the destination is not known because it is not analyzed.

selec = selec / nd2   => 0.000429

We get into eqjoinsel_semi():

                        if (nd1 <= nd2 || nd2 < 0)
                                selec = 1.0 - nullfrac1;

Voila, selec = 1.0

And so the ANTI JOIN will estimate to 0 (aka 1) result rows, instead
of whatever rowcount the source brings along (usually big).

----------------- cut -----------------------
Okay, so this should be fixed in R.14.

But, as mentioned above, the same can also happen in normal operations

Example:
I have a logfile from my webserver and a table to collect the (unique)
IP-addresses. I start with an empty table and feed in the log.

First round, the destination addresses. Lats assume there are 2330
different addresses.
Second round, the source addresses. There are only 4 different ones.

I don't do an ANALYZE between the two steps.

And voila, I get the same effect as above. (Been there, done
it: duration: 30216273.583 ms)

Agreed, this is an inefficient way of doing that. It would be better
to do a UNION on the source data, at that point filter it accordingly,
then DISTINCT, and only then the ANTI JOIN. 

But I'm a lazy guy and I may not care if a query that could run in
0.5 seconds takes 60 seconds instead.
However, it's not so nice when it takes 10 hours.

Anyway, I don't get a real clue on what this stance should actually
achieve (from selfuncs.c:eqjoinsel_semi()#2773):

            if (nd1 <= nd2 || nd2 < 0)
                selec = 1.0 - nullfrac1;
            else
                selec = (nd2 / nd1) * (1.0 - nullfrac1);

This seems to assume a 100% match whenever the left (source) table
brings fewer distinct(!) values than the right (destination) table
already carries.

For now, I have just disabled this behaviour, in the rough&dirty way:

--- src/backend/optimizer/path/costsize.c.orig  2022-11-07 22:47:13.000000000 +0100
+++ src/backend/optimizer/path/costsize.c       2023-02-13 00:04:54.156489000 +0100
@@ -4685,8 +4685,11 @@
                        /* pselec not used */
                        break;
                case JOIN_ANTI:
+                       if(fkselec * jselec > 0.9975) {
+                               jselec = 0.9975;
+                       }
                        nrows = outer_rows * (1.0 - fkselec * jselec);
                        nrows *= pselec;
                        break;
                default:
                        /* other values not expected here */


cheerio,
PMc



pgsql-general by date:

Previous
From: Sebastien Flaesch
Date:
Subject: Multi-column index: Which column order
Next
From: Erik Wienhold
Date:
Subject: Re: Multi-column index: Which column order