Thread: BUG #15184: Planner overestimates number of rows in empty table

BUG #15184: Planner overestimates number of rows in empty table

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15184
Logged by:          Alexey Ermakov
Email address:      alexey.ermakov@dataegret.com
PostgreSQL version: 10.3
Operating system:   Linux
Description:

Hello,

in optimizer/util/plancat.c we have following lines
(https://github.com/postgres/postgres/blob/master/src/backend/optimizer/util/plancat.c#L957):

            if (curpages < 10 &&
                rel->rd_rel->relpages == 0 &&
                !rel->rd_rel->relhassubclass &&
                rel->rd_rel->relkind != RELKIND_INDEX)
                curpages = 10;

            /* report estimated # pages */
            *pages = curpages;
            /* quick exit if rel is clearly empty */
            if (curpages == 0)
            {
                *tuples = 0;
                *allvisfrac = 0;
                break;
            }

if table is really empty then in first condition we set curpages = 10 and
second condition doesn't apply.
so we estimate that empty table has 10 pages and 2550 rows (for table with
one int column)
which doesn't look good. is it intended behavior? perhaps we should add
(curpages > 0) condition ?

that overestimate could mess plans for example when we use temporary tables
which might be empty.

Thanks,
Alexey Ermakov


Re: BUG #15184: Planner overestimates number of rows in empty table

From
Andrew Gierth
Date:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> if table is really empty then in first condition we set curpages =
 PG> 10 and second condition doesn't apply. so we estimate that empty
 PG> table has 10 pages and 2550 rows (for table with one int column)
 PG> which doesn't look good. is it intended behavior?

As the large comment immediately above explains, it is indeed intended
behavior. The reason is that over-estimating usually doesn't cause too
much harm, but under-estimating tends to blow things up in certain
critical cases (such as causing foreign-key checks to do sequential
scans on tables during a data-loading transaction).

It's actually still possible to trigger those kinds of pathological
cases, but in between the estimation hacks and the plan cache, you have
to work a lot harder at it. Consider for example:

create table tree (id integer primary key,
                   parent_id integer references tree);

insert into tree values (1, null);
vacuum analyze tree;  -- now relpages=1 reltuples=1
begin;
insert into tree select i, i-1 from generate_series(2,10) i;
insert into tree select i, i-1 from generate_series(11,100000) i;
commit;

That last insert could take maybe half an hour to run, because the FK
check has a query plan - established as a generic plan since the middle
insert ran it more than 5 times - with the small table size leading to a
sequential scan.

Without the vacuum analyze that I stuck in there, the code in plancat.c
avoids this problem by treating the table as large enough to require an
indexscan from the start.

As the comment says, this does mean we don't handle the case when the
table really is empty and stays empty. But this should be very rare
compared to the case where the table starts out empty but then has rows
added.

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15184: Planner overestimates number of rows in empty table

From
Alexey Ermakov
Date:
Thanks for the explanation and example.

On 5/3/18 21:40, Andrew Gierth wrote:
> As the large comment immediately above explains, it is indeed intended
> behavior. The reason is that over-estimating usually doesn't cause too
> much harm, but under-estimating tends to blow things up in certain
> critical cases (such as causing foreign-key checks to do sequential
> scans on tables during a data-loading transaction).
>
> It's actually still possible to trigger those kinds of pathological
> cases, but in between the estimation hacks and the plan cache, you have
> to work a lot harder at it. Consider for example:
>
> create table tree (id integer primary key,
>                     parent_id integer references tree);
>
> insert into tree values (1, null);
> vacuum analyze tree;  -- now relpages=1 reltuples=1
> begin;
> insert into tree select i, i-1 from generate_series(2,10) i;
> insert into tree select i, i-1 from generate_series(11,100000) i;
> commit;
>
> That last insert could take maybe half an hour to run, because the FK
> check has a query plan - established as a generic plan since the middle
> insert ran it more than 5 times - with the small table size leading to a
> sequential scan.
>
> Without the vacuum analyze that I stuck in there, the code in plancat.c
> avoids this problem by treating the table as large enough to require an
> indexscan from the start.
>
> As the comment says, this does mean we don't handle the case when the
> table really is empty and stays empty. But this should be very rare
> compared to the case where the table starts out empty but then has rows
> added.
>
--
Alexey Ermakov