Thread: BUG #15184: Planner overestimates number of rows in empty table
The following bug has been logged on the website: Bug reference: 15184 Logged by: Alexey Ermakov Email address: alexey.ermakov@dataegret.com PostgreSQL version: 10.3 Operating system: Linux Description: Hello, in optimizer/util/plancat.c we have following lines (https://github.com/postgres/postgres/blob/master/src/backend/optimizer/util/plancat.c#L957): if (curpages < 10 && rel->rd_rel->relpages == 0 && !rel->rd_rel->relhassubclass && rel->rd_rel->relkind != RELKIND_INDEX) curpages = 10; /* report estimated # pages */ *pages = curpages; /* quick exit if rel is clearly empty */ if (curpages == 0) { *tuples = 0; *allvisfrac = 0; break; } if table is really empty then in first condition we set curpages = 10 and second condition doesn't apply. so we estimate that empty table has 10 pages and 2550 rows (for table with one int column) which doesn't look good. is it intended behavior? perhaps we should add (curpages > 0) condition ? that overestimate could mess plans for example when we use temporary tables which might be empty. Thanks, Alexey Ermakov
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes: PG> if table is really empty then in first condition we set curpages = PG> 10 and second condition doesn't apply. so we estimate that empty PG> table has 10 pages and 2550 rows (for table with one int column) PG> which doesn't look good. is it intended behavior? As the large comment immediately above explains, it is indeed intended behavior. The reason is that over-estimating usually doesn't cause too much harm, but under-estimating tends to blow things up in certain critical cases (such as causing foreign-key checks to do sequential scans on tables during a data-loading transaction). It's actually still possible to trigger those kinds of pathological cases, but in between the estimation hacks and the plan cache, you have to work a lot harder at it. Consider for example: create table tree (id integer primary key, parent_id integer references tree); insert into tree values (1, null); vacuum analyze tree; -- now relpages=1 reltuples=1 begin; insert into tree select i, i-1 from generate_series(2,10) i; insert into tree select i, i-1 from generate_series(11,100000) i; commit; That last insert could take maybe half an hour to run, because the FK check has a query plan - established as a generic plan since the middle insert ran it more than 5 times - with the small table size leading to a sequential scan. Without the vacuum analyze that I stuck in there, the code in plancat.c avoids this problem by treating the table as large enough to require an indexscan from the start. As the comment says, this does mean we don't handle the case when the table really is empty and stays empty. But this should be very rare compared to the case where the table starts out empty but then has rows added. -- Andrew (irc:RhodiumToad)
Thanks for the explanation and example. On 5/3/18 21:40, Andrew Gierth wrote: > As the large comment immediately above explains, it is indeed intended > behavior. The reason is that over-estimating usually doesn't cause too > much harm, but under-estimating tends to blow things up in certain > critical cases (such as causing foreign-key checks to do sequential > scans on tables during a data-loading transaction). > > It's actually still possible to trigger those kinds of pathological > cases, but in between the estimation hacks and the plan cache, you have > to work a lot harder at it. Consider for example: > > create table tree (id integer primary key, > parent_id integer references tree); > > insert into tree values (1, null); > vacuum analyze tree; -- now relpages=1 reltuples=1 > begin; > insert into tree select i, i-1 from generate_series(2,10) i; > insert into tree select i, i-1 from generate_series(11,100000) i; > commit; > > That last insert could take maybe half an hour to run, because the FK > check has a query plan - established as a generic plan since the middle > insert ran it more than 5 times - with the small table size leading to a > sequential scan. > > Without the vacuum analyze that I stuck in there, the code in plancat.c > avoids this problem by treating the table as large enough to require an > indexscan from the start. > > As the comment says, this does mean we don't handle the case when the > table really is empty and stays empty. But this should be very rare > compared to the case where the table starts out empty but then has rows > added. > -- Alexey Ermakov