Thread: Wrong stats for empty tables

Wrong stats for empty tables

From
"Emmanuel Cecchet"
Date:
Hi,

Here is an example showing the problem:

Welcome to psql 8.3.6, the PostgreSQL interactive terminal.
manu=# create table foo (x int);
CREATE TABLE
manu=# explain select * from foo;                     QUERY PLAN
-------------------------------------------------------Seq Scan on foo  (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;                     QUERY PLAN
-------------------------------------------------------Seq Scan on foo  (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# insert into foo values (1);
INSERT 0 1
manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;                   QUERY PLAN
---------------------------------------------------Seq Scan on foo  (cost=0.00..1.01 rows=1 width=4)
(1 row)


Now a possible cause for this might be the relpages attribute in pg_class (the default value 0 does not seem to be
interpretedcorrectly): 

manu=# create table bar(x int);
CREATE TABLE
manu=# explain select * from bar;                     QUERY PLAN
-------------------------------------------------------Seq Scan on bar  (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# select relpages from pg_class where relname='bar';relpages
----------       0
(1 row)
manu=# update pg_class set relpages=1 where relname='bar';
UPDATE 1
manu=# explain select * from bar;                   QUERY PLAN
---------------------------------------------------Seq Scan on bar  (cost=0.00..0.00 rows=1 width=4)
(1 row)


This is a real problem if you have a lot of empty child tables. Postgres will not optimize correctly queries in the
presenceof empty child tables. 

Is this a bug?

Thanks for your help,
Emmanuel
--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


Re: Wrong stats for empty tables

From
Tom Lane
Date:
"Emmanuel  Cecchet" <Emmanuel.Cecchet@asterdata.com> writes:
> Is this a bug?

No, it's intentional.
        regards, tom lane


Re: Wrong stats for empty tables

From
"Emmanuel Cecchet"
Date:
From: Tom Lane [tgl@sss.pgh.pa.us]
Subject: Re: [HACKERS] Wrong stats for empty tables

"Emmanuel  Cecchet" <Emmanuel.Cecchet@asterdata.com> writes:
> Is this a bug?

No, it's intentional.


So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example:

manu=# create table father (id int, val int, tex varchar(100), primary key(id));
manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2));
insert some data
manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
                   QUERY PLAN 
------------------------------------------------------------------------Sort  (cost=37.81..37.82 rows=5 width=230)
SortKey: father.id  ->  Hash Join  (cost=23.44..37.75 rows=5 width=230)        Hash Cond: (father.id = other.id1)
->  Seq Scan on father  (cost=0.00..13.10 rows=310 width=226)        ->  Hash  (cost=23.38..23.38 rows=5 width=8)
      ->  Seq Scan on other  (cost=0.00..23.38 rows=5 width=8)                    Filter: (id2 = 2) 
(8 rows)

manu=# create table child1() inherits(father);
manu=# create table child2() inherits(father);
manu=# create table child3() inherits(father);
manu=# create table child4() inherits(father);
manu=# create table child5() inherits(father);
manu=# create table child6() inherits(father);
manu=# create table child7() inherits(father);
manu=# create index i1 on child1(id);
manu=# create index i2 on child2(id);
manu=# create index i3 on child3(id);
manu=# create index i4 on child4(id);
manu=# create index i5 on child5(id);
manu=# create index i6 on child6(id);
manu=# create index i7 on child7(id);
manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
                         QUERY PLAN 
------------------------------------------------------------------------------------Sort  (cost=140.00..140.16 rows=62
width=230) Sort Key: public.father.id  ->  Hash Join  (cost=23.44..138.16 rows=62 width=230)        Hash Cond:
(public.father.id= other.id1)        ->  Append  (cost=0.00..104.80 rows=2480 width=226)              ->  Seq Scan on
father (cost=0.00..13.10 rows=310 width=226)              ->  Seq Scan on child1 father  (cost=0.00..13.10 rows=310
width=226)             ->  Seq Scan on child2 father  (cost=0.00..13.10 rows=310 width=226)              ->  Seq Scan
onchild3 father  (cost=0.00..13.10 rows=310 width=226)              ->  Seq Scan on child4 father  (cost=0.00..13.10
rows=310width=226)              ->  Seq Scan on child5 father  (cost=0.00..13.10 rows=310 width=226)              ->
SeqScan on child6 father  (cost=0.00..13.10 rows=310 width=226)              ->  Seq Scan on child7 father
(cost=0.00..13.10rows=310 width=226)        ->  Hash  (cost=23.38..23.38 rows=5 width=8)              ->  Seq Scan on
other (cost=0.00..23.38 rows=5 width=8)                    Filter: (id2 = 2) 
(16 rows)



I must admit that I did not see what the original intention was to get this behavior.
Emmanuel

Re: Wrong stats for empty tables

From
Josh Berkus
Date:
On 5/5/09 9:52 AM, Tom Lane wrote:
> "Emmanuel  Cecchet"<Emmanuel.Cecchet@asterdata.com>  writes:
>> Is this a bug?
>
> No, it's intentional.

Huh?  Why would we want wrong stats?

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: Wrong stats for empty tables

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 5/5/09 9:52 AM, Tom Lane wrote:
>> No, it's intentional.

> Huh?  Why would we want wrong stats?

Tables rarely stay empty; and a plan generated on the assumption that a
table is empty is likely to suck much more when the table stops being
empty than a plan generated on the assumption that the table contains
some data will suck when it really doesn't.  Neither case is really
attractive, but the downside of a size underestimate tends to be a
lot worse than that of an overestimate.

This decision was made before we had autovacuum/autoanalyze support
or the ability to replan automatically after a stats update, but I think
it's still good even now that we do.  You can add a hundred or so tuples
to an empty table before autovac will deign to pay attention, and that's
more than enough to blow a nestloop plan out of the water.  Also, the
most common case for this type of issue is a temp table, which autovac
can't help with at all.
        regards, tom lane


Re: Wrong stats for empty tables

From
Robert Haas
Date:
On Tue, May 5, 2009 at 2:03 PM, Emmanuel  Cecchet
<Emmanuel.Cecchet@asterdata.com> wrote:
> So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following
example:
>
> manu=# create table father (id int, val int, tex varchar(100), primary key(id));
> manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2));
> insert some data
> manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;

Just because the table was empty at the time statistics were most
recently gathered doesn't mean it's still empty at the time the query
is executed.

ANALYZE;
PREPARE foo AS SELECT ...;
INSERT INTO ...some previously empty child table...
EXECUTE foo;

In order to rely on this for query planning, you'd need some way to
invalidate any cached plans when inserting into an empty table.

...Robert