Thread: Wrong stats for empty tables
Hi, Here is an example showing the problem: Welcome to psql 8.3.6, the PostgreSQL interactive terminal. manu=# create table foo (x int); CREATE TABLE manu=# explain select * from foo; QUERY PLAN -------------------------------------------------------Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (1 row) manu=# analyze foo; ANALYZE manu=# explain select * from foo; QUERY PLAN -------------------------------------------------------Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (1 row) manu=# insert into foo values (1); INSERT 0 1 manu=# analyze foo; ANALYZE manu=# explain select * from foo; QUERY PLAN ---------------------------------------------------Seq Scan on foo (cost=0.00..1.01 rows=1 width=4) (1 row) Now a possible cause for this might be the relpages attribute in pg_class (the default value 0 does not seem to be interpretedcorrectly): manu=# create table bar(x int); CREATE TABLE manu=# explain select * from bar; QUERY PLAN -------------------------------------------------------Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4) (1 row) manu=# select relpages from pg_class where relname='bar';relpages ---------- 0 (1 row) manu=# update pg_class set relpages=1 where relname='bar'; UPDATE 1 manu=# explain select * from bar; QUERY PLAN ---------------------------------------------------Seq Scan on bar (cost=0.00..0.00 rows=1 width=4) (1 row) This is a real problem if you have a lot of empty child tables. Postgres will not optimize correctly queries in the presenceof empty child tables. Is this a bug? Thanks for your help, Emmanuel -- Emmanuel Cecchet Aster Data Systems Web: http://www.asterdata.com
"Emmanuel Cecchet" <Emmanuel.Cecchet@asterdata.com> writes: > Is this a bug? No, it's intentional. regards, tom lane
From: Tom Lane [tgl@sss.pgh.pa.us] Subject: Re: [HACKERS] Wrong stats for empty tables "Emmanuel Cecchet" <Emmanuel.Cecchet@asterdata.com> writes: > Is this a bug? No, it's intentional. So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example: manu=# create table father (id int, val int, tex varchar(100), primary key(id)); manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2)); insert some data manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id; QUERY PLAN ------------------------------------------------------------------------Sort (cost=37.81..37.82 rows=5 width=230) SortKey: father.id -> Hash Join (cost=23.44..37.75 rows=5 width=230) Hash Cond: (father.id = other.id1) -> Seq Scan on father (cost=0.00..13.10 rows=310 width=226) -> Hash (cost=23.38..23.38 rows=5 width=8) -> Seq Scan on other (cost=0.00..23.38 rows=5 width=8) Filter: (id2 = 2) (8 rows) manu=# create table child1() inherits(father); manu=# create table child2() inherits(father); manu=# create table child3() inherits(father); manu=# create table child4() inherits(father); manu=# create table child5() inherits(father); manu=# create table child6() inherits(father); manu=# create table child7() inherits(father); manu=# create index i1 on child1(id); manu=# create index i2 on child2(id); manu=# create index i3 on child3(id); manu=# create index i4 on child4(id); manu=# create index i5 on child5(id); manu=# create index i6 on child6(id); manu=# create index i7 on child7(id); manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id; QUERY PLAN ------------------------------------------------------------------------------------Sort (cost=140.00..140.16 rows=62 width=230) Sort Key: public.father.id -> Hash Join (cost=23.44..138.16 rows=62 width=230) Hash Cond: (public.father.id= other.id1) -> Append (cost=0.00..104.80 rows=2480 width=226) -> Seq Scan on father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child1 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child2 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan onchild3 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child4 father (cost=0.00..13.10 rows=310width=226) -> Seq Scan on child5 father (cost=0.00..13.10 rows=310 width=226) -> SeqScan on child6 father (cost=0.00..13.10 rows=310 width=226) -> Seq Scan on child7 father (cost=0.00..13.10rows=310 width=226) -> Hash (cost=23.38..23.38 rows=5 width=8) -> Seq Scan on other (cost=0.00..23.38 rows=5 width=8) Filter: (id2 = 2) (16 rows) I must admit that I did not see what the original intention was to get this behavior. Emmanuel
On 5/5/09 9:52 AM, Tom Lane wrote: > "Emmanuel Cecchet"<Emmanuel.Cecchet@asterdata.com> writes: >> Is this a bug? > > No, it's intentional. Huh? Why would we want wrong stats? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 5/5/09 9:52 AM, Tom Lane wrote: >> No, it's intentional. > Huh? Why would we want wrong stats? Tables rarely stay empty; and a plan generated on the assumption that a table is empty is likely to suck much more when the table stops being empty than a plan generated on the assumption that the table contains some data will suck when it really doesn't. Neither case is really attractive, but the downside of a size underestimate tends to be a lot worse than that of an overestimate. This decision was made before we had autovacuum/autoanalyze support or the ability to replan automatically after a stats update, but I think it's still good even now that we do. You can add a hundred or so tuples to an empty table before autovac will deign to pay attention, and that's more than enough to blow a nestloop plan out of the water. Also, the most common case for this type of issue is a temp table, which autovac can't help with at all. regards, tom lane
On Tue, May 5, 2009 at 2:03 PM, Emmanuel Cecchet <Emmanuel.Cecchet@asterdata.com> wrote: > So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example: > > manu=# create table father (id int, val int, tex varchar(100), primary key(id)); > manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2)); > insert some data > manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id; Just because the table was empty at the time statistics were most recently gathered doesn't mean it's still empty at the time the query is executed. ANALYZE; PREPARE foo AS SELECT ...; INSERT INTO ...some previously empty child table... EXECUTE foo; In order to rely on this for query planning, you'd need some way to invalidate any cached plans when inserting into an empty table. ...Robert