Thread: planner or statistical bug on 8.5
Hello I checked query and I was surprised with very strange plan: postgres=# create table a(a int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "a_pkey" for table "a" CREATE TABLE postgres=# create table b(b int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "b_pkey" for table "b" CREATE TABLE postgres=# create table c(c int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "c_pkey" for table "c" CREATE TABLE postgres=# ANALYZE ; ANALYZE postgres=# explain select a, b from a,b,c; QUERY PLAN -----------------------------------------------------------------------Nested Loop (cost=0.00..276595350.00 rows=13824000000width=8) -> Nested Loop (cost=0.00..115292.00 rows=5760000 width=8) -> Seq Scan on a (cost=0.00..34.00rows=2400 width=4) -> Materialize (cost=0.00..82.00 rows=2400 width=4) -> Seq Scanon b (cost=0.00..34.00 rows=2400 width=4) -> Materialize (cost=0.00..82.00 rows=2400 width=0) -> Seq Scanon c (cost=0.00..34.00 rows=2400 width=0) (7 rows) Regards Pavel Stehule
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Pavel Stehule > Sent: Monday, January 11, 2010 11:55 PM > To: PostgreSQL Hackers > Cc: Tom Lane > Subject: [HACKERS] planner or statistical bug on 8.5 > > Hello > > I checked query and I was surprised with very strange plan: > > postgres=# create table a(a int primary key); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "a_pkey" for table "a" > CREATE TABLE > postgres=# create table b(b int primary key); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "b_pkey" for table "b" > CREATE TABLE > postgres=# create table c(c int primary key); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "c_pkey" for table "c" > CREATE TABLE > postgres=# ANALYZE ; > ANALYZE > postgres=# explain select a, b from a,b,c; > QUERY PLAN > ----------------------------------------------------------------------- > Nested Loop (cost=0.00..276595350.00 rows=13824000000 width=8) > -> Nested Loop (cost=0.00..115292.00 rows=5760000 width=8) > -> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4) > -> Materialize (cost=0.00..82.00 rows=2400 width=4) > -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) > -> Materialize (cost=0.00..82.00 rows=2400 width=0) > -> Seq Scan on c (cost=0.00..34.00 rows=2400 width=0) > (7 rows) You have no join columns, so it is a simple product. Perhaps you meant something like this: EXPLAIN SELECT a.a, b.b, c.c FROM a a, b b, c c WHERE a.a = b.b AND a.a = c.c
Il 12/01/2010 08:55, Pavel Stehule ha scritto: > I checked query and I was surprised with very strange plan: > > postgres=# explain select a, b from a,b,c; > QUERY PLAN > ----------------------------------------------------------------------- > Nested Loop (cost=0.00..276595350.00 rows=13824000000 width=8) > -> Nested Loop (cost=0.00..115292.00 rows=5760000 width=8) > -> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4) > -> Materialize (cost=0.00..82.00 rows=2400 width=4) > -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) > -> Materialize (cost=0.00..82.00 rows=2400 width=0) > -> Seq Scan on c (cost=0.00..34.00 rows=2400 width=0) > (7 rows) It doesn't surprise me. Tables are empty, thus get a default non-0 row estimate, which happens to be 2400: test=# create table a (a int); CREATE TABLE test=# ANALYZE a; ANALYZE test=# EXPLAIN SELECT * from a; QUERY PLAN ----------------------------------------------------- Seq Scan on a (cost=0.00..14.80 rows=2400 width=4) (1 row) That said, 2400^3 (cross join of 3 tables) == 13824000000 Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
2010/1/12 Matteo Beccati <php@beccati.com>: > Il 12/01/2010 08:55, Pavel Stehule ha scritto: >> >> I checked query and I was surprised with very strange plan: >> >> postgres=# explain select a, b from a,b,c; >> QUERY PLAN >> ----------------------------------------------------------------------- >> Nested Loop (cost=0.00..276595350.00 rows=13824000000 width=8) >> -> Nested Loop (cost=0.00..115292.00 rows=5760000 width=8) >> -> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4) >> -> Materialize (cost=0.00..82.00 rows=2400 width=4) >> -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) >> -> Materialize (cost=0.00..82.00 rows=2400 width=0) >> -> Seq Scan on c (cost=0.00..34.00 rows=2400 width=0) >> (7 rows) > > It doesn't surprise me. Tables are empty, thus get a default non-0 row > estimate, which happens to be 2400: > I though so default estimate is used only when table wasn't analysed. But you have a true. I am verifying it on 8.3 and the behave is same. so all is ok. Regards Pavel Stehule > test=# create table a (a int); > CREATE TABLE > test=# ANALYZE a; > ANALYZE > test=# EXPLAIN SELECT * from a; > QUERY PLAN > ----------------------------------------------------- > Seq Scan on a (cost=0.00..14.80 rows=2400 width=4) > (1 row) > > > That said, 2400^3 (cross join of 3 tables) == 13824000000 > > > Cheers > -- > Matteo Beccati > > Development & Consulting - http://www.beccati.com/ >
On Tue, Jan 12, 2010 at 3:08 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/1/12 Matteo Beccati <php@beccati.com>: >> Il 12/01/2010 08:55, Pavel Stehule ha scritto: >>> >>> I checked query and I was surprised with very strange plan: >>> >>> postgres=# explain select a, b from a,b,c; >>> QUERY PLAN >>> ----------------------------------------------------------------------- >>> Nested Loop (cost=0.00..276595350.00 rows=13824000000 width=8) >>> -> Nested Loop (cost=0.00..115292.00 rows=5760000 width=8) >>> -> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4) >>> -> Materialize (cost=0.00..82.00 rows=2400 width=4) >>> -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) >>> -> Materialize (cost=0.00..82.00 rows=2400 width=0) >>> -> Seq Scan on c (cost=0.00..34.00 rows=2400 width=0) >>> (7 rows) >> >> It doesn't surprise me. Tables are empty, thus get a default non-0 row >> estimate, which happens to be 2400: >> > > I though so default estimate is used only when table wasn't analysed. > But you have a true. I am verifying it on 8.3 and the behave is same. Not quite totally the same. I got: QUERY PLAN ------------------------------------------------------------------Nested Loop (cost=36.40..276619270.40 rows=13824000000width=8) -> Nested Loop (cost=0.00..139234.00 rows=5760000 width=4) -> Seq Scan on a (cost=0.00..34.00rows=2400 width=4) -> Seq Scan on c (cost=0.00..34.00 rows=2400 width=0) -> Materialize (cost=36.40..60.40rows=2400 width=4) -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) Tom made some changes for 8.5 that will result in materialization being used in more places, and I think we're seeing that here. The planner thinks that materializing the inner side of the nestloop will save it from going to disk for every iteration, but that's not really true. b will be fully cached anyway, but the planner doesn't know that. I think we need to think about this a little more before we let this code out into the wild, or we'll get complaints about materialize nodes being inserted in places where they only slow things down... Mind you, it's not totally obvious to me what the solution is. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Tom made some changes for 8.5 that will result in materialization > being used in more places, and I think we're seeing that here. The > planner thinks that materializing the inner side of the nestloop will > save it from going to disk for every iteration, but that's not really > true. b will be fully cached anyway, but the planner doesn't know > that. I think we need to think about this a little more before we let > this code out into the wild, or we'll get complaints about materialize > nodes being inserted in places where they only slow things down... I don't think it's a big deal. The materialize node won't create much of any slowdown unless its tuplestore gets big enough to spill to disk. And at that point you're probably talking enough savings from avoided visibility checks to make it worthwhile. It's possible that the cost parameters in there need some fine-tuning, but I don't think the costing model per se is wrong. It's certainly far less bogus than it was before. regards, tom lane