From: Artur Zając
Subject: Bad estimates
Date: ,
Msg-id: 03b401d3639e$5d7493a0$185dbae0$@ang.com.pl
(view: Whole thread, Raw)
Responses: Re: Bad estimates  (Don Seiler)
Re: Bad estimates (DEFAULT_UNK_SEL)  (Justin Pryzby)
Re: Bad estimates  (Tom Lane)
Re: Bad estimates  (Laurenz Albe)
List: pgsql-performance

Tree view

Bad estimates  (Artur Zając, )
 Re: Bad estimates  (Don Seiler, )
  RE: Bad estimates  ("Alex Ignatov", )
 Re: Bad estimates (DEFAULT_UNK_SEL)  (Justin Pryzby, )
 Re: Bad estimates  (Tom Lane, )
  RE: Bad estimates  (Artur Zając, )
 Re: Bad estimates  (Laurenz Albe, )

Hi,

We have table created like this:

CREATE TABLE xyz AS SELECT generate_series(1,10000000,1) AS gs;

Now:

db=# explain analyze select * from xyz where gs&1=1;
                                                 QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------Seq Scan on xyz  (cost=0.00..260815.38 rows=68920 width=4) (actual
time=0.044..2959.728 rows=5000000 loops=1)  Filter: ((gs & 1) = 1)  Rows Removed by Filter: 5000000Planning time: 0.133
msExecutiontime: 3340.886 ms
 
(5 rows)

And after adding additional clause to WHERE:

db=# explain analyze select * from xyz where gs&1=1 and gs&2=2;
                                                QUERY PLAN
----------------------------------------------------------------------------
---------------------------------Seq Scan on xyz  (cost=0.00..329735.50 rows=345 width=4) (actual
time=0.045..3010.430 rows=2500000 loops=1)  Filter: (((gs & 1) = 1) AND ((gs & 2) = 2))  Rows Removed by Filter:
7500000Planningtime: 0.106 msExecution time: 3176.355 ms
 
(5 rows)

And one more clause:

newrr=# explain analyze select * from xyz where gs&1=1 and gs&2=2 and
gs&4=4;                                               QUERY PLAN
----------------------------------------------------------------------------
-------------------------------Seq Scan on xyz  (cost=0.00..398655.62 rows=2 width=4) (actual
time=0.052..3329.422 rows=1250000 loops=1)  Filter: (((gs & 1) = 1) AND ((gs & 2) = 2) AND ((gs & 4) = 4))  Rows
Removedby Filter: 8750000Planning time: 0.119 msExecution time: 3415.839 ms
 
(5 rows)

As we can see estimates differs significally from the actual records count -
only three clauses are reducing estimated number of records from 10000000 to
2.

I noticed that each additional clause reduces the number about 200 times and
define DEFAULT_NUM_DISTINCT is responsible for this behaviur.

I think that this variable should be lower or maybe estimation using
DEFAULT_NUM_DISTTINCT should be done once per table.

Artur Zajac




pgsql-performance by date:

From: "Henrik Cednert (Filmlance)"
Date:
Subject: Re: pg_dump 3 times as slow after 8.4 -> 9.5 upgrade
From: Patrick KUI-LI
Date:
Subject: Re: pg_dump 3 times as slow after 8.4 -> 9.5 upgrade