Re: Query planner / Analyse statistics bad estimate rows=1 withmaximum statistics 10000 on PostgreSQL 10.2 - Mailing list pgsql-general

From Mark
Subject Re: Query planner / Analyse statistics bad estimate rows=1 withmaximum statistics 10000 on PostgreSQL 10.2
Date
Msg-id CAFh58O8CSuK52JgpDPHWuFZxqCkQSEmH4c2-66yXc+FALJmaOg@mail.gmail.com
Whole thread Raw
In response to Re: Query planner / Analyse statistics bad estimate rows=1 withmaximum statistics 10000 on PostgreSQL 10.2  (Maxim Boguk <maxim.boguk@gmail.com>)
List pgsql-general
Hi Maxim, 

Thanks for your help. 

Jeff has pointed me in the right direction here, it seems that the rows=1 is due to a long running transaction which deletes all of common_student for school_id 36 and then repopulates it. 

I was unaware that the delete inside the transaction would affect the VACUUM ANALYSE. As I said to Jeff I will have to rewrite the application to update the rows if this is the normal behaviour.

Here are the queries: 


db=> explain analyze select * from common_student where school_id = 36;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533 width=384) (actual time=4.852..7.065 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91 rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)


db=> /* At this point I have started a long running transaction that deletes all of common_student for school_id 36  */ ;

db=> analyse verbose common_student(school_id); 
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431 live rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_common_student_sid on common_student  (cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388 loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.098 ms
 Execution time: 2.583 ms
(4 rows)

db=> create table test_table AS SELECT * from common_student;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..3848.89 rows=1381 width=384) (actual time=10.105..22.680 rows=1388 loops=1)
   Filter: (school_id = 36)
   Rows Removed by Filter: 64043
 Planning time: 0.390 ms
 Execution time: 23.767 ms
(5 rows)

db=> drop table test_table;
DROP TABLE
db=> create table test_table AS SELECT * from common_student ORDER BY school_id;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..3850.89 rows=1341 width=382) (actual time=5.674..27.585 rows=1388 loops=1)
   Filter: (school_id = 36)
   Rows Removed by Filter: 64043
 Planning time: 0.264 ms
 Execution time: 28.643 ms
(5 rows)

db=> explain analyze select * from test_table where school_id = 36;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..3850.89 rows=1341 width=382) (actual time=20.848..43.272 rows=1388 loops=1)
   Filter: (school_id = 36)
   Rows Removed by Filter: 64043
 Planning time: 0.068 ms
 Execution time: 44.423 ms
(5 rows)

db=> /* At this point I have killed the long running transaction that deletes all of common_student for school_id 36  */ ;
db=> vacuum analyze common_student; 
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student  (cost=79.17..3357.79 rows=1388 width=383) (actual time=0.088..1.302 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..78.83 rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.327 ms
 Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 15:28 Maxim Boguk <maxim.boguk@gmail.com> wrote:
Hi Mark,

It's look very weird.
Can you try something like this (check that you have enough disk space for second copy of common_student before):

create table test_table AS SELECT * from common_student;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;
create table test_table AS SELECT * from common_student ORDER BY school_id; 
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;

And provide results of both explain analyze queries.



On Sat, Dec 22, 2018 at 3:39 AM Mark <mwchambers@gmail.com> wrote:
Hi All, 

I'm seeing some very slow queries and it looks like the query planner is deciding to do a 'Nested Loop Left Join' 
which is slow.  When I SET enable_nestloop=OFF for testing it does a 'Hash Left Join' which is much faster.

I think  I can see the cause of the problem in the examples below. I can't make sense of the statistics being written
by ANALYSE or the planners row estimates. 

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1 ?     ( I expect it to be higher)
- Why does '1' appear in the most_common_vals when it is actually the least common value.
- Why doesn't 36 appear in the most_common_vals (it is more common than 1)
- Does the analyse output below mean that it only scanned 51538 of 65463 rows in the table? Is school_id 36 just being missed in the sample? (This happens when the analyse is repeated )

Any help with understanding what's happening here would be much appreciated. 

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student  where school_id = 36 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_common_student_sid on common_student  (cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388 loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.087 ms
 Execution time: 2.706 ms
(4 rows)

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct from pg_stats where attname='school_id'  and tablename='common_stude
nt';
   tablename    |  attname  |                                                            most_common_vals                                                            | histogram_bounds | n_distinct 
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
 common_student | school_id | {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1} |                  |         45
(1 row)

db=> select count(distinct(school_id)) from common_student ;
 count 
-------
    55
(1 row)

db=> alter table common_student alter column school_id set statistics 100000;
WARNING:  lowering statistics target to 10000
ALTER TABLE
db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65463 live rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
ANALYZE
db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct from pg_stats where attname='school_id'  and tablename='common_stude
nt';
   tablename    |  attname  |                                                            most_common_vals                                                            | histogram_bounds | n_distinct 
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
 common_student | school_id | {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1} |                  |         45
(1 row)

db=> explain analyse select * from common_student  where school_id = 36 ;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_common_student_sid on common_student  (cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388 loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.334 ms
 Execution time: 6.542 ms
(4 rows)

db=> select school_id, count(*) from common_student  group by school_id order by count(*)  limit 6 ;
 school_id | count 
-----------+-------
         1 |    50
        88 |   161
        53 |   252
        94 |   422
        31 |   434
        68 |   454
(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id = 36  group by school_id ;                    
 school_id | count 
-----------+-------
        36 |  1388
(1 row)

db=> explain analyse select * from common_student  where school_id = 1 ;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student  (cost=4.91..243.76 rows=64 width=385) (actual time=0.128..0.267 rows=50 loops=1)
   Recheck Cond: (school_id = 1)
   Heap Blocks: exact=16
   ->  Bitmap Index Scan on "unique common_student"  (cost=0.00..4.90 rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
         Index Cond: (school_id = 1)
 Planning time: 0.177 ms
 Execution time: 0.414 ms
(7 rows)

db=> select VERSION();
                                   version                                   
-----------------------------------------------------------------------------
 PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2, 64-bit
(1 row)



--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7  985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

pgsql-general by date:

Previous
From: Mark
Date:
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 withmaximum statistics 10000 on PostgreSQL 10.2
Next
From: Ron
Date:
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 withmaximum statistics 10000 on PostgreSQL 10.2