Re: Bug? 8.0 does not use partial index - Mailing list pgsql-hackers

From Palle Girgensohn
Subject Re: Bug? 8.0 does not use partial index
Date
Msg-id C475FCAB18F868ED1CE84261@palle.girgensohn.se
Whole thread Raw
In response to Re: Bug? 8.0 does not use partial index  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bug? 8.0 does not use partial index  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
--On torsdag, januari 13, 2005 18.55.11 -0500 Tom Lane <tgl@sss.pgh.pa.us> 
wrote:

> Palle Girgensohn <girgen@pingpong.net> writes:
>> --On torsdag, januari 13, 2005 18.18.37 -0500 Tom Lane
>> <tgl@sss.pgh.pa.us>  wrote:
>>> So there's something nuts about the statistics in this case.
>
> On looking into it, it's the same old issue of not having column
> correlation statistics.  pg_stats shows that ANALYZE estimated the
> fraction of rows with null group_id as 0.137667 (versus exact value
> of 0.147, not too bad considering I used the default statistics target)
> and it estimated the fraction with this_group_id = 46 as 0.358
> (vs actual 0.369, ditto).  The problem is that it then estimates the
> total selectivity as 0.137667 * 0.358 or a bit under 5%, much too high
> to make an indexscan sensible.  In reality there are only 4 rows with
> this combination of values, but the planner has no way to know that.
>
>> Anything I can do about it?
>
> I thought of a fairly miserable hack, which relies on the fact that 8.0
> does know how to accumulate statistics on functional indexes:
>
> group=# create index fooi on group_data (abs(this_group_id)) WHERE
> group_id IS NULL; CREATE INDEX
> group=# analyze group_data;
> ANALYZE
> group=# explain select * from group_data where group_id is null and
> abs(this_group_id) = 46;                                   QUERY PLAN
> -------------------------------------------------------------------------
> -----  Index Scan using fooi on group_data  (cost=0.00..5302.60 rows=1802
> width=42)    Index Cond: (abs(this_group_id) = 46)
>    Filter: (group_id IS NULL)
> (3 rows)
>
> (The choice of abs() is arbitrary, it just has to be something other
> than the unadorned column.)  In this situation the planner will look at
> the stats for the functional index and discover that in that index there
> aren't many 46's, so it comes out with a more reasonable rowcount
> estimate.

OK, I think I understand. And this is changed between 7.4.x and 8.0?

> We should probably make it accumulate stats on partial indexes even when
> the index columns aren't expressions.  This example shows that useful
> stats can be derived that way.  Too late for 8.0 though...

True, but for next version, perhaps? :)

Trying all this out, I realize that on 7.4.5, I can sometimes get different 
results after `vacuum analyze' vs. a plain `analyze' (again, not exactly 
the same data, and I cannot reproduce this on the other machine with the 
data I sent you). It does not really relate to the question above, but 
perhaps you can explain how come I get different results?

I join with a table person, group_data.item_text has person.userid as 
foreign key constraint:

7.4.5:

pp=# vacuum analyze group_data;
VACUUM
Time: 256353,802 ms
pp=# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;last_name |  userid
-----------+----------Lastname  | u1wmd5nn
(1 row)

Time: 6223,123 ms
pp=# explain analyze
pp-# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;                                                              QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------Unique
(cost=76016.57..77215.19 rows=92632 width=23) (actual 
 
time=7649.496..7649.512 rows=1 loops=1)  ->  Sort  (cost=76016.57..76416.11 rows=159816 width=23) (actual 
time=7649.481..7649.487 rows=1 loops=1)        Sort Key: p.last_name, p.userid        ->  Hash Join
(cost=3003.90..62203.64rows=159816 width=23) 
 
(actual time=7649.254..7649.435 rows=1 loops=1)              Hash Cond: ("outer".item_text = "inner".userid)
 ->  Seq Scan on group_data gd  (cost=0.00..53238.10 
 
rows=160565 width=12) (actual time=431.078..5927.410 rows=5 loops=1)                    Filter: ((this_group_id = 46)
AND(group_id IS NULL))              ->  Hash  (cost=2229.32..2229.32 rows=92632 width=23) 
 
(actual time=1555.797..1555.797 rows=0 loops=1)                    ->  Seq Scan on person p  (cost=0.00..2229.32 
rows=92632 width=23) (actual time=0.093..856.728 rows=92632 loops=1)Total runtime: 7652.771 ms
(10 rows)

Time: 7656,909 ms
pp=# select * from group_data where this_group_id=46 and group_id is null;this_group_id | group_id | item_text |
item_int| link_path
 
---------------+----------+-----------+----------+-----------           46 |          |           |     1223 | :46:
     46 |          |           |     1228 | :46:           46 |          |           |     1328 | :46:           46 |
      |           |     1391 | :46:           46 |          | u1wmd5nn  |          | :46:
 
(5 rows)

Time: 5891,716 ms
pp=# analyze group_data;
ANALYZE
Time: 3210,096 ms
pp=# explain select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;                                                        QUERY PLAN 


-----------------------------------------------------------------------------------------------------------------------------Unique
(cost=111417.47..113761.30 rows=92632 width=23)  ->  Sort  (cost=111417.47..112198.75 rows=312510 width=23)        Sort
Key:p.last_name, p.userid        ->  Hash Join  (cost=3003.90..79231.40 rows=312510 width=23)              Hash Cond:
("outer".item_text= "inner".userid)              ->  Index Scan using group_data_tgid_gidnull_idx on 
 
group_data gd  (cost=0.00..65091.35 rows=275225 width=11)                    Index Cond: (this_group_id = 46)
        Filter: (group_id IS NULL)              ->  Hash  (cost=2229.32..2229.32 rows=92632 width=23)
->  Seq Scan on person p  (cost=0.00..2229.32 
 
rows=92632 width=23)
(10 rows)

Time: 6,647 ms
pp=# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;last_name |  userid
-----------+----------Lastname  | u1wmd5nn
(1 row)

Time: 772,969 ms
pp=# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   person p, group_data gd
pp-#  where
pp-#   p.userid = gd.item_text
pp-#   and gd.this_group_id = 46
pp-#   and gd.group_id is null
pp-# ;last_name |  userid
-----------+----------Lastname  | u1wmd5nn
(1 row)

Time: 720,345 ms


/Palle




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bug? 8.0 does not use partial index
Next
From: Tom Lane
Date:
Subject: Re: Bug? 8.0 does not use partial index