Re: Bug? 8.0 does not use partial index - Mailing list pgsql-hackers
From | Palle Girgensohn |
---|---|
Subject | Re: Bug? 8.0 does not use partial index |
Date | |
Msg-id | C475FCAB18F868ED1CE84261@palle.girgensohn.se Whole thread Raw |
In response to | Re: Bug? 8.0 does not use partial index (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Bug? 8.0 does not use partial index
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
--On torsdag, januari 13, 2005 18.55.11 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Palle Girgensohn <girgen@pingpong.net> writes: >> --On torsdag, januari 13, 2005 18.18.37 -0500 Tom Lane >> <tgl@sss.pgh.pa.us> wrote: >>> So there's something nuts about the statistics in this case. > > On looking into it, it's the same old issue of not having column > correlation statistics. pg_stats shows that ANALYZE estimated the > fraction of rows with null group_id as 0.137667 (versus exact value > of 0.147, not too bad considering I used the default statistics target) > and it estimated the fraction with this_group_id = 46 as 0.358 > (vs actual 0.369, ditto). The problem is that it then estimates the > total selectivity as 0.137667 * 0.358 or a bit under 5%, much too high > to make an indexscan sensible. In reality there are only 4 rows with > this combination of values, but the planner has no way to know that. > >> Anything I can do about it? > > I thought of a fairly miserable hack, which relies on the fact that 8.0 > does know how to accumulate statistics on functional indexes: > > group=# create index fooi on group_data (abs(this_group_id)) WHERE > group_id IS NULL; CREATE INDEX > group=# analyze group_data; > ANALYZE > group=# explain select * from group_data where group_id is null and > abs(this_group_id) = 46; QUERY PLAN > ------------------------------------------------------------------------- > ----- Index Scan using fooi on group_data (cost=0.00..5302.60 rows=1802 > width=42) Index Cond: (abs(this_group_id) = 46) > Filter: (group_id IS NULL) > (3 rows) > > (The choice of abs() is arbitrary, it just has to be something other > than the unadorned column.) In this situation the planner will look at > the stats for the functional index and discover that in that index there > aren't many 46's, so it comes out with a more reasonable rowcount > estimate. OK, I think I understand. And this is changed between 7.4.x and 8.0? > We should probably make it accumulate stats on partial indexes even when > the index columns aren't expressions. This example shows that useful > stats can be derived that way. Too late for 8.0 though... True, but for next version, perhaps? :) Trying all this out, I realize that on 7.4.5, I can sometimes get different results after `vacuum analyze' vs. a plain `analyze' (again, not exactly the same data, and I cannot reproduce this on the other machine with the data I sent you). It does not really relate to the question above, but perhaps you can explain how come I get different results? I join with a table person, group_data.item_text has person.userid as foreign key constraint: 7.4.5: pp=# vacuum analyze group_data; VACUUM Time: 256353,802 ms pp=# select pp-# distinct p.last_name, pp-# p.userid pp-# from pp-# group_data gd join person p on (p.userid = gd.item_text) pp-# where pp-# gd.this_group_id = 46 pp-# and gd.group_id is null;last_name | userid -----------+----------Lastname | u1wmd5nn (1 row) Time: 6223,123 ms pp=# explain analyze pp-# select pp-# distinct p.last_name, pp-# p.userid pp-# from pp-# group_data gd join person p on (p.userid = gd.item_text) pp-# where pp-# gd.this_group_id = 46 pp-# and gd.group_id is null; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------Unique (cost=76016.57..77215.19 rows=92632 width=23) (actual time=7649.496..7649.512 rows=1 loops=1) -> Sort (cost=76016.57..76416.11 rows=159816 width=23) (actual time=7649.481..7649.487 rows=1 loops=1) Sort Key: p.last_name, p.userid -> Hash Join (cost=3003.90..62203.64rows=159816 width=23) (actual time=7649.254..7649.435 rows=1 loops=1) Hash Cond: ("outer".item_text = "inner".userid) -> Seq Scan on group_data gd (cost=0.00..53238.10 rows=160565 width=12) (actual time=431.078..5927.410 rows=5 loops=1) Filter: ((this_group_id = 46) AND(group_id IS NULL)) -> Hash (cost=2229.32..2229.32 rows=92632 width=23) (actual time=1555.797..1555.797 rows=0 loops=1) -> Seq Scan on person p (cost=0.00..2229.32 rows=92632 width=23) (actual time=0.093..856.728 rows=92632 loops=1)Total runtime: 7652.771 ms (10 rows) Time: 7656,909 ms pp=# select * from group_data where this_group_id=46 and group_id is null;this_group_id | group_id | item_text | item_int| link_path ---------------+----------+-----------+----------+----------- 46 | | | 1223 | :46: 46 | | | 1228 | :46: 46 | | | 1328 | :46: 46 | | | 1391 | :46: 46 | | u1wmd5nn | | :46: (5 rows) Time: 5891,716 ms pp=# analyze group_data; ANALYZE Time: 3210,096 ms pp=# explain select pp-# distinct p.last_name, pp-# p.userid pp-# from pp-# group_data gd join person p on (p.userid = gd.item_text) pp-# where pp-# gd.this_group_id = 46 pp-# and gd.group_id is null; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------Unique (cost=111417.47..113761.30 rows=92632 width=23) -> Sort (cost=111417.47..112198.75 rows=312510 width=23) Sort Key:p.last_name, p.userid -> Hash Join (cost=3003.90..79231.40 rows=312510 width=23) Hash Cond: ("outer".item_text= "inner".userid) -> Index Scan using group_data_tgid_gidnull_idx on group_data gd (cost=0.00..65091.35 rows=275225 width=11) Index Cond: (this_group_id = 46) Filter: (group_id IS NULL) -> Hash (cost=2229.32..2229.32 rows=92632 width=23) -> Seq Scan on person p (cost=0.00..2229.32 rows=92632 width=23) (10 rows) Time: 6,647 ms pp=# select pp-# distinct p.last_name, pp-# p.userid pp-# from pp-# group_data gd join person p on (p.userid = gd.item_text) pp-# where pp-# gd.this_group_id = 46 pp-# and gd.group_id is null;last_name | userid -----------+----------Lastname | u1wmd5nn (1 row) Time: 772,969 ms pp=# select pp-# distinct p.last_name, pp-# p.userid pp-# from pp-# person p, group_data gd pp-# where pp-# p.userid = gd.item_text pp-# and gd.this_group_id = 46 pp-# and gd.group_id is null pp-# ;last_name | userid -----------+----------Lastname | u1wmd5nn (1 row) Time: 720,345 ms /Palle
pgsql-hackers by date: