Re: Why won't it index scan? - Mailing list pgsql-general
From | Ed L. |
---|---|
Subject | Re: Why won't it index scan? |
Date | |
Msg-id | 200605171250.14207.pgsql@bluepolka.net Whole thread Raw |
In response to | Re: Why won't it index scan? ("Ed L." <pgsql@bluepolka.net>) |
Responses |
Re: Why won't it index scan?
|
List | pgsql-general |
On Wednesday May 17 2006 11:44 am, Ed L. wrote: > On Wednesday May 17 2006 10:37 am, Ed L. wrote: > > Can someone help me understand why the 8.1.2 query below is > > using a seq scan instead of an index scan? All relevant > > columns appear to be indexed and all tables vacuum analyzed. > > > > > > $ psql -c "explain analyze select * from visit inner join > > patient on patient.key = visit.patient_key where > > nursestation_key = '40';" QUERY PLAN > > ------------------------------------------------------------ > >-- > > ------------------------------------------------------------ > >--- ---------------------- Merge Join > > (cost=27724.37..28457.01 rows=4956 width=421) (actual > > time=1819.993..2004.802 rows=6 loops=1) Merge Cond: > > ("outer".patient_key = "inner"."key") -> Sort > > (cost=11859.31..11871.70 rows=4956 width=209) (actual > > time=0.416..0.426 rows=6 loops=1) Sort Key: > > visit.patient_key > > -> Bitmap Heap Scan on visit > > (cost=69.35..11555.14 rows=4956 width=209) (actual > > time=0.187..0.245 rows=6 loops=1) Recheck Cond: > > (nursestation_key = 40) > > -> Bitmap Index Scan on > > idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 > > width=0) (actual time=0.158..0.158 rows=6 loops=1) Index > > Cond: (nursestation_key = 40) -> Sort > > (cost=15865.05..16194.21 rows=131661 width=212) (actual > > time=1768.501..1856.334 rows=61954 loops=1) Sort Key: > > patient."key" > > -> Seq Scan on patient (cost=0.00..4669.61 > > rows=131661 width=212) (actual time=0.010..355.299 > > rows=131661 loops=1) Total runtime: 2046.323 ms > > (12 rows) > > Increasing statistics target yielded index scan. > > How can I best find optimal statistics target to ensure 100% > index scan? I'm trying to understand what happened here, and I have a theory. There are 389K rows total, and 262K rows with a null indexed value. Their are 15164 non-null rows newer than those null rows. When stats target is set to 50 or less, analyze scans 15,000 rows or less. If it scans the newest rows/pages first, then is it possible it never sees any hint of the 262K null rows, and thus ends up with skewed stats that yield seq scans when idx scan is in order? If stat target is > 50, analyze begins to include non-null rows in stat sample, yielding idx scans. Also, I see the most_common_vals array is not growing linearly with the stats target as the docs seem to suggest. I have 34 unique values, so with stats target >= 34, I'd expect most_common_vals array to have 34 values, but it has 8. Ed
pgsql-general by date: