Re: Why won't it index scan? - Mailing list pgsql-general

From Ed L.
Subject Re: Why won't it index scan?
Date
Msg-id 200605171250.14207.pgsql@bluepolka.net
Whole thread Raw
In response to Re: Why won't it index scan?  ("Ed L." <pgsql@bluepolka.net>)
Responses Re: Why won't it index scan?
List pgsql-general
On Wednesday May 17 2006 11:44 am, Ed L. wrote:
> On Wednesday May 17 2006 10:37 am, Ed L. wrote:
> > Can someone help me understand why the 8.1.2 query below is
> > using a seq scan instead of an index scan?  All relevant
> > columns appear to be indexed and all tables vacuum analyzed.
> >
> >
> > $ psql -c "explain analyze select * from visit inner join
> > patient on patient.key = visit.patient_key where
> > nursestation_key = '40';" QUERY PLAN
> > ------------------------------------------------------------
> >--
> > ------------------------------------------------------------
> >--- ---------------------- Merge Join
> > (cost=27724.37..28457.01 rows=4956 width=421) (actual
> > time=1819.993..2004.802 rows=6 loops=1) Merge Cond:
> > ("outer".patient_key = "inner"."key") ->  Sort
> > (cost=11859.31..11871.70 rows=4956 width=209) (actual
> > time=0.416..0.426 rows=6 loops=1) Sort Key:
> > visit.patient_key
> >          ->  Bitmap Heap Scan on visit
> > (cost=69.35..11555.14 rows=4956 width=209) (actual
> > time=0.187..0.245 rows=6 loops=1) Recheck Cond:
> > (nursestation_key = 40)
> >                ->  Bitmap Index Scan on
> > idx_visit_nursestation_key  (cost=0.00..69.35 rows=4956
> > width=0) (actual time=0.158..0.158 rows=6 loops=1) Index
> > Cond: (nursestation_key = 40) ->  Sort
> > (cost=15865.05..16194.21 rows=131661 width=212) (actual
> > time=1768.501..1856.334 rows=61954 loops=1) Sort Key:
> > patient."key"
> >          ->  Seq Scan on patient  (cost=0.00..4669.61
> > rows=131661 width=212) (actual time=0.010..355.299
> > rows=131661 loops=1) Total runtime: 2046.323 ms
> > (12 rows)
>
> Increasing statistics target yielded index scan.
>
> How can I best find optimal statistics target to ensure 100%
> index scan?

I'm trying to understand what happened here, and I have a theory.
There are 389K rows total, and 262K rows with a null indexed
value.  Their are 15164 non-null rows newer than those null
rows.  When stats target is set to 50 or less, analyze scans
15,000 rows or less.  If it scans the newest rows/pages first,
then is it possible it never sees any hint of the 262K null
rows, and thus ends up with skewed stats that yield seq scans
when idx scan is in order?  If stat target is > 50, analyze
begins to include non-null rows in stat sample, yielding idx
scans.

Also, I see the most_common_vals array is not growing linearly
with the stats target as the docs seem to suggest.  I have 34
unique values, so with stats target >= 34, I'd expect
most_common_vals array to have 34 values, but it has 8.

Ed

pgsql-general by date:

Previous
From: "Ed L."
Date:
Subject: Re: Why won't it index scan?
Next
From: Tom Lane
Date:
Subject: Re: Why won't it index scan?