Re: Inconsistant use of index. - Mailing list pgsql-bugs

From Ron Mayer
Subject Re: Inconsistant use of index.
Date
Msg-id Pine.LNX.4.33.0204031108250.24107-100000@ron
Whole thread Raw
In response to Re: Inconsistant use of index.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Inconsistant use of index.
List pgsql-bugs
On Tue, 26 Mar 2002, Tom Lane wrote:
> Ron Mayer <ron@intervideo.com> writes:
> >> I'm particularly interested in the correlation estimate for the dat
> >> column.  (Would you happen to have an idea whether the data has been
> >> inserted more-or-less in dat order?)
>
> > I beleve much of February was loaded first, then we back-filled January,
> > and daily I've been adding March's results.  I don't believe the index-usage
> > stopped when we did the january fill... something happend a few days ago after
> > a pretty routine daily load.
>
> The correlation estimate for dat is pretty low (0.086088), which I think
> reflects the fact that on a macro level your data is not very orderly
> (Feb/Jan/Mar).  However, if it's been loaded on a daily basis then the
> records for any single day will be together --- which is why the indexed
> probe for a single day is so fast.  I don't see any way that we can
> expect the system to model this effect with only one ordering-correlation
> number :-( ... so a proper fix will have to wait for some future release
> when we can think about having more extensive stats about ordering.
>
> In the meantime, it would be interesting to see if re-establishing the
> big-picture order correlation would persuade the planner to do the right
> thing.  Could you do something like this:
>
>     CREATE TABLE foo AS SELECT * FROM fact ORDER BY dat;
>     TRUNCATE TABLE fact;
>     INSERT INTO fact SELECT * FROM foo;
>     DROP TABLE foo;
>     VACUUM ANALYZE fact;
>
> (this should leave you in a state where pg_stats shows correlation 1.0
> for fact.dat) and then see what you get from EXPLAIN?
>
>             regards, tom lane



I did quite a bit more playing with this, and no matter what the
correlation was (1, -0.001), it never seemed to have any effect
at all on the execution plan.

Should it?  With a high correlation the index scan is a much better choice.

   Ron

---
--- create the table with a correlation of "1".
---
logs2=# CREATE TABLE fact_by_dat AS SELECT * FROM fact ORDER BY dat;
SELECT
logs2=# CREATE INDEX fact_by_dat__dat ON fact_by_dat(dat);
CREATE
logs2=# vacuum analyze fact_by_dat;
VACUUM
logs2=# select correlation from pg_stats where tablename='fact_by_dat' and attname='dat';
 correlation
-------------
           1
(1 row)

---
--- Still does the "Seq Scan"
---
logs2=#  explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE:  QUERY PLAN:
Aggregate  (cost=380347.31..380347.31 rows=1 width=0) (actual time=77785.14..77785.14 rows=1 loops=1)
  ->  Seq Scan on fact  (cost=0.00..379816.25 rows=212423 width=0) (actual time=20486.16..77420.05 rows=180295 loops=1)
Total runtime: 77785.28 msec
EXPLAIN


---
--- Disable Seq Scan...  30 times faster.
---
logs2=# set enable_seqscan to off;
SET VARIABLE
logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE:  QUERY PLAN:
Aggregate  (cost=5502.57..5502.57 rows=1 width=0) (actual time=2671.20..2671.20 rows=1 loops=1)
  ->  Index Scan using fact_by_dat__dat on fact_by_dat  (cost=0.00..4974.99 rows=211036 width=0)
(actual time=90.24..2339.64 rows=180295 loops=1)
Total runtime: 2693.87 msec

pgsql-bugs by date:

Previous
From: pgsql-bugs@postgresql.org
Date:
Subject: Bug #628: move works incorrectly on cursors using GiST indexes
Next
From: Thomas Lockhart
Date:
Subject: Re: date function 'age' problem