Optimizer confusion? - Mailing list pgsql-general

From Philip Warner
Subject Optimizer confusion?
Date
Msg-id 3.0.5.32.20000812152726.01f51210@mail.rhyme.com.au
Whole thread Raw
Responses Re: [HACKERS] Optimizer confusion?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I have a table for which the SEQSCAN and INDEXSCAN estimates are the same
up to a point, after which the SEQSCAN estimates remain fixed, and the
indexscan estimates continue to grow. However, the actual speed of the
index scan is superior for a much greater period than the optimizer predicts.

The database has a table 'ping' with various fields including a 'pingtime
timestamp'; it also has a btree indexe on the date and has been 'vacuum
analyze'-ed. There are about 200000 rows and the data is evenly distributed
in 5 minute intervals.

These are the results from 'explain':

------ 1 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'1-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..4.28 rows=1 width=52)
------

This seems fine, even if the query is bogus.


------ 2 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'2-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..1679.29 rows=561 width=52)
------

Also looks OK.


------ 3 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'3-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..3091.18 rows=1123 width=52)
------

This seems OK; the estimate is roughly double the previous, which is to be
expected, I think.


------- 5 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'5-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..5386.70 rows=2245 width=52)
------

Again. this is OK, although I am a little surprised at the continuing
non-linearity of the estimates.

Now it starts getting very strange:

------- 5+a bit ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'5-aug-1999 20:25';
NOTICE:  QUERY PLAN:

Seq Scan on ping  (cost=0.00..6208.68 rows=2723 width=52)
-------

OK so far, but look at the following (the costs are the same):

------- 3 Months ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'1-nov-1999';
NOTICE:  QUERY PLAN:

Seq Scan on ping  (cost=0.00..6208.68 rows=51623 width=52)
-------

and

------- 5 + a YEAR ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'5-aug-2000 20:25';
NOTICE:  QUERY PLAN:

Seq Scan on ping  (cost=0.00..6208.68 rows=208184 width=52)
------


Now what is also strange, is if I set ENABLE_SEQSCAN=OFF, then the
estimates up to '5+a bit' are the *same*, but the running time is
substantially better for index scan. In fact the running time is better for
index scans up to an interval of about three months. I presume there is
something wrong with the selectivify estimates for the index.

I really don't want to have the code call 'SET ENABLE_SEQSCAN=OFF/ON'
around this statement, since for a longer period, I do want a sequential
scan. And building my own 'query optimizer' which says 'if time diff > 3
months, then enable seqscan' seems like a very bad idea.

I would be interested to know (a) if there is any way I can influence the
optimizer choice when it considers using the index in question, and (b) if
the fixed seqscan cost estimate is a bug.


FWIW, the output of a 3 month period with ENABLE_SEQSCAN=OFF is:

-----
uptime=# set enable_seqscan=off;
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'1-nov-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..27661.01 rows=51623 width=52)
-----

Any help, explanation, etc would be appreciated.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: problem with float8 input format
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Optimizer confusion?