Thread: index scan backward plan question

index scan backward plan question

From
Kevin Murphy
Date:
I have a table for which PG 8.3 is guessing wrong about a plan when the
result set gets large.  For these large result sets, it uses an Index
Scan Backward/Filter (slower) instead of Sort/Bitmap Heap Scan/Bitmap
Index Scan (faster).  See below.

I fooled around with various planner variables, but the only thing I
found that worked was:

set enable_indexscan = off;

BTW, without turning index scans off, the largest queries take many
minutes to return (in fact, I've never had the patience to wait for the
result).

Is there some other way I can persuade PG to not ever use the Index Scan
Backward approach?

Thanks,
Kevin Murphy

 Limit  (cost=104804.79..110320.84 rows=25 width=229) (actual
time=1653.686..10381.264 rows=25 loops=1)
   ->  Index Scan Backward using merged_weight_date_idx on merged
(cost=0.00..31295593.98 rows=141839 width=229) (actual
time=3.888..10380.783 rows=500 loops=1)
         Filter: (symbol = 'ERVK6'::text)
 Total runtime: 10381.552 ms


Limit  (cost=278766.87..278766.93 rows=25 width=229) (actual
time=5140.604..5140.639 rows=25 loops=1)
   ->  Sort  (cost=278765.81..279120.41 rows=141839 width=229) (actual
time=5140.098..5140.571 rows=450 loops=1)
         Sort Key: weight, date
         ->  Bitmap Heap Scan on merged  (cost=920.44..245618.72
rows=141839 width=229) (actual time=61.265..354.795 rows=142814 loops=1)
               Recheck Cond: (symbol = 'ERVK6'::text)
               ->  Bitmap Index Scan on merged_symbol_idx
(cost=0.00..920.44 rows=141839 width=0) (actual time=58.846..58.846
rows=142814 loops=1)
                     Index Cond: (symbol = 'ERVK6'::text)
 Total runtime: 5903.179 ms



Re: index scan backward plan question

From
Neil Conway
Date:
On Tue, 2006-03-21 at 16:58 -0500, Kevin Murphy wrote:
> I have a table for which PG 8.3 is guessing wrong about a plan when the
> result set gets large.

There is no such thing as "PG 8.3".

>    ->  Index Scan Backward using merged_weight_date_idx on merged
> (cost=0.00..31295593.98 rows=141839 width=229) (actual
> time=3.888..10380.783 rows=500 loops=1)

The mis-estimation of the result set of the index scan is clearly a
problem -- have you run ANALYZE recently? Try re-running ANALYZE and
retrying EXPLAIN ANALYZE.

Otherwise, please provide the queries that trigger the problem and the
relevant schema definitions.

-Neil



Re: index scan backward plan question

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
>> ->  Index Scan Backward using merged_weight_date_idx on merged
>> (cost=0.00..31295593.98 rows=141839 width=229) (actual
>> time=3.888..10380.783 rows=500 loops=1)

> The mis-estimation of the result set of the index scan is clearly a
> problem -- have you run ANALYZE recently?

I'm not sure the above is a misestimation --- the actual rowcount is
going to be restricted by the LIMIT node (which I'm betting is actually
LIMIT 50 OFFSET 450, though that's strictly a guess from the rowcounts).
It would be useful to know how many rows match the query without any
LIMIT though.

> Otherwise, please provide the queries that trigger the problem and the
> relevant schema definitions.

Yes, giving people only the EXPLAIN output without the queries or table
definitions (in particular, the available indexes) is not a recipe for
getting quality advice.

            regards, tom lane

Re: index scan backward plan question

From
Kevin Murphy
Date:
> Neil Conway wrote:
 > There is no such thing as "PG 8.3".

I meant 8.1.3

 > please provide the queries that trigger the problem and the
 > relevant schema definitions.

Sorry about not posting more details initially.  I was running out the
door and was hasty.

 > Try re-running ANALYZE and retrying EXPLAIN ANALYZE.

It was analyzed (and unmodified since analysis).  I also rebuilt the
table from scratch - same results.

I'm beginning to think my performance problem is mostly related to tight
memory.  The same query that was insanely slow on one machine (1GB RAM)
was reasonably fast on another with more memory (8GB RAM).  Also, when I
freed a bit more memory on the original machine, the performance wall
went away.  PostgreSQL seems to be making OK decisions about when to use
the 'index scan backward' plan, but under tight memory conditions, the
index scan would take forever while the alternative would be fine.

My curiosity has sunk below my need to get real work done, so I don't
require any further response.

If anybody has too much time on his or her hands, you can read what I
wrote (below) before I decided it wasn't worth pursuing this any further.

Thanks,
Kevin Murphy

\d merged
      Table "public.merged"
   Column    |  Type   | Modifiers
--------------+---------+-----------
pmid         | integer |
first_author | text    |
authors      | text    |
title        | text    |
journal      | text    |
date         | date    |
display_date | text    |
volume       | text    |
issue        | text    |
pages        | text    |
symbol       | text    |
weight       | real    |
Indexes:
   "merged_data_idx" btree (date, first_author)
   "merged_first_author_idx" btree (first_author)
   "merged_pmid_idx" btree (pmid)
   "merged_symbol_idx" btree (symbol)
   "merged_weight_date_idx" btree (weight, date)

The query in question searches on 'symbol', orders by the combination of
'weight' and 'date', and constrains by a limit and offset.  The table
has 7,839,274 rows; 16,247 distinct symbol values; 17,279 distinct
weight values; 831,007 distinct weight-date combinations.

Depending on the query search term and offset value, one of two
different plans is chosen.  For search terms matching large numbers of
rows, and with a low enough offset, an 'index scan backward' plan is
chosen, which is sometimes extremely fast (faster than the other plan).
However, for the search terms matching very large numbers of rows, the
'index scan backward' plan can be molasses-slow (as much as 2 and 3
orders of magnitude).  During such slow queries, the CPU is not doing
much: postgres is consuming <=3% of CPU, and nothing else is happening
on the box ('top' is using the most CPU).  Disk I/O is consistent and
low at around 1MB/sec while this is going on.  Top doesn't show any
swapping, although free physical memory is very low, but on the other
hand there is lots of inactive memory (720MB).  During these very long
queries, I see a lock on merged_weight_date_idx (which is 168 MB in
size, BTW) the whole time.  There are no messages in the system or
postgresql logs.  Can it really be taking PG 490 seconds (see below) to
read some fraction of 168MB into memory?


* * * * * * * *

postgresql.conf settings:

shared_buffers = 15000
work_mem = 32768
maintenance_work_mem = 65536
default_statistics_target = 1000

I just noticed that default_statistics_target is set at one extreme.
Since many of the columns have large numbers of values, highly skewed
distributions, and the oddballs are often important, it was my
understanding that this was not unreasonable (and early tests seemed to
bear this out).  I stand ready to be corrected.

* * * * * * * *

OS:

OS X 10.4, 1 GB RAM, dual 1.8 Ghz PPC

* * * * * * * *

The following plan is produced for almost all values of 'symbol' except
the values with the most records.  It is also produced for the
high-frequency symbols if the OFFSET is large enough.  The following is
a search for a less common symbol than the problematic cases ('CFTR'
matches 8920 rows):

EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date,
display_date, volume, issue, pages
FROM merged
WHERE symbol = 'CFTR'
ORDER BY weight DESC, date DESC
OFFSET 0 LIMIT 25;
                                                                 QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=29364.77..29364.83 rows=25 width=229) (actual
time=1351.026..1351.040 rows=25 loops=1)
  ->  Sort  (cost=29364.77..29385.54 rows=8310 width=229) (actual
time=1351.023..1351.028 rows=25 loops=1)
        Sort Key: weight, date
        ->  Bitmap Heap Scan on merged  (cost=55.08..28823.76 rows=8310
width=229) (actual time=60.076..1198.231 rows=8920 loops=1)
              Recheck Cond: (symbol = 'CFTR'::text)
              ->  Bitmap Index Scan on merged_symbol_idx
(cost=0.00..55.08 rows=8310 width=0) (actual time=39.128..39.128
rows=8920 loops=1)
                    Index Cond: (symbol = 'CFTR'::text)
Total runtime: 1357.684 ms

* * * * * * * *

In contrast, low-offset searches for the symbol 'ERVK6' (which matches
142814 rows) produce a different plan, based on 'index scan backward'.
For symbol 'ERVK6', I found that offset values of 1234 and less produce
this plan (with a limit of 25).  Offset 1235 and above produce the
non-"index scan backward" plan.

Here's the plan for symbol = 'ERVK6', OFFSET = 1234:

EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date,
display_date, volume, issue, pages
FROM merged
WHERE symbol = 'ERVK6'
ORDER BY weight DESC, date DESC
OFFSET 1234 LIMIT 25;

QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=272879.06..278407.40 rows=25 width=229) (actual
time=489694.717..489893.358 rows=25 loops=1)
  ->  Index Scan Backward using merged_weight_date_idx on merged
(cost=0.00..31295954.88 rows=141525 width=229) (actual
time=4.974..489891.348 rows=1259 loops=1)
        Filter: (symbol = 'ERVK6'::text)
Total runtime: 489893.584 ms

* * * * * * * *

Here's the plan for the same symbol with OFFSET = 1235:

EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date,
display_date, volume, issue, pages
FROM merged
WHERE symbol = 'ERVK6'
ORDER BY weight DESC, date DESC
OFFSET 1235 LIMIT 25;


QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=278441.55..278441.61 rows=25 width=229) (actual
time=5940.268..5940.299 rows=25 loops=1)
  ->  Sort  (cost=278438.46..278792.27 rows=141525 width=229) (actual
time=5938.818..5940.130 rows=1260 loops=1)
        Sort Key: weight, date
        ->  Bitmap Heap Scan on merged  (cost=919.34..245370.50
rows=141525 width=229) (actual time=71.680..1169.222 rows=142814 loops=1)
              Recheck Cond: (symbol = 'ERVK6'::text)
              ->  Bitmap Index Scan on merged_symbol_idx
(cost=0.00..919.34 rows=141525 width=0) (actual time=69.068..69.068
rows=142814 loops=1)
                    Index Cond: (symbol = 'ERVK6'::text)
Total runtime: 6692.934 ms

The difference doesn't seem to be due to disk caching.  I can bounce
back and forth between the two queries getting consistent relative
timings (slow/fast/slow/fast).