Thread: Choice of bitmap scan over index scan

Choice of bitmap scan over index scan

From
Mathieu De Zutter
Date:
Hi,

Part of a larger problem, I'm trying to optimize a rather simple query which is basically:
SELECT * FROM table WHERE indexed_column > ... ORDER BY indexed_column DESC;

(see attachment for all details: table definition, query, query plans)

For small ranges it will choose an index scan which is very good. For somewhat larger ranges (not very large yet) it will switch to a bitmap scan + sorting. Pgsql probably thinks that the larger the range, the better a bitmap scan is because it reads more effectively. However, in my case, the larger the query, the worse bitmap+sort performs compared to index scan:

Small range (5K rows): 5.4 msec (b+s) vs 3.3 msec (i) -- performance penalty of ~50%
Large range (1.5M rows): 6400 sec (b+s) vs 2100 msec (i) -- performance penalty of ~200%

How can I make pgsql realize that it should always pick the index scan?

Thanks!

Kind regards,
Mathieu
Attachment

Re: Choice of bitmap scan over index scan

From
Jeremy Harris
Date:
On 01/10/2010 12:28 PM, Mathieu De Zutter wrote:
>Sort  (cost=481763.31..485634.61 rows=1548520 width=338) (actual time=5423.628..6286.148 rows=1551923 loops=1)
>   Sort Key: event_timestamp
  >  Sort Method:  external merge  Disk: 90488kB
>   ->  Seq Scan on log_event  (cost=0.00..79085.92 rows=1548520 width=338) (actual time=0.022..2195.527 rows=1551923
loops=1)
>         Filter: (event_timestamp > (now() - '1 year'::interval))
> Total runtime: 6407.377 ms

Needing to use an external (on-disk) sort method, when taking
only 90MB, looks odd.

- Jeremy

Re: Choice of bitmap scan over index scan

From
"Kevin Grittner"
Date:
Mathieu De Zutter  wrote:

You didn't include any information on your hardware and OS, which can
be very important.  Also, what version of PostgreSQL is this?
SELECT version(); output would be good.

> How can I make pgsql realize that it should always pick the index
> scan?

That would probably be a very bad thing to do, in a general sense.
I'm not even convinced yet it's really what you want in this case.

> shared_buffers = 24MB
> work_mem = 8MB
> #effective_cache_size = 128MB

Those are probably not optimal; however, without information on your
hardware and runtime environment, I can't make any concrete
suggestion.

> #seq_page_cost = 1.0
> #random_page_cost = 4.0

It's entirely possible that you will get plans more appropriate to
your hardware and runtime environment by adjusting these.  Again, I
lack data to propose anything specific yet.

-Kevin



Re: Choice of bitmap scan over index scan

From
Mathieu De Zutter
Date:
On Sun, Jan 10, 2010 at 4:18 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Mathieu De Zutter  wrote:

You didn't include any information on your hardware and OS, which can
be very important.  Also, what version of PostgreSQL is this?
SELECT version(); output would be good.

Intel(R) Core(TM)2 Duo CPU     E7200  @ 2.53GHz
2GB RAM
2x500GB RAID-1

Running Debian/Etch AMD64
PG version: PostgreSQL 8.3.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2

Server also runs DNS/Mail/Web/VCS/... for budget reasons.
Database size is 1-2 GB. Also running copies of it for testing/dev.


Kind regards,
Mathieu

Re: Choice of bitmap scan over index scan

From
"Kevin Grittner"
Date:
Mathieu De Zutter  wrote:

> Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz
> 2GB RAM
> 2x500GB RAID-1

> Running Debian/Etch AMD64
> PG version: PostgreSQL 8.3.8 on x86_64

> Server also runs DNS/Mail/Web/VCS/... for budget reasons.
> Database size is 1-2 GB. Also running copies of it for testing/dev.

I would try something like this and see how it goes:

shared_buffers = 200MB
work_mem = 32MB
effective_cache_size = 1.2GB
seq_page_cost = 0.1
random_page_cost = 0.1

Some of these settings require a PostgreSQL restart.

I may have gone too aggressively low on the page costs, but it seems
likely with a "1-2 GB" database and 2 GB RAM, the active portions of
your database will be fully cached in spite of the other
applications.  Besides looking at the impact on this one query, you
should keep an eye on all queries after such changes, and post for
any which become unacceptably slow.  Properly tuning something like
this can take a few iterations.

-Kevin



Re: Choice of bitmap scan over index scan

From
"Kevin Grittner"
Date:
I wrote:

> work_mem = 32MB

Hmmm...  With 100 connections and 2 GB RAM, that is probably on the
high side, at least if you sometimes use a lot of those connections
at the same time to run queries which might use sorts or hashes. It's
probably safer to go down to 16MB or even back to where you had it.

Sorry I missed that before.

-Kevin



Re: Choice of bitmap scan over index scan

From
Robert Haas
Date:
On Sun, Jan 10, 2010 at 10:53 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> seq_page_cost = 0.1
> random_page_cost = 0.1

These might not even be low enough.  The reason why bitmap index scans
win over plain index scans, in general, is because you make one pass
through the heap to get all the rows you need instead of bouncing
around doing lots of random access.  But of course if all the data is
in RAM then this argument falls down.

If these aren't enough to get the query planner to DTRT, then the OP
might want to try lowering them further and seeing how low he has to
go to flip the plan...

...Robert

Re: Choice of bitmap scan over index scan

From
Robert Haas
Date:
On Sun, Jan 10, 2010 at 9:04 AM, Jeremy Harris <jgh@wizmail.org> wrote:
> On 01/10/2010 12:28 PM, Mathieu De Zutter wrote:
>>
>> Sort  (cost=481763.31..485634.61 rows=1548520 width=338) (actual
>> time=5423.628..6286.148 rows=1551923 loops=1)
>>  Sort Key: event_timestamp
>
>  >  Sort Method:  external merge  Disk: 90488kB
>>
>>  ->  Seq Scan on log_event  (cost=0.00..79085.92 rows=1548520 width=338)
>> (actual time=0.022..2195.527 rows=1551923 loops=1)
>>        Filter: (event_timestamp > (now() - '1 year'::interval))
>> Total runtime: 6407.377 ms
>
> Needing to use an external (on-disk) sort method, when taking
> only 90MB, looks odd.
>
> - Jeremy

Well, you'd need to have work_mem > 90 MB for that not to happen, and
very few people can afford to set that setting that high.  If you have
a query with three or four sorts using 90 MB a piece, and five or ten
users running them, you can quickly kill the box...

...Robert

Re: Choice of bitmap scan over index scan

From
Mathieu De Zutter
Date:
On Mon, Jan 11, 2010 at 3:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jan 10, 2010 at 10:53 AM, Kevin Grittner
> seq_page_cost = 0.1
> random_page_cost = 0.1

These might not even be low enough.  The reason why bitmap index scans
win over plain index scans, in general, is because you make one pass
through the heap to get all the rows you need instead of bouncing
around doing lots of random access.  But of course if all the data is
in RAM then this argument falls down.

If these aren't enough to get the query planner to DTRT, then the OP
might want to try lowering them further and seeing how low he has to
go to flip the plan...
 
So if this query usually does *not* hit the cache, it will be probably faster if I leave it like that? While testing a query I execute it that much that it's always getting into the cache. However, since other applications run on the same server, I think that infrequently used data gets flushed after a while, even if the DB could fit in the RAM.

--
Mathieu

Re: Choice of bitmap scan over index scan

From
Matthew Wakeling
Date:
On Mon, 11 Jan 2010, Mathieu De Zutter wrote:
> > seq_page_cost = 0.1
> > random_page_cost = 0.1

> So if this query usually does *not* hit the cache, it will be probably faster if I leave
> it like that? While testing a query I execute it that much that it's always getting into
> the cache. However, since other applications run on the same server, I think that
> infrequently used data gets flushed after a while, even if the DB could fit in the RAM.

Postgres is being conservative. The plan it uses (bitmap index scan) will
perform much better than an index scan when the data is not in the cache,
by maybe an order of magnitude, depending on your hardware setup.

The index scan may perform better at the moment, but the bitmap index scan
is safer.

Matthew

--
 Change is inevitable, except from vending machines.

Re: Choice of bitmap scan over index scan

From
Pierre Frédéric Caillaud
Date:
> Postgres is being conservative. The plan it uses (bitmap index scan)
> will perform much better than an index scan when the data is not in the
> cache, by maybe an order of magnitude, depending on your hardware setup.
>
> The index scan may perform better at the moment, but the bitmap index
> scan is safer.

    Suppose you make a query that will need to retrieve 5% of the rows in a
table...

    If the table is nicely clustered (ie you want the latest rows in a table
where they are always appended at the end with no holes, for instance),
bitmap index scan will mark 5% of the pages for reading, and read them
sequentially (fast). Plain index scan will also scan the rows more or less
sequentially, so it's going to be quite fast too.

    Now if your table is not clustered at all, or clustered on something
which has no correlation to your current query, you may hit the worst case
: reading a ramdom sampling of 5% of the pages. Bitmap index scan will
sort these prior to reading, so the HDD/OS will do smart things. Plain
index scan won't.

    - worst case for bitmap index scan is a seq scan... slow, but if you have
no other choice, it's OK.
    - worst case for plain index scan is a lot worse since it's a random
seekfest.

    If everything is cached in RAM, there is not much difference (plain index
scan can be faster if the bitmap "recheck cond" is slow).

Re: Choice of bitmap scan over index scan

From
"Kevin Grittner"
Date:
Mathieu De Zutter <mathieu@dezutter.org> wrote:

> So if this query usually does *not* hit the cache, it will be
> probably faster if I leave it like that? While testing a query I
> execute it that much that it's always getting into the cache.
> However, since other applications run on the same server, I think
> that infrequently used data gets flushed after a while, even if
> the DB could fit in the RAM.

You definitely were hitting the cache almost exclusively in the
EXPLAIN ANALYZE results you sent.  If that's not typically what
happens, we'd be able to better analyze the situation with an
EXPLAIN ANALYZE of a more typical run.  That said, if you are doing
physical reads, reading backwards on the index is going to degrade
pretty quickly if you're using a normal rotating magnetic medium,
because the blocks are arranged on the disk in a format to support
fast reads in a forward direction.  Given that and other factors,
the bitmap scan will probably be much faster if you do wind up going
to the disk most of the time.

On the other hand, there's no reason to lie to the optimizer about
how much memory is on the machine.  You can't expect it to make sane
choices on the basis of misleading assumptions.  For starters, try
setting effective_cache_size to at least 1GB.  That doesn't reserve
any space, it just tells the optimizer what it can assume about how
much data can be cached, and a large setting will tend to encourage
more indexed access.

Given that when you focus on one piece of the database, the caching
effects are pretty dramatic, you really should reduce
random_page_cost to 2, even with the in-and-out-of-cache scenario
you describe.  These aren't "magic bullets" that solve all
performance problems, but you would be giving the optimizer a
fighting chance at costing plans in a way that the one with the
lowest calculated cost is actually the one which will run the
fastest.

Also, if the pressure on RAM is that high on this machine, it would
probably be cost-effective to add another 2GB of RAM, so that you
could be a bit more generous in your allocation of RAM to the
database.  It might make your problems queries an order of magnitude
or more faster with very little effort.  If a quick google search is
any indication, you can get that much RAM for less than $50 these
days, if you've got a slot open.

-Kevin

Re: Choice of bitmap scan over index scan

From
Jeremy Harris
Date:
On 01/11/2010 02:53 AM, Robert Haas wrote:
> On Sun, Jan 10, 2010 at 9:04 AM, Jeremy Harris<jgh@wizmail.org>  wrote:
>> Needing to use an external (on-disk) sort method, when taking
>> only 90MB, looks odd.
[...]
> Well, you'd need to have work_mem>  90 MB for that not to happen, and
> very few people can afford to set that setting that high.  If you have
> a query with three or four sorts using 90 MB a piece, and five or ten
> users running them, you can quickly kill the box...

Oh.  That's, um, a real pity given the cost of going external.  Any hope
of a more dynamic allocation of memory resource in the future?
Within a single query plan, the number of sorts is known; how about
a sort-mem limit per query rather than per sort (as a first step)?

Cheers,
     Jeremy



Re: Choice of bitmap scan over index scan

From
Robert Haas
Date:
On Mon, Jan 11, 2010 at 2:41 PM, Jeremy Harris <jgh@wizmail.org> wrote:
> On 01/11/2010 02:53 AM, Robert Haas wrote:
>>
>> On Sun, Jan 10, 2010 at 9:04 AM, Jeremy Harris<jgh@wizmail.org>  wrote:
>>>
>>> Needing to use an external (on-disk) sort method, when taking
>>> only 90MB, looks odd.
>
> [...]
>>
>> Well, you'd need to have work_mem>  90 MB for that not to happen, and
>> very few people can afford to set that setting that high.  If you have
>> a query with three or four sorts using 90 MB a piece, and five or ten
>> users running them, you can quickly kill the box...
>
> Oh.  That's, um, a real pity given the cost of going external.  Any hope
> of a more dynamic allocation of memory resource in the future?
> Within a single query plan, the number of sorts is known; how about
> a sort-mem limit per query rather than per sort (as a first step)?

Unfortunately, it's not the case that the number of sorts is known -
the amount of memory available to the sort affects its cost, so if we
knew we were going to have more or less memory it would affect whether
we chose the plan involving the sort in the first place.

My previous musings on this topic are here:

http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php

...Robert