Thread: Choice of bitmap scan over index scan
Hi,
Part of a larger problem, I'm trying to optimize a rather simple query which is basically:
SELECT * FROM table WHERE indexed_column > ... ORDER BY indexed_column DESC;
(see attachment for all details: table definition, query, query plans)
For small ranges it will choose an index scan which is very good. For somewhat larger ranges (not very large yet) it will switch to a bitmap scan + sorting. Pgsql probably thinks that the larger the range, the better a bitmap scan is because it reads more effectively. However, in my case, the larger the query, the worse bitmap+sort performs compared to index scan:
Small range (5K rows): 5.4 msec (b+s) vs 3.3 msec (i) -- performance penalty of ~50%
Large range (1.5M rows): 6400 sec (b+s) vs 2100 msec (i) -- performance penalty of ~200%
How can I make pgsql realize that it should always pick the index scan?
Thanks!
Kind regards,
Mathieu
Part of a larger problem, I'm trying to optimize a rather simple query which is basically:
SELECT * FROM table WHERE indexed_column > ... ORDER BY indexed_column DESC;
(see attachment for all details: table definition, query, query plans)
For small ranges it will choose an index scan which is very good. For somewhat larger ranges (not very large yet) it will switch to a bitmap scan + sorting. Pgsql probably thinks that the larger the range, the better a bitmap scan is because it reads more effectively. However, in my case, the larger the query, the worse bitmap+sort performs compared to index scan:
Small range (5K rows): 5.4 msec (b+s) vs 3.3 msec (i) -- performance penalty of ~50%
Large range (1.5M rows): 6400 sec (b+s) vs 2100 msec (i) -- performance penalty of ~200%
How can I make pgsql realize that it should always pick the index scan?
Thanks!
Kind regards,
Mathieu
Attachment
On 01/10/2010 12:28 PM, Mathieu De Zutter wrote: >Sort (cost=481763.31..485634.61 rows=1548520 width=338) (actual time=5423.628..6286.148 rows=1551923 loops=1) > Sort Key: event_timestamp > Sort Method: external merge Disk: 90488kB > -> Seq Scan on log_event (cost=0.00..79085.92 rows=1548520 width=338) (actual time=0.022..2195.527 rows=1551923 loops=1) > Filter: (event_timestamp > (now() - '1 year'::interval)) > Total runtime: 6407.377 ms Needing to use an external (on-disk) sort method, when taking only 90MB, looks odd. - Jeremy
Mathieu De Zutter wrote: You didn't include any information on your hardware and OS, which can be very important. Also, what version of PostgreSQL is this? SELECT version(); output would be good. > How can I make pgsql realize that it should always pick the index > scan? That would probably be a very bad thing to do, in a general sense. I'm not even convinced yet it's really what you want in this case. > shared_buffers = 24MB > work_mem = 8MB > #effective_cache_size = 128MB Those are probably not optimal; however, without information on your hardware and runtime environment, I can't make any concrete suggestion. > #seq_page_cost = 1.0 > #random_page_cost = 4.0 It's entirely possible that you will get plans more appropriate to your hardware and runtime environment by adjusting these. Again, I lack data to propose anything specific yet. -Kevin
On Sun, Jan 10, 2010 at 4:18 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz
2GB RAM
2x500GB RAID-1
Running Debian/Etch AMD64
PG version: PostgreSQL 8.3.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
Server also runs DNS/Mail/Web/VCS/... for budget reasons.
Database size is 1-2 GB. Also running copies of it for testing/dev.
Kind regards,
Mathieu
Mathieu De Zutter wrote:
You didn't include any information on your hardware and OS, which can
be very important. Also, what version of PostgreSQL is this?
SELECT version(); output would be good.
Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz
2GB RAM
2x500GB RAID-1
Running Debian/Etch AMD64
PG version: PostgreSQL 8.3.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
Server also runs DNS/Mail/Web/VCS/... for budget reasons.
Database size is 1-2 GB. Also running copies of it for testing/dev.
Kind regards,
Mathieu
Mathieu De Zutter wrote: > Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz > 2GB RAM > 2x500GB RAID-1 > Running Debian/Etch AMD64 > PG version: PostgreSQL 8.3.8 on x86_64 > Server also runs DNS/Mail/Web/VCS/... for budget reasons. > Database size is 1-2 GB. Also running copies of it for testing/dev. I would try something like this and see how it goes: shared_buffers = 200MB work_mem = 32MB effective_cache_size = 1.2GB seq_page_cost = 0.1 random_page_cost = 0.1 Some of these settings require a PostgreSQL restart. I may have gone too aggressively low on the page costs, but it seems likely with a "1-2 GB" database and 2 GB RAM, the active portions of your database will be fully cached in spite of the other applications. Besides looking at the impact on this one query, you should keep an eye on all queries after such changes, and post for any which become unacceptably slow. Properly tuning something like this can take a few iterations. -Kevin
I wrote: > work_mem = 32MB Hmmm... With 100 connections and 2 GB RAM, that is probably on the high side, at least if you sometimes use a lot of those connections at the same time to run queries which might use sorts or hashes. It's probably safer to go down to 16MB or even back to where you had it. Sorry I missed that before. -Kevin
On Sun, Jan 10, 2010 at 10:53 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > seq_page_cost = 0.1 > random_page_cost = 0.1 These might not even be low enough. The reason why bitmap index scans win over plain index scans, in general, is because you make one pass through the heap to get all the rows you need instead of bouncing around doing lots of random access. But of course if all the data is in RAM then this argument falls down. If these aren't enough to get the query planner to DTRT, then the OP might want to try lowering them further and seeing how low he has to go to flip the plan... ...Robert
On Sun, Jan 10, 2010 at 9:04 AM, Jeremy Harris <jgh@wizmail.org> wrote: > On 01/10/2010 12:28 PM, Mathieu De Zutter wrote: >> >> Sort (cost=481763.31..485634.61 rows=1548520 width=338) (actual >> time=5423.628..6286.148 rows=1551923 loops=1) >> Sort Key: event_timestamp > > > Sort Method: external merge Disk: 90488kB >> >> -> Seq Scan on log_event (cost=0.00..79085.92 rows=1548520 width=338) >> (actual time=0.022..2195.527 rows=1551923 loops=1) >> Filter: (event_timestamp > (now() - '1 year'::interval)) >> Total runtime: 6407.377 ms > > Needing to use an external (on-disk) sort method, when taking > only 90MB, looks odd. > > - Jeremy Well, you'd need to have work_mem > 90 MB for that not to happen, and very few people can afford to set that setting that high. If you have a query with three or four sorts using 90 MB a piece, and five or ten users running them, you can quickly kill the box... ...Robert
On Mon, Jan 11, 2010 at 3:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
So if this query usually does *not* hit the cache, it will be probably faster if I leave it like that? While testing a query I execute it that much that it's always getting into the cache. However, since other applications run on the same server, I think that infrequently used data gets flushed after a while, even if the DB could fit in the RAM.
--
Mathieu
On Sun, Jan 10, 2010 at 10:53 AM, Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote:> seq_page_cost = 0.1These might not even be low enough. The reason why bitmap index scans
> random_page_cost = 0.1
win over plain index scans, in general, is because you make one pass
through the heap to get all the rows you need instead of bouncing
around doing lots of random access. But of course if all the data is
in RAM then this argument falls down.
If these aren't enough to get the query planner to DTRT, then the OP
might want to try lowering them further and seeing how low he has to
go to flip the plan...
So if this query usually does *not* hit the cache, it will be probably faster if I leave it like that? While testing a query I execute it that much that it's always getting into the cache. However, since other applications run on the same server, I think that infrequently used data gets flushed after a while, even if the DB could fit in the RAM.
--
Mathieu
On Mon, 11 Jan 2010, Mathieu De Zutter wrote: > > seq_page_cost = 0.1 > > random_page_cost = 0.1 > So if this query usually does *not* hit the cache, it will be probably faster if I leave > it like that? While testing a query I execute it that much that it's always getting into > the cache. However, since other applications run on the same server, I think that > infrequently used data gets flushed after a while, even if the DB could fit in the RAM. Postgres is being conservative. The plan it uses (bitmap index scan) will perform much better than an index scan when the data is not in the cache, by maybe an order of magnitude, depending on your hardware setup. The index scan may perform better at the moment, but the bitmap index scan is safer. Matthew -- Change is inevitable, except from vending machines.
> Postgres is being conservative. The plan it uses (bitmap index scan) > will perform much better than an index scan when the data is not in the > cache, by maybe an order of magnitude, depending on your hardware setup. > > The index scan may perform better at the moment, but the bitmap index > scan is safer. Suppose you make a query that will need to retrieve 5% of the rows in a table... If the table is nicely clustered (ie you want the latest rows in a table where they are always appended at the end with no holes, for instance), bitmap index scan will mark 5% of the pages for reading, and read them sequentially (fast). Plain index scan will also scan the rows more or less sequentially, so it's going to be quite fast too. Now if your table is not clustered at all, or clustered on something which has no correlation to your current query, you may hit the worst case : reading a ramdom sampling of 5% of the pages. Bitmap index scan will sort these prior to reading, so the HDD/OS will do smart things. Plain index scan won't. - worst case for bitmap index scan is a seq scan... slow, but if you have no other choice, it's OK. - worst case for plain index scan is a lot worse since it's a random seekfest. If everything is cached in RAM, there is not much difference (plain index scan can be faster if the bitmap "recheck cond" is slow).
Mathieu De Zutter <mathieu@dezutter.org> wrote: > So if this query usually does *not* hit the cache, it will be > probably faster if I leave it like that? While testing a query I > execute it that much that it's always getting into the cache. > However, since other applications run on the same server, I think > that infrequently used data gets flushed after a while, even if > the DB could fit in the RAM. You definitely were hitting the cache almost exclusively in the EXPLAIN ANALYZE results you sent. If that's not typically what happens, we'd be able to better analyze the situation with an EXPLAIN ANALYZE of a more typical run. That said, if you are doing physical reads, reading backwards on the index is going to degrade pretty quickly if you're using a normal rotating magnetic medium, because the blocks are arranged on the disk in a format to support fast reads in a forward direction. Given that and other factors, the bitmap scan will probably be much faster if you do wind up going to the disk most of the time. On the other hand, there's no reason to lie to the optimizer about how much memory is on the machine. You can't expect it to make sane choices on the basis of misleading assumptions. For starters, try setting effective_cache_size to at least 1GB. That doesn't reserve any space, it just tells the optimizer what it can assume about how much data can be cached, and a large setting will tend to encourage more indexed access. Given that when you focus on one piece of the database, the caching effects are pretty dramatic, you really should reduce random_page_cost to 2, even with the in-and-out-of-cache scenario you describe. These aren't "magic bullets" that solve all performance problems, but you would be giving the optimizer a fighting chance at costing plans in a way that the one with the lowest calculated cost is actually the one which will run the fastest. Also, if the pressure on RAM is that high on this machine, it would probably be cost-effective to add another 2GB of RAM, so that you could be a bit more generous in your allocation of RAM to the database. It might make your problems queries an order of magnitude or more faster with very little effort. If a quick google search is any indication, you can get that much RAM for less than $50 these days, if you've got a slot open. -Kevin
On 01/11/2010 02:53 AM, Robert Haas wrote: > On Sun, Jan 10, 2010 at 9:04 AM, Jeremy Harris<jgh@wizmail.org> wrote: >> Needing to use an external (on-disk) sort method, when taking >> only 90MB, looks odd. [...] > Well, you'd need to have work_mem> 90 MB for that not to happen, and > very few people can afford to set that setting that high. If you have > a query with three or four sorts using 90 MB a piece, and five or ten > users running them, you can quickly kill the box... Oh. That's, um, a real pity given the cost of going external. Any hope of a more dynamic allocation of memory resource in the future? Within a single query plan, the number of sorts is known; how about a sort-mem limit per query rather than per sort (as a first step)? Cheers, Jeremy
On Mon, Jan 11, 2010 at 2:41 PM, Jeremy Harris <jgh@wizmail.org> wrote: > On 01/11/2010 02:53 AM, Robert Haas wrote: >> >> On Sun, Jan 10, 2010 at 9:04 AM, Jeremy Harris<jgh@wizmail.org> wrote: >>> >>> Needing to use an external (on-disk) sort method, when taking >>> only 90MB, looks odd. > > [...] >> >> Well, you'd need to have work_mem> 90 MB for that not to happen, and >> very few people can afford to set that setting that high. If you have >> a query with three or four sorts using 90 MB a piece, and five or ten >> users running them, you can quickly kill the box... > > Oh. That's, um, a real pity given the cost of going external. Any hope > of a more dynamic allocation of memory resource in the future? > Within a single query plan, the number of sorts is known; how about > a sort-mem limit per query rather than per sort (as a first step)? Unfortunately, it's not the case that the number of sorts is known - the amount of memory available to the sort affects its cost, so if we knew we were going to have more or less memory it would affect whether we chose the plan involving the sort in the first place. My previous musings on this topic are here: http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php ...Robert