Thread: why is bitmap index chosen for this query?
Could someone explain the results of the following? This is with postgres 8.1.2 on a database that was just vacuum-verbose-analyzed. I have packets_i4 index which I am expecting to be used with this query but as you can see, I have have to convince its usage by turning off other scans. The total runtime is pretty drastic when the index is not chosen. When using a cursor, the query using the index is the only one that provides immediate results. Also, what is Recheck Cond?
adbs_db=# \d packets
Table "public.packets"
Column | Type | Modifiers
-------------------------+------------------------+--------------------
system_time_secs | integer | not null
system_time_subsecs | integer | not null
spacecraft_time_secs | integer | not null
spacecraft_time_subsecs | integer | not null
mnemonic | character varying(64) |
mnemonic_id | integer | not null
data_length | integer | not null
data | bytea | not null
volume_label | character varying(128) | not null
tlm_version_name | character varying(32) | not null
environment_name | character varying(32) | not null
quality | integer | not null default 0
Indexes:
"packets_i1" btree (volume_label)
"packets_i4" btree (environment_name, system_time_secs, system_time_subsecs, mnemonic)
"packets_i5" btree (environment_name, spacecraft_time_secs, spacecraft_time_subsecs, mnemonic)
Table "public.packets"
Column | Type | Modifiers
-------------------------+------------------------+--------------------
system_time_secs | integer | not null
system_time_subsecs | integer | not null
spacecraft_time_secs | integer | not null
spacecraft_time_subsecs | integer | not null
mnemonic | character varying(64) |
mnemonic_id | integer | not null
data_length | integer | not null
data | bytea | not null
volume_label | character varying(128) | not null
tlm_version_name | character varying(32) | not null
environment_name | character varying(32) | not null
quality | integer | not null default 0
Indexes:
"packets_i1" btree (volume_label)
"packets_i4" btree (environment_name, system_time_secs, system_time_subsecs, mnemonic)
"packets_i5" btree (environment_name, spacecraft_time_secs, spacecraft_time_subsecs, mnemonic)
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label
from packets where environment_name='PASITCTX01'
and system_time_secs>=1132272000 and system_time_secs<=1143244800;
and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on packets (cost=247201.41..2838497.72 rows=12472989 width=47) (actual time=573856.344..771866.516 rows=13365371 loops=1)
Recheck Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
-> Bitmap Index Scan on packets_i4 (cost=0.00..247201.41 rows=12472989 width=0) (actual time=573484.199..573484.199 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 777208.041 ms
(5 rows)
adbs_db=# set enable_bitmapscan to off;
SET
SET
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label
from packets where environment_name='PASITCTX01'
and system_time_secs>=1132272000 and system_time_secs<=1143244800;
and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on packets (cost=0.00..3045957.30 rows=12472989 width=47) (actual time=58539.693..493056.015 rows=13365371 loops=1)
Filter: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 498620.963 ms
(3 rows)
adbs_db=# set enable_seqscan to off;
SET
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label
from packets where environment_name='PASITCTX01'
and system_time_secs>=1132272000 and system_time_secs<=1143244800;
and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using packets_i4 on packets (cost=0.00..19908567.85 rows=12472989 width=47) (actual time=47.691..206028.754 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 211644.843 ms
(3 rows)
------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using packets_i4 on packets (cost=0.00..19908567.85 rows=12472989 width=47) (actual time=47.691..206028.754 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 211644.843 ms
(3 rows)
Blab-away for as little as 1¢/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice.
On Thu, May 18, 2006 at 08:52:04AM -0700, Stephen Byers wrote: > Could someone explain the results of the following? It sounds like PostgreSQL badly overestimates the cost of the index scan. Does the table perchance fit completely into memory, without effective_cache_size indicating that? > Also, what is Recheck Cond? The bitmap index scan will by default allocate one bit per tuple. If it can't hold a complete bitmap of every tuple in memory, it will fall back to allocating one bit per (8 kB) page, since it will have to read the entire page anyhow, and the most dramatic cost is the seek. However, in the latter case, it will also get a few extra records that don't match the original clause, so it will have to recheck the condition ("Recheck Cond") before outputting the tuples to the parent node. /* Steinar */ -- Homepage: http://www.sesse.net/
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:
Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2¢/min with Yahoo! Messenger with Voice.
It sounds like PostgreSQL badly overestimates the cost of the index scan.
Does the table perchance fit completely into memory, without
effective_cache_size indicating that?
Don't know the exact way to answer your question, but my initial instinct is "no way."
select pg_relation_size('packets');
pg_relation_size
------------------
19440115712
------------------
19440115712
19GB. So it's a big table. The query I submitted earlier returns about 13M rows and the table currently has about 38M rows.
Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2¢/min with Yahoo! Messenger with Voice.
On Thu, May 18, 2006 at 09:41:23AM -0700, Stephen Byers wrote: > Does the table perchance fit completely into memory, without > effective_cache_size indicating that? > > Don't know the exact way to answer your question, but my initial instinct is "no way." What about the working set? Have you tried running the queries multiple times in a row to see if the results change? It might be that your initial bitmap scan puts all the relevant bits into cache, which will skew the results. /* Steinar */ -- Homepage: http://www.sesse.net/
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes: > What about the working set? Have you tried running the queries multiple times > in a row to see if the results change? It might be that your initial bitmap > scan puts all the relevant bits into cache, which will skew the results. If the examples were done in the order shown, the seqscan ought to have pretty well blown out the cache ... but I concur that it'd be interesting to check whether repeated executions of the same plan show markedly different times. Also, is the index order closely correlated to the actual physical table order? What is work_mem set to, and does increasing it substantially make the bitmap scan work any better? Considering that the query is fetching about half of the table, I'd have thought that the planner was correct to estimate that bitmap or seqscan ought to win. For the plain indexscan to win, the order correlation must be quite strong, and I'm also guessing that the bitmap scan must have run into some substantial trouble (like discarding a lot of info because of lack of work_mem). IIRC, the planner doesn't currently try to model the effects of a bitmap scan going into lossy mode, which is something it probably should try to account for. regards, tom lane
I repeated explain analyze on the query 5 times and it came up with the same plan.
You asked about index order and physical table order. In general the index order is indeed close to the same order as the physical table order. However, this query is likely an exception. The data is actually from a backup server that has filled a hole for some of the time range that I'm specifying in my query.
Work_mem was set to 10240. After your suggestion, I bumped it to 102400 and it looks like it did significantly impact performance.
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label from packets
adbs_db-# where environment_name='PASITCTX01'
adbs_db-# and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on packets (cost=247205.64..2838556.55 rows=12473252 width=47) (actual time=32118.943..187075.742 rows=13365371 loops=1)
Recheck Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
-> Bitmap Index Scan on packets_i4 (cost=0.00..247205.64 rows=12473252 width=0) (actual time=30370.789..30370.789 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 191995.431 ms
(5 rows)
adbs_db-# where environment_name='PASITCTX01'
adbs_db-# and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on packets (cost=247205.64..2838556.55 rows=12473252 width=47) (actual time=32118.943..187075.742 rows=13365371 loops=1)
Recheck Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
-> Bitmap Index Scan on packets_i4 (cost=0.00..247205.64 rows=12473252 width=0) (actual time=30370.789..30370.789 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 191995.431 ms
(5 rows)
adbs_db=#
adbs_db=#
adbs_db=# adbs_db=# set enable_bitmapscan to off;
SET
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label from packets
adbs_db-# where environment_name='PASITCTX01'
adbs_db-# and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on packets (cost=0.00..3046021.47 rows=12473252 width=47) (actual time=56616.457..475839.789 rows=13365371 loops=1)
Filter: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 481228.409 ms
(3 rows)
adbs_db=#
adbs_db=# adbs_db=# set enable_bitmapscan to off;
SET
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label from packets
adbs_db-# where environment_name='PASITCTX01'
adbs_db-# and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on packets (cost=0.00..3046021.47 rows=12473252 width=47) (actual time=56616.457..475839.789 rows=13365371 loops=1)
Filter: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 481228.409 ms
(3 rows)
adbs_db=#
adbs_db=#
adbs_db=# adbs_db=# set enable_seqscan to off;
SET
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label from packets
adbs_db-# where environment_name='PASITCTX01'
adbs_db-# and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using packets_i4 on packets (cost=0.00..19909080.77 rows=12473273 width=47) (actual time=3.511..188273.177 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 194061.497 ms
(3 rows)
adbs_db=#
adbs_db=# adbs_db=# set enable_seqscan to off;
SET
adbs_db=# explain analyze select spacecraft_time_secs,mnemonic,volume_label from packets
adbs_db-# where environment_name='PASITCTX01'
adbs_db-# and system_time_secs>=1132272000 and system_time_secs<=1143244800;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using packets_i4 on packets (cost=0.00..19909080.77 rows=12473273 width=47) (actual time=3.511..188273.177 rows=13365371 loops=1)
Index Cond: (((environment_name)::text = 'PASITCTX01'::text) AND (system_time_secs >= 1132272000) AND (system_time_secs <= 1143244800))
Total runtime: 194061.497 ms
(3 rows)
Wow -- so what does that mean? Do I need to leave my work_mem at 100MB?? I mentioned that my application actually uses a cursor to walk through this data. Even though the bitmap scan technically had the fastest time with explain analyze, it takes a long while (20 seconds) before the results start to come back through the cursor. Conversely, with the index scan, results immediately come back through the cursor method (which is more desirable). Thoughts?
Example:
begin;
declare myCursor cursor for
select spacecraft_time_secs,mnemonic,volume_label from packets
where environment_name='PASITCTX01'
and system_time_secs>=1132272000
and system_time_secs<=1143244800;
fetch 10 from myCursor;
end;
declare myCursor cursor for
select spacecraft_time_secs,mnemonic,volume_label from packets
where environment_name='PASITCTX01'
and system_time_secs>=1132272000
and system_time_secs<=1143244800;
fetch 10 from myCursor;
end;
PS, this is on a Sun Fire V240 with 4GB RAM, Solaris 8.
Thanks,
Steve
Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Steinar H. Gunderson" writes:
> What about the working set? Have you tried running the queries multiple times
> in a row to see if the results change? It might be that your initial bitmap
> scan puts all the relevant bits into cache, which will skew the results.
If the examples were done in the order shown, the seqscan ought to have
pretty well blown out the cache ... but I concur that it'd be
interesting to check whether repeated executions of the same plan show
markedly different times.
Also, is the index order closely correlated to the actual physical
table order?
What is work_mem set to, and does increasing it substantially make the
bitmap scan work any better?
Considering that the query is fetching about half of the table, I'd have
thought that the planner was correct to estimate that bitmap or seqscan
ought to win. For the plain indexscan to win, the order correlation
must be quite strong, and I'm also guessing that the bitmap scan must
have run into some substantial trouble (like discarding a lot of info
because of lack of work_mem).
IIRC, the planner doesn't currently try to model the effects of a bitmap
scan going into lossy mode, which is something it probably should try to
account for.
regards, tom lane
Sneak preview the all-new Yahoo.com. It's not radically different. Just radically better.
On Thu, May 18, 2006 at 12:38:18PM -0700, Stephen Byers wrote: > I repeated explain analyze on the query 5 times and it came up with the same plan. Yes, but did it end up with the same runtime? That's the interesting part -- the plan will almost always be identical between explain analyze runs given that you haven't done anything in between them. > You asked about index order and physical table order. In general the index > order is indeed close to the same order as the physical table order. > However, this query is likely an exception. The data is actually from a > backup server that has filled a hole for some of the time range that I'm > specifying in my query. Well, it still isn't all that far-fetched to believe that the data has lots of correlation (which helps the index scan quite a lot) that the planner isn't able to pick up. I don't know the details here, so I can't tell you how the correlation for such a query (WHERE a=foo and b between bar and baz) is estimated. Something tells me someone else might, though. :-) /* Steinar */ -- Homepage: http://www.sesse.net/
Yes, here are the runtimes for the repeated query.
Total runtime: 748716.750 ms
Total runtime: 749170.934 ms
Total runtime: 744113.594 ms
Total runtime: 746314.251 ms
Total runtime: 742083.732 ms
Total runtime: 749170.934 ms
Total runtime: 744113.594 ms
Total runtime: 746314.251 ms
Total runtime: 742083.732 ms
Thanks,
Steve
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:
On Thu, May 18, 2006 at 12:38:18PM -0700, Stephen Byers wrote:
> I repeated explain analyze on the query 5 times and it came up with the same plan.
Yes, but did it end up with the same runtime? That's the interesting part --
the plan will almost always be identical between explain analyze runs given
that you haven't done anything in between them.
> You asked about index order and physical table order. In general the index
> order is indeed close to the same order as the physical table order.
> However, this query is likely an exception. The data is actually from a
> backup server that has filled a hole for some of the time range that I'm
> specifying in my query.
Well, it still isn't all that far-fetched to believe that the data has lots
of correlation (which helps the index scan quite a lot) that the planner
isn't able to pick up. I don't know the details here, so I can't tell you how
the correlation for such a query (WHERE a=foo and b between bar and baz) is
estimated. Something tells me someone else might, though. :-)
/* Steinar */
--
Homepage: http://www.sesse.net/
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Sneak preview the all-new Yahoo.com. It's not radically different. Just radically better.
On Thu, May 18, 2006 at 12:53:16PM -0700, Stephen Byers wrote: > Yes, here are the runtimes for the repeated query. > Total runtime: 748716.750 ms > Total runtime: 749170.934 ms > Total runtime: 744113.594 ms > Total runtime: 746314.251 ms > Total runtime: 742083.732 ms With which options enabled? This isn't even close to any of the three times you already posted. /* Steinar */ -- Homepage: http://www.sesse.net/
You may be comparing the values to Tom's suggestion to bump up work_mem. Take a look at the original posting (Total runtime: 777208.041 ms for the bitmap scan)
-Steve
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:
On Thu, May 18, 2006 at 12:53:16PM -0700, Stephen Byers wrote:
> Yes, here are the runtimes for the repeated query.
> Total runtime: 748716.750 ms
> Total runtime: 749170.934 ms
> Total runtime: 744113.594 ms
> Total runtime: 746314.251 ms
> Total runtime: 742083.732 ms
With which options enabled? This isn't even close to any of the three times
you already posted.
/* Steinar */
--
Homepage: http://www.sesse.net/
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.
On Thu, May 18, 2006 at 12:38:18PM -0700, Stephen Byers wrote: > I repeated explain analyze on the query 5 times and it came up with the same plan. > > You asked about index order and physical table order. In general the index order is indeed close to the same order asthe physical table order. However, this query is likely an exception. The data is actually from a backup server thathas filled a hole for some of the time range that I'm specifying in my query. What's SELECT correlation FROM pg_stats WHERE tablename='packets' AND attname='environment_name' show? What's effective_cache_size and random_page_cost set to? Also, out of curiosity, why not just use a timestamp instead of two int's for storing time? > Wow -- so what does that mean? Do I need to leave my work_mem at 100MB?? I mentioned that my application actually usesa cursor to walk through this data. Even though the bitmap scan technically had the fastest time with explain analyze,it takes a long while (20 seconds) before the results start to come back through the cursor. Conversely, with theindex scan, results immediately come back through the cursor method (which is more desirable). Thoughts? Do you really need to use a cursor? It's generally less efficient than doing things with a single SQL statement, depending on what exactly you're doing. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461