two index bitmap scan of a big table & hash_seq_search - Mailing list pgsql-hackers

From Sergey E. Koposov
Subject two index bitmap scan of a big table & hash_seq_search
Date
Msg-id alpine.LRH.2.00.1108191956360.23450@lnfm1.sai.msu.ru
Whole thread Raw
Responses Re: two index bitmap scan of a big table & hash_seq_search
List pgsql-hackers
Hello Hackers,

I've recently noticed a particularly strange behaviour of one of my simple 
queries:
SELECT ra,dec FROM sdssdr7.photoobjall WHERE ra BETWEEN 175 and 190 AND   dec BETWEEN 17 And 23 ;
The table is quite large (1.1Tb, 10^9 rows, and has Btree indexes on 
"ra","dec" columns).
The plan is a combination of two bitmap scans:
wsdb=# explain SELECT ra,dec FROM sdssdr7.photoobjall WHERE ra BETWEEN 175 and 190 AND dec BETWEEN 17 And 23 ;
                                                             QUERY PLAN
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------
BitmapHeap Scan on photoobjall  (cost=1854918.62..13401835.50 rows=3266290 width=16)   Recheck Cond: ((ra >=
175::doubleprecision) AND (ra <= 190::double precision) AND ("dec" >= 17::double precision) AND ("dec" <= 23::double
precision))  ->  BitmapAnd  (cost=1854918.62..1854918.62 rows=3266290 width=0)         ->  Bitmap Index Scan on
sdssdr7_photoobjall_ra_idx (cost=0.00..607217.21 rows=30530306 width=0)               Index Cond: ((ra >= 175::double
precision)AND (ra <= 190::double precision))         ->  Bitmap Index Scan on sdssdr7_photoobjall_dec_idx
(cost=0.00..1246068.01rows=62654186 width=0)               Index Cond: (("dec" >= 17::double precision) AND ("dec" <=
23::doubleprecision))
 

But the funny thing I noticed is that the query after running a certain 
amount of time doing I/O, starts to use 100%CPU and spend 99% the time in 
hash_seq_search. Here is the oprofile of PG during that period:
--------
CPU: Intel Core/i7, speed 2.268e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
303404   99.3562  hash_seq_search
1163      0.3808  tbm_lossify
639       0.2093  hash_search_with_hash_value
....
--------

During that it very rarely tries to actually do any I/O. Every second or 
so it does issue an read of the main relation(not the index). Here is 
how strace looks like during that time:
read(455, "\0\0\0\0\0\0\0\0\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\200\206 \0\340\237 \0\320"..., 8192) = 8192
... repeated 100 or more  times(issued every second or so) ..
lseek(455, 603635712, SEEK_SET)         = 603635712
read(455, "\0\0\0\0\0\0\0\0\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\200\206 \0\340\237 \0\320"..., 8192) = 8192
... repeated a lot of times...
...etc...
So it seems that it actually reads the same thing all over again and 
again.

Another funny thing is that I have the exactly the same table, but with 
significantly smaller number of columns (so it occupies 180 gb 
instead) but has the same indexes. Also all the rows in the table are 
exactly in the same order (both tables were clustered using the same 
thing). For that table the query succeeds without problems in 20 seconds 
or so:
wsdb=> explain analyze SELECT ra,dec FROM sdssdr7.phototag WHERE ra BETWEEN 175 and 190 AND dec BETWEEN 17 And 23 ;
                                                                       QUERY PLAN
 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
BitmapHeap Scan on phototag  (cost=1824731.39..10750473.21 rows=3204629 width=16) (actual time=21146.520..23136.512
rows=2973205loops=1)   Recheck Cond: ((ra >= 175::double precision) AND (ra <= 190::double precision) AND ("dec" >=
17::doubleprecision) AND ("dec" <= 23::double precision))   ->  BitmapAnd  (cost=1824731.39..1824731.39 rows=3204629
width=0)(actual time=21093.906..21093.906 rows=0 loops=1)         ->  Bitmap Index Scan on sdssdr7_phototag2_ra_idx
(cost=0.00..613910.25rows=30866810 width=0) (actual time=6502.589..6502.589 rows=30418322 loops=1)               Index
Cond:((ra >= 175::double precision) AND (ra <= 190::double precision))         ->  Bitmap Index Scan on
sdssdr7_phototag2_dec_idx (cost=0.00..1209218.57 rows=60801242 width=0) (actual time=14201.455..14201.455 rows=63031352
loops=1)              Index Cond: (("dec" >= 17::double precision) AND ("dec" <= 23::double precision)) Total runtime:
23295.384ms
 
(8 rows)

While for "wider" table it is still running the query for more than an 
hour  with 100%CPU an almost no I/O

Additional info:
PG version 8.4.8,
OS: debian 5.0
hardware: 2xXeon E5520, 12GB RAM, RAID50 with BBU
The tables are completely static(e.g. no write activity on them), and have 
been vacuum analyzed.
The system is not busy at all (e.g. not much if any concurrent queries).
The tables have large number of columns of real/int/bigint/double 
precision time.

Some config parameters: effective_cache_size            | 6GB work_mem                        | 1GB
effective_io_concurrency       | 0 shared_buffers                  | 4GB
 

Any ideas what can be wrong? Any info I can provide ?

Thanks in advance,    Sergey

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge UK /Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Rethinking sinval callback hook API
Next
From: Heikki Linnakangas
Date:
Subject: Re: the big picture for index-only scans