two index bitmap scan of a big table & hash_seq_search - Mailing list pgsql-hackers
From | Sergey E. Koposov |
---|---|
Subject | two index bitmap scan of a big table & hash_seq_search |
Date | |
Msg-id | alpine.LRH.2.00.1108191956360.23450@lnfm1.sai.msu.ru Whole thread Raw |
Responses |
Re: two index bitmap scan of a big table & hash_seq_search
|
List | pgsql-hackers |
Hello Hackers, I've recently noticed a particularly strange behaviour of one of my simple queries: SELECT ra,dec FROM sdssdr7.photoobjall WHERE ra BETWEEN 175 and 190 AND dec BETWEEN 17 And 23 ; The table is quite large (1.1Tb, 10^9 rows, and has Btree indexes on "ra","dec" columns). The plan is a combination of two bitmap scans: wsdb=# explain SELECT ra,dec FROM sdssdr7.photoobjall WHERE ra BETWEEN 175 and 190 AND dec BETWEEN 17 And 23 ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- BitmapHeap Scan on photoobjall (cost=1854918.62..13401835.50 rows=3266290 width=16) Recheck Cond: ((ra >= 175::doubleprecision) AND (ra <= 190::double precision) AND ("dec" >= 17::double precision) AND ("dec" <= 23::double precision)) -> BitmapAnd (cost=1854918.62..1854918.62 rows=3266290 width=0) -> Bitmap Index Scan on sdssdr7_photoobjall_ra_idx (cost=0.00..607217.21 rows=30530306 width=0) Index Cond: ((ra >= 175::double precision)AND (ra <= 190::double precision)) -> Bitmap Index Scan on sdssdr7_photoobjall_dec_idx (cost=0.00..1246068.01rows=62654186 width=0) Index Cond: (("dec" >= 17::double precision) AND ("dec" <= 23::doubleprecision)) But the funny thing I noticed is that the query after running a certain amount of time doing I/O, starts to use 100%CPU and spend 99% the time in hash_seq_search. Here is the oprofile of PG during that period: -------- CPU: Intel Core/i7, speed 2.268e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 303404 99.3562 hash_seq_search 1163 0.3808 tbm_lossify 639 0.2093 hash_search_with_hash_value .... -------- During that it very rarely tries to actually do any I/O. Every second or so it does issue an read of the main relation(not the index). Here is how strace looks like during that time: read(455, "\0\0\0\0\0\0\0\0\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\200\206 \0\340\237 \0\320"..., 8192) = 8192 ... repeated 100 or more times(issued every second or so) .. lseek(455, 603635712, SEEK_SET) = 603635712 read(455, "\0\0\0\0\0\0\0\0\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\200\206 \0\340\237 \0\320"..., 8192) = 8192 ... repeated a lot of times... ...etc... So it seems that it actually reads the same thing all over again and again. Another funny thing is that I have the exactly the same table, but with significantly smaller number of columns (so it occupies 180 gb instead) but has the same indexes. Also all the rows in the table are exactly in the same order (both tables were clustered using the same thing). For that table the query succeeds without problems in 20 seconds or so: wsdb=> explain analyze SELECT ra,dec FROM sdssdr7.phototag WHERE ra BETWEEN 175 and 190 AND dec BETWEEN 17 And 23 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- BitmapHeap Scan on phototag (cost=1824731.39..10750473.21 rows=3204629 width=16) (actual time=21146.520..23136.512 rows=2973205loops=1) Recheck Cond: ((ra >= 175::double precision) AND (ra <= 190::double precision) AND ("dec" >= 17::doubleprecision) AND ("dec" <= 23::double precision)) -> BitmapAnd (cost=1824731.39..1824731.39 rows=3204629 width=0)(actual time=21093.906..21093.906 rows=0 loops=1) -> Bitmap Index Scan on sdssdr7_phototag2_ra_idx (cost=0.00..613910.25rows=30866810 width=0) (actual time=6502.589..6502.589 rows=30418322 loops=1) Index Cond:((ra >= 175::double precision) AND (ra <= 190::double precision)) -> Bitmap Index Scan on sdssdr7_phototag2_dec_idx (cost=0.00..1209218.57 rows=60801242 width=0) (actual time=14201.455..14201.455 rows=63031352 loops=1) Index Cond: (("dec" >= 17::double precision) AND ("dec" <= 23::double precision)) Total runtime: 23295.384ms (8 rows) While for "wider" table it is still running the query for more than an hour with 100%CPU an almost no I/O Additional info: PG version 8.4.8, OS: debian 5.0 hardware: 2xXeon E5520, 12GB RAM, RAID50 with BBU The tables are completely static(e.g. no write activity on them), and have been vacuum analyzed. The system is not busy at all (e.g. not much if any concurrent queries). The tables have large number of columns of real/int/bigint/double precision time. Some config parameters: effective_cache_size | 6GB work_mem | 1GB effective_io_concurrency | 0 shared_buffers | 4GB Any ideas what can be wrong? Any info I can provide ? Thanks in advance, Sergey ******************************************************************* Sergey E. Koposov, PhD Institute for Astronomy, Cambridge UK /Sternberg Astronomical Institute Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
pgsql-hackers by date: