Re: performance of bitmap scans in nested loop joins - Mailing list pgsql-hackers

From Sergey E. Koposov
Subject Re: performance of bitmap scans in nested loop joins
Date
Msg-id Pine.LNX.4.44.0505052256180.12581-100000@lnfm1.sai.msu.ru
Whole thread Raw
In response to Re: performance of bitmap scans in nested loop joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: performance of bitmap scans in nested loop joins
List pgsql-hackers
On Wed, 4 May 2005, Tom Lane wrote:

> 
> I figured out part of the problem: I had made nodeBitmapIndexscan
> re-open the index on each call, thinking that that would save amrescan
> calls.  But an amrescan is a whole lot cheaper than index open/close,
> so that was a bad tradeoff.
> 
> This seems to account for about half of the slowdown you exhibited.
> I'm not sure where the other half went ... gprof isn't turning up
> any obvious candidates.
> 

Great!!!
Thanks. 

Now I have done again the profiling. Really, now the bitmap index scan is 
faster and take 90 seconds instead of 120 seconds. But it still more than the
sum of two simple index scans (26+38 secs). 

I also tried to help you, Tom, to find the exact location of the remaining 
problem. For that purpose I did the profiling of those queries:

test=# explain analyze select * from q3c,q3c as q3cs where
q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3;

test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE
(q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993);

test=# explain analyze select * from q3c,q3c as q3cs where
(q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3) OR
(q3c.ipix>=q3cs.ipix-1000 AND  q3c.ipix<=q3cs.ipix-993);

And I coadded the "flat profiles" of first two (index scan) queries and 
compared it with the flat profile of bitmap scan:         self seconds     | self seconds 
Function              of TWO index scans | of bitmap scan
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LWLockAcquire                       51.29      46.29
LWLockRelease                       40.29      38.65
_bt_compare                         38.35      36.20
hash_seq_search                     0.00      30.08
hash_search                         26.90      29.32
AllocSetAlloc                       10.73      27.23
btint8cmp                           30.95      24.05
PinBuffer                           23.25      22.56
hash_any                            18.58      20.18
_bt_binsrch                         19.73      19.09
UnpinBuffer                         18.27      17.44
_bt_first                           13.92      16.20
FunctionCall2                       18.23      14.62
_bt_preprocess_keys                 12.01      13.67
_bt_moveright                       13.42      11.87
InstrStopNode                       7.62      11.24
seg_alloc                           0.00      11.13
ReadBuffer                          11.42      10.68
InstrEndLoop                        4.11      10.38
ExecProcNode                        14.60      10.17
fmgr_info_cxt_security              13.87      9.96
_bt_search                          10.41      9.80
AllocSetReset                       9.70      9.47
AllocSetFree                        4.00      7.75
_bt_getroot                         7.54      7.35
LockBuffer                          9.32      7.07
MemoryContextAlloc                  3.22      6.93
ExecMakeFunctionResultNoSets        5.95      6.85
element_alloc                       0.12      6.73
_bt_checkkeys                       8.68      6.07
slot_deform_tuple                   7.71      5.77
ExecReScan                          2.89      5.67
MemoryContextAllocZero              0.00      5.43
ReleaseAndReadBuffer                6.20      5.00
ExecEvalVar                         4.02      4.80
ReleaseBuffer                       5.39      4.28
ExecScan                            4.39      4.11
ExecutorRun                         1.64      4.08
slot_getattr                        4.80      4.00
_bt_relandgetbuf                    4.31      3.70
MemoryContextCreate                 0.00      3.54


The strongest difference is in hash_seq_search (30 seconds) and 
AllocSetAlloc -- 17 seconds. (which contribute a significant part into 
slowness of bitmap scan) (I remind that  in my case bitmap scan is slower 
than 2 index scans by ~ 20-30seconds (34 seconds when postgres was compiled
with profiling support, 25 seconds when postgres was compiled without
profiling support)). 

Since I don't know exactly the normal sequence of function calls in postgres,
I cannot judge whether those functions are true reason of the problem or not,
but probably that information can help.

In any case, again I have put all the full profiling results on the web

http://lnfm1.sai.msu.ru/~math/public_misc/idxscan1.gprof
profiling of
test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE
(q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993);


http://lnfm1.sai.msu.ru/~math/public_misc/_idxscan1.gprof
profiling of
test=# explain analyze select * from q3c,q3c as q3cs where
q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3;


http://lnfm1.sai.msu.ru/~math/public_misc/bitmap1.gprof
profiling of
test=# explain analyze select * from q3c,q3c as q3cs where
(q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3) OR
(q3c.ipix>=q3cs.ipix-1000 AND  q3c.ipix<=q3cs.ipix-993);


Here, this is just the new output of explain analyze (for the postgres
compiled without profiling support)

test=# explain analyze select * from q3c,q3c as q3cs where (q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3) OR
(q3c.ipix>=q3cs.ipix-1000AND q3c.ipix<=q3cs.ipix-993);
               QUERY PLAN                                                                                    
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=5832.03..190281569757.46 rows=1888909037091 width=96) (actual time=0.173..88500.469 rows=3000000 loops=1)
-> Seq Scan on q3c q3cs  (cost=0.00..60928.16 rows=3000016 width=48) (actual time=0.072..3750.724 rows=3000000 loops=1)
->  Bitmap Heap Scan on q3c  (cost=5832.03..43426.73 rows=666670 width=48) (actual time=0.021..0.022 rows=1
loops=3000000)       Recheck Cond: (((q3c.ipix >= ("outer".ipix - 3)) AND (q3c.ipix <= ("outer".ipix + 3))) OR
((q3c.ipix>= ("outer".ipix - 1000)) AND (q3c.ipix <= ("outer".ipix - 993))))        ->  BitmapOr
(cost=5832.03..5832.03rows=666670 width=0) (actual time=0.017..0.017 rows=0 loops=3000000)              ->  Bitmap
IndexScan on ipix_idx  (cost=0.00..2916.02 rows=333335 width=0) (actual time=0.008..0.008 rows=1 loops=3000000)
          Index Cond: ((q3c.ipix >= ("outer".ipix - 3)) AND (q3c.ipix <= ("outer".ipix + 3)))              ->  Bitmap
IndexScan on ipix_idx  (cost=0.00..2916.02 rows=333335 width=0) (actual time=0.006..0.006 rows=0 loops=3000000)
          Index Cond: ((q3c.ipix >= ("outer".ipix - 1000)) AND (q3c.ipix <= ("outer".ipix - 993)))Total runtime:
90241.180ms
 



test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE 
test-# (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993);
QUERY PLAN                                                             
 

-----------------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=0.01..49059519577.21 rows=1000010666695 width=96) (actual time=26391.542..26391.542 rows=0 loops=1)  ->  Seq
Scanon q3c q3cs  (cost=0.00..60928.16 rows=3000016 width=48) (actual time=0.059..3393.616 rows=3000000 loops=1)  ->
IndexScan using ipix_idx on q3c  (cost=0.01..9686.37 rows=333335 width=48) (actual time=0.006..0.006 rows=0
loops=3000000)       Index Cond: ((q3c.ipix >= ("outer".ipix - 1000)) AND (q3c.ipix <= ("outer".ipix - 993)))Total
runtime:26391.623 ms
 
(5 rows)



test=# explain analyze select * from q3c,q3c as q3cs where q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3;
                                             QUERY PLAN                                                             
 

-----------------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=0.01..49059519577.21 rows=1000010666695 width=96) (actual time=0.108..37355.388 rows=3000000 loops=1)  ->
SeqScan on q3c q3cs  (cost=0.00..60928.16 rows=3000016 width=48) (actual time=0.039..3403.276 rows=3000000 loops=1)  ->
Index Scan using ipix_idx on q3c  (cost=0.01..9686.37 rows=333335 width=48) (actual time=0.007..0.008 rows=1
loops=3000000)       Index Cond: ((q3c.ipix >= ("outer".ipix - 3)) AND (q3c.ipix <= ("outer".ipix + 3)))Total runtime:
38588.672ms
 
(5 rows)


With best regards, 
Sergey Koposov




pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: 'kitchen sink' downloads (Was: Re: [pgsql-advocacy] Increased company involvement)
Next
From: Tom Lane
Date:
Subject: Re: 'infinity' in GiST index