Re: speedup tidbitmap patch: cache page - Mailing list pgsql-hackers

From David Rowley
Subject Re: speedup tidbitmap patch: cache page
Date
Msg-id CAApHDvp_+c1hPRiVfKcT1ty__6q5wnz2cMc_tNyVx6mhbbzSAg@mail.gmail.com
Whole thread Raw
In response to speedup tidbitmap patch: cache page  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: speedup tidbitmap patch: cache page  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
On 23 October 2014 at 00:52, Teodor Sigaev <teodor@sigaev.ru> wrote:
In specific workload postgres could spend a lot of time in tbm_add_tuples,  up to 50% of query time. hash_search call is expensive and called twice for each ItemPointer to insert. Suggested patch tries to cache last PagetableEntry pointer in hope that next ItemPointer will be on the same relation page.

Patch is rather trivial and I don't believe that it could cause performance degradation. But it could increase performance on some workload.

An example:
# create extension btree_gin;
# select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v;
# create index idx on t using gin (i);
# set enable_seqscan  = off;

# explain analyze select * from t where i >= 0;
without patch: Execution time: 2427.692 ms
with patch:    Execution time: 2058.718 ms

# explain analyze select * from t where i >= 100 and i<= 100;
without patch: Execution time: 524.441 ms
with patch:    Execution time: 195.381 ms


This seems like quite a promising performance boost.

I've been having a look at this and I'm wondering about a certain scenario:

In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block, and on the next iteration of the loop we have the same block again, have you benchmarked any caching code to store if tbm_page_is_lossy() returned true for that block on the previous iteration of the loop? This would save from having to call tbm_page_is_lossy() again for the same block. Or are you just expecting that tbm_page_is_lossy() returns true so rarely that you'll end up caching the page most of the time, and gain on skipping both hash lookups on the next loop, since page will be set in this case?

It would be nice to see a comment to explain why it might be a good idea to cache the page lookup. Perhaps something like:

+                       /*
+                        * We'll cache this page as it's quite likely that on the next loop
+                        * we'll be seeing the same page again. This will save from having
+                        * to lookup the page in the hashtable again.
+                        */
+ page = tbm_get_pageentry(tbm, blk);


I also wondered if there might be a small slowdown in the case where the index only finds 1 matching tuple. So I tried the following:

select v::int4 as i into t1 from generate_series(1, 5000000) as v;
create index t1_idx on t1 using gin (i);

select count(*) from t1 where i >= 0;

In this case tbm_add_tuples is called with ntids == 1, so there's only 1 loop performed per call. I wondered if the page == NULL check would add much overhead the function.

Times are in milliseconds:

Patch Master
2413.183 2339.979
2352.564 2428.025
2369.118 2359.498
2338.442 2350.974
2373.330 2400.942
2383.883 2342.461
2393.804 2359.49
2334.998 2359.884
2428.156 2520.842
2336.978 2356.995
avg. 2372.4456 2381.909 99.6%
med. 2371.224 2359.494 100.5%

It appears that if it does, then it's not very much.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Confusing comment in tidbitmap.c
Next
From: Mark Cave-Ayland
Date:
Subject: Re: Commitfest problems