Home > mailing lists

Re: speedup tidbitmap patch: cache page - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: speedup tidbitmap patch: cache page
Date	December 14, 2014 14:58:16
Msg-id	CAApHDvp_+c1hPRiVfKcT1ty__6q5wnz2cMc_tNyVx6mhbbzSAg@mail.gmail.com Whole thread Raw
In response to	speedup tidbitmap patch: cache page (Teodor Sigaev <teodor@sigaev.ru>)
Responses	Re: speedup tidbitmap patch: cache page (Teodor Sigaev <teodor@sigaev.ru>)
List	pgsql-hackers

Tree view

On 23 October 2014 at 00:52, Teodor Sigaev <teodor@sigaev.ru> wrote:

In specific workload postgres could spend a lot of time in tbm_add_tuples, up to 50% of query time. hash_search call is expensive and called twice for each ItemPointer to insert. Suggested patch tries to cache last PagetableEntry pointer in hope that next ItemPointer will be on the same relation page.

Patch is rather trivial and I don't believe that it could cause performance degradation. But it could increase performance on some workload.

An example:
# create extension btree_gin;
# select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v;
# create index idx on t using gin (i);
# set enable_seqscan = off;

# explain analyze select * from t where i >= 0;
without patch: Execution time: 2427.692 ms
with patch: Execution time: 2058.718 ms

# explain analyze select * from t where i >= 100 and i<= 100;
without patch: Execution time: 524.441 ms
with patch: Execution time: 195.381 ms

This seems like quite a promising performance boost.

I've been having a look at this and I'm wondering about a certain scenario:

In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block, and on the next iteration of the loop we have the same block again, have you benchmarked any caching code to store if tbm_page_is_lossy() returned true for that block on the previous iteration of the loop? This would save from having to call tbm_page_is_lossy() again for the same block. Or are you just expecting that tbm_page_is_lossy() returns true so rarely that you'll end up caching the page most of the time, and gain on skipping both hash lookups on the next loop, since page will be set in this case?

It would be nice to see a comment to explain why it might be a good idea to cache the page lookup. Perhaps something like:

+ /*

+ * We'll cache this page as it's quite likely that on the next loop

+ * we'll be seeing the same page again. This will save from having

+ * to lookup the page in the hashtable again.

+ */

+ page = tbm_get_pageentry(tbm, blk);

I also wondered if there might be a small slowdown in the case where the index only finds 1 matching tuple. So I tried the following:

select v::int4 as i into t1 from generate_series(1, 5000000) as v;

create index t1_idx on t1 using gin (i);

select count(*) from t1 where i >= 0;

In this case tbm_add_tuples is called with ntids == 1, so there's only 1 loop performed per call. I wondered if the page == NULL check would add much overhead the function.

Times are in milliseconds:

Patch Master

2413.183 2339.979

2352.564 2428.025

2369.118 2359.498

2338.442 2350.974

2373.330 2400.942

2383.883 2342.461

2393.804 2359.49

2334.998 2359.884

2428.156 2520.842

2336.978 2356.995

avg. 2372.4456 2381.909 99.6%

med. 2371.224 2359.494 100.5%

It appears that if it does, then it's not very much.

Regards

David Rowley

pgsql-hackers by date:

From: David Rowley
Date: 14 December 2014, 11:03:37
Subject: Confusing comment in tidbitmap.c

From: Mark Cave-Ayland
Date: 14 December 2014, 17:36:04
Subject: Re: Commitfest problems

Re: speedup tidbitmap patch: cache page - Mailing list pgsql-hackers

Previous

Next