Re: tbm_lossify causes "unbalanced" hashtables / bitmaps - Mailing list pgsql-hackers

From Andres Freund
Subject Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date
Msg-id 20160923221449.4fvtxhu654k6n3uh@alap3.anarazel.de
Whole thread Raw
In response to Re: tbm_lossify causes "unbalanced" hashtables / bitmaps  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2016-09-23 17:40:13 -0400, Tom Lane wrote:
> My idea of an appropriate fix would be to resume the scan at the same
> point where the last scan stopped, and work circularly around the table
> when necessary.

I've played with that idea, and it does help a great deal. Not that
surprisingly, it's better than starting at a random point (which in turn
is better than starting at one end all the time).


> But I'm not sure there is any really good way to do that
> in the dynahash infrastructure.  Maybe it'd work to keep the iteration
> state around, but I don't remember how well that interacts with other
> insertions/deletions.  What about in your implementation?

It's easy enough to specify a start point. Requires exposing some things
that I don't necessarily want to - that's why I played around with a
random start point - but other than that it's easy to
implement. Internally growing the hashtable would be kind of an issue,
invalidating that point, but a) we're most of the time not growing at
that point anymore b) it'd be quite harmless to start at the wrong
point.


> There's also the point mentioned in the existing comment, that it'd be
> better to go after pages with more bits set first.  Not sure of an
> inexpensive way to do that (ie, one that doesn't involve multiple
> scans of the hashtable).  But your results suggest that maybe it'd
> be worth making tbm_lossify slower in order to get better results.

It's not easy, I agree.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Next
From: Robert Haas
Date:
Subject: Re: asynchronous execution