Home > mailing lists

Re: tbm_lossify causes "unbalanced" hashtables / bitmaps - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date	September 23, 2016 21:40:20
Msg-id	30437.1474666813@sss.pgh.pa.us Whole thread Raw
In response to	tbm_lossify causes "unbalanced" hashtables / bitmaps (Andres Freund <andres@anarazel.de>)
Responses	Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
List	pgsql-hackers

Tree view

Andres Freund <andres@anarazel.de> writes:
> Because iteration (both in my implementation and dynahash) is
> essentially in bucket order, the front of the hashtable will be mostly
> empty, whereas later parts of the hashtable will be full (i.e. a lot of
> collisions). The reason for that is that we'll find all parts of the
> hashtable that are uncompressed and "early" in the hashspace, and
> they'll possibly hash to something later in the table.

Hm, yeah, I had supposed that this would hit a random part of the key
space, but you're right that over successive calls it's not good.

My idea of an appropriate fix would be to resume the scan at the same
point where the last scan stopped, and work circularly around the table
when necessary.  But I'm not sure there is any really good way to do that
in the dynahash infrastructure.  Maybe it'd work to keep the iteration
state around, but I don't remember how well that interacts with other
insertions/deletions.  What about in your implementation?

There's also the point mentioned in the existing comment, that it'd be
better to go after pages with more bits set first.  Not sure of an
inexpensive way to do that (ie, one that doesn't involve multiple
scans of the hashtable).  But your results suggest that maybe it'd
be worth making tbm_lossify slower in order to get better results.
        regards, tom lane

pgsql-hackers by date:

From: Andres Freund
Date: 23 September 2016, 21:26:02
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps

From: Andres Freund
Date: 23 September 2016, 22:15:01
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps

Re: tbm_lossify causes "unbalanced" hashtables / bitmaps - Mailing list pgsql-hackers

Previous

Next