Re: tbm_lossify causes "unbalanced" hashtables / bitmaps - Mailing list pgsql-hackers

From Tom Lane
Subject Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Date
Msg-id 30437.1474666813@sss.pgh.pa.us
Whole thread Raw
In response to tbm_lossify causes "unbalanced" hashtables / bitmaps  (Andres Freund <andres@anarazel.de>)
Responses Re: tbm_lossify causes "unbalanced" hashtables / bitmaps  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> Because iteration (both in my implementation and dynahash) is
> essentially in bucket order, the front of the hashtable will be mostly
> empty, whereas later parts of the hashtable will be full (i.e. a lot of
> collisions). The reason for that is that we'll find all parts of the
> hashtable that are uncompressed and "early" in the hashspace, and
> they'll possibly hash to something later in the table.

Hm, yeah, I had supposed that this would hit a random part of the key
space, but you're right that over successive calls it's not good.

My idea of an appropriate fix would be to resume the scan at the same
point where the last scan stopped, and work circularly around the table
when necessary.  But I'm not sure there is any really good way to do that
in the dynahash infrastructure.  Maybe it'd work to keep the iteration
state around, but I don't remember how well that interacts with other
insertions/deletions.  What about in your implementation?

There's also the point mentioned in the existing comment, that it'd be
better to go after pages with more bits set first.  Not sure of an
inexpensive way to do that (ie, one that doesn't involve multiple
scans of the hashtable).  But your results suggest that maybe it'd
be worth making tbm_lossify slower in order to get better results.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Next
From: Andres Freund
Date:
Subject: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps