Re: performance of bitmap scans in nested loop joins - Mailing list pgsql-hackers

From Tom Lane
Subject Re: performance of bitmap scans in nested loop joins
Date
Msg-id 21246.1116278184@sss.pgh.pa.us
Whole thread Raw
In response to Re: performance of bitmap scans in nested loop joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> ... Where we are losing is mostly on the actual manipulation
> of the bitmaps (particularly hash_seq_search which is done in
> tbm_begin_iterate; and it looks like memory allocation for the bitmap
> hashtables is nontrivial too).  I had already had a TODO item to look
> into speeding up hash_seq_search ... will see what I can find.

I got around to looking at this some more.  It seems that the basic
problem is that dynahash.c isn't optimized for very small hashtables.
The test case I'm looking at (which may be an oversimplification of
Sergey's problem) never has more than one entry in the hashtables it
creates for in-memory bitmaps, and so the hashtable overhead is pretty
significant.  Particularly the overhead of creating and deleting 'em.

The other uses of dynahash.c that are at all performance-critical seem
to deal with hashtables that are (a) fairly large and (b) long-lived.
So I'm thinking that hacking dynahash.c itself to improve this situation
would probably be counterproductive overall.

An idea that comes to mind is to allow tidbitmap.c to handle the cases
of zero or one page entry directly, storing the page entry in a simple
field of the TIDBitmap struct.  Only when the bitmap needs entries for
more than one heap page will we create a subsidiary hash table.  (Note
that we could store bits for more than one tuple, if they happen to be
on the same heap page.)

This would uglify the logic in tidbitmap.c a bit, but it doesn't seem
completely preposterous.  The main objection I can see is that it would
only optimize the zero-or-one-page cases, which might be insufficient.

Anyone have a better idea for speeding up operations on small bitmaps?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: SQL Request Size
Next
From: Tom Lane
Date:
Subject: Re: SQL Request Size