I wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> We could have two kinds of seq scans, with and without support for
>> concurrent inserts.
> Yeah, I considered that too, but it just seems too error-prone. We
> could maybe make it trustworthy by having hash_seq_search complain if
> it noticed there had been any concurrent insertions --- but then you're
> putting new overhead into hash_seq_search, which kind of defeats the
> argument for it (and hash_seq_search is a bit of a bottleneck, so extra
> cycles there matter).
I just finished looking through the uses of hash_seq_search, and
realized that there is one place where it would be a bit painful to
convert to the insertion-safe approach I'm proposing; namely nodeAgg.c.
The places where the hashtable iteration is started and used are
scattered, and we don't really track whether the iteration is done or
not, so it's hard to be sure where to cancel the iteration. It could
probably be made to work but it seems like it'd be fragile.
I still don't want to introduce more checking overhead into
hash_seq_search, though, so what I'm now thinking about is a new
dynahash primitive named something like "hash_freeze", which'd mark a
hashtable as disallowing insertions. If the hashtable is frozen before
hash_seq_init then we don't add it to the central list of scans, and
therefore there is no cleanup to do at the end. nodeAgg can use this
mode since it doesn't modify its hashtable anymore after beginning its
readout scan.
BTW, we didn't really get into details, but for the insertion-safe case
I'm envisioning adding a routine "hash_seq_term", which you would need
to call if and only if you abandon a hash_seq_search scan without
running it to completion (if you do the complete scan, hash_seq_search
will automatically call hash_seq_term before returning NULL). All but
a very small number of places run their searches to completion and
therefore won't require any source code changes with this API.
Thoughts?
regards, tom lane