Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Also, we have a generic issue that making fresh entries in a hashtable
>> might result in a concurrent hash_seq_search scan visiting existing
>> entries more than once; that's definitely not something any of the
>> existing callers are thinking about.
>
> Ouch. Note that we can also miss some entries altogether, which is
> probably even worse.
In case someone is wondering how that can happen, here's an example.
We're scanning a bucket that contains four entries, and we split it
after returning 1:
1 -> 2* -> 3 -> 4
* denotes the next entry the seq scan has stored.
If this is split goes example like this:
1 -> 3
2* -> 4
The seq scan will continue scanning from 2, then 4, and miss 3 altogether.
I briefly went through all callers of hash_seq_init. The only place
where we explicitly rely on being able to add entries to a hash table
while scanning it is in tbm_lossify. There's more complex loops in
portalmem.c and relcache.c, which I think are safe, but would need to
look closer. There's also the pg_prepared_statement
set-returning-function that keeps a scan open across calls, which seems
error-prone.
Should we document the fact that it's not safe to insert new entries to
a hash table while scanning it, and fix the few call sites that do that,
or does anyone see a better solution? One alternative would be to
inhibit bucket splits while a scan is in progress, but then we'd need to
take care to clean up after each scan.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com