Tom Lane wrote:
> The pending-fsync stuff in md.c is also expecting to be able to add
> entries during a scan.
No, mdsync starts the scan from scratch after calling AbsorbFsyncRequests.
> I don't think we can go in the direction of forbidding insertions during
> a scan --- as the case at hand shows, it's just not always obvious that
> that could happen, and finding/fixing such a problem is nigh impossible.
> (We were darn fortunate to be able to reproduce this one.) Plus we have
> a couple of places where it's really necessary to be able to do it,
> anyway.
>
> The only answer I can see that seems reasonably robust is to change
> dynahash.c so that it tracks whether any seq_search scans are open on a
> hashtable, and doesn't carry out any splits while one is. This wouldn't
> cost anything noticeable in performance, assuming that not very many
> splits are postponed. The PITA aspect of it is that we'd need to add
> bookkeeping mechanisms to ensure that the count of active scans gets
> cleaned up on error exit. It's not like we've not got lots of those,
> though.
We could have two kinds of seq scans, with and without support for
concurrent inserts. If you open a scan without that support, it acts
just like today, and no extra bookkeeping or clean up by the caller is
required. If you need concurrent inserts, we inhibit bucket splits, but
it's up to the caller to explicitly close the scan, possibly with
PG_TRY/CATCH. I'm not sure if that's simpler in the end, but we could
get away without adding generic bookkeeping mechanism.
> Possibly we could simplify matters a bit by not worrying about cleaning
> up leaked counts at subtransaction abort, ie, the list of open scans
> would only get forced to empty at top transaction end. This carries a
> slightly higher risk of meaningful performance degradation, but in
> practice I doubt it's a big problem. If we agreed that then we'd not
> need ResourceOwner support --- it could be handled like LWLock counts.
Hmm. Unlike lwlocks, hash tables can live in different memory contexts,
so we can't just have list of open scans similar to held_lwlocks array.
Do we need to support multiple simultaneous seq scans of a hash table? I
suppose we do..
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com