On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > The cause is that the worker had received pending-sync entry correctly
> > but not never created a relcache entry for the relation using
> > RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> > set.
> >
> > I'm investigating it.
>
> Relcaches are loaded from a file with old content at parallel worker
> startup. The relcache entry is corrected by invalidation at taking a
> lock but pending syncs are not considered.
>
> Since parallel workers don't access the files so we can just ignore
> the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
> invalidation, as attached PoC patch.
> [patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
> rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
> RelationBuildDesc() would do.]
As a PoC, this looks promising. Thanks. Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?
printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check TEMP_CONFIG=/tmp/minimal.conf
Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation. I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.