Home > mailing lists

Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From	Kyotaro Horiguchi
Subject	Re: archive status ".ready" files may be created too early
Date	January 27, 2021 05:35:00
Msg-id	20210127.113500.1720002622159938202.horikyota.ntt@gmail.com Whole thread Raw
In response to	Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>)
Responses	Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>) Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>) Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>)
List	pgsql-hackers

Tree view

At Tue, 26 Jan 2021 19:13:57 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> On 12/17/20, 9:15 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> > At Thu, 17 Dec 2020 22:20:35 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
> >> On 12/15/20, 2:33 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> >> > You're right in that regard. There's a window where partial record is
> >> > written when write location passes F0 after insertion location passes
> >> > F1. However, remembering all spanning records seems overkilling to me.
> >>
> >> I'm curious why you feel that recording all cross-segment records is
> >> overkill.  IMO it seems far simpler to just do that rather than try to
> >
> > Sorry, my words are not enough. Remembering all spanning records in
> > *shared memory* seems to be overkilling.  Much more if it is stored in
> > shared hash table.  Even though it rarely the case, it can fail hard
> > way when reaching the limit. If we could do well by remembering just
> > two locations, we wouldn't need to worry about such a limitation.
> 
> I don't think it will fail if we reach max_size for the hash table.
> The comment above ShmemInitHash() has this note:
> 
>     * max_size is the estimated maximum number of hashtable entries.  This is
>     * not a hard limit, but the access efficiency will degrade if it is
>     * exceeded substantially (since it's used to compute directory size and
>     * the hash table buckets will get overfull).

That description means that a shared hash has a directory with fixed
size thus there may be synonyms, which causes degradation. Even though
buckets are preallocated with the specified number, since the minimum
directory size is 256, buckets are allocated at least 256 in a long
run. Minimum on-the-fly allocation size is 32. I haven't calcuated
further precicely, but I'm worried about the amount of spare shared
memory the hash can allocate.

> > Another concern about the concrete patch:
> >
> > NotifySegmentsReadyForArchive() searches the shared hashacquiaing a
> > LWLock every time XLogWrite is called while segment archive is being
> > held off.  I don't think it is acceptable and I think it could be a
> > problem when many backends are competing on WAL.
> 
> This is a fair point.  I did some benchmarking with a few hundred
> connections all doing writes, and I was not able to discern any
> noticeable performance impact.  My guess is that contention on this
> new lock is unlikely because callers of XLogWrite() must already hold
> WALWriteLock.  Otherwise, I believe we only acquire ArchNotifyLock no
> more than once per segment to record new record boundaries.

Thanks. I agree that the reader-reader contention is not a problem due
to existing serialization by WALWriteLock. Adding an entry happens
only at segment boundary so the ArchNotifyLock doesn't seem to be a
problem.

However the function prolongs the WALWriteLock section. Couldn't we
somehow move the call to NotifySegmentsReadyForArchive in XLogWrite
out of the WALWriteLock section?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Zhihong Yu
Date: 27 January 2021, 05:19:44
Subject: Re: logical replication worker accesses catalogs in error context callback

From: Sehrope Sarkuni
Date: 27 January 2021, 05:53:52
Subject: Re: Add SQL function for SHA1

Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

Previous

Next