Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id 68120830-3A34-4C4F-942F-6739DAA664CF@amazon.com
Whole thread Raw
In response to Re: archive status ".ready" files may be created too early  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: archive status ".ready" files may be created too early  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On 12/17/20, 9:15 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> At Thu, 17 Dec 2020 22:20:35 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
>> On 12/15/20, 2:33 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
>> > You're right in that regard. There's a window where partial record is
>> > written when write location passes F0 after insertion location passes
>> > F1. However, remembering all spanning records seems overkilling to me.
>>
>> I'm curious why you feel that recording all cross-segment records is
>> overkill.  IMO it seems far simpler to just do that rather than try to
>
> Sorry, my words are not enough. Remembering all spanning records in
> *shared memory* seems to be overkilling.  Much more if it is stored in
> shared hash table.  Even though it rarely the case, it can fail hard
> way when reaching the limit. If we could do well by remembering just
> two locations, we wouldn't need to worry about such a limitation.

I don't think it will fail if we reach max_size for the hash table.
The comment above ShmemInitHash() has this note:

    * max_size is the estimated maximum number of hashtable entries.  This is
    * not a hard limit, but the access efficiency will degrade if it is
    * exceeded substantially (since it's used to compute directory size and
    * the hash table buckets will get overfull).

> Another concern about the concrete patch:
>
> NotifySegmentsReadyForArchive() searches the shared hashacquiaing a
> LWLock every time XLogWrite is called while segment archive is being
> held off.  I don't think it is acceptable and I think it could be a
> problem when many backends are competing on WAL.

This is a fair point.  I did some benchmarking with a few hundred
connections all doing writes, and I was not able to discern any
noticeable performance impact.  My guess is that contention on this
new lock is unlikely because callers of XLogWrite() must already hold
WALWriteLock.  Otherwise, I believe we only acquire ArchNotifyLock no
more than once per segment to record new record boundaries.

Nathan


pgsql-hackers by date:

Previous
From: "Finnerty, Jim"
Date:
Subject: Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Next
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early