Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: archive status ".ready" files may be created too early
Date
Msg-id 20200706.141340.695077532820840879.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: archive status ".ready" files may be created too early  ("matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com>)
Responses RE: archive status ".ready" files may be created too early
List pgsql-hackers
Hello, Matsumura-san.

At Mon, 6 Jul 2020 04:02:23 +0000, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com> wrote in 
> Hello, Horiguchi-san
> 
> Thank you for your comment and patch.
> 
> At Thursday, June 25, 2020 3:36 PM(JST),  "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in
> > I think we don't need most of that shmem stuff.  XLogWrite is called
> 
> I wanted no more shmem stuff too, but other ideas need more lock
> that excludes inserter and writer each other.
> 
> > after WAL buffer is filled up to the requested position. So when it
> > crosses segment boundary we know the all past corss segment-boundary
> > records are stable. That means all we need to remember is only the
> > position of the latest corss-boundary record.
> 
> I could not agree. In the following case, it may not work well.
> - record-A and record-B (record-B is a newer one) is copied, and
> - lastSegContRecStart/End points to record-B's, and
> - FlushPtr is proceeded to in the middle of record-A.

IIUC, that means record-B is a cross segment-border record and we hav e
flushed beyond the recrod-B. In that case crash recovery afterwards
can read the complete record-B and will finish recovery *after* the
record-B. That's what we need here.

> In the above case, the writer should notify segments before record-A,
> but it notifies ones before record-B. If the writer notifies

If you mean that NotifyStableSegments notifies up-to the previous
segment of the segment where record-A is placed, that's wrong. The
issue here is crash recovery sees an incomplete record at a
segment-border. So it is sufficient that crash recoery can read the
last record by looking pg_wal.

> only when it flushes the latest record completely, it works well.

It confirms that "lastSegContRecEnd < LogwrtResult.Flush", that means
the last record(B) is completely flushed-out, isn't that? So it works
well.

> But the writer may not be enable to notify any segment forever when
> WAL records crossing-segment-boundary are inserted contiunuously.

No. As I mentioned in the preivous main, if we see a
cross-segment-boundary record, the previous cross-segment-boundary
record is flushed completely, and the segment containing the
first-half of the previous cross-segment-boundary record has already
been flushed.  I didin't that but we can put an assertion in
XLogInsertRecord like this:

 +      /* Remember the range of the record if it spans over segments */
 +      XLByteToSeg(StartPos, startseg, wal_segment_size);
 +      XLByteToPrevSeg(EndPos, endseg, wal_segment_size);
 +
 +      if (startseg != endseg)
 +      {
++          /* we shouldn't have a record spanning over three or more segments */
++          Assert(endseg = startseg + 1);
 +          SpinLockAcquire(&XLogCtl->info_lck);
 +          if (XLogCtl->lastSegContRecEnd < StartPos)
 +          {
 +              XLogCtl->lastSegContRecStart = StartPos;
 +              XLogCtl->lastSegContRecEnd = EndPos;

> So I think that we must remeber all such cross-segement-boundary records's EndRecPtr in buffer.
> 
> 
> > If we call XLogMarkEndRecPtrIfNeeded() there, the function is called
> > every time a record is written, most of which are wasteful.
> > XLogInsertRecord already has a code block executed only at every page
> > boundary.
> 
> I agree.
> XLogMarkEndRecPtrIfNeeded() is moved into the code block before updating
> LogwrtRqst.Write for avoiding passing-each-other with writer.
> 
> 
> > Now we can identify stable portion of WAL stream. It's enough to
> > prevent walsender from sending data that can be overwritten
> > afterwards. GetReplicationTargetRecPtr() in the attached does that.
> 
> I didn't notice it.
> I agree basically, but it is based on lastSegContRecStart/End.
> 
> So, first of all, we have to agree what should be remebered.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal - function string_to_table
Next
From: Fujii Masao
Date:
Subject: Re: track_planning causing performance regression