Re: archive status ".ready" files may be created too early - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: archive status ".ready" files may be created too early |
Date | |
Msg-id | 20200706.141340.695077532820840879.horikyota.ntt@gmail.com Whole thread Raw |
In response to | RE: archive status ".ready" files may be created too early ("matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com>) |
Responses |
RE: archive status ".ready" files may be created too early
|
List | pgsql-hackers |
Hello, Matsumura-san. At Mon, 6 Jul 2020 04:02:23 +0000, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com> wrote in > Hello, Horiguchi-san > > Thank you for your comment and patch. > > At Thursday, June 25, 2020 3:36 PM(JST), "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in > > I think we don't need most of that shmem stuff. XLogWrite is called > > I wanted no more shmem stuff too, but other ideas need more lock > that excludes inserter and writer each other. > > > after WAL buffer is filled up to the requested position. So when it > > crosses segment boundary we know the all past corss segment-boundary > > records are stable. That means all we need to remember is only the > > position of the latest corss-boundary record. > > I could not agree. In the following case, it may not work well. > - record-A and record-B (record-B is a newer one) is copied, and > - lastSegContRecStart/End points to record-B's, and > - FlushPtr is proceeded to in the middle of record-A. IIUC, that means record-B is a cross segment-border record and we hav e flushed beyond the recrod-B. In that case crash recovery afterwards can read the complete record-B and will finish recovery *after* the record-B. That's what we need here. > In the above case, the writer should notify segments before record-A, > but it notifies ones before record-B. If the writer notifies If you mean that NotifyStableSegments notifies up-to the previous segment of the segment where record-A is placed, that's wrong. The issue here is crash recovery sees an incomplete record at a segment-border. So it is sufficient that crash recoery can read the last record by looking pg_wal. > only when it flushes the latest record completely, it works well. It confirms that "lastSegContRecEnd < LogwrtResult.Flush", that means the last record(B) is completely flushed-out, isn't that? So it works well. > But the writer may not be enable to notify any segment forever when > WAL records crossing-segment-boundary are inserted contiunuously. No. As I mentioned in the preivous main, if we see a cross-segment-boundary record, the previous cross-segment-boundary record is flushed completely, and the segment containing the first-half of the previous cross-segment-boundary record has already been flushed. I didin't that but we can put an assertion in XLogInsertRecord like this: + /* Remember the range of the record if it spans over segments */ + XLByteToSeg(StartPos, startseg, wal_segment_size); + XLByteToPrevSeg(EndPos, endseg, wal_segment_size); + + if (startseg != endseg) + { ++ /* we shouldn't have a record spanning over three or more segments */ ++ Assert(endseg = startseg + 1); + SpinLockAcquire(&XLogCtl->info_lck); + if (XLogCtl->lastSegContRecEnd < StartPos) + { + XLogCtl->lastSegContRecStart = StartPos; + XLogCtl->lastSegContRecEnd = EndPos; > So I think that we must remeber all such cross-segement-boundary records's EndRecPtr in buffer. > > > > If we call XLogMarkEndRecPtrIfNeeded() there, the function is called > > every time a record is written, most of which are wasteful. > > XLogInsertRecord already has a code block executed only at every page > > boundary. > > I agree. > XLogMarkEndRecPtrIfNeeded() is moved into the code block before updating > LogwrtRqst.Write for avoiding passing-each-other with writer. > > > > Now we can identify stable portion of WAL stream. It's enough to > > prevent walsender from sending data that can be overwritten > > afterwards. GetReplicationTargetRecPtr() in the attached does that. > > I didn't notice it. > I agree basically, but it is based on lastSegContRecStart/End. > > So, first of all, we have to agree what should be remebered. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: