Re: sequences vs. synchronous replication - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: sequences vs. synchronous replication |
Date | |
Msg-id | 20211224.170451.116141220257426861.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: sequences vs. synchronous replication (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: sequences vs. synchronous replication
Re: sequences vs. synchronous replication |
List | pgsql-hackers |
At Fri, 24 Dec 2021 08:23:13 +0100, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in > > > On 12/24/21 06:37, Kyotaro Horiguchi wrote: > > At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote in > >> On 12/23/21 15:42, Fujii Masao wrote: > >>> On 2021/12/23 3:49, Tomas Vondra wrote: > >>>> Attached is a patch tweaking WAL logging - in wal_level=minimal we do > >>>> the same thing as now, in higher levels we log every sequence fetch. > >>> Thanks for the patch! > >>> With the patch, I found that the regression test for sequences failed. > >>> + fetch = log = fetch; > >>> This should be "log = fetch"? > >>> On second thought, originally a sequence doesn't guarantee that the > >>> value already returned by nextval() will never be returned by > >>> subsequent nextval() after the server crash recovery. That is, > >>> nextval() may return the same value across crash recovery. Is this > >>> understanding right? For example, this case can happen if the server > >>> crashes after nextval() returned the value but before WAL for the > >>> sequence was flushed to the permanent storage. > >> > >> I think the important step is commit. We don't guarantee anything for > >> changes in uncommitted transactions. If you do nextval in a > >> transaction and the server crashes before the WAL gets flushed before > >> COMMIT, then yes, nextval may generate the same nextval again. But > >> after commit that is not OK - it must not happen. > > I don't mean to stand on Fujii-san's side particularly, but it seems > > to me sequences of RDBSs are not rolled back generally. Some googling > > told me that at least Oracle (documented), MySQL, DB2 and MS-SQL > > server doesn't rewind sequences at rollback, that is, sequences are > > incremented independtly from transaction control. It seems common to > > think that two nextval() calls for the same sequence must not return > > the same value in any context. > > > > Yes, sequences are not rolled back on abort generally. That would > require much stricter locking, and that'd go against using sequences > in concurrent sessions. I thinks so. > But we're not talking about sequence rollback - we're talking about > data loss, caused by failure to flush WAL for a sequence. But that > affects the *current* code too, and to much greater extent. Ah, yes, I don't object to that aspect. > Consider this: > > BEGIN; > SELECT nextval('s') FROM generate_series(1,1000) s(i); > ROLLBACK; -- or crash of a different backend > > BEGIN; > SELECT nextval('s'); > COMMIT; > > With the current code, this may easily lose the WAL, and we'll > generate duplicate values from the sequence. We pretty much ignore the > COMMIT. > > With the proposed change to WAL logging, that is not possible. The > COMMIT flushes enough WAL to prevent this issue. > > So this actually makes this issue less severe. > > Maybe I'm missing some important detail, though. Can you show an > example where the proposed changes make the issue worse? No. It seems to me improvoment at least from the current state, for the reason you mentioned. > >>> So it's not a bug that sync standby may return the same value as > >>> already returned in the primary because the corresponding WAL has not > >>> been replicated yet, isn't it? > >>> > >> > >> No, I don't think so. Once the COMMIT happens (and gets confirmed by > >> the sync standby), it should be possible to failover to the sync > >> replica without losing any data in committed transaction. Generating > >> duplicate values is a clear violation of that. > > So, strictly speaking, that is a violation of the constraint I > > mentioned regardless whether the transaction is committed or > > not. However we have technical limitations as below. > > > > I don't follow. What violates what? > > If the transaction commits (and gets a confirmation from sync > replica), the modified WAL logging prevents duplicate values. It does > nothing for uncommitted transactions. Seems like an improvement to me. Sorry for the noise. I misunderstand that ROLLBACK is being changed to rollback sequences. > No idea. IMHO from the correctness / behavior point of view, the > modified logging is an improvement. The only issue is the additional > overhead, and I think the cache addresses that quite well. Now I understand the story here. I agree that the patch is improvment from the current behavior. I agree that the overhead is eventually-nothing for WAL-emitting workloads. Still, as Fujii-san concerns, I'm afraid that some people may suffer the degradation the patch causes. I wonder it is acceptable to get back the previous behavior by exposing SEQ_LOG_VALS itself or a boolean to do that, as a 'not-recommended-to-use' variable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: