Re: sequences vs. synchronous replication - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: sequences vs. synchronous replication
Date
Msg-id 20211224.170451.116141220257426861.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: sequences vs. synchronous replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: sequences vs. synchronous replication
Re: sequences vs. synchronous replication
List pgsql-hackers
At Fri, 24 Dec 2021 08:23:13 +0100, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in
>
>
> On 12/24/21 06:37, Kyotaro Horiguchi wrote:
> > At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote in
> >> On 12/23/21 15:42, Fujii Masao wrote:
> >>> On 2021/12/23 3:49, Tomas Vondra wrote:
> >>>> Attached is a patch tweaking WAL logging - in wal_level=minimal we do
> >>>> the same thing as now, in higher levels we log every sequence fetch.
> >>> Thanks for the patch!
> >>> With the patch, I found that the regression test for sequences failed.
> >>> +            fetch = log = fetch;
> >>> This should be "log = fetch"?
> >>> On second thought, originally a sequence doesn't guarantee that the
> >>> value already returned by nextval() will never be returned by
> >>> subsequent nextval() after the server crash recovery. That is,
> >>> nextval() may return the same value across crash recovery. Is this
> >>> understanding right? For example, this case can happen if the server
> >>> crashes after nextval() returned the value but before WAL for the
> >>> sequence was flushed to the permanent storage.
> >>
> >> I think the important step is commit. We don't guarantee anything for
> >> changes in uncommitted transactions. If you do nextval in a
> >> transaction and the server crashes before the WAL gets flushed before
> >> COMMIT, then yes, nextval may generate the same nextval again. But
> >> after commit that is not OK - it must not happen.
> > I don't mean to stand on Fujii-san's side particularly, but it seems
> > to me sequences of RDBSs are not rolled back generally.  Some googling
> > told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
> > server doesn't rewind sequences at rollback, that is, sequences are
> > incremented independtly from transaction control.  It seems common to
> > think that two nextval() calls for the same sequence must not return
> > the same value in any context.
> >
>
> Yes, sequences are not rolled back on abort generally. That would
> require much stricter locking, and that'd go against using sequences
> in concurrent sessions.

I thinks so.

> But we're not talking about sequence rollback - we're talking about
> data loss, caused by failure to flush WAL for a sequence. But that
> affects the *current* code too, and to much greater extent.

Ah, yes, I don't object to that aspect.

> Consider this:
>
> BEGIN;
> SELECT nextval('s') FROM generate_series(1,1000) s(i);
> ROLLBACK; -- or crash of a different backend
>
> BEGIN;
> SELECT nextval('s');
> COMMIT;
>
> With the current code, this may easily lose the WAL, and we'll
> generate duplicate values from the sequence. We pretty much ignore the
> COMMIT.
>
> With the proposed change to WAL logging, that is not possible. The
> COMMIT flushes enough WAL to prevent this issue.
>
> So this actually makes this issue less severe.
>
> Maybe I'm missing some important detail, though. Can you show an
> example where the proposed changes make the issue worse?

No. It seems to me improvoment at least from the current state, for
the reason you mentioned.

> >>> So it's not a bug that sync standby may return the same value as
> >>> already returned in the primary because the corresponding WAL has not
> >>> been replicated yet, isn't it?
> >>>
> >>
> >> No, I don't think so. Once the COMMIT happens (and gets confirmed by
> >> the sync standby), it should be possible to failover to the sync
> >> replica without losing any data in committed transaction. Generating
> >> duplicate values is a clear violation of that.
> > So, strictly speaking, that is a violation of the constraint I
> > mentioned regardless whether the transaction is committed or
> > not. However we have technical limitations as below.
> >
>
> I don't follow. What violates what?
>
> If the transaction commits (and gets a confirmation from sync
> replica), the modified WAL logging prevents duplicate values. It does
> nothing for uncommitted transactions. Seems like an improvement to me.

Sorry for the noise. I misunderstand that ROLLBACK is being changed to
rollback sequences.

> No idea. IMHO from the correctness / behavior point of view, the
> modified logging is an improvement. The only issue is the additional
> overhead, and I think the cache addresses that quite well.

Now I understand the story here.

I agree that the patch is improvment from the current behavior.
I agree that the overhead is eventually-nothing for WAL-emitting workloads.

Still, as Fujii-san concerns, I'm afraid that some people may suffer
the degradation the patch causes.  I wonder it is acceptable to get
back the previous behavior by exposing SEQ_LOG_VALS itself or a
boolean to do that, as a 'not-recommended-to-use' variable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Allow escape in application_name
Next
From: Masahiko Sawada
Date:
Subject: Re: more descriptive message for process termination due to max_slot_wal_keep_size