Home > mailing lists

Re: sequences vs. synchronous replication - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: sequences vs. synchronous replication
Date	December 23, 2021 18:50:22
Msg-id	a58f1c38-10eb-7d7f-372c-cfb95007d54e@enterprisedb.com Whole thread Raw
In response to	Re: sequences vs. synchronous replication (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses	Re: sequences vs. synchronous replication
List	pgsql-hackers

Tree view

On 12/23/21 15:42, Fujii Masao wrote:
> 
> 
> On 2021/12/23 3:49, Tomas Vondra wrote:
>> Attached is a patch tweaking WAL logging - in wal_level=minimal we do 
>> the same thing as now, in higher levels we log every sequence fetch.
> 
> Thanks for the patch!
> 
> With the patch, I found that the regression test for sequences failed.
> 
> +            fetch = log = fetch;
> 
> This should be "log = fetch"?
> 
> On second thought, originally a sequence doesn't guarantee that the 
> value already returned by nextval() will never be returned by subsequent 
> nextval() after the server crash recovery. That is, nextval() may return 
> the same value across crash recovery. Is this understanding right? For 
> example, this case can happen if the server crashes after nextval() 
> returned the value but before WAL for the sequence was flushed to the 
> permanent storage.

I think the important step is commit. We don't guarantee anything for 
changes in uncommitted transactions. If you do nextval in a transaction 
and the server crashes before the WAL gets flushed before COMMIT, then 
yes, nextval may generate the same nextval again. But after commit that 
is not OK - it must not happen.

> So it's not a bug that sync standby may return the same value as
> already returned in the primary because the corresponding WAL has not
> been replicated yet, isn't it?
> 

No, I don't think so. Once the COMMIT happens (and gets confirmed by the 
sync standby), it should be possible to failover to the sync replica 
without losing any data in committed transaction. Generating duplicate 
values is a clear violation of that.

IMHO the fact that we allow a transaction to commit (even just locally) 
without flushing all the WAL it depends on is clearly a data loss bug.

> BTW, if the returned value is stored in database, the same value is 
> guaranteed not to be returned again after the server crash or by sync 
> standby. Because in that case the WAL of the transaction storing that 
> value is flushed and replicated.
> 

True, assuming the table is WAL-logged etc. I agree the issue may be 
affecting a fairly small fraction of workloads, because most people use 
sequences to generate data for inserts etc.

>> So I think this makes it acceptable / manageable. Of course, this 
>> means the values are much less monotonous (across backends), but I 
>> don't think we really promised that. And I doubt anyone is really 
>> using sequences like this (just nextval) in performance critical use 
>> cases.
> 
> I think that this approach is not acceptable to some users. So, if we 
> actually adopt WAL-logging every sequence fetch, also how about exposing 
> SEQ_LOG_VALS as reloption for a sequence? If so, those who want to log 
> every sequence fetch can set this SEQ_LOG_VALS reloption to 0. OTOH, 
> those who prefer the current behavior in spite of the risk we're 
> discussing at this thread can set the reloption to 32 like it is for 
> now, for example.
> 

I think it'd be worth explaining why you think it's not acceptable?

I've demonstrated the impact on regular workloads (with other changes 
that write stuff to WAL) is not measurable, and enabling sequence 
caching eliminates most of the overhead for the rare corner case 
workloads if needed. It does generate a bit more WAL, but the sequence 
WAL records are pretty tiny.

I'm opposed to adding relooptions that affect correctness - it just 
seems like a bad idea to me. Moreover setting the CACHE for a sequence 
does almost the same thing - if you set CACHE 32, we only generate WAL 
once every 32 increments. The only difference is that this cache is not 
shared between backends, so one backend will generate 1,2,3,... and 
another backend will generate 33,34,35,... etc. I don't think that's a 
problem, because if you want strictly monotonous / gap-less sequences 
you can't use our sequences anyway. Yes, with short-lived backends this 
may consume the sequences faster, but well - short-lived backends are 
expensive anyway and overflowing bigserial is still unlikely.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Максим Орлов
Date: 23 December 2021, 18:05:47
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

From: Joshua Brindle
Date: 23 December 2021, 21:06:45
Subject: Re: CREATEROLE and role ownership hierarchies

Re: sequences vs. synchronous replication - Mailing list pgsql-hackers

Previous

Next