Thread: Efficiently advancing a sequence without risking it going backwards.

Efficiently advancing a sequence without risking it going backwards.

From
Paul McGarry
Date:
I have two sequences in different dbs which I want to keep roughly in sync (they don't have to be exactly in sync, I am just keeping them in the same ballpark).

Currently I have a process which periodically checks the sequences and does:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) while (nextval('DB2sequence')<=1234);

which works fine, but is pretty inefficient if the discrepancy is large (ie calling nextval a hundred thousand times).

I don't think I can use setval(), because it risks making sequences go backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on another process,  (2) would take the sequence back from 1235 to 1234 and I would end up trying to create a duplicate key ID from the sequence.

So what I really want is something equivalent to the setval, but with "where DB2sequence <1234" logic so it doesn't overwrite the value if it is already large.

Is there such a mechanism?

Thanks for any help.

Paul

Re: Efficiently advancing a sequence without risking it going backwards.

From
Adrian Klaver
Date:
On 7/6/20 7:06 PM, Paul McGarry wrote:
> I have two sequences in different dbs which I want to keep roughly in 
> sync (they don't have to be exactly in sync, I am just keeping them in 
> the same ballpark).
> 
> Currently I have a process which periodically checks the sequences and does:
> 
> 1) Check values
> DB1sequence: 1234
> DB2sequence: 1233 (1 behind)
> 2) while (nextval('DB2sequence')<=1234);
> 
> which works fine, but is pretty inefficient if the discrepancy is large 
> (ie calling nextval a hundred thousand times).
> 
> I don't think I can use setval(), because it risks making sequences go 
> backwards, eg:
> 
> 1) Check values
> DB1sequence: 1234
> DB2sequence: 1233 (1 behind)
> 2) setval('DB2sequence',1234);
> 
> but if between (1) and (2) there are 2 nextval(DB2sequence) calls on 
> another process,  (2) would take the sequence back from 1235 to 1234 and 
> I would end up trying to create a duplicate key ID from the sequence.
> 
> So what I really want is something equivalent to the setval, but with 
> "where DB2sequence <1234" logic so it doesn't overwrite the value if it 
> is already large.
> 
> Is there such a mechanism?

Well sequences are designed to be operated on independently from each 
session, so there is not much you can do about locking on a number. The 
best you can do is use setval() to increment the number by enough to get 
past any potential sequence advances in other sessions. Say advance by 
10, 50 or 100 depending on what you think is a reasonable number of 
other sessions also hitting the sequence.


> 
> Thanks for any help.
> 
> Paul


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Efficiently advancing a sequence without risking it going backwards.

From
Jeremy Schneider
Date:
> On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:
>
> I don't think I can use setval(), because it risks making sequences go backwards, eg:
>
> 1) Check values
> DB1sequence: 1234
> DB2sequence: 1233 (1 behind)
> 2) setval('DB2sequence',1234);
>
> but if between (1) and (2) there are 2 nextval(DB2sequence) calls on another process,  (2) would take the sequence
backfrom 1235 to 1234 and I would end up trying to create a duplicate key ID from the sequence. 

An ability to “lock” the sequence momentarily would give you the tool you need, but I don’t think it’s there.

Total hack, but if your application or users can retry when the rare error is encountered then one idea is to rename
thesequence momentarily while you do the setval() then rename it back. Do an initial check without renaming, then
re-checkafter renaming and before the setval() call. 

If you put retry logic into your application then make sure to include back-off logic so you don’t get an outage
inducedby thundering herd. 

-Jeremy

Sent from my TI-83





Re: Efficiently advancing a sequence without risking it going backwards.

From
Christopher Browne
Date:
On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider <schneider@ardentperf.com> wrote:

> On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:
>
> I don't think I can use setval(), because it risks making sequences go backwards, eg:
>
> 1) Check values
> DB1sequence: 1234
> DB2sequence: 1233 (1 behind)
> 2) setval('DB2sequence',1234);
>
> but if between (1) and (2) there are 2 nextval(DB2sequence) calls on another process,  (2) would take the sequence back from 1235 to 1234 and I would end up trying to create a duplicate key ID from the sequence.

An ability to “lock” the sequence momentarily would give you the tool you need, but I don’t think it’s there.

Total hack, but if your application or users can retry when the rare error is encountered then one idea is to rename the sequence momentarily while you do the setval() then rename it back. Do an initial check without renaming, then re-check after renaming and before the setval() call.

If you put retry logic into your application then make sure to include back-off logic so you don’t get an outage induced by thundering herd.

This is increasingly looking like a set of attempts to intentionally abuse what sequences were designed for.

The use-case where you need a lock on the value so that there can't possibly be a hole in the sequence points at the notion of having some other kind of a function that takes out a lock on a table, and serially gives out "MAX+1" as the next value.

That isn't a very difficult function to write; the problem with it is that that sort of function will forcibly serialize all inserts through the function+table lock that is giving out "MAX+1" values.  That's going to be WAY slower than using a sequence object, and about 98% of the time, people will prefer the sequence object, particularly because it's about 98% faster.

I'm not quite sure if anyone has put out there a standard-ish idiom for this; that seems like a not TOO difficult "exercise for the user."

There will definitely be more failure cases, and *wildly* more fighting, in a concurrent environment, over tuple locks.

- An obvious failure is that if one connection asks for the new MAX+1, gets it, and then the transaction fails, for some later, out-of-relevant-scope, reason, you'll still potentially get some "holes" in the series of values.

- If there are 10 connections trying to get MAX+1 concurrently, only one can get it at a time, and that connection can't relinquish the lock until its transaction has completed, and the 9 must wait, regardless of how much work the "winner" still has to do.

These are amongst the reasons why people conclude they *don't* want that kind of functionality.

It makes me think that the problem needs to be taken back to that initial point of "I think I need some somewhat coordinated sequences", and poke at what the *real* requirement is there, and why someone thinks that the values should be "somewhat coordinated."  Something seems off there.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
Christopher Browne <cbbrowne@gmail.com> writes:

> On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider <schneider@ardentperf.com>
> wrote:
>
>>
>> > On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:
>> >
>> > I don't think I can use setval(), because it risks making sequences go
>> backwards, eg:
>> >
>> > 1) Check values
>> > DB1sequence: 1234
>> > DB2sequence: 1233 (1 behind)
>> > 2) setval('DB2sequence',1234);
>> >
>> > but if between (1) and (2) there are 2 nextval(DB2sequence) calls on
>> another process,  (2) would take the sequence back from 1235 to 1234 and I
>> would end up trying to create a duplicate key ID from the sequence.
>>
>> An ability to “lock” the sequence momentarily would give you the tool you
>> need, but I don’t think it’s there.
>>
>> Total hack, but if your application or users can retry when the rare error
>> is encountered then one idea is to rename the sequence momentarily while
>> you do the setval() then rename it back. Do an initial check without
>> renaming, then re-check after renaming and before the setval() call.
>>
>> If you put retry logic into your application then make sure to include
>> back-off logic so you don’t get an outage induced by thundering herd.
>>
>
> This is increasingly looking like a set of attempts to intentionally abuse
> what sequences were designed for.
>
> The use-case where you need a lock on the value so that there can't
> possibly be a hole in the sequence points at the notion of having some
> other kind of a function that takes out a lock on a table, and serially
> gives out "MAX+1" as the next value.
>
> That isn't a very difficult function to write; the problem with it is that
> that sort of function will forcibly serialize all inserts through the
> function+table lock that is giving out "MAX+1" values.  That's going to be
> WAY slower than using a sequence object, and about 98% of the time, people
> will prefer the sequence object, particularly because it's about 98% faster.
>
> I'm not quite sure if anyone has put out there a standard-ish idiom for
> this; that seems like a not TOO difficult "exercise for the user."
>
> There will definitely be more failure cases, and *wildly* more fighting, in
> a concurrent environment, over tuple locks.
>
> - An obvious failure is that if one connection asks for the new MAX+1, gets
> it, and then the transaction fails, for some later, out-of-relevant-scope,
> reason, you'll still potentially get some "holes" in the series of values.
>
> - If there are 10 connections trying to get MAX+1 concurrently, only one
> can get it at a time, and that connection can't relinquish the lock until
> its transaction has completed, and the 9 must wait, regardless of how much
> work the "winner" still has to do.
>
> These are amongst the reasons why people conclude they *don't* want that
> kind of functionality.
>
> It makes me think that the problem needs to be taken back to that initial
> point of "I think I need some somewhat coordinated sequences", and poke at
> what the *real* requirement is there, and why someone thinks that the
> values should be "somewhat coordinated."  Something seems off there.

I agree and was going to write something similar. All the 'solutions'
are problematic in one way or the other and seem to be due to a
misconception about the role for sequences or some requirement which
needs to be re-examined.
--
Tim Cross



Re: Efficiently advancing a sequence without risking it going backwards.

From
Jeremy Schneider
Date:


On Jul 9, 2020, at 14:08, Christopher Browne <cbbrowne@gmail.com> wrote:


On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider <schneider@ardentperf.com> wrote:

> On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:
>
> I don't think I can use setval(), because it risks making sequences go backwards, eg:
>
> 1) Check values
> DB1sequence: 1234
> DB2sequence: 1233 (1 behind)
> 2) setval('DB2sequence',1234);
>
> but if between (1) and (2) there are 2 nextval(DB2sequence) calls on another process,  (2) would take the sequence back from 1235 to 1234 and I would end up trying to create a duplicate key ID from the sequence.

An ability to “lock” the sequence momentarily would give you the tool you need, but I don’t think it’s there.

The use-case where you need a lock on the value so that there can't possibly be a hole in the sequence

OP asked for a way to call setval() with a guarantee the sequence will never go backwards IIUC. His code can check that the new value he wants to set is higher than the current value, but there’s a race condition where a second connection could quickly advance the sequence between the check and the setval() call and then cause duplicates from the next call which is bad.

The ideal solution is a setval_forward_only() or setval_no_duplicates() function that does it atomically or something. If it were possible to “lock” the entire sequence to prevent any other sessions from using it at all, that would work too. Not locking a value, locking the whole thing. Very bad hack solution is renaming the sequence then renaming it back as a blunt form of locking... and to be clear I don’t think is a good idea I just was saying that technically it might work.  :)

-Jeremy

Sent from my TI-83

Re: Efficiently advancing a sequence without risking it going backwards.

From
Paul McGarry
Date:


On Fri, 10 Jul 2020 at 10:27, Jeremy Schneider <schneider@ardentperf.com> wrote:

OP asked for a way to call setval() with a guarantee the sequence will never go backwards IIUC. His code can check that the new value he wants to set is higher than the current value, but there’s a race condition where a second connection could quickly advance the sequence between the check and the setval() call and then cause duplicates from the next call which is bad.

The ideal solution is a setval_forward_only() or setval_no_duplicates() function that does it atomically or something. If it were possible to “lock” the entire sequence to prevent any other sessions from using it at all, that would work too. Not locking a value, locking the whole thing. Very bad hack solution is renaming the sequence then renaming it back as a blunt form of locking... and to be clear I don’t think is a good idea I just was saying that technically it might work.  :)

-Jeremy

Yes, that first paragraph is a good summary. A "setval_forward_only()" is the sort of thing I was after.

Maybe something analogous to:
UPDATE the_seq SET last_value = number WHERE last_value < number;
with some sort of global (but short) lock as required.

Relating to some of the other replies there isn't a "requirement" (from an application perspective) that the sequences always generate ids in ascending order or that they don't skip numbers etc. To the application they are just ids, as long as they are unique that is enough. However it is an application that is used by people, so there is some external value in having the ids going up in a way that roughly correlates to time as people tend to expect numbers to do that sort of thing.

For a bit more background, we have our own application and homegrown loosely coupled multi-primary DB cluster and replication system.
Each backend DB in the cluster has its own node id (0-9) and when our app asks for a sequence value it calls a custom function which gets a normal sequence value suffixed with the DB node ID.

So if there were two backend dbs (1 and 2) and both backend dbs had a sequence with last_value of 1234 then our application would get a "sequence" value of 12351 or 12352 depending on which db backend served the request.
The resulting ids are unique across our cluster, but certainly not gapless nor issued in strict ascending order which is fine from an application perspective.

But as mentioned, from a human perspective there is some value in keeping the ids issued by the cluster roughly in time order, so we have a secondary process which liaises with all the backend nodes and pulls forwards any sequences that fall behind other nodes. So if DB 1 happened to serve 1000 requests using the sequence while DB2 served none, the process pulls the sequence in DB2 forward until it catches up, currently by calling nextval in a loop.

Which all works fine. However sometimes (eg taking a node offline for maintenance or upgrade) a sequence might get quite a long way out, and calling nextval() 100k times seems a rather inefficient way to catch up (but it is better to be inefficient than risk going backwards and causing a duplicate id).

We have been using essentially this system for our cluster since Postgres 7 days, periodically we have touched base with Postgres replication advancements (which have come a long way) but haven't yet found a compelling reason to switch from what is working.

Paul