Thread: Support for grabbing multiple consecutive values with nextval()

Support for grabbing multiple consecutive values with nextval()

From
Jille Timmermans
Date:
Hi,

First time PostgreSQL contributor here :)

I wanted to be able to allocate a bunch of numbers from a sequence at 
once. Multiple people seem to be struggling with this 
(https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).

I propose to add an extra argument to nextval() that specifies how many 
numbers you want to allocate (default 1).

The attached patch (based on master) passes `./configure 
--enable-cassert --enable-debug && make && make check`, including the 
newly added regression tests.

It does change the signature of nextval_internal(), not sure if that's 
considered backwards compatibility breaking (for extensions?).

-- Jille
Attachment

Re: Support for grabbing multiple consecutive values with nextval()

From
Julien Rouhaud
Date:
Hi,

On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote:
>
> First time PostgreSQL contributor here :)

Welcome!

> I wanted to be able to allocate a bunch of numbers from a sequence at once.
> Multiple people seem to be struggling with this
(https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence,
> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).
>
> I propose to add an extra argument to nextval() that specifies how many
> numbers you want to allocate (default 1).
>
> The attached patch (based on master) passes `./configure --enable-cassert
> --enable-debug && make && make check`, including the newly added regression
> tests.
>
> It does change the signature of nextval_internal(), not sure if that's
> considered backwards compatibility breaking (for extensions?).

Please register this patch to the next commit fest (and last for pg15
inclusion) at https://commitfest.postgresql.org/37/ if not done already.



Re: Support for grabbing multiple consecutive values with nextval()

From
Jille Timmermans
Date:
On 2022-02-27 14:22, Julien Rouhaud wrote:
> Hi,
> 
> On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote:
>> 
>> First time PostgreSQL contributor here :)
> 
> Welcome!
Thanks!

> 
>> I wanted to be able to allocate a bunch of numbers from a sequence at 
>> once.
>> Multiple people seem to be struggling with this 
>> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence,
>> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).
>> 
>> I propose to add an extra argument to nextval() that specifies how 
>> many
>> numbers you want to allocate (default 1).
>> 
>> The attached patch (based on master) passes `./configure 
>> --enable-cassert
>> --enable-debug && make && make check`, including the newly added 
>> regression
>> tests.
>> 
>> It does change the signature of nextval_internal(), not sure if that's
>> considered backwards compatibility breaking (for extensions?).
> 
> Please register this patch to the next commit fest (and last for pg15
> inclusion) at https://commitfest.postgresql.org/37/ if not done 
> already.
Done: https://commitfest.postgresql.org/37/3577/ (I was waiting for 
mailman approval before I got the thread id.)



Re: Support for grabbing multiple consecutive values with nextval()

From
Peter Eisentraut
Date:
On 27.02.22 10:42, Jille Timmermans wrote:
> I wanted to be able to allocate a bunch of numbers from a sequence at 
> once. Multiple people seem to be struggling with this 
> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).
> 
> I propose to add an extra argument to nextval() that specifies how many 
> numbers you want to allocate (default 1).

What is the use of this?

I note that the stackoverflow question wanted to return multiple 
sequence values as a result set, whereas your implementation just skips 
a number of values and returns the last one.  At least we should be 
clear about what we are trying to achieve.




Re: Support for grabbing multiple consecutive values with nextval()

From
Jille Timmermans
Date:
On 2022-02-28 11:13, Peter Eisentraut wrote:
> On 27.02.22 10:42, Jille Timmermans wrote:
>> I wanted to be able to allocate a bunch of numbers from a sequence at 
>> once. Multiple people seem to be struggling with this 
>> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
>> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).
>> 
>> I propose to add an extra argument to nextval() that specifies how 
>> many numbers you want to allocate (default 1).
> 
> What is the use of this?
> 
> I note that the stackoverflow question wanted to return multiple
> sequence values as a result set, whereas your implementation just
> skips a number of values and returns the last one.  At least we should
> be clear about what we are trying to achieve.
Both would work for me actually. I'm using COPY FROM to insert many rows 
and need to know their ids and COPY FROM doesn't support RETURNING.

I implemented this approach because:
- smaller diff
- maybe someone benefits from them being consecutive
- less data to send between client/server

The obvious downside is that people can make mistakes in whether the 
returned number is the first or last number of the series.



Re: Support for grabbing multiple consecutive values with nextval()

From
Peter Eisentraut
Date:
On 02.03.22 20:12, Jille Timmermans wrote:
> On 2022-02-28 11:13, Peter Eisentraut wrote:
>> On 27.02.22 10:42, Jille Timmermans wrote:
>>> I wanted to be able to allocate a bunch of numbers from a sequence at 
>>> once. Multiple people seem to be struggling with this 
>>> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
>>> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). 
>>>
>>>
>>> I propose to add an extra argument to nextval() that specifies how 
>>> many numbers you want to allocate (default 1).
>>
>> What is the use of this?
>>
>> I note that the stackoverflow question wanted to return multiple
>> sequence values as a result set, whereas your implementation just
>> skips a number of values and returns the last one.  At least we should
>> be clear about what we are trying to achieve.
> Both would work for me actually. I'm using COPY FROM to insert many rows 
> and need to know their ids and COPY FROM doesn't support RETURNING.

I don't understand this use case.  COPY FROM copies from a file.  So you 
want to preallocate the sequence numbers before you copy the new data 
in?  How do you know how many rows are in the file?



Re: Support for grabbing multiple consecutive values with nextval()

From
Jille Timmermans
Date:
On 2022-03-03 10:10, Peter Eisentraut wrote:
> On 02.03.22 20:12, Jille Timmermans wrote:
>> On 2022-02-28 11:13, Peter Eisentraut wrote:
>>> On 27.02.22 10:42, Jille Timmermans wrote:
>>>> I wanted to be able to allocate a bunch of numbers from a sequence 
>>>> at once. Multiple people seem to be struggling with this 
>>>> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
>>>> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). 
>>>> I propose to add an extra argument to nextval() that specifies how 
>>>> many numbers you want to allocate (default 1).
>>> 
>>> What is the use of this?
>>> 
>>> I note that the stackoverflow question wanted to return multiple
>>> sequence values as a result set, whereas your implementation just
>>> skips a number of values and returns the last one.  At least we 
>>> should
>>> be clear about what we are trying to achieve.
>> Both would work for me actually. I'm using COPY FROM to insert many 
>> rows and need to know their ids and COPY FROM doesn't support 
>> RETURNING.
> 
> I don't understand this use case.  COPY FROM copies from a file.  So
> you want to preallocate the sequence numbers before you copy the new
> data in?
Yes

> How do you know how many rows are in the file?
I'm using https://pkg.go.dev/github.com/jackc/pgx/v4#Conn.CopyFrom, 
which uses the COPY FROM protocol but doesn't actually have to originate 
from a file.



Re: Support for grabbing multiple consecutive values with nextval()

From
Greg Stark
Date:
On Sun, 27 Feb 2022 at 07:09, Jille Timmermans <jille@quis.cx> wrote:
>
> Hi,
>
> First time PostgreSQL contributor here :)

I wish I had noticed this patch during the CF. It seems like a nice
self-contained feature that could have been easily reviewed and
committed and it's always good to see first-time contributions.
Hopefully it'll get committed early in the next cycle.


-- 
greg



Re: Support for grabbing multiple consecutive values with nextval()

From
Jille Timmermans
Date:
On 2022-04-08 15:33, Greg Stark wrote:
> On Sun, 27 Feb 2022 at 07:09, Jille Timmermans <jille@quis.cx> wrote:
>> First time PostgreSQL contributor here :)
> 
> I wish I had noticed this patch during the CF. It seems like a nice
> self-contained feature that could have been easily reviewed and
> committed and it's always good to see first-time contributions.
> Hopefully it'll get committed early in the next cycle.

If anyone is looking for a small patch to review, here's one for you :)

(https://commitfest.postgresql.org/38/3577/)



Re: Support for grabbing multiple consecutive values with nextval()

From
Ronan Dunklau
Date:
Hello,

Reading the thread, I think the feature has value: it would basically transfer 
control of the sequence cache to the client application.

However, I don't think that returning only the last value is a sensible thing 
to do. The client will need to know the details of the sequence to do anything 
useful about this, especially it's increment, minvalue, maxvalue and cycle 
options. 

As suggested upthread, returning a resultset would probably be better. If the 
client application is concerned about the volume of data exchanged with the 
server, and is willing to deal with handling the knowledge of the sequence 
details themselves, they can always wrap it in an aggregate:

SELECT min(newvals), max(newvals)   FROM nextvals(<num_nextvals>) as newvals

Regards,

--
Ronan Dunklau





Re: Support for grabbing multiple consecutive values with nextval()

From
Michael Paquier
Date:
On Thu, Mar 03, 2022 at 10:21:05AM +0100, Jille Timmermans wrote:
> I'm using https://pkg.go.dev/github.com/jackc/pgx/v4#Conn.CopyFrom, which
> uses the COPY FROM protocol but doesn't actually have to originate from a
> file.

It is Friday here, so I would easily miss something..  It is possible
to use COPY FROM with a list of columns, so assuming that you could
use a default expression with nextval() or just a SERIAL column not
listed in the COPY FROM query to do the job, what do we gain with this
feature?  In which aspect does the preallocation of a range handled
on the client side after being allocated in the backend make things
better?
--
Michael

Attachment

Re: Support for grabbing multiple consecutive values with nextval()

From
Ronan Dunklau
Date:
> It is Friday here, so I would easily miss something..  It is possible
> to use COPY FROM with a list of columns, so assuming that you could
> use a default expression with nextval() or just a SERIAL column not
> listed in the COPY FROM query to do the job, what do we gain with this
> feature?  In which aspect does the preallocation of a range handled
> on the client side after being allocated in the backend make things
> better?

The problem the author wants to solve is the fact they don't have a way of 
returning the ids when using COPY FROM. Pre-allocating them and assigning them 
to the individual records before sending them via COPY FROM would solve that 
for them.

-- 
Ronan Dunklau





Re: Support for grabbing multiple consecutive values with nextval()

From
Tom Lane
Date:
Ronan Dunklau <ronan.dunklau@aiven.io> writes:
> The problem the author wants to solve is the fact they don't have a way of
> returning the ids when using COPY FROM. Pre-allocating them and assigning them
> to the individual records before sending them via COPY FROM would solve that
> for them.

True.

I took a quick look at this patch and am not pleased at all with the
implementation.  That loop in nextval_internal is okay performance-wise
today, since there's a small upper bound on the number of times it will
iterate.  But with this patch, a user can trivially lock up a backend for
up to 2^63 iterations; let's just say that's longer than you want to wait.
There's not even a CHECK_FOR_INTERRUPTS() in the loop :-(.  Even without
mistakes or deliberate DoS attempts, looping means holding the sequence
lock for longer than we really want to.

I think to seriously consider a feature like this, nextval_internal
would have to be rewritten so that it can advance the counter the
correct number of steps without using a loop.  That would be quite a
headache, once you've dealt with integer overflow, cyclic sequences,
and suchlike complications, but it's probably do-able.

I don't think I agree with Ronan's upthread comment that

>> However, I don't think that returning only the last value is a sensible thing
>> to do. The client will need to know the details of the sequence to do anything
>> useful about this, especially it's increment, minvalue, maxvalue and cycle
>> options.

Most applications are probably quite happy to assume that they know the
sequence's static parameters, and those that aren't can easily fetch them
using existing facilities.  So I don't think that returning them in this
function's result is really necessary.

I've got no strong opinion about this bit:

> As suggested upthread, returning a resultset would probably be better.

There are use-cases for that, sure, but there are also use-cases for
returning just the first or just the last value --- I'd think "just the
first" is the more common need of those two.  Aggregating over a resultset
is a remarkably inefficient answer when that's what you want.

In any case, "nextval()" is an increasingly poor name for these different
definitions, so I counsel picking some other name instead of overloading
nextval().  "nextvals()" would be a pretty good choice for the resultset
case, I think.

            regards, tom lane



Re: Support for grabbing multiple consecutive values with nextval()

From
Tom Lane
Date:
I wrote:
> I've got no strong opinion about this bit:
>> As suggested upthread, returning a resultset would probably be better.

Actually, on further thought, I do like the resultset idea, because
it'd remove the need for a complex rewrite of nextval_internal.
Assuming the SRF is written in ValuePerCall style, each iteration
can just call nextval_internal with no modifications needed in that
function.  There'll be a CHECK_FOR_INTERRUPTS somewhere in the
query-level loop, or at least it's not nextval's fault if there's not.
The situation is then no different from generate_series with a large
loop count, or any other query that can generate lots of data.

Of course, this does imply a lot more cycles expended per generated value
--- but most of that is inherent in the larger amount of data being
handed back.

            regards, tom lane



Re: Support for grabbing multiple consecutive values with nextval()

From
Michael Paquier
Date:
On Thu, Jul 28, 2022 at 12:47:10PM -0400, Tom Lane wrote:
> Actually, on further thought, I do like the resultset idea, because
> it'd remove the need for a complex rewrite of nextval_internal.
> Assuming the SRF is written in ValuePerCall style, each iteration
> can just call nextval_internal with no modifications needed in that
> function.  There'll be a CHECK_FOR_INTERRUPTS somewhere in the
> query-level loop, or at least it's not nextval's fault if there's not.
> The situation is then no different from generate_series with a large
> loop count, or any other query that can generate lots of data.
>
> Of course, this does imply a lot more cycles expended per generated value
> --- but most of that is inherent in the larger amount of data being
> handed back.

FWIW, I find the result set approach more intuitive and robust,
particularly in the case of a sequence has non-default values
INCREMENT and min/max values.  This reduces the dependency of what an
application needs to know about the details of a given sequence.  With
only the last value reported, the application would need to compile
things by itself.
--
Michael

Attachment

Re: Support for grabbing multiple consecutive values with nextval()

From
Michael Paquier
Date:
On Sat, Jul 30, 2022 at 04:21:07PM +0900, Michael Paquier wrote:
> FWIW, I find the result set approach more intuitive and robust,
> particularly in the case of a sequence has non-default values
> INCREMENT and min/max values.  This reduces the dependency of what an
> application needs to know about the details of a given sequence.  With
> only the last value reported, the application would need to compile
> things by itself.

It seems like there is a consensus here, but the thread has no
activity for the past two months, so I have marked the patch as
returned with feedback for now.
--
Michael

Attachment