Thread: Support for grabbing multiple consecutive values with nextval()
Hi, First time PostgreSQL contributor here :) I wanted to be able to allocate a bunch of numbers from a sequence at once. Multiple people seem to be struggling with this (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). I propose to add an extra argument to nextval() that specifies how many numbers you want to allocate (default 1). The attached patch (based on master) passes `./configure --enable-cassert --enable-debug && make && make check`, including the newly added regression tests. It does change the signature of nextval_internal(), not sure if that's considered backwards compatibility breaking (for extensions?). -- Jille
Attachment
Hi, On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote: > > First time PostgreSQL contributor here :) Welcome! > I wanted to be able to allocate a bunch of numbers from a sequence at once. > Multiple people seem to be struggling with this (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, > https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). > > I propose to add an extra argument to nextval() that specifies how many > numbers you want to allocate (default 1). > > The attached patch (based on master) passes `./configure --enable-cassert > --enable-debug && make && make check`, including the newly added regression > tests. > > It does change the signature of nextval_internal(), not sure if that's > considered backwards compatibility breaking (for extensions?). Please register this patch to the next commit fest (and last for pg15 inclusion) at https://commitfest.postgresql.org/37/ if not done already.
On 2022-02-27 14:22, Julien Rouhaud wrote: > Hi, > > On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote: >> >> First time PostgreSQL contributor here :) > > Welcome! Thanks! > >> I wanted to be able to allocate a bunch of numbers from a sequence at >> once. >> Multiple people seem to be struggling with this >> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, >> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). >> >> I propose to add an extra argument to nextval() that specifies how >> many >> numbers you want to allocate (default 1). >> >> The attached patch (based on master) passes `./configure >> --enable-cassert >> --enable-debug && make && make check`, including the newly added >> regression >> tests. >> >> It does change the signature of nextval_internal(), not sure if that's >> considered backwards compatibility breaking (for extensions?). > > Please register this patch to the next commit fest (and last for pg15 > inclusion) at https://commitfest.postgresql.org/37/ if not done > already. Done: https://commitfest.postgresql.org/37/3577/ (I was waiting for mailman approval before I got the thread id.)
On 27.02.22 10:42, Jille Timmermans wrote: > I wanted to be able to allocate a bunch of numbers from a sequence at > once. Multiple people seem to be struggling with this > (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, > https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). > > I propose to add an extra argument to nextval() that specifies how many > numbers you want to allocate (default 1). What is the use of this? I note that the stackoverflow question wanted to return multiple sequence values as a result set, whereas your implementation just skips a number of values and returns the last one. At least we should be clear about what we are trying to achieve.
On 2022-02-28 11:13, Peter Eisentraut wrote: > On 27.02.22 10:42, Jille Timmermans wrote: >> I wanted to be able to allocate a bunch of numbers from a sequence at >> once. Multiple people seem to be struggling with this >> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, >> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). >> >> I propose to add an extra argument to nextval() that specifies how >> many numbers you want to allocate (default 1). > > What is the use of this? > > I note that the stackoverflow question wanted to return multiple > sequence values as a result set, whereas your implementation just > skips a number of values and returns the last one. At least we should > be clear about what we are trying to achieve. Both would work for me actually. I'm using COPY FROM to insert many rows and need to know their ids and COPY FROM doesn't support RETURNING. I implemented this approach because: - smaller diff - maybe someone benefits from them being consecutive - less data to send between client/server The obvious downside is that people can make mistakes in whether the returned number is the first or last number of the series.
On 02.03.22 20:12, Jille Timmermans wrote: > On 2022-02-28 11:13, Peter Eisentraut wrote: >> On 27.02.22 10:42, Jille Timmermans wrote: >>> I wanted to be able to allocate a bunch of numbers from a sequence at >>> once. Multiple people seem to be struggling with this >>> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, >>> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). >>> >>> >>> I propose to add an extra argument to nextval() that specifies how >>> many numbers you want to allocate (default 1). >> >> What is the use of this? >> >> I note that the stackoverflow question wanted to return multiple >> sequence values as a result set, whereas your implementation just >> skips a number of values and returns the last one. At least we should >> be clear about what we are trying to achieve. > Both would work for me actually. I'm using COPY FROM to insert many rows > and need to know their ids and COPY FROM doesn't support RETURNING. I don't understand this use case. COPY FROM copies from a file. So you want to preallocate the sequence numbers before you copy the new data in? How do you know how many rows are in the file?
On 2022-03-03 10:10, Peter Eisentraut wrote: > On 02.03.22 20:12, Jille Timmermans wrote: >> On 2022-02-28 11:13, Peter Eisentraut wrote: >>> On 27.02.22 10:42, Jille Timmermans wrote: >>>> I wanted to be able to allocate a bunch of numbers from a sequence >>>> at once. Multiple people seem to be struggling with this >>>> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, >>>> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). >>>> I propose to add an extra argument to nextval() that specifies how >>>> many numbers you want to allocate (default 1). >>> >>> What is the use of this? >>> >>> I note that the stackoverflow question wanted to return multiple >>> sequence values as a result set, whereas your implementation just >>> skips a number of values and returns the last one. At least we >>> should >>> be clear about what we are trying to achieve. >> Both would work for me actually. I'm using COPY FROM to insert many >> rows and need to know their ids and COPY FROM doesn't support >> RETURNING. > > I don't understand this use case. COPY FROM copies from a file. So > you want to preallocate the sequence numbers before you copy the new > data in? Yes > How do you know how many rows are in the file? I'm using https://pkg.go.dev/github.com/jackc/pgx/v4#Conn.CopyFrom, which uses the COPY FROM protocol but doesn't actually have to originate from a file.
On Sun, 27 Feb 2022 at 07:09, Jille Timmermans <jille@quis.cx> wrote: > > Hi, > > First time PostgreSQL contributor here :) I wish I had noticed this patch during the CF. It seems like a nice self-contained feature that could have been easily reviewed and committed and it's always good to see first-time contributions. Hopefully it'll get committed early in the next cycle. -- greg
On 2022-04-08 15:33, Greg Stark wrote: > On Sun, 27 Feb 2022 at 07:09, Jille Timmermans <jille@quis.cx> wrote: >> First time PostgreSQL contributor here :) > > I wish I had noticed this patch during the CF. It seems like a nice > self-contained feature that could have been easily reviewed and > committed and it's always good to see first-time contributions. > Hopefully it'll get committed early in the next cycle. If anyone is looking for a small patch to review, here's one for you :) (https://commitfest.postgresql.org/38/3577/)
Hello, Reading the thread, I think the feature has value: it would basically transfer control of the sequence cache to the client application. However, I don't think that returning only the last value is a sensible thing to do. The client will need to know the details of the sequence to do anything useful about this, especially it's increment, minvalue, maxvalue and cycle options. As suggested upthread, returning a resultset would probably be better. If the client application is concerned about the volume of data exchanged with the server, and is willing to deal with handling the knowledge of the sequence details themselves, they can always wrap it in an aggregate: SELECT min(newvals), max(newvals) FROM nextvals(<num_nextvals>) as newvals Regards, -- Ronan Dunklau
On Thu, Mar 03, 2022 at 10:21:05AM +0100, Jille Timmermans wrote: > I'm using https://pkg.go.dev/github.com/jackc/pgx/v4#Conn.CopyFrom, which > uses the COPY FROM protocol but doesn't actually have to originate from a > file. It is Friday here, so I would easily miss something.. It is possible to use COPY FROM with a list of columns, so assuming that you could use a default expression with nextval() or just a SERIAL column not listed in the COPY FROM query to do the job, what do we gain with this feature? In which aspect does the preallocation of a range handled on the client side after being allocated in the backend make things better? -- Michael
Attachment
> It is Friday here, so I would easily miss something.. It is possible > to use COPY FROM with a list of columns, so assuming that you could > use a default expression with nextval() or just a SERIAL column not > listed in the COPY FROM query to do the job, what do we gain with this > feature? In which aspect does the preallocation of a range handled > on the client side after being allocated in the backend make things > better? The problem the author wants to solve is the fact they don't have a way of returning the ids when using COPY FROM. Pre-allocating them and assigning them to the individual records before sending them via COPY FROM would solve that for them. -- Ronan Dunklau
Ronan Dunklau <ronan.dunklau@aiven.io> writes: > The problem the author wants to solve is the fact they don't have a way of > returning the ids when using COPY FROM. Pre-allocating them and assigning them > to the individual records before sending them via COPY FROM would solve that > for them. True. I took a quick look at this patch and am not pleased at all with the implementation. That loop in nextval_internal is okay performance-wise today, since there's a small upper bound on the number of times it will iterate. But with this patch, a user can trivially lock up a backend for up to 2^63 iterations; let's just say that's longer than you want to wait. There's not even a CHECK_FOR_INTERRUPTS() in the loop :-(. Even without mistakes or deliberate DoS attempts, looping means holding the sequence lock for longer than we really want to. I think to seriously consider a feature like this, nextval_internal would have to be rewritten so that it can advance the counter the correct number of steps without using a loop. That would be quite a headache, once you've dealt with integer overflow, cyclic sequences, and suchlike complications, but it's probably do-able. I don't think I agree with Ronan's upthread comment that >> However, I don't think that returning only the last value is a sensible thing >> to do. The client will need to know the details of the sequence to do anything >> useful about this, especially it's increment, minvalue, maxvalue and cycle >> options. Most applications are probably quite happy to assume that they know the sequence's static parameters, and those that aren't can easily fetch them using existing facilities. So I don't think that returning them in this function's result is really necessary. I've got no strong opinion about this bit: > As suggested upthread, returning a resultset would probably be better. There are use-cases for that, sure, but there are also use-cases for returning just the first or just the last value --- I'd think "just the first" is the more common need of those two. Aggregating over a resultset is a remarkably inefficient answer when that's what you want. In any case, "nextval()" is an increasingly poor name for these different definitions, so I counsel picking some other name instead of overloading nextval(). "nextvals()" would be a pretty good choice for the resultset case, I think. regards, tom lane
I wrote: > I've got no strong opinion about this bit: >> As suggested upthread, returning a resultset would probably be better. Actually, on further thought, I do like the resultset idea, because it'd remove the need for a complex rewrite of nextval_internal. Assuming the SRF is written in ValuePerCall style, each iteration can just call nextval_internal with no modifications needed in that function. There'll be a CHECK_FOR_INTERRUPTS somewhere in the query-level loop, or at least it's not nextval's fault if there's not. The situation is then no different from generate_series with a large loop count, or any other query that can generate lots of data. Of course, this does imply a lot more cycles expended per generated value --- but most of that is inherent in the larger amount of data being handed back. regards, tom lane
On Thu, Jul 28, 2022 at 12:47:10PM -0400, Tom Lane wrote: > Actually, on further thought, I do like the resultset idea, because > it'd remove the need for a complex rewrite of nextval_internal. > Assuming the SRF is written in ValuePerCall style, each iteration > can just call nextval_internal with no modifications needed in that > function. There'll be a CHECK_FOR_INTERRUPTS somewhere in the > query-level loop, or at least it's not nextval's fault if there's not. > The situation is then no different from generate_series with a large > loop count, or any other query that can generate lots of data. > > Of course, this does imply a lot more cycles expended per generated value > --- but most of that is inherent in the larger amount of data being > handed back. FWIW, I find the result set approach more intuitive and robust, particularly in the case of a sequence has non-default values INCREMENT and min/max values. This reduces the dependency of what an application needs to know about the details of a given sequence. With only the last value reported, the application would need to compile things by itself. -- Michael
Attachment
On Sat, Jul 30, 2022 at 04:21:07PM +0900, Michael Paquier wrote: > FWIW, I find the result set approach more intuitive and robust, > particularly in the case of a sequence has non-default values > INCREMENT and min/max values. This reduces the dependency of what an > application needs to know about the details of a given sequence. With > only the last value reported, the application would need to compile > things by itself. It seems like there is a consensus here, but the thread has no activity for the past two months, so I have marked the patch as returned with feedback for now. -- Michael