Thread: Sequence Access Method WIP
SeqAm allows you to specify a plugin that alters the behaviour for sequence allocation and resetting, aimed specifically at clustering systems. New command options on end of statement allow syntax CREATE SEQUENCE foo_seq USING globalseq WITH (custom_option=setting); which allows you to specify an alternate Sequence Access Method for a sequence object, or you can specify a default_sequence_mgr as a USERSET parameter SET default_sequence_mgr = globalseq; Existing sequences can be modified to use a different SeqAM, by calling ALTER SEQUENCE foo_seq USING globalseq; SeqAM is similar to IndexAM: There is a separate catalog table for SeqAMs, but no specific API to create them. Initdb creates one sequence am, called "local", which is the initial default. If default_sequence_mgr is set to '' or 'local' then we use the local seqam. The local seqam's functions are included in core. Status is still "Work In Progress". Having said that most of the grunt work is done and if we agree the shape of this is right, its relatively easy going code. postgres=# select oid, * from pg_seqam; -[ RECORD 1 ]+-------------------- oid | 3839 seqamname | local seqamalloc | seqam_local_alloc seqamsetval | seqam_local_setval seqamoptions | seqam_local_options postgres=# select relname, relam from pg_class where relname = 'foo2'; relname | relam ---------+------- foo2 | 3839 postgres=# create sequence foo5 using global; ERROR: access method "global" does not exist Footprint backend/access/Makefile | 2 backend/access/common/reloptions.c | 26 +++ backend/access/sequence/Makefile | 17 ++ backend/access/sequence/seqam.c | 278 +++++++++++++++++++++++++++++++++++++ backend/catalog/Makefile | 2 backend/commands/sequence.c | 132 +++++++++++++++-- backend/commands/tablecmds.c | 3 backend/nodes/copyfuncs.c | 4 backend/nodes/equalfuncs.c | 4 backend/parser/gram.y | 84 ++++++++++- backend/parser/parse_utilcmd.c | 4 backend/utils/cache/catcache.c | 6 backend/utils/cache/syscache.c | 23 +++ backend/utils/misc/guc.c | 12 + include/access/reloptions.h | 6 include/access/seqam.h | 27 +++ include/catalog/indexing.h | 5 include/catalog/pg_proc.h | 6 include/catalog/pg_seqam.h | 70 +++++++++ include/nodes/parsenodes.h | 8 - include/utils/guc.h | 1 include/utils/rel.h | 22 +- include/utils/syscache.h | 2 23 files changed, 706 insertions(+), 38 deletions(-) Tasks to complete * contrib module for example/testing * Docs -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote: > SeqAm allows you to specify a plugin that alters the behaviour for > sequence allocation and resetting, aimed specifically at clustering > systems. > > New command options on end of statement allow syntax > CREATE SEQUENCE foo_seq > USING globalseq Production version of this, ready for commit to PG 9.4 Includes test extension which allows sequences without gaps - "gapless". Test using seq_test.psql after creating extension. No dependencies on other patches. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 14.11.2013 22:10, Simon Riggs wrote: > On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote: > >> SeqAm allows you to specify a plugin that alters the behaviour for >> sequence allocation and resetting, aimed specifically at clustering >> systems. >> >> New command options on end of statement allow syntax >> CREATE SEQUENCE foo_seq >> USING globalseq > > Production version of this, ready for commit to PG 9.4 > > Includes test extension which allows sequences without gaps - "gapless". > > Test using seq_test.psql after creating extension. > > No dependencies on other patches. It's pretty hard to review the this without seeing the "other" implementation you're envisioning to use this API. But I'll try: I wonder if the level of abstraction is right. The patch assumes that the on-disk storage of all sequences is the same, so the access method can't change that. But then it leaves the details of actually updating the on-disk block, WAL-logging and all, as the responsibility of the access method. Surely that's going to look identical in all the seqams, if they all use the same on-disk format. That also means that the sequence access methods can't be implemented as plugins, as plugins can't do WAL-logging. The comment in seqam.c says that there's a private column reserved for access method-specific data, called am_data, but that seems to be the only mention of am_data in the patch. - Heikki
On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote: > It's pretty hard to review the this without seeing the "other" > implementation you're envisioning to use this API. But I'll try: We've written a distributed sequence implementation against it, so it works at least for that use case. While certainly not release worthy, it still might be interesting to look at. http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/bdr/bdr_seq.c;h=c9480c8021882f888e35080f0e7a7494af37ae27;hb=bdr bdr_sequencer_fill_sequence pre-allocates ranges of values, bdr_sequence_alloc returns them upon nextval(). > I wonder if the level of abstraction is right. The patch assumes that the > on-disk storage of all sequences is the same, so the access method can't > change that. But then it leaves the details of actually updating the on-disk > block, WAL-logging and all, as the responsibility of the access method. > Surely that's going to look identical in all the seqams, if they all use the > same on-disk format. That also means that the sequence access methods can't > be implemented as plugins, as plugins can't do WAL-logging. Well, it exposes log_sequence_tuple() - together with the added "am private" column of pg_squence that allows to do quite a bit of different things. I think unless we really implement pluggable RMGRs - which I don't really see coming - we cannot be radically different. > The comment in seqam.c says that there's a private column reserved for > access method-specific data, called am_data, but that seems to be the only > mention of am_data in the patch. It's amdata, not am_data :/. Directly at the end of pg_sequence. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 November 2013 15:08, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I wonder if the level of abstraction is right. That is the big question and not something to shy away from. What is presented is not the first thought, by a long way. Andres' contribution to the patch is mainly around this point, so the seq am is designed with the needs of the main use case in mind. I'm open to suggested changes but I would say that practical usage beats changes suggested for purity. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/14/13, 3:10 PM, Simon Riggs wrote: > On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote: > >> SeqAm allows you to specify a plugin that alters the behaviour for >> sequence allocation and resetting, aimed specifically at clustering >> systems. >> >> New command options on end of statement allow syntax >> CREATE SEQUENCE foo_seq >> USING globalseq > > Production version of this, ready for commit to PG 9.4 This patch doesn't apply anymore. Also, you set this to "returned with feedback" in the CF app. Please verify whether that was intentional.
On 15 November 2013 15:48, Peter Eisentraut <peter_e@gmx.net> wrote: > On 11/14/13, 3:10 PM, Simon Riggs wrote: >> On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote: >> >>> SeqAm allows you to specify a plugin that alters the behaviour for >>> sequence allocation and resetting, aimed specifically at clustering >>> systems. >>> >>> New command options on end of statement allow syntax >>> CREATE SEQUENCE foo_seq >>> USING globalseq >> >> Production version of this, ready for commit to PG 9.4 > > This patch doesn't apply anymore. Yes, a patch conflict was just committed, will repost. > Also, you set this to "returned with feedback" in the CF app. Please > verify whether that was intentional. Not sure that was me, if so, corrected. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15.11.2013 21:00, Simon Riggs wrote: > On 15 November 2013 15:48, Peter Eisentraut <peter_e@gmx.net> wrote: >> Also, you set this to "returned with feedback" in the CF app. Please >> verify whether that was intentional. > > Not sure that was me, if so, corrected. It was me, sorry. I figured this needs such a large restructuring that it's not going to be committable in this commitfest. But I'm happy to keep the discussion going if you think otherwise. - Heikki
On 15.11.2013 20:21, Andres Freund wrote: > On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote: >> It's pretty hard to review the this without seeing the "other" >> implementation you're envisioning to use this API. But I'll try: > > We've written a distributed sequence implementation against it, so it > works at least for that use case. > > While certainly not release worthy, it still might be interesting to > look at. > https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdr&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0A&m=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0A&s=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d > > bdr_sequencer_fill_sequence pre-allocates ranges of values, > bdr_sequence_alloc returns them upon nextval(). Thanks. That pokes into the low-level details of sequence tuples, just like I was afraid it would. >> I wonder if the level of abstraction is right. The patch assumes that the >> on-disk storage of all sequences is the same, so the access method can't >> change that. But then it leaves the details of actually updating the on-disk >> block, WAL-logging and all, as the responsibility of the access method. >> Surely that's going to look identical in all the seqams, if they all use the >> same on-disk format. That also means that the sequence access methods can't >> be implemented as plugins, as plugins can't do WAL-logging. > > Well, it exposes log_sequence_tuple() - together with the added "am > private" column of pg_squence that allows to do quite a bit of different > things. I think unless we really implement pluggable RMGRs - which I > don't really see coming - we cannot be radically different. The proposed abstraction leaks like a sieve. The plugin should not need to touch anything but the private amdata field. log_sequence_tuple() is way too low-level. Just as a thought-experiment, imagine that we decided to re-implement sequences so that all the sequence values are stored in one big table, or flat-file in the data directory, instead of the current implementation where we have a one-block relation file for each sequence. If the sequence access method API is any good, it should stay unchanged. That's clearly not the case with the proposed API. >> The comment in seqam.c says that there's a private column reserved for >> access method-specific data, called am_data, but that seems to be the only >> mention of am_data in the patch. > > It's amdata, not am_data :/. Directly at the end of pg_sequence. Ah, got it. Stepping back a bit and looking at this problem from a higher level, why do you need to hack this stuff into the sequences? Couldn't you just define a new set of functions, say bdr_currval() and bdr_nextval(), to operate on these new kinds of sequences? You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the implementation completely separate. If you need it for compatibility with applications, you could create facade currval/nextval() functions that call the built-in version or the bdr version depending on the argument. - Heikki
On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote: > On 15.11.2013 20:21, Andres Freund wrote: > >Well, it exposes log_sequence_tuple() - together with the added "am > >private" column of pg_squence that allows to do quite a bit of different > >things. I think unless we really implement pluggable RMGRs - which I > >don't really see coming - we cannot be radically different. > > The proposed abstraction leaks like a sieve. The plugin should not need to > touch anything but the private amdata field. log_sequence_tuple() is way too > low-level. Why? It's *less* low level than a good part of other crash recovery code we have. I have quite some doubts about the layering, but it's imo pretty sensible to centralize the wal logging code that plugins couldn't write. If you look at what e.g the _alloc callback currently gets passed. a) Relation: Important for metadata like TupleDesc, name etc. b) SeqTable entry: Important to setup state for cur/lastval c) Buffer of the tuple: LSN etc. d) HeapTuple itself: a place to store the tuples changed state And if you then look at what gets modified in that callback: Form_pg_sequence->amdata ->is_called ->last_value ->log_cnt SeqTable ->last SeqTable ->cached SeqTable ->last_valid You need is_called, last_valid because of our current representation of sequences state (which we could change, to remove that need). The rest contains values that need to be set depending on how you want your sequence to behave: * Amdata is obvious. * You need log_cnt to influence/remember how big the chunks are you WAL log at once are. Which e.g. you need to controlif want a sequence that doesn't leak values in crashes * cached is needed to control the per-backend caching. > Just as a thought-experiment, imagine that we decided to re-implement > sequences so that all the sequence values are stored in one big table, or > flat-file in the data directory, instead of the current implementation where > we have a one-block relation file for each sequence. If the sequence access > method API is any good, it should stay unchanged. That's clearly not the > case with the proposed API. I don't think we can entirely abstract away the reality that now they are based on relations and might not be at some later point. Otherwise we'd have to invent an API that copied all the data that's stored in struct RelationData. Like name, owner, ... Which non SQL accessible API we provide has a chance of providing that level of consistency in the face of fundamental refactoring? I'd say none. > Stepping back a bit and looking at this problem from a higher level, why do > you need to hack this stuff into the sequences? Couldn't you just define a > new set of functions, say bdr_currval() and bdr_nextval(), to operate on > these new kinds of sequences? Basically two things: a) User interface. For one everyone would need to reimplement the entire sequence DDL from start. For another it means it's hard to write (client) code that doesn't depend on a specific implementation. b) It's not actually easy to get similar semantics in "userspace". How would you emulate the crash-safe but non-transactional semantics of sequences without copying most of sequence.c? Without writing XLogInsert()s which we cannot do from a out-of-tree code? > You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the > implementation completely separate. If you need it for compatibility with > applications, you could create facade currval/nextval() functions that call > the built-in version or the bdr version depending on the argument. That doesn't get you very far: a) the default functions created by sequences are pg_catalog prefixed. So you'd need to hack up the catalog to get your own functions to be used if you want the application to work transparently. In which you need to remember the former function because you now cannot call it normally anymore. Yuck. b) One would need nearly all of sequence.c again. You need the state across calls, the locking, the WAL logging, DDL support. Pretty much the only thing *not* used would be nextval_internal() and do_setval(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 18.11.2013 11:48, Andres Freund wrote: > On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote: >> On 15.11.2013 20:21, Andres Freund wrote: >>> Well, it exposes log_sequence_tuple() - together with the added "am >>> private" column of pg_squence that allows to do quite a bit of different >>> things. I think unless we really implement pluggable RMGRs - which I >>> don't really see coming - we cannot be radically different. >> >> The proposed abstraction leaks like a sieve. The plugin should not need to >> touch anything but the private amdata field. log_sequence_tuple() is way too >> low-level. > > Why? It's *less* low level than a good part of other crash recovery code > we have. I have quite some doubts about the layering, but it's imo > pretty sensible to centralize the wal logging code that plugins couldn't > write. It doesn't go far enough, it's still too *low*-level. The sequence AM implementation shouldn't need to have direct access to the buffer page at all. > If you look at what e.g the _alloc callback currently gets passed. > a) Relation: Important for metadata like TupleDesc, name etc. > b) SeqTable entry: Important to setup state for cur/lastval > c) Buffer of the tuple: LSN etc. > d) HeapTuple itself: a place to store the tuples changed state > > And if you then look at what gets modified in that callback: > Form_pg_sequence->amdata > ->is_called > ->last_value > ->log_cnt > SeqTable ->last > SeqTable ->cached > SeqTable ->last_valid > > You need is_called, last_valid because of our current representation of > sequences state (which we could change, to remove that need). The rest > contains values that need to be set depending on how you want your > sequence to behave: > * Amdata is obvious. > * You need log_cnt to influence/remember how big the chunks are you WAL log at > once are. Which e.g. you need to control if want a sequence that > doesn't leak values in crashes > * cached is needed to control the per-backend caching. I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call. I'm thinking that the alloc function should look something like this: seqam_alloc(Relation seqrel, int nrequested, Datum am_private) And it should return: int64 value - the first value allocated. int nvalues - the number of values allocated. am_private - updated private data. The backend code handles the caching and logging of values. When it has exhausted all the cached values (or doesn't have any yet), it calls the AM's alloc function to get a new batch. The AM returns the new batch, and updates its private state as necessary. Then the backend code updates the relation file with the new values and the AM's private data. WAL-logging and checkpointing is the backend's responsibility. >> Just as a thought-experiment, imagine that we decided to re-implement >> sequences so that all the sequence values are stored in one big table, or >> flat-file in the data directory, instead of the current implementation where >> we have a one-block relation file for each sequence. If the sequence access >> method API is any good, it should stay unchanged. That's clearly not the >> case with the proposed API. > > I don't think we can entirely abstract away the reality that now they are > based on relations and might not be at some later point. Otherwise we'd > have to invent an API that copied all the data that's stored in struct > RelationData. Like name, owner, ... > Which non SQL accessible API we provide has a chance of providing that > level of consistency in the face of fundamental refactoring? I'd say > none. I'm not thinking that we'd change sequences to not be relations. I'm thinking that we might not want to store the state as a one-page file anymore. In fact that was just discussed in the other thread titled "init_sequence spill to hash table". > b) It's not actually easy to get similar semantics in "userspace". How > would you emulate the crash-safe but non-transactional semantics of > sequences without copying most of sequence.c? Without writing > XLogInsert()s which we cannot do from a out-of-tree code? heap_inplace_update() - Heikki
On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote: > On 18.11.2013 11:48, Andres Freund wrote: > I don't think the sequence AM should be in control of 'cached'. The caching > is done outside the AM. And log_cnt probably should be passed to the _alloc > function directly as an argument, ie. the server code asks the AM to > allocate N new values in one call. Sounds sane. > I'm thinking that the alloc function should look something like this: > > seqam_alloc(Relation seqrel, int nrequested, Datum am_private) I don't think we can avoid giving access to the other columns of pg_sequence, stuff like increment, limits et all need to be available for reading, so that'd also need to get passed in. And we need to signal that am_private can be NULL, otherwise we'd end up with ugly ways to signal that. So I'd say to pass in the entire tuple, and return a copy? Alternatively we can return am_private as a 'struct varlena *', so we can properly signal empty values. We also need a way to set am_private from outside seqam_alloc/setval/... Many of the fancier sequences that people talked about will need preallocation somewhere in the background. As proposed that's easy enough to write using log_sequence_tuple(), this way we'd need something that calls a callback with the appropriate buffer lock held. So maybe a seqam_update(Relation seqrel, ...) callback that get's called when somebody executes pg_sequence_update(oid)? It'd probably a good idea to provide a generic function for checking whether a new value falls in the boundaries of the sequence's min, max + error handling. > I'm not thinking that we'd change sequences to not be relations. I'm > thinking that we might not want to store the state as a one-page file > anymore. In fact that was just discussed in the other thread titled > "init_sequence spill to hash table". Yes, I read and even commented in that thread... But nothing in the current proposed API would prevent you from going in that direction, you'd just get passed in a different tuple/buffer. > >b) It's not actually easy to get similar semantics in "userspace". How > >would you emulate the crash-safe but non-transactional semantics of > >sequences without copying most of sequence.c? Without writing > >XLogInsert()s which we cannot do from a out-of-tree code? > > heap_inplace_update() That gets the crashsafe part partially (doesn't allow making the tuple wider than before), but not the caching/stateful part et al. The point is that you need most of sequence.c again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > It doesn't go far enough, it's still too *low*-level. The sequence AM > implementation shouldn't need to have direct access to the buffer page at > all. > I don't think the sequence AM should be in control of 'cached'. The caching > is done outside the AM. And log_cnt probably should be passed to the _alloc > function directly as an argument, ie. the server code asks the AM to > allocate N new values in one call. I can't see what the rationale of your arguments is. All index Ams write WAL and control buffer locking etc.. Do you have a new use case that shows why changes should happen? We can't just redesign things based upon arbitrary decisions about what things should or should not be possible via the API. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 18.11.2013 13:48, Simon Riggs wrote: > On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >> It doesn't go far enough, it's still too *low*-level. The sequence AM >> implementation shouldn't need to have direct access to the buffer page at >> all. > >> I don't think the sequence AM should be in control of 'cached'. The caching >> is done outside the AM. And log_cnt probably should be passed to the _alloc >> function directly as an argument, ie. the server code asks the AM to >> allocate N new values in one call. > > I can't see what the rationale of your arguments is. All index Ams > write WAL and control buffer locking etc.. Index AM's are completely responsible for the on-disk structure, while with the proposed API, both the AM and the backend are intimately aware of the on-disk representation. Such a shared responsibility is not a good thing in an API. I would also be fine with going 100% to the index AM direction, and remove all knowledge of the on-disk layout from the backend code and move it into the AMs. Then you could actually implement the discussed "store all sequences in a single file" change by writing a new sequence AM for it. - Heikki
On 14.11.2013 22:10, Simon Riggs wrote: > Includes test extension which allows sequences without gaps - "gapless". I realize this is just for demonstration purposes, but it's worth noting that it doesn't actually guarantee that when you use the sequence to populate a column in the table, the column would not have gaps. Sequences are not transactional, so rollbacks will still produce gaps. The documentation is misleading on that point. Without a strong guarantee, it's a pretty useless extension. - Heikki
On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 14.11.2013 22:10, Simon Riggs wrote: >> >> Includes test extension which allows sequences without gaps - "gapless". > > > I realize this is just for demonstration purposes, but it's worth noting > that it doesn't actually guarantee that when you use the sequence to > populate a column in the table, the column would not have gaps. Sequences > are not transactional, so rollbacks will still produce gaps. The > documentation is misleading on that point. Without a strong guarantee, it's > a pretty useless extension. True. If I fix that problem, I should change the name to "lockup" sequences, since only one transaction at a time could use the nextval. Should I change the documentation, or just bin the idea? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 18.11.2013 13:48, Simon Riggs wrote: >> >> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> >> wrote: >> >>> It doesn't go far enough, it's still too *low*-level. The sequence AM >>> implementation shouldn't need to have direct access to the buffer page at >>> all. >> >> >>> I don't think the sequence AM should be in control of 'cached'. The >>> caching >>> is done outside the AM. And log_cnt probably should be passed to the >>> _alloc >>> function directly as an argument, ie. the server code asks the AM to >>> allocate N new values in one call. >> >> >> I can't see what the rationale of your arguments is. All index Ams >> write WAL and control buffer locking etc.. > > > Index AM's are completely responsible for the on-disk structure, while with > the proposed API, both the AM and the backend are intimately aware of the > on-disk representation. Such a shared responsibility is not a good thing in > an API. I would also be fine with going 100% to the index AM direction, and > remove all knowledge of the on-disk layout from the backend code and move it > into the AMs. Then you could actually implement the discussed "store all > sequences in a single file" change by writing a new sequence AM for it. I think the way to resolve this is to do both of these things, i.e. a two level API 1. Implement SeqAM API at the most generic level. Add a nextval() call as well as alloc() 2. Also implement the proposed changes to alloc() So the SeqAM would implement either nextval() or alloc() but not both global sequences as envisaged for BDR would use a special alloc() call. I don't think that is too much work, but I want to do this just once... Thoughts on exact next steps for implementation please? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24.11.2013 19:23, Simon Riggs wrote: > On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> On 18.11.2013 13:48, Simon Riggs wrote: >>> >>> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> >>> wrote: >>> >>>> It doesn't go far enough, it's still too *low*-level. The sequence AM >>>> implementation shouldn't need to have direct access to the buffer page at >>>> all. >>> >>>> I don't think the sequence AM should be in control of 'cached'. The >>>> caching >>>> is done outside the AM. And log_cnt probably should be passed to the >>>> _alloc >>>> function directly as an argument, ie. the server code asks the AM to >>>> allocate N new values in one call. >>> >>> I can't see what the rationale of your arguments is. All index Ams >>> write WAL and control buffer locking etc.. >> >> Index AM's are completely responsible for the on-disk structure, while with >> the proposed API, both the AM and the backend are intimately aware of the >> on-disk representation. Such a shared responsibility is not a good thing in >> an API. I would also be fine with going 100% to the index AM direction, and >> remove all knowledge of the on-disk layout from the backend code and move it >> into the AMs. Then you could actually implement the discussed "store all >> sequences in a single file" change by writing a new sequence AM for it. > > I think the way to resolve this is to do both of these things, i.e. a > two level API > > 1. Implement SeqAM API at the most generic level. Add a nextval() call > as well as alloc() > > 2. Also implement the proposed changes to alloc() The proposed changes to alloc() would still suffer from all the problems that I complained about. Adding a new API alongside doesn't help with that. - Heikki
On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > The proposed changes to alloc() would still suffer from all the problems > that I complained about. Adding a new API alongside doesn't help with that. You made two proposals. I suggested implementing both. What would you have me do? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/24/13 19:15, Simon Riggs wrote: > On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> On 14.11.2013 22:10, Simon Riggs wrote: >>> >>> Includes test extension which allows sequences without gaps - "gapless". >> >> I realize this is just for demonstration purposes, but it's worth noting >> that it doesn't actually guarantee that when you use the sequence to >> populate a column in the table, the column would not have gaps. Sequences >> are not transactional, so rollbacks will still produce gaps. The >> documentation is misleading on that point. Without a strong guarantee, it's >> a pretty useless extension. > > True. > > If I fix that problem, I should change the name to "lockup" sequences, > since only one transaction at a time could use the nextval. > > Should I change the documentation, or just bin the idea? Just bin it. It would be useful if it could guarantee gaplessness, but I don't see how to do that. - Heikki
On 11/25/13 12:00, Simon Riggs wrote: > On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >> The proposed changes to alloc() would still suffer from all the problems >> that I complained about. Adding a new API alongside doesn't help with that. > > You made two proposals. I suggested implementing both. > > What would you have me do? Dunno. I do know that the proposed changes to alloc() are not a good API. You could implement the other API, I think that has a chance of being a cleaner API, but looking at the BDR extension that would use that facility, I'm not sure how useful that would be for you. (and I'd really like to see an actual implementation of whatever API we come up with, before we commit to maintaining it). - Heikki
On 18/11/13 11:50, Heikki Linnakangas wrote: > > I don't think the sequence AM should be in control of 'cached'. The > caching is done outside the AM. And log_cnt probably should be passed to > the _alloc function directly as an argument, ie. the server code asks > the AM to allocate N new values in one call. > > I'm thinking that the alloc function should look something like this: > > seqam_alloc(Relation seqrel, int nrequested, Datum am_private) > I was looking at this a bit today and what I see is that it's not that simple. Minimum input the seqam_alloc needs is: - Relation seqrel - int64 minv, maxv, incby, bool is_cycled - these are basically options giving info about how the new numbers are allocated (I guess some implementations are not going to support all of those) - bool is_called - the current built-in sequence generator behaves differently based on it and I am not sure we can get over it (it could perhaps be done in back-end independently of AM?) - int64 nrequested - number of requested values - Datum am_private - current private data In this light I agree with what Andres wrote - let's just send the whole Form_pg_sequence object. Also makes me think that the seqam options interface should also be passed the minv/maxv/incby/is_cycled etc options for validation, not just the amoptions. > And it should return: > > int64 value - the first value allocated. > int nvalues - the number of values allocated. > am_private - updated private data. > There is also more needed than this, you need: - int64 value - first value allocated (value to be returned) - int64 nvalues - number of values allocated - int64 last - last cached value (used for cached/last_value) - int64 next - last logged value (used for wal logging) - am_private - updated private data, must be possible to return as null I personally don't like that we need all the "nvalues", "next" and "last" as it makes the seqam a little bit too aware of the sequence logging internals in my opinion but I haven't found a way around it - it's impossible for backend to know how the AM will act around incby/maxv/minv/cycling so it can't really calculate these values by itself, unless ofcourse we fix the behavior and require seqams to behave predictably, but that somewhat breaks the whole idea of leaving the allocation to the seqam. Obviously it would also work to return list of allocated values and then backend could calculate the "value", "nvalues", "last", "next" from that list by itself, but I am worried about performance of that approach. > > The backend code handles the caching and logging of values. When it has > exhausted all the cached values (or doesn't have any yet), it calls the > AM's alloc function to get a new batch. The AM returns the new batch, > and updates its private state as necessary. Then the backend code > updates the relation file with the new values and the AM's private data. > WAL-logging and checkpointing is the backend's responsibility. > Agreed here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote: > - int64 minv, maxv, incby, bool is_cycled - these are basically options > giving info about how the new numbers are allocated (I guess some > implementations are not going to support all of those) > - bool is_called - the current built-in sequence generator behaves > differently based on it and I am not sure we can get over it (it could > perhaps be done in back-end independently of AM?) I think we might be able to get rid of is_called entirely. Or at least get rid of it from the view of the AMs. > Also makes me think that the seqam options interface should also be passed > the minv/maxv/incby/is_cycled etc options for validation, not just the > amoptions. Sup. BTW: Is 'is_cycled' a horrible name, or is that just me? Horribly easy to confuse with the fact that a sequence has already wrapped around... > >And it should return: > > > >int64 value - the first value allocated. > >int nvalues - the number of values allocated. > >am_private - updated private data. > > > > There is also more needed than this, you need: > - int64 value - first value allocated (value to be returned) > - int64 nvalues - number of values allocated > - int64 last - last cached value (used for cached/last_value) > - int64 next - last logged value (used for wal logging) > - am_private - updated private data, must be possible to return as null > I personally don't like that we need all the "nvalues", "next" and "last" as > it makes the seqam a little bit too aware of the sequence logging internals > in my opinion but I haven't found a way around it - it's impossible for > backend to know how the AM will act around incby/maxv/minv/cycling so it > can't really calculate these values by itself, unless ofcourse we fix the > behavior and require seqams to behave predictably, but that somewhat breaks > the whole idea of leaving the allocation to the seqam. Obviously it would > also work to return list of allocated values and then backend could > calculate the "value", "nvalues", "last", "next" from that list by itself, > but I am worried about performance of that approach. Yea, it's far from pretty. I'm not convinced that the AM ever needs to/should care about caching. To me that's more like a generic behaviour. So it probably should be abstracted away from the individual AMs. I think the allocation routine might also need to be able to indicate whether WAL logging is needed or not. One thing I want attention to be paid to is that the interface should be able to support 'gapless' sequences. I.e. where nextval() (and thus alloc) needs to wait until the last caller to it finished. That very well can be relevant from the locking *and* WAL logging perspective. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 16/09/14 14:17, Andres Freund wrote: > On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote: >> >> There is also more needed than this, you need: >> - int64 value - first value allocated (value to be returned) >> - int64 nvalues - number of values allocated >> - int64 last - last cached value (used for cached/last_value) >> - int64 next - last logged value (used for wal logging) >> - am_private - updated private data, must be possible to return as null > >> I personally don't like that we need all the "nvalues", "next" and "last" as >> it makes the seqam a little bit too aware of the sequence logging internals >> in my opinion but I haven't found a way around it - it's impossible for >> backend to know how the AM will act around incby/maxv/minv/cycling so it >> can't really calculate these values by itself, unless ofcourse we fix the >> behavior and require seqams to behave predictably, but that somewhat breaks >> the whole idea of leaving the allocation to the seqam. Obviously it would >> also work to return list of allocated values and then backend could >> calculate the "value", "nvalues", "last", "next" from that list by itself, >> but I am worried about performance of that approach. > > Yea, it's far from pretty. > > I'm not convinced that the AM ever needs to/should care about > caching. To me that's more like a generic behaviour. So it probably > should be abstracted away from the individual AMs. > > I think the allocation routine might also need to be able to indicate > whether WAL logging is needed or not. Well that means we probably want to return first allocated value, last allocated value and then some boolean that tells backend if to wal log the sequence or not (number of values allocated does not really seem to be important unless I am missing something). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, I rewrote the patch with different API along the lines of what was discussed. The API now consists of following functions: sequence_alloc - allocating range of new values The function receives the sequence relation, current value, number of requested values amdata and relevant sequence options like min/max and returns new amdata, new current value, number of values allocated and also if it needs wal write (that should be returned if amdata has changed plus other reasons the AM might have to force the wal update). sequence_setval - notification that setval is happening This function gets sequence relation, previous value and new value plus the amdata and returns amdata (I can imagine some complex sequence AMs will want to throw error that setval can't be done on them). sequence_request_update/sequence_update - used for background processing Basically AM can call the sequence_request_update and backend will then call the sequence_update method of an AM with current amdata and will write the updated amdata to disk sequence_seqparams - function to process/validate the standard sequence options like start position, min/max, increment by etc by the AM, it's called in addition to the standard processing sequence_reloptions - this is the only thing that remained unchanged from previous patch, it's meant to pass custom options to the AM Only the alloc and reloptions methods are required (and implemented by the local AM). The caching, xlog writing, updating the page, etc is handled by backend, the AM does not see the tuple at all. I decided to not pass even the struct around and just pass the relevant options because I think if we want to abstract the storage properly then the AM should not care about how the pg_sequence looks like at all, even if it means that the sequence_alloc parameter list is bit long. For the amdata handling (which is the AM's private data variable) the API assumes that (Datum) 0 is NULL, this seems to work well for reloptions so should work here also and it simplifies things a little compared to passing pointers to pointers around and making sure everything is allocated, etc. Sadly the fact that amdata is not fixed size and can be NULL made the page updates of the sequence relation quite more complex that it used to be. There are probably some optimizations possible there but I think the patch is good enough for the review now, so I am adding it to October commitfest. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 10/13/2014 01:01 PM, Petr Jelinek wrote: > Hi, > > I rewrote the patch with different API along the lines of what was > discussed. Thanks, that's better. It would be good to see an alternative seqam to implement this API, to see how it really works. The "local" one is too dummy to expose any possible issues. > The API now consists of following functions: > sequence_alloc - allocating range of new values > The function receives the sequence relation, current value, number of > requested values amdata and relevant sequence options like min/max and > returns new amdata, new current value, number of values allocated and > also if it needs wal write (that should be returned if amdata has > changed plus other reasons the AM might have to force the wal update). > > sequence_setval - notification that setval is happening > This function gets sequence relation, previous value and new value plus > the amdata and returns amdata (I can imagine some complex sequence AMs > will want to throw error that setval can't be done on them). > > sequence_request_update/sequence_update - used for background processing > Basically AM can call the sequence_request_update and backend will then > call the sequence_update method of an AM with current amdata and will > write the updated amdata to disk > > sequence_seqparams - function to process/validate the standard sequence > options like start position, min/max, increment by etc by the AM, it's > called in addition to the standard processing > > sequence_reloptions - this is the only thing that remained unchanged > from previous patch, it's meant to pass custom options to the AM > > Only the alloc and reloptions methods are required (and implemented by > the local AM). > > The caching, xlog writing, updating the page, etc is handled by backend, > the AM does not see the tuple at all. I decided to not pass even the > struct around and just pass the relevant options because I think if we > want to abstract the storage properly then the AM should not care about > how the pg_sequence looks like at all, even if it means that the > sequence_alloc parameter list is bit long. Hmm. The division of labour between the seqam and commands/sequence.c still feels a bit funny. sequence.c keeps track of how many values have been WAL-logged, and thus usable immediately, but we still call sequence_alloc even when using up those already WAL-logged values. If you think of using this for something like a centralized sequence server in a replication cluster, you certainly don't want to make a call to the remote server for every value - you'll want to cache them. With the "local" seqam, there are two levels of caching. Each backend caches some values (per the CACHE <value> option in CREATE SEQUENCE). In addition to that, the server WAL-logs 32 values at a time. If you have a remote seqam, it would most likely add a third cache, but it would interact in strange ways with the second cache. Considering a non-local seqam, the locking is also a bit strange. The server keeps the sequence page locked throughout nextval(). But if the actual state of the sequence is maintained elsewhere, there's no need to serialize the calls to the remote allocator, i.e. the sequence_alloc() calls. I'm not exactly sure what to do about that. One option is to completely move the maintenance of the "current" value, i.e. sequence.last_value, to the seqam. That makes sense from an abstraction point of view. For example with a remote server managing the sequence, storing the "current" value in the local catalog table makes no sense as it's always going to be out-of-date. The local seqam would store it as part of the am-private data. However, you would need to move the responsibility of locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to provide an API that the seqam can call to do that. Perhaps just let the seqam call heap_inplace_update on the sequence relation. > For the amdata handling (which is the AM's private data variable) the > API assumes that (Datum) 0 is NULL, this seems to work well for > reloptions so should work here also and it simplifies things a little > compared to passing pointers to pointers around and making sure > everything is allocated, etc. > > Sadly the fact that amdata is not fixed size and can be NULL made the > page updates of the sequence relation quite more complex that it used to > be. It would be nice if the seqam could define exactly the columns it needs, with any datatypes. There would be a set of common attributes: sequence_name, start_value, cache_value, increment_by, max_value, min_value, is_cycled. The local seqam would add "last_value", "log_cnt" and "is_called" to that. A remote seqam that calls out to some other server might store the remote server's hostname etc. There could be a seqam function that returns a TupleDesc with the required columns, for example. - Heikki
On 04/11/14 13:11, Heikki Linnakangas wrote: > On 10/13/2014 01:01 PM, Petr Jelinek wrote: >> Hi, >> >> I rewrote the patch with different API along the lines of what was >> discussed. > > Thanks, that's better. > > It would be good to see an alternative seqam to implement this API, to > see how it really works. The "local" one is too dummy to expose any > possible issues. > Yeah, I don't know what that would be, It's hard for me to find something that would sever purely demonstration purposes. I guess I could port BDR sequences to this if it would help (once we have bit more solid agreement that the proposed API at least theoretically seems ok so that I don't have to rewrite it 10 times if at all possible). >> ... >> Only the alloc and reloptions methods are required (and implemented by >> the local AM). >> >> The caching, xlog writing, updating the page, etc is handled by backend, >> the AM does not see the tuple at all. I decided to not pass even the >> struct around and just pass the relevant options because I think if we >> want to abstract the storage properly then the AM should not care about >> how the pg_sequence looks like at all, even if it means that the >> sequence_alloc parameter list is bit long. > > Hmm. The division of labour between the seqam and commands/sequence.c > still feels a bit funny. sequence.c keeps track of how many values have > been WAL-logged, and thus usable immediately, but we still call > sequence_alloc even when using up those already WAL-logged values. If > you think of using this for something like a centralized sequence server > in a replication cluster, you certainly don't want to make a call to the > remote server for every value - you'll want to cache them. > > With the "local" seqam, there are two levels of caching. Each backend > caches some values (per the CACHE <value> option in CREATE SEQUENCE). In > addition to that, the server WAL-logs 32 values at a time. If you have a > remote seqam, it would most likely add a third cache, but it would > interact in strange ways with the second cache. > > Considering a non-local seqam, the locking is also a bit strange. The > server keeps the sequence page locked throughout nextval(). But if the > actual state of the sequence is maintained elsewhere, there's no need to > serialize the calls to the remote allocator, i.e. the sequence_alloc() > calls. > > I'm not exactly sure what to do about that. One option is to completely > move the maintenance of the "current" value, i.e. sequence.last_value, > to the seqam. That makes sense from an abstraction point of view. For > example with a remote server managing the sequence, storing the > "current" value in the local catalog table makes no sense as it's always > going to be out-of-date. The local seqam would store it as part of the > am-private data. However, you would need to move the responsibility of > locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to > provide an API that the seqam can call to do that. Perhaps just let the > seqam call heap_inplace_update on the sequence relation. > My idea of how this works is - sequence_next handles the allocation and the level2 caching (WAL logged cache) via amdata if it supports it, or returns single value if it doesn't - then the WAL will always just write the 1 value and there will basically be no level2 cache, since it is the sequence_next who controls how much will be WAL-logged, what backend asks for is just "suggestion". The third level caching as you say should be solved by the sequence_request_update and sequence_update callback - that will enable the sequence AM to handle this kind of caching asynchronously without blocking the sequence_next unnecessarily. That way it's possible to implement many different strategies, from no cache at all, to three levels of caching, with and without blocking of calls to sequence_next. >> For the amdata handling (which is the AM's private data variable) the >> API assumes that (Datum) 0 is NULL, this seems to work well for >> reloptions so should work here also and it simplifies things a little >> compared to passing pointers to pointers around and making sure >> everything is allocated, etc. >> >> Sadly the fact that amdata is not fixed size and can be NULL made the >> page updates of the sequence relation quite more complex that it used to >> be. > > It would be nice if the seqam could define exactly the columns it needs, > with any datatypes. There would be a set of common attributes: > sequence_name, start_value, cache_value, increment_by, max_value, > min_value, is_cycled. The local seqam would add "last_value", "log_cnt" > and "is_called" to that. A remote seqam that calls out to some other > server might store the remote server's hostname etc. > > There could be a seqam function that returns a TupleDesc with the > required columns, for example. > Wouldn't that somewhat bloat catalog if we had new catalog table for each sequence AM? It also does not really solve the amdata being dynamic size "issue". I think just dynamic field where AM stores whatever it wants is enough (ie the current solution), it's just that the hackery that sequences storage implementation does is more visible once there are non-fixed fields. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/04/2014 11:01 PM, Petr Jelinek wrote: > On 04/11/14 13:11, Heikki Linnakangas wrote: >> On 10/13/2014 01:01 PM, Petr Jelinek wrote: >>> Only the alloc and reloptions methods are required (and implemented by >>> the local AM). >>> >>> The caching, xlog writing, updating the page, etc is handled by backend, >>> the AM does not see the tuple at all. I decided to not pass even the >>> struct around and just pass the relevant options because I think if we >>> want to abstract the storage properly then the AM should not care about >>> how the pg_sequence looks like at all, even if it means that the >>> sequence_alloc parameter list is bit long. >> >> Hmm. The division of labour between the seqam and commands/sequence.c >> still feels a bit funny. sequence.c keeps track of how many values have >> been WAL-logged, and thus usable immediately, but we still call >> sequence_alloc even when using up those already WAL-logged values. If >> you think of using this for something like a centralized sequence server >> in a replication cluster, you certainly don't want to make a call to the >> remote server for every value - you'll want to cache them. >> >> With the "local" seqam, there are two levels of caching. Each backend >> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In >> addition to that, the server WAL-logs 32 values at a time. If you have a >> remote seqam, it would most likely add a third cache, but it would >> interact in strange ways with the second cache. >> >> Considering a non-local seqam, the locking is also a bit strange. The >> server keeps the sequence page locked throughout nextval(). But if the >> actual state of the sequence is maintained elsewhere, there's no need to >> serialize the calls to the remote allocator, i.e. the sequence_alloc() >> calls. >> >> I'm not exactly sure what to do about that. One option is to completely >> move the maintenance of the "current" value, i.e. sequence.last_value, >> to the seqam. That makes sense from an abstraction point of view. For >> example with a remote server managing the sequence, storing the >> "current" value in the local catalog table makes no sense as it's always >> going to be out-of-date. The local seqam would store it as part of the >> am-private data. However, you would need to move the responsibility of >> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to >> provide an API that the seqam can call to do that. Perhaps just let the >> seqam call heap_inplace_update on the sequence relation. > > My idea of how this works is - sequence_next handles the allocation and > the level2 caching (WAL logged cache) via amdata if it supports it, or > returns single value if it doesn't - then the WAL will always just write > the 1 value and there will basically be no level2 cache, since it is the > sequence_next who controls how much will be WAL-logged, what backend > asks for is just "suggestion". Hmm, so the AM might return an "nallocated" value less than the "fetch" value that sequence.c requested? As the patch stands, wouldn't that make sequence.c write a WAL record more often? In fact, if the seqam manages the current value outside the database (e.g. a "remote" seqam that gets the value from another server), nextval() never needs to write a WAL record. > The third level caching as you say should be solved by the > sequence_request_update and sequence_update callback - that will enable > the sequence AM to handle this kind of caching asynchronously without > blocking the sequence_next unnecessarily. > > That way it's possible to implement many different strategies, from no > cache at all, to three levels of caching, with and without blocking of > calls to sequence_next. It's nice that asynchronous operation is possible, but that seems like a more advanced feature. Surely an approach where you fetch a bunch of values when needed is going to be more common. Let's focus on the simple synchronous case first, and make sure the API works well for that. >>> For the amdata handling (which is the AM's private data variable) the >>> API assumes that (Datum) 0 is NULL, this seems to work well for >>> reloptions so should work here also and it simplifies things a little >>> compared to passing pointers to pointers around and making sure >>> everything is allocated, etc. >>> >>> Sadly the fact that amdata is not fixed size and can be NULL made the >>> page updates of the sequence relation quite more complex that it used to >>> be. >> >> It would be nice if the seqam could define exactly the columns it needs, >> with any datatypes. There would be a set of common attributes: >> sequence_name, start_value, cache_value, increment_by, max_value, >> min_value, is_cycled. The local seqam would add "last_value", "log_cnt" >> and "is_called" to that. A remote seqam that calls out to some other >> server might store the remote server's hostname etc. >> >> There could be a seqam function that returns a TupleDesc with the >> required columns, for example. > > Wouldn't that somewhat bloat catalog if we had new catalog table for > each sequence AM? No, that's not what I meant. The number of catalog tables would be the same as today. Sequences look much like any other relation, with entries in pg_attribute catalog table for all the attributes for each sequence. Currently, all sequences have the same set of attributes, sequence_name, last_value and so forth. What I'm proposing is that there would a set of attributes that are common to all sequences, but in addition to that there could be any number of AM-specific attributes. > It also does not really solve the amdata being dynamic > size "issue". Yes it would. There would not be a single amdata attribute, but the AM could specify any number of custom attributes, which could be fixed size or varlen. It would be solely the AM's responsibility to set the values of those attributes. > I think just dynamic field where AM stores whatever it > wants is enough (ie the current solution), it's just that the hackery > that sequences storage implementation does is more visible once there > are non-fixed fields. Using a single amdata field is like creating a table with a single text column, and putting all your data in the single column separated by tabs. I'm sure you can make it work, but normalized data is much nicer to work with. - Heikki
On 05/11/14 13:45, Heikki Linnakangas wrote: > On 11/04/2014 11:01 PM, Petr Jelinek wrote: >> On 04/11/14 13:11, Heikki Linnakangas wrote: >>> On 10/13/2014 01:01 PM, Petr Jelinek wrote: >>>> Only the alloc and reloptions methods are required (and implemented by >>>> the local AM). >>>> >>>> The caching, xlog writing, updating the page, etc is handled by >>>> backend, >>>> the AM does not see the tuple at all. I decided to not pass even the >>>> struct around and just pass the relevant options because I think if we >>>> want to abstract the storage properly then the AM should not care about >>>> how the pg_sequence looks like at all, even if it means that the >>>> sequence_alloc parameter list is bit long. >>> >>> Hmm. The division of labour between the seqam and commands/sequence.c >>> still feels a bit funny. sequence.c keeps track of how many values have >>> been WAL-logged, and thus usable immediately, but we still call >>> sequence_alloc even when using up those already WAL-logged values. If >>> you think of using this for something like a centralized sequence server >>> in a replication cluster, you certainly don't want to make a call to the >>> remote server for every value - you'll want to cache them. >>> >>> With the "local" seqam, there are two levels of caching. Each backend >>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In >>> addition to that, the server WAL-logs 32 values at a time. If you have a >>> remote seqam, it would most likely add a third cache, but it would >>> interact in strange ways with the second cache. >>> >>> Considering a non-local seqam, the locking is also a bit strange. The >>> server keeps the sequence page locked throughout nextval(). But if the >>> actual state of the sequence is maintained elsewhere, there's no need to >>> serialize the calls to the remote allocator, i.e. the sequence_alloc() >>> calls. >>> >>> I'm not exactly sure what to do about that. One option is to completely >>> move the maintenance of the "current" value, i.e. sequence.last_value, >>> to the seqam. That makes sense from an abstraction point of view. For >>> example with a remote server managing the sequence, storing the >>> "current" value in the local catalog table makes no sense as it's always >>> going to be out-of-date. The local seqam would store it as part of the >>> am-private data. However, you would need to move the responsibility of >>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to >>> provide an API that the seqam can call to do that. Perhaps just let the >>> seqam call heap_inplace_update on the sequence relation. >> >> My idea of how this works is - sequence_next handles the allocation and >> the level2 caching (WAL logged cache) via amdata if it supports it, or >> returns single value if it doesn't - then the WAL will always just write >> the 1 value and there will basically be no level2 cache, since it is the >> sequence_next who controls how much will be WAL-logged, what backend >> asks for is just "suggestion". > > Hmm, so the AM might return an "nallocated" value less than the "fetch" > value that sequence.c requested? As the patch stands, wouldn't that make > sequence.c write a WAL record more often? > That's correct, that's also why you usually want to have some form of local caching if possible. > In fact, if the seqam manages the current value outside the database > (e.g. a "remote" seqam that gets the value from another server), > nextval() never needs to write a WAL record. Sure it does, you need to keep the current state in Postgres also, at least the current value so that you can pass correct input to sequence_alloc(). And you need to do this in crash-safe way so WAL is necessary. I think sequences will cache in amdata if possible for that type of sequence and in case where it's not and it's caching on some external server then you'll most likely get bigger overhead from network than WAL anyway... > >>>> For the amdata handling (which is the AM's private data variable) the >>>> API assumes that (Datum) 0 is NULL, this seems to work well for >>>> reloptions so should work here also and it simplifies things a little >>>> compared to passing pointers to pointers around and making sure >>>> everything is allocated, etc. >>>> >>>> Sadly the fact that amdata is not fixed size and can be NULL made the >>>> page updates of the sequence relation quite more complex that it >>>> used to >>>> be. >>> >>> It would be nice if the seqam could define exactly the columns it needs, >>> with any datatypes. There would be a set of common attributes: >>> sequence_name, start_value, cache_value, increment_by, max_value, >>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt" >>> and "is_called" to that. A remote seqam that calls out to some other >>> server might store the remote server's hostname etc. >>> >>> There could be a seqam function that returns a TupleDesc with the >>> required columns, for example. >> >> Wouldn't that somewhat bloat catalog if we had new catalog table for >> each sequence AM? > > No, that's not what I meant. The number of catalog tables would be the > same as today. Sequences look much like any other relation, with entries > in pg_attribute catalog table for all the attributes for each sequence. > Currently, all sequences have the same set of attributes, sequence_name, > last_value and so forth. What I'm proposing is that there would a set of > attributes that are common to all sequences, but in addition to that > there could be any number of AM-specific attributes. Oh, that's interesting idea, so the AM interfaces would basically return updated tuple and there would be some description function that returns tupledesc. I am bit worried that this would kill any possibility of ALTER SEQUENCE USING access_method. Plus I don't think it actually solves any real problem - serializing the internal C structs into bytea is not any harder than serializing them into tuple IMHO. > >> It also does not really solve the amdata being dynamic >> size "issue". > > Yes it would. There would not be a single amdata attribute, but the AM > could specify any number of custom attributes, which could be fixed size > or varlen. It would be solely the AM's responsibility to set the values > of those attributes. > That's not the issue I was referring to, I was talking about the page replacement code which is not as simple now that we have potentially dynamic size tuple and if tuples were different for different AMs the code would still have to be able to handle that case. Setting the values in tuple itself is not too complicated. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/05/2014 05:07 PM, Petr Jelinek wrote: > On 05/11/14 13:45, Heikki Linnakangas wrote: >> In fact, if the seqam manages the current value outside the database >> (e.g. a "remote" seqam that gets the value from another server), >> nextval() never needs to write a WAL record. > > Sure it does, you need to keep the current state in Postgres also, at > least the current value so that you can pass correct input to > sequence_alloc(). And you need to do this in crash-safe way so WAL is > necessary. Why does sequence_alloc need the current value? If it's a "remote" seqam, the current value is kept in the remote server, and the last value that was given to this PostgreSQL server is irrelevant. That irks me with this API. The method for acquiring a new value isn't fully abstracted behind the AM interface, as sequence.c still needs to track it itself. That's useful for the local AM, of course, and maybe some others, but for others it's totally useless. >>>>> For the amdata handling (which is the AM's private data variable) the >>>>> API assumes that (Datum) 0 is NULL, this seems to work well for >>>>> reloptions so should work here also and it simplifies things a little >>>>> compared to passing pointers to pointers around and making sure >>>>> everything is allocated, etc. >>>>> >>>>> Sadly the fact that amdata is not fixed size and can be NULL made the >>>>> page updates of the sequence relation quite more complex that it >>>>> used to >>>>> be. >>>> >>>> It would be nice if the seqam could define exactly the columns it needs, >>>> with any datatypes. There would be a set of common attributes: >>>> sequence_name, start_value, cache_value, increment_by, max_value, >>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt" >>>> and "is_called" to that. A remote seqam that calls out to some other >>>> server might store the remote server's hostname etc. >>>> >>>> There could be a seqam function that returns a TupleDesc with the >>>> required columns, for example. >>> >>> Wouldn't that somewhat bloat catalog if we had new catalog table for >>> each sequence AM? >> >> No, that's not what I meant. The number of catalog tables would be the >> same as today. Sequences look much like any other relation, with entries >> in pg_attribute catalog table for all the attributes for each sequence. >> Currently, all sequences have the same set of attributes, sequence_name, >> last_value and so forth. What I'm proposing is that there would a set of >> attributes that are common to all sequences, but in addition to that >> there could be any number of AM-specific attributes. > > Oh, that's interesting idea, so the AM interfaces would basically return > updated tuple and there would be some description function that returns > tupledesc. Yeah, something like that. > I am bit worried that this would kill any possibility of > ALTER SEQUENCE USING access_method. Plus I don't think it actually > solves any real problem - serializing the internal C structs into bytea > is not any harder than serializing them into tuple IMHO. I agree that serialization to bytea isn't that difficult, but it's still nicer to work directly with the correct data types. And it makes the internal state easily accessible for monitoring and debugging purposes. >>> It also does not really solve the amdata being dynamic >>> size "issue". >> >> Yes it would. There would not be a single amdata attribute, but the AM >> could specify any number of custom attributes, which could be fixed size >> or varlen. It would be solely the AM's responsibility to set the values >> of those attributes. >> > > That's not the issue I was referring to, I was talking about the page > replacement code which is not as simple now that we have potentially > dynamic size tuple and if tuples were different for different AMs the > code would still have to be able to handle that case. Setting the values > in tuple itself is not too complicated. I don't see the problem with that. We deal with variable-sized tuples in heap pages all the time. The max size of amdata (or the extra AM-specific columns) is going to be determined by the block size, though. - Heikki
On 05/11/14 18:32, Heikki Linnakangas wrote: > On 11/05/2014 05:07 PM, Petr Jelinek wrote: >> On 05/11/14 13:45, Heikki Linnakangas wrote: >>> In fact, if the seqam manages the current value outside the database >>> (e.g. a "remote" seqam that gets the value from another server), >>> nextval() never needs to write a WAL record. >> >> Sure it does, you need to keep the current state in Postgres also, at >> least the current value so that you can pass correct input to >> sequence_alloc(). And you need to do this in crash-safe way so WAL is >> necessary. > > Why does sequence_alloc need the current value? If it's a "remote" > seqam, the current value is kept in the remote server, and the last > value that was given to this PostgreSQL server is irrelevant. Hmm, I am not sure if I see this usecase as practical TBH, but I also don't see fundamental problem with it. > That irks me with this API. The method for acquiring a new value isn't > fully abstracted behind the AM interface, as sequence.c still needs to > track it itself. That's useful for the local AM, of course, and maybe > some others, but for others it's totally useless. Hmm, I think that kind of abstraction can only be done by passing current tuple and returning updated tuple (yes I realize that it's what you have been saying basically). In general it sounds like the level of abstraction you'd want would be one where AM cares about everything except the the code that does the actual writes to page and WAL (but when to do those would still be controlled completely by AM?) and the SQL interface. I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that use-case (I think the ability to convert the "local" sequences to 3rd party ones and back is very important). >> >> That's not the issue I was referring to, I was talking about the page >> replacement code which is not as simple now that we have potentially >> dynamic size tuple and if tuples were different for different AMs the >> code would still have to be able to handle that case. Setting the values >> in tuple itself is not too complicated. > > I don't see the problem with that. We deal with variable-sized tuples in > heap pages all the time. The max size of amdata (or the extra > AM-specific columns) is going to be determined by the block size, though. > Glad to hear that. Yes the limit is block size, I think we can live with that at least for the moment... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/05/2014 05:01 AM, Petr Jelinek wrote: > I guess I could port BDR sequences to this if it would help (once we > have bit more solid agreement that the proposed API at least > theoretically seems ok so that I don't have to rewrite it 10 times if at > all possible). Because the BDR sequences rely on all the other BDR machinery I suspect that'd be a pretty big thing to review and follow for someone who doesn't know the BDR code. Do you think it'd be simple to provide a blocking, transactional sequence allocator via this API? i.e. gapless sequences, much the same as typically implemented with SELECT ... FOR UPDATE on a counter table. It might be more digestible standalone, and would be a handy contrib/ example extension demonstrating use of the API. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer wrote: > On 11/05/2014 05:01 AM, Petr Jelinek wrote: > > I guess I could port BDR sequences to this if it would help (once we > > have bit more solid agreement that the proposed API at least > > theoretically seems ok so that I don't have to rewrite it 10 times if at > > all possible). > > Because the BDR sequences rely on all the other BDR machinery I suspect > that'd be a pretty big thing to review and follow for someone who > doesn't know the BDR code. > > Do you think it'd be simple to provide a blocking, transactional > sequence allocator via this API? i.e. gapless sequences, much the same > as typically implemented with SELECT ... FOR UPDATE on a counter table. I think that would be a useful contrib module too; gapless allocation is a frequent user need. If we can build it in some reasonable way with the seqam API, it'd show the API is good. But on the other hand, since BDR sequences are the ultimate goal of this patch, we need to make sure that they can be made to work with whatever the seqam API ends up being. It's not necessary for reviewers here to review that code, but the BDR developers need to say "yes, this API works for us" -- just like the slony developers (well, ssinger) did for logical decoding. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 06/11/14 11:22, Craig Ringer wrote: > On 11/05/2014 05:01 AM, Petr Jelinek wrote: >> I guess I could port BDR sequences to this if it would help (once we >> have bit more solid agreement that the proposed API at least >> theoretically seems ok so that I don't have to rewrite it 10 times if at >> all possible). > > Because the BDR sequences rely on all the other BDR machinery I suspect > that'd be a pretty big thing to review and follow for someone who > doesn't know the BDR code. > > Do you think it'd be simple to provide a blocking, transactional > sequence allocator via this API? i.e. gapless sequences, much the same > as typically implemented with SELECT ... FOR UPDATE on a counter table. > > It might be more digestible standalone, and would be a handy contrib/ > example extension demonstrating use of the API. > Yes I think that's doable (once we iron out the API we can agree on). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that use-case(I think the ability to convert the "local" sequences to 3rd party ones and back is very important). What specific problems do you foresee? There's an issue if something depends on one of the added sequence columns, but ifthat is the case then you had *better* fail. I think that the debugability value of making extra sequence columns human-readable is quite high. ...Robert
On 08/11/14 00:45, Robert Haas wrote: > On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that use-case(I think the ability to convert the "local" sequences to 3rd party ones and back is very important). > > What specific problems do you foresee? There's an issue if something depends on one of the added sequence columns, butif that is the case then you had *better* fail. > > I think that the debugability value of making extra sequence columns human-readable is quite high. > My main problem is actually not with having tuple per-seqAM, but more with the fact that Heikki does not want to have last_value as compulsory column/parameter. How is the new AM then supposed to know where to pick up and if it even can pick up? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/11/14 00:57, Petr Jelinek wrote: > On 08/11/14 00:45, Robert Haas wrote: >> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >>> I don't see how to make that work with ALTER SEQUENCE USING to be >>> honest and I do care quite a lot about that use-case (I think the >>> ability to convert the "local" sequences to 3rd party ones and back >>> is very important). >> >> What specific problems do you foresee? There's an issue if something >> depends on one of the added sequence columns, but if that is the case >> then you had *better* fail. >> >> I think that the debugability value of making extra sequence columns >> human-readable is quite high. >> > > My main problem is actually not with having tuple per-seqAM, but more > with the fact that Heikki does not want to have last_value as compulsory > column/parameter. How is the new AM then supposed to know where to pick > up and if it even can pick up? > And obviously, once the last_value is part of the compulsory columns we again have to WAL log all the time for the use-case which Heikki is using as model, so it does not help there (just to clear what my point was about). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> My main problem is actually not with having tuple per-seqAM, but more >> with the fact that Heikki does not want to have last_value as compulsory >> column/parameter. How is the new AM then supposed to know where to pick >> up and if it even can pick up? That seems pretty well impossible to know anyway. If the pluggable AM was handing out values at random or in some unpredictable fashion, there may be no well-defined point where it's safe for the default AM to resume. Granted, in the case of replication, it probably is possible, and maybe that's important enough to be worth catering to. > And obviously, once the last_value is part of the compulsory columns we > again have to WAL log all the time for the use-case which Heikki is using as > model, so it does not help there (just to clear what my point was about). But I don't know what to do about that. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/08/2014 12:35 AM, Petr Jelinek wrote: >> >> Do you think it'd be simple to provide a blocking, transactional >> sequence allocator via this API? i.e. gapless sequences, much the same >> as typically implemented with SELECT ... FOR UPDATE on a counter table. >> >> It might be more digestible standalone, and would be a handy contrib/ >> example extension demonstrating use of the API. >> > > Yes I think that's doable (once we iron out the API we can agree on). Cool; I think that'd be an interesting self-contained example that'd also have real-world uses ... and provide a nice succinct way to answer those semi-regular "how do I create a sequence that doesn't have holes" questions. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 08/11/14 03:10, Robert Haas wrote: > On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >>> My main problem is actually not with having tuple per-seqAM, but more >>> with the fact that Heikki does not want to have last_value as compulsory >>> column/parameter. How is the new AM then supposed to know where to pick >>> up and if it even can pick up? > > That seems pretty well impossible to know anyway. If the pluggable AM > was handing out values at random or in some unpredictable fashion, > there may be no well-defined point where it's safe for the default AM > to resume. Granted, in the case of replication, it probably is > possible, and maybe that's important enough to be worth catering to. While this is true, I think 99% of this use-case in practice is about converting existing schema with "local" sequence to some other sequence and perhaps fixing schema after people create sequence using wrong AM because their default is not what they thought it is. > >> And obviously, once the last_value is part of the compulsory columns we >> again have to WAL log all the time for the use-case which Heikki is using as >> model, so it does not help there (just to clear what my point was about). > > But I don't know what to do about that. :-( > Me neither and I don't think this is actually solvable, we either have last_value and logcnt as mandatory columns and the central sequence server example Heikki is talking about will be impacted by this, or we leave those columns to AM implementations and we lose the ability to do AM change for majority of cases, and also IMHO most AMs will then have to implement their own last_value and logcnt logic anyway. Honestly, I am still not convinced that the centralized sequence server with no local caching is something we should be optimizing for as that one will suffer from performance problems anyway. And it can ignore the last_value input from postgres if it choses to, so it's not like the currently proposed patch forbids implementation of such AMs. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Why does sequence_alloc need the current value? If it's a "remote" seqam, > the current value is kept in the remote server, and the last value that was > given to this PostgreSQL server is irrelevant. > > That irks me with this API. The method for acquiring a new value isn't fully > abstracted behind the AM interface, as sequence.c still needs to track it > itself. That's useful for the local AM, of course, and maybe some others, > but for others it's totally useless. Please bear in mind what seqam is for... At present it's only use is to provide Global Sequences. There's a few ways of doing that, but they all look sorta similar. The only other use we thought of was shot down, so producing the perfect API isn't likely to help anyone. It's really not worth the effort to produce a better API. BDR doesn't require Global Sequences, nor are Global Sequences restricted in their use to just BDR - lots of cluster configurations would want something like this. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 8, 2014 at 6:17 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: > Honestly, I am still not convinced that the centralized sequence server with > no local caching is something we should be optimizing for as that one will > suffer from performance problems anyway. And it can ignore the last_value > input from postgres if it choses to, so it's not like the currently proposed > patch forbids implementation of such AMs. I can buy that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/08/2014 01:57 AM, Petr Jelinek wrote: > My main problem is actually not with having tuple per-seqAM, but more > with the fact that Heikki does not want to have last_value as compulsory > column/parameter. How is the new AM then supposed to know where to pick > up and if it even can pick up? Call nextval(), and use the value it returns as the starting point for the new AM. - Heikki
On 09/11/14 20:47, Heikki Linnakangas wrote: > On 11/08/2014 01:57 AM, Petr Jelinek wrote: >> My main problem is actually not with having tuple per-seqAM, but more >> with the fact that Heikki does not want to have last_value as compulsory >> column/parameter. How is the new AM then supposed to know where to pick >> up and if it even can pick up? > > Call nextval(), and use the value it returns as the starting point for > the new AM. > Hah, so simple, works for me. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/08/2014 04:21 PM, Simon Riggs wrote: > On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >> Why does sequence_alloc need the current value? If it's a "remote" seqam, >> the current value is kept in the remote server, and the last value that was >> given to this PostgreSQL server is irrelevant. >> >> That irks me with this API. The method for acquiring a new value isn't fully >> abstracted behind the AM interface, as sequence.c still needs to track it >> itself. That's useful for the local AM, of course, and maybe some others, >> but for others it's totally useless. > > Please bear in mind what seqam is for... > > At present it's only use is to provide Global Sequences. There's a few > ways of doing that, but they all look sorta similar. Ok. > The only other use we thought of was shot down, so producing the > perfect API isn't likely to help anyone. It's really not worth the > effort to produce a better API. Your argument seems to be that because this API is only used for creating global sequences, it's not worth it to make it better for that purpose. That doesn't make much sense, so that's probably not what you meant. I'm confused. To be clear: I don't think this API is very good for its stated purpose, for implementing global sequences for use in a cluster. For the reasons I've mentioned before. I'd like to see two changes to this proposal: 1. Make the AM implementation solely responsible for remembering the "last value". (if it's a global or remote sequence, the current value might not be stored in the local server at all) 2. Instead of the single amdata field, make it possible for the implementation to define any number of fields with any datatype in the tuple. That would make debugging, monitoring etc. easier. > BDR doesn't require Global Sequences, nor are Global Sequences > restricted in their use to just BDR - lots of cluster configurations > would want something like this. Sure. (I don't see how that's relevant to this discussion, however). (marking this as "Returned with Feedback" in the commitfest) - Heikki
On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote: > To be clear: I don't think this API is very good for its stated purpose, for > implementing global sequences for use in a cluster. For the reasons I've > mentioned before. I'd like to see two changes to this proposal: > > 1. Make the AM implementation solely responsible for remembering the "last > value". (if it's a global or remote sequence, the current value might not be > stored in the local server at all) I think that reason isn't particularly good. The practical applicability for such a implementation doesn't seem to be particularly large. > 2. Instead of the single amdata field, make it possible for the > implementation to define any number of fields with any datatype in the > tuple. That would make debugging, monitoring etc. easier. My main problem with that approach is that it pretty much nails the door shut for moving sequences into a catalog table instead of the current, pretty insane, approach of a physical file per sequence. Currently, with our without seqam, it'd not be all that hard to force it into a catalog, taking care to to force each tuple onto a separate page... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/02/2014 08:21 PM, Andres Freund wrote: > [snip] >> 2. Instead of the single amdata field, make it possible for the >> implementation to define any number of fields with any datatype in the >> tuple. That would make debugging, monitoring etc. easier. > My main problem with that approach is that it pretty much nails the door > shut for moving sequences into a catalog table instead of the current, > pretty insane, approach of a physical file per sequence. Hmm... having done my fair bit of testing, I can say that this isn't actually that bad (though depends heavily on the underlying filesystem and workload, of course) With this approach, I fear extreme I/O contention with an update-heavy workload... unless all sequence activity is finally WAL-logged and hence writes to the actual files become mostly sequential and asynchronous. May I possibly suggest a file-per-schema model instead? This approach would certainly solve the excessive i-node consumption problem that --I guess-- Andres is trying to address here. Moreover, the "one file per schema for sequences" solution would fit a quite common model of grouping tables (in schemas) for physical [tablespace] location purposes.... > Currently, with > our without seqam, it'd not be all that hard to force it into a catalog, > taking care to to force each tuple onto a separate page... IMHO, this is jst as wasteful as the current approach (one-page file per sequence) in terms of disk usage and complicates the code a bit .... but I really don't see how we can have more than one sequence state per page without severe (page) locking problems. However, someone with deeper knowledge of page pinning and buffer manager internals could certainly devise a better solution... Just my 2c Thanks, / J.L.
On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote: > On 12/02/2014 08:21 PM, Andres Freund wrote: > >[snip] > >>2. Instead of the single amdata field, make it possible for the > >>implementation to define any number of fields with any datatype in the > >>tuple. That would make debugging, monitoring etc. easier. > >My main problem with that approach is that it pretty much nails the door > >shut for moving sequences into a catalog table instead of the current, > >pretty insane, approach of a physical file per sequence. > > Hmm... having done my fair bit of testing, I can say that this isn't > actually that bad (though depends heavily on the underlying filesystem and > workload, of course) > With this approach, I fear extreme I/O contention with an update-heavy > workload... unless all sequence activity is finally WAL-logged and hence > writes to the actual files become mostly sequential and asynchronous. I don't think the WAL logging would need to change much in comparison to the current solution. We'd just add the page number to the WAL record. The biggest advantage would be to require fewer heavyweight locks, because the pg_sequence one could be a single fastpath lock. Currently we have to take the sequence's relation lock (heavyweight) and then the the page level lock (lwlock) for every single sequence used. > May I possibly suggest a file-per-schema model instead? This approach would > certainly solve the excessive i-node consumption problem that --I guess-- > Andres is trying to address here. I don't think that really has any advantages. > >Currently, with > >our without seqam, it'd not be all that hard to force it into a catalog, > >taking care to to force each tuple onto a separate page... > > IMHO, this is jst as wasteful as the current approach (one-page file per > sequence) in terms of disk usage and complicates the code a bit .... but I > really don't see how we can have more than one sequence state per page > without severe (page) locking problems. The overhead of a file is much more than wasting the remainder of a page. Alone the requirement of separate fsyncs and everything is pretty bothersome. The generated IO patterns are also much worse... > However, someone with deeper knowledge of page pinning and buffer manager > internals could certainly devise a better solution... I think there's pretty much no chance of accepting more than one page per Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 02/12/14 20:21, Andres Freund wrote: > On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote: >> To be clear: I don't think this API is very good for its stated purpose, for >> implementing global sequences for use in a cluster. For the reasons I've >> mentioned before. I'd like to see two changes to this proposal: >> ... >> 2. Instead of the single amdata field, make it possible for the >> implementation to define any number of fields with any datatype in the >> tuple. That would make debugging, monitoring etc. easier. > > My main problem with that approach is that it pretty much nails the door > shut for moving sequences into a catalog table instead of the current, > pretty insane, approach of a physical file per sequence. Currently, with > our without seqam, it'd not be all that hard to force it into a catalog, > taking care to to force each tuple onto a separate page... > I don't know, I think if we decide to change storage format we can do serialization/conversion in seqam layer, it does not seem to matter too much if the serialization into some opaque type is done in AM itself or by the API layer. Or we can have one relation for all sequences in single AM, etc. In general I don't think that the custom columns for AM approach prohibits future storage changes. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/03/2014 11:24 AM, Andres Freund wrote: > On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote: >> snip] > I don't think the WAL logging would need to change much in comparison to > the current solution. We'd just add the page number to the WAL record. > > The biggest advantage would be to require fewer heavyweight locks, > because the pg_sequence one could be a single fastpath lock. Currently > we have to take the sequence's relation lock (heavyweight) and then the > the page level lock (lwlock) for every single sequence used. Got it, thank you for the explanation. >> May I possibly suggest a file-per-schema model instead? This approach would >> certainly solve the excessive i-node consumption problem that --I guess-- >> Andres is trying to address here. > I don't think that really has any advantages. Just spreading the I/O load, nothing more, it seems: Just to elaborate a bit on the reasoning, for completeness' sake: Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k sequences per relation segment. Arguably, not many real use cases will have that many sequences.... save for *massively* multi-tenant databases. The downside being that all that random I/O --- in general, it can't really be sequential unless there are very very few sequences--- can't be spread to other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe? (I could take a shot at the patch, if deemed worthwhile) >> [snip] > The overhead of a file is much more than wasting the remainder of a > page. Alone the requirement of separate fsyncs and everything is pretty > bothersome. The generated IO patterns are also much worse... Yes, you are right. I stand corrected. > [snip] > I think there's pretty much no chance of accepting more than one page > per sequence Definitively. Thanks, J.L.
On 12/3/14, 8:50 AM, José Luis Tallón wrote: > >>> May I possibly suggest a file-per-schema model instead? This approach would >>> certainly solve the excessive i-node consumption problem that --I guess-- >>> Andres is trying to address here. >> I don't think that really has any advantages. > > Just spreading the I/O load, nothing more, it seems: > > Just to elaborate a bit on the reasoning, for completeness' sake: > Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k sequences per relation segment. > Arguably, not many real use cases will have that many sequences.... save for *massively* multi-tenant databases. > > The downside being that all that random I/O --- in general, it can't really be sequential unless there are very very fewsequences--- can't be spread to other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE SET TABLESPACE,to use an SSD for this purpose maybe? Why not? RAID arrays typically use stripe sizes in the 128-256k range, which means only 16 or 32 sequences per stripe. It still might make sense to allow controlling what tablespace a sequence is in, but IMHO the default should just be pg_default. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
> >>May I possibly suggest a file-per-schema model instead? This approach would > >>certainly solve the excessive i-node consumption problem that --I guess-- > >>Andres is trying to address here. > >I don't think that really has any advantages. > > Just spreading the I/O load, nothing more, it seems: > > Just to elaborate a bit on the reasoning, for completeness' sake: > Given that a relation's segment maximum size is 1GB, we'd have > (1048576/8)=128k sequences per relation segment. > Arguably, not many real use cases will have that many sequences.... save for > *massively* multi-tenant databases. > > The downside being that all that random I/O --- in general, it can't really > be sequential unless there are very very few sequences--- can't be spread to > other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE > SET TABLESPACE, to use an SSD for this purpose maybe? > (I could take a shot at the patch, if deemed worthwhile) I think that's just so far outside the sane usecases that I really don't see us adding complexity to reign it in. If your frequently used sequences get flushed out to disk by anything but the checkpointer the schema is just badly designed... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24/11/14 12:16, Heikki Linnakangas wrote: > I'd like to see two changes to this proposal: > > 1. Make the AM implementation solely responsible for remembering the > "last value". (if it's a global or remote sequence, the current value > might not be stored in the local server at all) > > 2. Instead of the single amdata field, make it possible for the > implementation to define any number of fields with any datatype in the > tuple. That would make debugging, monitoring etc. easier. > Hi, here is the new version for next CF, there are several rough edges (more about that later) but I think it's ready for more feedback. I did change the API quite considerably (rewrite is the word I believe). Instead of trying to put smarts on one or the other side of the API, I basically created 2 APIs, one is the seqam which is responsible for maintaining the sequence data and the other is sequence storage API provided by postgres backend. This means that backend cares purely about storage, and local caching while AM cares about the data inside sequence, and decides when to WAL. I'll start with storage API description: - sequence_open(Oid) - open sequence with given Oid and returns SequenceHandle (opens relation in share lock only) - sequence_read_tuple(SequenceHandle) - returns heap tuple of the sequence, this will also do the buffer locking - sequence_save_tuple(SequenceHandle, newtuple, do_wal) - updates the buffer with new sequence, optionally does WAL logging, newtuple can be NULL in which case the tuple is not replaced (this is performance optimization for sequences that don't need varlena/nullable columns and can update everything inplace) - sequence_release_tuple(SequenceHandle) - unlocks previously locked buffer - usable for synchronization inside the seqam implementation, the gapless sequence uses this - sequence_close(SequenceHandle) - closes the sequence handle, also unlocks buffer if needed - sequence_needs_wal(SequenceHandle) - returns true if sequence needs wal, that is if the last wal write happened before last checkpoint - this is a little bit low level but I don't see a way around it if the local sequence is supposed to maintain last_value and logcnt itself, also we have similar function for relations (that checks for different things but naming and usage is similar) so it's probably not too bad The SequenceHandle is opaque struct so no gory details seen by the AM. Now for the actual sequence AM API: - seqam_extra_columns(Oid) - returns list of custom columns needed by specified sequenceAM - seqam_init - Initializes the columns for the custom sequenceAM, used when creating, altering or resetting sequence. It gets sequenceam Oid, sequence options, reloptions on input and returns updated values and nulls arrays. - seqam_alloc - Allocates new sequence values. It gets sequence relation, handle and how many new values the backend wants and returns first and last allocated value. - seqam_setval - implements setval() support - seqam_dump_state - pg_dump support - seqam_restore_state - pg_dump support The API is slightly bigger, but it also includes proper support for dumping and restoring via pg_dump and seems somewhat cleaner, plus the synchronous and asynchronous APIs are now same and no callbacks anywhere (if AM needs to do something asynchronously it just calls the storage API). There is also gapless sequence contrib module included as separate patch, it's missing proper locking during dump/restore atm but otherwise should be fully working (see tests). I am also using it for testing the ALTER TABLE USING. About the rough edges: - The AlterSequence is not prettiest code around as we now have to create new relation when sequence AM is changed and I don't know how to do that nicely - I am not sure if I did the locking, command order and dependency handling in AlterSequence correcly - I removed the setval3 since there is no guarantee that sequence will have is_called parameter but there is now proper pg_dump support so it's not needed for pg_dump usecase, but it's still incompatible change - ResetSequence basically does ALTER SEQUENCE RESET internally instead calling setval now because of the above - there are no docs for the C APIs yet - I don't even know where those should be placed btw, would appreciate suggestions (I have same problem with committs API docs) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 10/12/14 03:33, Petr Jelinek wrote: > On 24/11/14 12:16, Heikki Linnakangas wrote: > > About the rough edges: > - The AlterSequence is not prettiest code around as we now have to > create new relation when sequence AM is changed and I don't know how to > do that nicely > - I am not sure if I did the locking, command order and dependency > handling in AlterSequence correcly This version does AlterSequence differently and better. Didn't attach the gapless sequence again as that one is unchanged. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 15/12/14 11:36, Petr Jelinek wrote: > On 10/12/14 03:33, Petr Jelinek wrote: >> On 24/11/14 12:16, Heikki Linnakangas wrote: >> >> About the rough edges: >> - The AlterSequence is not prettiest code around as we now have to >> create new relation when sequence AM is changed and I don't know how to >> do that nicely >> - I am not sure if I did the locking, command order and dependency >> handling in AlterSequence correcly > > This version does AlterSequence differently and better. Didn't attach > the gapless sequence again as that one is unchanged. > > And another version, separated into patch-set of 3 patches. First patch contains the seqam patch itself, not many changes there, mainly docs/comments related. What I wrote in the previous email for version 3 still applies. Second patch adds DDL support. I originally wanted to make it CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES which does not need to change anything (besides adding METHOD to unreserved keywords). The DDL support uses the DefineStmt infra with some very small change as the sequence ams are not schema qualified, but I think it's acceptable and saves considerable amount of boilerplate. And third patch is gapless sequence implementation updated to work with the new DDL support with some tests added for checking if dependencies work correctly. It also acts as example on how to make custom AMs. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 12.1.2015 22:33, Petr Jelinek wrote: > On 15/12/14 11:36, Petr Jelinek wrote: >> On 10/12/14 03:33, Petr Jelinek wrote: >>> On 24/11/14 12:16, Heikki Linnakangas wrote: >>> >>> About the rough edges: >>> - The AlterSequence is not prettiest code around as we now have to >>> create new relation when sequence AM is changed and I don't know how to >>> do that nicely >>> - I am not sure if I did the locking, command order and dependency >>> handling in AlterSequence correcly >> >> This version does AlterSequence differently and better. Didn't attach >> the gapless sequence again as that one is unchanged. >> >> > > And another version, separated into patch-set of 3 patches. > First patch contains the seqam patch itself, not many changes there, > mainly docs/comments related. What I wrote in the previous email for > version 3 still applies. I did a review of the first part today - mostly by reading through the diff. I plan to do a bit more thorough testing in a day or two. I'll also look at the two (much smaller) patches. comments/questions/general nitpicking: (1) Why treating empty string as equal to 'local'? Isn't enforcing the actual value a better approach? (2) NITPICK: Maybe we could use access_method in the docs (instead of sequence_access_method), as the 'sequence' part isclear from the context? (3) Why index_reloptions / sequence_reloptions when both do exactly the same thing (call to common_am_reloptions)? I'dunderstand this if the patch(es) then change the sequence_reloptions() but that's not happening. Maybe that's expectedto happen? (4) DOCS: Each sequence can only use one access method at a time. Does that mean a sequence can change the access method during it's lifetime? My understanding is that the AM is fixedafter creating the sequence? (5) DOCS/NITPICK: SeqAM / SeqAm ... (probably should be the same?) (6) cache lookup failed for relation %u I believe this should reference 'sequence access method' instead of a relation. (7) seqam_init I believe this is a bug (not setting nulls[4] but twice nulls[3]): + fcinfo.argnull[0] = seqparams == NULL; + fcinfo.argnull[1] = reloptions_parsed == NULL; + fcinfo.argnull[2] = false; + fcinfo.argnull[3] = false; + fcinfo.argnull[3] = false; (8) check_default_seqam without a transaction * If we aren't inside a transaction, we cannot do database access so* cannot verify the name. Must accept the value onfaith. In which situation this happens? Wouldn't it be better to simply enforce the transaction and fail otherwise? regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 13/01/15 13:24, Tomas Vondra wrote: > On 12.1.2015 22:33, Petr Jelinek wrote: >> On 15/12/14 11:36, Petr Jelinek wrote: >>> On 10/12/14 03:33, Petr Jelinek wrote: >>>> On 24/11/14 12:16, Heikki Linnakangas wrote: >>>> >>>> About the rough edges: >>>> - The AlterSequence is not prettiest code around as we now have to >>>> create new relation when sequence AM is changed and I don't know how to >>>> do that nicely >>>> - I am not sure if I did the locking, command order and dependency >>>> handling in AlterSequence correcly >>> >>> This version does AlterSequence differently and better. Didn't attach >>> the gapless sequence again as that one is unchanged. >>> >>> >> >> And another version, separated into patch-set of 3 patches. >> First patch contains the seqam patch itself, not many changes there, >> mainly docs/comments related. What I wrote in the previous email for >> version 3 still applies. > > I did a review of the first part today - mostly by reading through the > diff. I plan to do a bit more thorough testing in a day or two. I'll > also look at the two (much smaller) patches. > Thanks! > comments/questions/general nitpicking: > > (1) Why treating empty string as equal to 'local'? Isn't enforcing the > actual value a better approach? > Álvaro mentioned on IM also, I already changed it to saner normal GUC with 'local' as default value in my working copy > (2) NITPICK: Maybe we could use access_method in the docs (instead of > sequence_access_method), as the 'sequence' part is clear from the > context? Yes. > (3) Why index_reloptions / sequence_reloptions when both do exactly the > same thing (call to common_am_reloptions)? I'd understand this if > the patch(es) then change the sequence_reloptions() but that's not > happening. Maybe that's expected to happen? That's leftover from the original design where AM was supposed to call this, since this is not exposed to AM anymore I think it can be single function now. > > (4) DOCS: Each sequence can only use one access method at a time. > > Does that mean a sequence can change the access method during it's > lifetime? My understanding is that the AM is fixed after creating > the sequence? > Oh, I forgot to add ALTER SEQUENCE USING into docs, you can change AM even though you probably don't want to do it often, but for migrations it's useful. > (8) check_default_seqam without a transaction > > * If we aren't inside a transaction, we cannot do database access so > * cannot verify the name. Must accept the value on faith. > > In which situation this happens? Wouldn't it be better to simply > enforce the transaction and fail otherwise? Reading postgresql.conf during startup, I don't think we want to fail in that scenario ;) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 01/12/2015 11:33 PM, Petr Jelinek wrote: > Second patch adds DDL support. I originally wanted to make it > CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS > a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES > which does not need to change anything (besides adding METHOD to > unreserved keywords). > The DDL support uses the DefineStmt infra with some very small change as > the sequence ams are not schema qualified, but I think it's acceptable > and saves considerable amount of boilerplate. Do we need DDL commands for this at all? I could go either way on that question. We recently had a discussion on that wrt. index access methods [1], and Tom opined that providing DDL for creating index access methods is not worth it. The extension can just insert the rows into pg_seqam with INSERT. Do we expect sequence access methods as extensions to be more popular than index access methods? Maybe, because the WAL-logging problem doesn't exist. But OTOH, if you're writing something like a replication system that needs global sequences as part of it, there aren't that many of those, and the installation scripts will need to deal with more complicated stuff than inserting a row in pg_seqam. [1] http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us > And third patch is gapless sequence implementation updated to work with > the new DDL support with some tests added for checking if dependencies > work correctly. It also acts as example on how to make custom AMs. I'll take a look at that to see how the API works, but we're not going to include it in the source tree, because it doesn't actually guarantee gaplessness. That makes it a pretty dangerous example. I'm sure we can come up with a better example that might even be useful. How about a Lamport's clock sequence, which advances once per second, in addition to when anyone calls nextval() ? Or a remote sequence that uses an FDW to call nextval() in a foreign server? - Heikki
On Thu, Jan 22, 2015 at 10:50 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 01/12/2015 11:33 PM, Petr Jelinek wrote: >> Second patch adds DDL support. I originally wanted to make it >> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS >> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES >> which does not need to change anything (besides adding METHOD to >> unreserved keywords). >> The DDL support uses the DefineStmt infra with some very small change as >> the sequence ams are not schema qualified, but I think it's acceptable >> and saves considerable amount of boilerplate. > > Do we need DDL commands for this at all? I could go either way on that > question. We recently had a discussion on that wrt. index access methods > [1], and Tom opined that providing DDL for creating index access methods is > not worth it. The extension can just insert the rows into pg_seqam with > INSERT. Do we expect sequence access methods as extensions to be more > popular than index access methods? Maybe, because the WAL-logging problem > doesn't exist. But OTOH, if you're writing something like a replication > system that needs global sequences as part of it, there aren't that many of > those, and the installation scripts will need to deal with more complicated > stuff than inserting a row in pg_seqam. I think the main reason we don't need DDL for pg_am is because there's no real hope of anybody actually inserting anything in there whether we have a command for it or not. If we actually expect this to be used, I think it should probably have some kind of DDL, because otherwise what will pg_dump emit? "Nothing" is bad because then dumps won't restore, and "direct inserts to pg_seqam" doesn't seem great either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 22/01/15 16:50, Heikki Linnakangas wrote: > On 01/12/2015 11:33 PM, Petr Jelinek wrote: >> Second patch adds DDL support. I originally wanted to make it >> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS >> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES >> which does not need to change anything (besides adding METHOD to >> unreserved keywords). >> The DDL support uses the DefineStmt infra with some very small change as >> the sequence ams are not schema qualified, but I think it's acceptable >> and saves considerable amount of boilerplate. > > Do we need DDL commands for this at all? I could go either way on that > question. We recently had a discussion on that wrt. index access methods > [1], and Tom opined that providing DDL for creating index access methods > is not worth it. The extension can just insert the rows into pg_seqam > with INSERT. Do we expect sequence access methods as extensions to be > more popular than index access methods? Maybe, because the WAL-logging > problem doesn't exist. But OTOH, if you're writing something like a > replication system that needs global sequences as part of it, there > aren't that many of those, and the installation scripts will need to > deal with more complicated stuff than inserting a row in pg_seqam. I don't know about popularity, and I've seen the discussion about indexes. The motivation for DDL for me was handling dependencies correctly, that's all. If we say we don't care about that (and allow DROP EXTENSION even though user has sequences that are using the AM) then we don't need DDL. > > [1] http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us > >> And third patch is gapless sequence implementation updated to work with >> the new DDL support with some tests added for checking if dependencies >> work correctly. It also acts as example on how to make custom AMs. > > I'll take a look at that to see how the API works, but we're not going > to include it in the source tree, because it doesn't actually guarantee > gaplessness. That makes it a pretty dangerous example. I'm sure we can > come up with a better example that might even be useful. How about a > Lamport's clock sequence, which advances once per second, in addition to > when anyone calls nextval() ? Or a remote sequence that uses an FDW to > call nextval() in a foreign server? > I have updated patch ready, just didn't submit it because I am otherwise busy this week, I hope to get to it today evening or tomorrow morning, so I'd wait until that with looking at the patch. The new version (the one that is not submitted yet) of gapless sequence is way more ugly and probably not best example either but does guarantee gaplessness (it stores the last value in it's own value table). So I am not sure if it should be included either... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 22/01/15 17:02, Petr Jelinek wrote: > > The new version (the one that is not submitted yet) of gapless sequence > is way more ugly and probably not best example either but does guarantee > gaplessness (it stores the last value in it's own value table). So I am > not sure if it should be included either... > Here it is as promised. Notable changes: - The gapless sequence rewritten to use the transactional storage as that's the only way to guarantee gaplessness between dump and restore. - Custom columns are now stored in the catalog instead of generated by some special interface. This makes it possible to record dependencies on types and also seems in line with how other similar things are done. - Removed couple of pallocs from the nextval codepath which improved performance considerably. - Changed how the ALTER SEQUENCE ... USING behaves - when RESET WITH value is specified the new sequenceAM will use that one for starting, if it's not specified the nextval from original sequenceAM will be used as RESET value. In general handling of the RESET parameter is left to the AMs. - Unified the reloptions for index and sequences into single function. - Re-added setval3 for compatibility reasons - it only works on local sequences, otherwise it throws error. - Made the 'local' default value for the GUC instead of special handling of the empty string. Notable missing things: - Docs for pg_seqam. - pg_dump support for pg_seqam. It's embarrassing but I didn't think about this until Robert mentioned it, I am apparently too used to handle these things using extensions. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 01/23/2015 02:34 AM, Petr Jelinek wrote: > On 22/01/15 17:02, Petr Jelinek wrote: >> >> The new version (the one that is not submitted yet) of gapless sequence >> is way more ugly and probably not best example either but does guarantee >> gaplessness (it stores the last value in it's own value table). So I am >> not sure if it should be included either... > > Here it is as promised. I generally like the division of labour between the seqam implementations and the API now. I don't like the default_sequenceam GUC. That seems likely to create confusion. And it's not really enough for something like a remote sequence AM anyway that definitely needs some extra reloptions, like the hostname of the remote server. The default should be 'local', unless you specify something else with CREATE SEQUENCE USING - no GUCs. Some comments on pg_dump: * In pg_dump's dumpSequence function, check the server version number instead of checking whether pg_sequence_dump_state() function exists. That's what we usually do when new features are introduced. And there's actually a bug there: you have the check backwards. (try dumping a database with any sequences in it; it fails) * pg_dump should not output a USING clause when a sequence uses the 'local' sequence. No point in adding such noise - making the SQL command not standard-compatible - for the 99% of databases that don't use other sequence AMs. * Be careful to escape all strings correctly in pg_dump. I think you're missing escaping for the 'state' variable at least. In sequence_save_tuple: > else > { > /* > * New tuple was not sent, so the original tuple was probably just > * changed inline, all we need to do is mark the buffer dirty and > * optionally log the update tuple. > */ > START_CRIT_SECTION(); > > MarkBufferDirty(seqh->buf); > > if (do_wal) > log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page); > > END_CRIT_SECTION(); > } This is wrong when checksums are enabled and !do_wal. I believe this should use MarkBufferDirtyHint(). > Notable changes: > - The gapless sequence rewritten to use the transactional storage as > that's the only way to guarantee gaplessness between dump and restore. Can you elaborate? Using the auxiliary seqam_gapless_values is a bit problematic. First of all, the event trigger on DROP SEQUENCE doesn't fire if you drop the whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, updating a row in a table for every nextval() call is pretty darn expensive. But I don't actually see a problem with dump and restore. Can you rewrite it without using the values table? AFAICS, there are a two of problems to solve: 1. If the top-level transaction aborts, you need to restore the old value. I suggest keeping two values in the sequence tuple: the old, definitely committed one, and the last value. The last value is only considered current if the associated XID committed; otherwise the old value is current. When a transaction is about to commit, it stores its top-level XID and the new value in the "last" field, and copies the previously current value to the "old" field. 2. You need to track the last value on a per-subtransaction basis, until the transaction commits/rolls back, in order to have "rollback to savepoint" to retreat the sequence's value. That can be done in backend-private memory, maintaining a stack of subtransactions and the last value of each. Use RegisterSubXactCallback to hook into subxact commit/abort. Just before top-level commit (in pre-commit callback), update the sequence tuple with the latest value, so that it gets WAL-logged before the commit record. - Heikki
On 28/01/15 18:09, Heikki Linnakangas wrote: > On 01/23/2015 02:34 AM, Petr Jelinek wrote: >> On 22/01/15 17:02, Petr Jelinek wrote: >>> >>> The new version (the one that is not submitted yet) of gapless sequence >>> is way more ugly and probably not best example either but does guarantee >>> gaplessness (it stores the last value in it's own value table). So I am >>> not sure if it should be included either... >> >> Here it is as promised. > > I generally like the division of labour between the seqam > implementations and the API now. > > I don't like the default_sequenceam GUC. That seems likely to create > confusion. And it's not really enough for something like a remote > sequence AM anyway that definitely needs some extra reloptions, like the > hostname of the remote server. The default should be 'local', unless you > specify something else with CREATE SEQUENCE USING - no GUCs. > Hmm, I am not too happy about this, I want SERIAL to work with custom sequences (as long as it's possible for the sequence to work that way at least). That's actually important feature for me. Your argument about this being potential problem for some sequenceAMs is valid though. > Some comments on pg_dump: > > * In pg_dump's dumpSequence function, check the server version number > instead of checking whether pg_sequence_dump_state() function exists. > That's what we usually do when new features are introduced. And there's > actually a bug there: you have the check backwards. (try dumping a > database with any sequences in it; it fails) Right. > * pg_dump should not output a USING clause when a sequence uses the > 'local' sequence. No point in adding such noise - making the SQL command > not standard-compatible - for the 99% of databases that don't use other > sequence AMs. Well this only works if we remove the GUC. Because otherwise if GUC is there then you always need to either add USING clause or set the GUC in advance (like we do with search_path for example). > * Be careful to escape all strings correctly in pg_dump. I think you're > missing escaping for the 'state' variable at least. Ouch. > In sequence_save_tuple: >> else >> { >> /* >> * New tuple was not sent, so the original tuple was probably >> just >> * changed inline, all we need to do is mark the buffer dirty and >> * optionally log the update tuple. >> */ >> START_CRIT_SECTION(); >> >> MarkBufferDirty(seqh->buf); >> >> if (do_wal) >> log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page); >> >> END_CRIT_SECTION(); >> } > > This is wrong when checksums are enabled and !do_wal. I believe this > should use MarkBufferDirtyHint(). > Oh ok, didn't realize. >> Notable changes: >> - The gapless sequence rewritten to use the transactional storage as >> that's the only way to guarantee gaplessness between dump and restore. > > Can you elaborate? > > Using the auxiliary seqam_gapless_values is a bit problematic. First of > all, the event trigger on DROP SEQUENCE doesn't fire if you drop the > whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, Yeah but at worst there are some unused rows there, it's not too bad. I could also create relation per sequence so that dependency code would handle everything correctly, but seems bit too expensive to create not one but two relations per sequence... > updating a row in a table for every nextval() call is pretty darn > expensive. Yes it's expensive, but the gapless sequence also serializes access so it will never be speed daemon. > But I don't actually see a problem with dump and restore. The problem is that the tuple stored in the sequence relation is always the one with latest state (and with frozen xid), so pg_dump has no way of getting the value as it was at the time it took the snapshot. This means that after dump/restore cycle, the sequence can be further away than the table it's attached to and you end up with a gap there. > Can you rewrite it without using the values table? AFAICS, there are a > two of problems to solve: > > 1. If the top-level transaction aborts, you need to restore the old > value. I suggest keeping two values in the sequence tuple: the old, > definitely committed one, and the last value. The last value is only > considered current if the associated XID committed; otherwise the old > value is current. When a transaction is about to commit, it stores its > top-level XID and the new value in the "last" field, and copies the > previously current value to the "old" field. > Yes, this is how the previous implementation worked. > 2. You need to track the last value on a per-subtransaction basis, until > the transaction commits/rolls back, in order to have "rollback to > savepoint" to retreat the sequence's value. That can be done in > backend-private memory, maintaining a stack of subtransactions and the > last value of each. Use RegisterSubXactCallback to hook into subxact > commit/abort. Just before top-level commit (in pre-commit callback), > update the sequence tuple with the latest value, so that it gets > WAL-logged before the commit record. > I started writing something like what you described here but then I realized the dump/restore problem and went with the values table which solves both of these issues at the same time. BTW I am happy to discuss this at FOSDEM if you are interested and will have time. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, sending new version that is updated along the lines of what we discussed at FOSDEM, which means: - back to single bytea amdata column (no custom columns) - the dump/restore interfaces were changed to produce/accept array of key/value pairs - psql shows the output of dump interface in verbose mode (this makes it easier to inspect the state) - last_value and is_called are always present columns even if the sequence am does not care for them (this helps with backwards compatibility with various admin tools) - pg_dump uses new interface for dumping only for non-local sequences so that dumps of schemas which are not using custom seqams are restorable to older postgres - the default_sequenceam was changed to serial_sequenceam and only applies to serial/bigserial columns - I would personally prefer to have the old default but default for serial is better than nothing - added description of pg_seqam catalog to docs - gapless_seq still uses table internally to support snapshots for pg_dump - pg_dump support for dumping the sequence ams It would be nice to also have something along the lines of chapter 55.2. (Index Access Method Functions), I plan to send that as additional patch in few days. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > sending new version that is updated along the lines of what we discussed at > FOSDEM, which means: > > - back to single bytea amdata column (no custom columns) Why? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17/02/15 23:11, Robert Haas wrote: > On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> sending new version that is updated along the lines of what we discussed at >> FOSDEM, which means: >> >> - back to single bytea amdata column (no custom columns) > Well, the main argument is still future possibility of moving into single table for storage. And when we discussed about it in person we agreed that there is not too big advantage in having separate columns since there need to be dump/restore state interfaces anyway so you can see the relevant state via those as we made the output more human readable (and the psql support reflects that). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 18/02/15 02:59, Petr Jelinek wrote: > On 17/02/15 23:11, Robert Haas wrote: >> On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr@2ndquadrant.com> >> wrote: >>> sending new version that is updated along the lines of what we >>> discussed at >>> FOSDEM, which means: >>> >>> - back to single bytea amdata column (no custom columns) >> > > Well, the main argument is still future possibility of moving into > single table for storage. And when we discussed about it in person we > agreed that there is not too big advantage in having separate columns > since there need to be dump/restore state interfaces anyway so you can > see the relevant state via those as we made the output more human > readable (and the psql support reflects that). > Also makes the patch a little simpler obviously. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Slightly updated version of the patch. Mainly rebased against current master (there were several conflicts) and fixed some typos, no real functional change. I also attached initial version of the API sgml doc. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 03/15/2015 09:07 PM, Petr Jelinek wrote: > Slightly updated version of the patch. > > Mainly rebased against current master (there were several conflicts) and > fixed some typos, no real functional change. > > I also attached initial version of the API sgml doc. Thanks! With the patch, pg_class.relam column references to the pg_seqam table for sequences, but pg_indexam for indexes. I believe it's the first instance where we reuse a "foreign key" column like that. It's not a real foreign key, of course - that wouldn't work with a real foreign key at all - but it's a bit strange. That makes me a bit uncomfortable. How do others feel about that? - Heikki
On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote: > With the patch, pg_class.relam column references to the pg_seqam table for > sequences, but pg_indexam for indexes. I believe it's the first instance > where we reuse a "foreign key" column like that. It's not a real foreign > key, of course - that wouldn't work with a real foreign key at all - but > it's a bit strange. That makes me a bit uncomfortable. How do others feel > about that? Hm. I'd modeled it more as an extension of the 'relkind' column mentally. I.e. it further specifies how exactly the relation is behaving. Given that the field has been added to pg_class and not pg_index, combined with it not having index in its name, makes me think that it actually was intended to be used the way it's done in the patch. It's not the first column that behaves that way btw, at least pg_depend comes to mind. Greetings, Andres Freund
On 20/04/15 12:05, Andres Freund wrote: > On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote: >> With the patch, pg_class.relam column references to the pg_seqam table for >> sequences, but pg_indexam for indexes. I believe it's the first instance >> where we reuse a "foreign key" column like that. It's not a real foreign >> key, of course - that wouldn't work with a real foreign key at all - but >> it's a bit strange. That makes me a bit uncomfortable. How do others feel >> about that? > > Hm. I'd modeled it more as an extension of the 'relkind' column > mentally. I.e. it further specifies how exactly the relation is > behaving. Given that the field has been added to pg_class and not > pg_index, combined with it not having index in its name, makes me think > that it actually was intended to be used the way it's done in the patch. > That's how I think about it too. It's also short for access method, nothing really suggests to me that it should be index specific by design. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 03/15/2015 09:07 PM, Petr Jelinek wrote: > Slightly updated version of the patch. > > Mainly rebased against current master (there were several conflicts) and > fixed some typos, no real functional change. > > I also attached initial version of the API sgml doc. I finally got around to have another round of review on this. I fixed a couple of little bugs, did some minor edition on comments etc. See attached. It is also available in my git repository at git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam", if you want to look at individual changes. It combines your patches 1 and 4, I think those need to be applied together. I haven't looked at the DDL changes yet. I'm fairly happy with the alloc API now. I'm not sure it's a good idea for the AM to access the sequence tuple directly, though. I would seem cleaner if it only manipulated the amdata Datum. But it's not too bad as it is. We went back and forth on whether 'amdata' should be a single column or multiple columns, but I guess the single bytea column was the consensus in the end. The division of labour between sequence.c and the AM, in the init and the get/set_state functions, is a bit more foggy: * Do we really need a separate amoptions() method and an init() method, when the only caller to amoptions() is just before the init() method? The changes in extractRelOptions suggest that it would call am_reloptions for sequences too, but that doesn't actually seem to be happening. postgres=# create sequence fooseq using local with (garbage=option); CREATE SEQUENCE Invalid options probably should throw an error. * Currently, the AM's init function is responsible for basic sanity checking, like min < max. It also has to extract the RESTART value from the list of options. I think it would make more sense to move that to sequence.c. The AM should only be responsible for initializing the 'amdata' portion, and for handling any AM-specific options. If the AM doesn't support some options, like MIN and MAX value, it can check them and throw an error, but it shouldn't be responsible for just passing them through from the arguments to the sequence tuple. * It might be better to form the sequence tuple before calling the init function, and pass the ready-made tuple to it. All the other API functions deal with the tuple (after calling sequence_read_tuple), so that would be more consistent. The init function would need to deconstruct it to modify it, but we're not concerned about performance here. * The transformations of the arrays in get_state() and set_state() functions are a bit complicated. The seqam_get_state() function returns two C arrays, and pg_sequence_get_state() turns them into a text[] array. Why not construct the text[] array directly in the AM? I guess it's a bit more convenient to the AM, if the pg_sequence_get_state() do that, but if that's an issue, you could create generic helper function to turn two C string arrays into text[], and call that from the AM. > seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh)); > > for (i = 0; i < count; i++) > { > if (pg_strcasecmp(keys[i], "last_value") == 0) > seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in, > CStringGetDatum(values[i]))); > else if (pg_strcasecmp(keys[i], "is_called") == 0) > seq->is_called = DatumGetBool(DirectFunctionCall1(boolin, > CStringGetDatum(values[i]))); > else > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > errmsg("invalid state key \"%s\" for local sequence", > keys[i]))); > } > > sequence_save_tuple(seqh, NULL, true); If that error happens after having already processed a "last_value" or "is_called" entry, you have already modified the on-disk tuple. AFAICS that's the only instance of that bug, but sequence_read_tuple - modify tuple in place - sequence_save_tuple pattern is quite unsafe in general. If you modify a tuple directly in a Buffer, you should have a critical section around it. It would make sense to start a critical section in sequence_read_tuple(), except that not all callers want to modify the tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple functions should be split into two: sequence_read_tuple_for_update() sequence_save_tuple() * seqam_local_get_state() calls sequence_read_tuple() but not sequence_release_tuple(). It looks like sequence_close() releases the tuple and the buffer, but that seems sloppy. sequence_close() probably shouldn't be calling sequence_release_tuple at all, so that you'd get a warning at the end of transaction about the buffer leak. - Heikki
Attachment
Heikki Linnakangas wrote: > * The transformations of the arrays in get_state() and set_state() functions > are a bit complicated. The seqam_get_state() function returns two C arrays, > and pg_sequence_get_state() turns them into a text[] array. Why not > construct the text[] array directly in the AM? I guess it's a bit more > convenient to the AM, if the pg_sequence_get_state() do that, but if that's > an issue, you could create generic helper function to turn two C string > arrays into text[], and call that from the AM. Um, see strlist_to_textarray() in objectaddress.c if you do that. Maybe we need some generic place to store that kind of helper function. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 20/04/15 17:50, Heikki Linnakangas wrote: > On 03/15/2015 09:07 PM, Petr Jelinek wrote: >> Slightly updated version of the patch. >> >> Mainly rebased against current master (there were several conflicts) and >> fixed some typos, no real functional change. >> >> I also attached initial version of the API sgml doc. > > I finally got around to have another round of review on this. I fixed a > couple of little bugs, did some minor edition on comments etc. See > attached. It is also available in my git repository at > git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam", > if you want to look at individual changes. It combines your patches 1 > and 4, I think those need to be applied together. I haven't looked at > the DDL changes yet. Thanks! > > I'm fairly happy with the alloc API now. I'm not sure it's a good idea > for the AM to access the sequence tuple directly, though. I would seem > cleaner if it only manipulated the amdata Datum. But it's not too bad as > it is. Yeah, I was thinking about this myself I just liked sending 10 parameters to the function less than this. > > The division of labour between sequence.c and the AM, in the init and > the get/set_state functions, is a bit more foggy: > > * Do we really need a separate amoptions() method and an init() method, > when the only caller to amoptions() is just before the init() method? > The changes in extractRelOptions suggest that it would call > am_reloptions for sequences too, but that doesn't actually seem to be > happening. Hmm yes it should and I am sure it did at some point, must have messed it during one of the rebases :( And it's the reason why we need separate API function. > > postgres=# create sequence fooseq using local with (garbage=option); > CREATE SEQUENCE > > Invalid options probably should throw an error. > > * Currently, the AM's init function is responsible for basic sanity > checking, like min < max. It also has to extract the RESTART value from > the list of options. I think it would make more sense to move that to > sequence.c. The AM should only be responsible for initializing the > 'amdata' portion, and for handling any AM-specific options. If the AM > doesn't support some options, like MIN and MAX value, it can check them > and throw an error, but it shouldn't be responsible for just passing > them through from the arguments to the sequence tuple. Well then we need to send restart as additional parameter to the init function as restart is normally not stored anywhere. The checking is actually if new value is withing min/max but yes that can be done inside sequence.c I guess. > > * It might be better to form the sequence tuple before calling the init > function, and pass the ready-made tuple to it. All the other API > functions deal with the tuple (after calling sequence_read_tuple), so > that would be more consistent. The init function would need to > deconstruct it to modify it, but we're not concerned about performance > here. Right, this is actually more of a relic of the custom columns implementation where I didn't want to build the tuple with NULLs for columns that might have been specified as NOT NULL, but with the amdata approach it can create the tuple safely. > > * The transformations of the arrays in get_state() and set_state() > functions are a bit complicated. The seqam_get_state() function returns > two C arrays, and pg_sequence_get_state() turns them into a text[] > array. Why not construct the text[] array directly in the AM? I guess > it's a bit more convenient to the AM, if the pg_sequence_get_state() do > that, but if that's an issue, you could create generic helper function > to turn two C string arrays into text[], and call that from the AM. Yeah that was exactly the reasoning. Helper function works for me (will check what Álvaro's suggested, maybe it can be moved somewhere and reused). > >> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh)); >> >> for (i = 0; i < count; i++) >> { >> if (pg_strcasecmp(keys[i], "last_value") == 0) >> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in, >> >> CStringGetDatum(values[i]))); >> else if (pg_strcasecmp(keys[i], "is_called") == 0) >> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin, >> >> CStringGetDatum(values[i]))); >> else >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >> errmsg("invalid state key \"%s\" for local >> sequence", >> keys[i]))); >> } >> >> sequence_save_tuple(seqh, NULL, true); > > If that error happens after having already processed a "last_value" or > "is_called" entry, you have already modified the on-disk tuple. AFAICS > that's the only instance of that bug, but sequence_read_tuple - modify > tuple in place - sequence_save_tuple pattern is quite unsafe in general. > If you modify a tuple directly in a Buffer, you should have a critical > section around it. It would make sense to start a critical section in > sequence_read_tuple(), except that not all callers want to modify the > tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple > functions should be split into two: > > sequence_read_tuple_for_update() > sequence_save_tuple() Agreed, however I am much more concerned about the way seqam_local_alloc() works, see the XXX comment there. I am thinking it's not really safe that way, but problem is that we can't put critical section around it currently as the sequence_save_tuple() can potentially call GetTopTransactionId(). Separating it into more functions might be solution. Except the split into two does not really help there, it would have to be something like this: sequence_tuple_update_start() - does the GetTopTransactionId bit and starts critical section sequence_tuple_save() - saves the tuple/does wal sequence_tuple_update_finish() - ends the critical section That looks slightly cumbersome but I don't really have better idea. > > * seqam_local_get_state() calls sequence_read_tuple() but not > sequence_release_tuple(). It looks like sequence_close() releases the > tuple and the buffer, but that seems sloppy. sequence_close() probably > shouldn't be calling sequence_release_tuple at all, so that you'd get a > warning at the end of transaction about the buffer leak. > Do you mean that you want to make call to sequence_release_tuple() mandatory when sequence_read_tuple() was called? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 22/04/15 22:01, Petr Jelinek wrote: > On 20/04/15 17:50, Heikki Linnakangas wrote: >>> >> I finally got around to have another round of review on this. I fixed a >> couple of little bugs, did some minor edition on comments etc. See >> attached. It is also available in my git repository at >> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam", >> if you want to look at individual changes. It combines your patches 1 >> and 4, I think those need to be applied together. I haven't looked at >> the DDL changes yet. > > Thanks! I merged all your changes and merged the patch#1 with patch#4. Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam and gave you (Heikki) commit rights there in case you want to change anything. >> The division of labour between sequence.c and the AM, in the init and >> the get/set_state functions, is a bit more foggy: >> >> * Do we really need a separate amoptions() method and an init() method, >> when the only caller to amoptions() is just before the init() method? >> The changes in extractRelOptions suggest that it would call >> am_reloptions for sequences too, but that doesn't actually seem to be >> happening. > > Hmm yes it should and I am sure it did at some point, must have messed > it during one of the rebases :( > > And it's the reason why we need separate API function. > Actually the reloption handling was broken in more than one place, I fixed it all around. I had to teach the heap_create_with_catalog about relam but the change does not seem to be too intrusive to me so I think it's fine. We no longer do the update of pg_class tuple after the sequence was created anymore and the relcache works correctly now. Also the whole reloption handling was moved to more proper places than the seqam_init so it's more in line with how it works for other relation kinds. >> >> postgres=# create sequence fooseq using local with (garbage=option); >> CREATE SEQUENCE >> >> Invalid options probably should throw an error. >> Done, well actually the local sequence now throws error on any options. I also updated CREATE SEQUENCE and ALTER SEQUENCE docs with the reloptions syntax. >> * Currently, the AM's init function is responsible for basic sanity >> checking, like min < max. It also has to extract the RESTART value from >> the list of options. I think it would make more sense to move that to >> sequence.c. The AM should only be responsible for initializing the >> 'amdata' portion, and for handling any AM-specific options. If the AM >> doesn't support some options, like MIN and MAX value, it can check them >> and throw an error, but it shouldn't be responsible for just passing >> them through from the arguments to the sequence tuple. > I now do the checking in sequence.c where possible, the init got generally redone to accept the newly created tuple and restart value parameters. One thing that I am not sure how much like is that if the AM wants to change the amdata in the init, it now has to modify the tuple and free the old one. > >> >> * The transformations of the arrays in get_state() and set_state() >> functions are a bit complicated. The seqam_get_state() function returns >> two C arrays, and pg_sequence_get_state() turns them into a text[] >> array. Why not construct the text[] array directly in the AM? I guess >> it's a bit more convenient to the AM, if the pg_sequence_get_state() do >> that, but if that's an issue, you could create generic helper function >> to turn two C string arrays into text[], and call that from the AM. > > Yeah that was exactly the reasoning. Helper function works for me (will > check what Álvaro's suggested, maybe it can be moved somewhere and reused). > Didn't use Álvaro's code in the end as ISTM working directly with arrays is simple enough for this use case. The only slightly ugly part is the use of TextDatumGetCString/CStringGetDatum but I think that's survivable (maybe the sequence.c could convert the TEXT[] to CSTRING[] or just char *[] as this is not performance critical code path). >> >>> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh)); >>> >>> for (i = 0; i < count; i++) >>> { >>> if (pg_strcasecmp(keys[i], "last_value") == 0) >>> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in, >>> >>> CStringGetDatum(values[i]))); >>> else if (pg_strcasecmp(keys[i], "is_called") == 0) >>> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin, >>> >>> CStringGetDatum(values[i]))); >>> else >>> ereport(ERROR, >>> (errcode(ERRCODE_SYNTAX_ERROR), >>> errmsg("invalid state key \"%s\" for local >>> sequence", >>> keys[i]))); >>> } >>> >>> sequence_save_tuple(seqh, NULL, true); >> >> If that error happens after having already processed a "last_value" or >> "is_called" entry, you have already modified the on-disk tuple. AFAICS >> that's the only instance of that bug, but sequence_read_tuple - modify >> tuple in place - sequence_save_tuple pattern is quite unsafe in general. >> If you modify a tuple directly in a Buffer, you should have a critical >> section around it. It would make sense to start a critical section in >> sequence_read_tuple(), except that not all callers want to modify the >> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple >> functions should be split into two: >> >> sequence_read_tuple_for_update() >> sequence_save_tuple() > > Agreed, however I am much more concerned about the way > seqam_local_alloc() works, see the XXX comment there. I am thinking it's > not really safe that way, but problem is that we can't put critical > section around it currently as the sequence_save_tuple() can potentially > call GetTopTransactionId(). Separating it into more functions might be > solution. Except the split into two does not really help there, it would > have to be something like this: > sequence_tuple_update_start() - does the GetTopTransactionId bit and > starts critical section > sequence_tuple_save() - saves the tuple/does wal > sequence_tuple_update_finish() - ends the critical section > > That looks slightly cumbersome but I don't really have better idea. > So I went with something like what was proposed: - sequence_start_update(seqh, do_wal) - prepares the state and starts the critical section - sequence_apply_update(seqh, do_wal) - does buffer dirtying and wal logging - sequence_finish_update(seqh) - closes the critical section and cleans up the state The API might look slightly hairy now but I don't see much difference between having to do: START_CRIT_SECTION() sequence_update_tuple() END_CRIT_SECTION() and the above, except the above also solves the xid acquiring and shows less internals. I also added sequence_swap_tuple() for the use-case when the AM does not want to do inline updates because having this separate makes things easier and less ugly. And once you are changing whole tuple you are already in slow/complex code path so one more function call does not matter. >> * seqam_local_get_state() calls sequence_read_tuple() but not >> sequence_release_tuple(). It looks like sequence_close() releases the >> tuple and the buffer, but that seems sloppy. sequence_close() probably >> shouldn't be calling sequence_release_tuple at all, so that you'd get a >> warning at the end of transaction about the buffer leak. >> I did this (requiring the sequence_release_tuple() always) and documented it, but personally don't like it much as it just adds one more call to the API which already has enough of them and I really don't see the advantage of it. Other than things mentioned above I rebased it on current master (the transforms commit produced several issues). And fixed several bugs like checking min/max value in set_state, proper initialization of session cached increment_by, etc. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Heikki, do you have time to go through this at this point? Petr Jelinek wrote: > I merged all your changes and merged the patch#1 with patch#4. > > Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam and > gave you (Heikki) commit rights there in case you want to change anything. I rebased your #1 to current master; attached. It builds and passes regression test, but I didn't check any further. Doc build fails with: $ make check onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s /pgsql/source/master/doc/src/sgml/postgres.sgml onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open Makefile:320: recipe for target 'check' failed make: *** [check] Error 1 -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 05/13/2015 07:10 AM, Alvaro Herrera wrote: > Heikki, do you have time to go through this at this point? I'm afraid I won't :-(. I did intend to, but looking at the calendar, I won't have the time to review this thoroughly enough to commit. Sorry. I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax patch at all yet. We discussed using a single amdata column vs. any number of am-specific columns. We settled on amdata, but I'm still not 100% convinced that's the best approach. Just as a data point, this removes the log_cnt field and moves it into amdata in a non-human-readable format. So for someone who only uses the local seqam, this just makes things slightly worse. For more complicated seqam's, it would be even more important to display the state in a human-readable format. Perhaps it's OK that each seqam provides its own functions or similar to do that, but I'd like to revisit that decision. I still don't like the serial_sequenceam GUC. Not sure what to do instead. Needs some thought. - Heikki
On 13 May 2015 at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
--
On 05/13/2015 07:10 AM, Alvaro Herrera wrote:Heikki, do you have time to go through this at this point?
I'm afraid I won't :-(. I did intend to, but looking at the calendar, I won't have the time to review this thoroughly enough to commit. Sorry.
I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax patch at all yet.
We discussed using a single amdata column vs. any number of am-specific columns. We settled on amdata, but I'm still not 100% convinced that's the best approach. Just as a data point, this removes the log_cnt field and moves it into amdata in a non-human-readable format. So for someone who only uses the local seqam, this just makes things slightly worse. For more complicated seqam's, it would be even more important to display the state in a human-readable format. Perhaps it's OK that each seqam provides its own functions or similar to do that, but I'd like to revisit that decision.
I still don't like the serial_sequenceam GUC. Not sure what to do instead. Needs some thought.
This has had around 2 years of thought at this point. I don't agree it needs more thought.
There is one clear use case for this and it is of benefit to many distributed architectures.
I don't see what calamity will occur if we commit this. If you don't want a sequence AM, don't ever use this.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 13/05/15 12:56, Heikki Linnakangas wrote: > On 05/13/2015 07:10 AM, Alvaro Herrera wrote: >> Heikki, do you have time to go through this at this point? > > I'm afraid I won't :-(. I did intend to, but looking at the calendar, I > won't have the time to review this thoroughly enough to commit. Sorry. > > We discussed using a single amdata column vs. any number of am-specific > columns. We settled on amdata, but I'm still not 100% convinced that's > the best approach. Just as a data point, this removes the log_cnt field > and moves it into amdata in a non-human-readable format. So for someone > who only uses the local seqam, this just makes things slightly worse. > For more complicated seqam's, it would be even more important to display > the state in a human-readable format. Perhaps it's OK that each seqam > provides its own functions or similar to do that, but I'd like to > revisit that decision. > I do think it's ok for seqam to provide functions that can give you the data in human readable form. I think main argument against custom human readable columns is that it will kill any possibility to have common storage for sequences. There is also additional complexity in the API required for that. The performance implications are probably small as one could still define opaque bytea column and store the data same way a now. > I still don't like the serial_sequenceam GUC. Not sure what to do > instead. Needs some thought. > I think it would be ok if this issue was solved by follow-up patch at later time. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 05/13/2015 02:12 PM, Simon Riggs wrote: > This has had around 2 years of thought at this point. I don't agree it > needs more thought. Noted. > There is one clear use case for this and it is of benefit to many > distributed architectures. Right. What's your point? > I don't see what calamity will occur if we commit this. If you don't want a > sequence AM, don't ever use this. I'd like the API to be good for its purpose. Also, I did mention that the the current patch makes the situation slightly worse for people who never use this: it makes the log_cnt field non human-readable. That's a really minor thing, but it shows that it *does* matter how this is implemented, even if you only ever use the local seq AM. - Heikki
On 13/05/15 14:15, Heikki Linnakangas wrote: >> I don't see what calamity will occur if we commit this. If you don't >> want a >> sequence AM, don't ever use this. > > I'd like the API to be good for its purpose. Also, I did mention that > the the current patch makes the situation slightly worse for people who > never use this: it makes the log_cnt field non human-readable. That's a > really minor thing, but it shows that it *does* matter how this is > implemented, even if you only ever use the local seq AM. > It definitely does matter. I don't think we'll find perfect compromise here though, you can either do it one way or the other. Trust me it does not make me happy either, I like perfect solutions too, but when there is lack of perfect solution I prefer the simpler one. Both of the solutions have drawbacks - amdata has opaque blob which does not store data in user visible way but that can be worked around by providing function that shows it in human readable way (and the dump function for many sequence types actually does that). - multiple columns basically kill any future ability to unify the storage for sequences and also adds complexity, especially around alter table (since it means drop/add column and stuff) But I already wrote both versions anyway so from that perspective it does not matter much which part we merge. (As a side-node I would have preferred to have this discussion earlier than 2 days before feature freeze because the current implementation is something that we agreed on several months ago so there was plenty of time to revisit that decision.) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13/05/15 06:10, Alvaro Herrera wrote: > > I rebased your #1 to current master; attached. It builds and passes Thanks > regression test, but I didn't check any further. Doc build fails with: > > $ make check > onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s /pgsql/source/master/doc/src/sgml/postgres.sgml > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here; missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here > onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open > Makefile:320: recipe for target 'check' failed > make: *** [check] Error 1 > Missing <para>, attached fixed version with your rebase and one of the commits from Heikki that I missed last time. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
I've been looking at these patches a bit and here are some comments: Patch 1: seqam I would like to see an example in the docs for CREATE SEQUENCE. That's perhaps not possible (or desirable) with only the "local" seqam? Not sure. In the docs for pg_class, there is no mention that relam refers to pg_seqam for sequences but pg_am for all other types. There are errant half-sentences in the documentation, for example "to the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in Sequence Access Method Functions. I'd prefer a README instead of the long comment at the start of seqam.c.The other ams do that. As mentioned upthread, this patch isn't a seamless replacement for what's already there because of the amdata field. I wasn't part of the conversation of FOSDEM unfortunately, and there's not enough information in this thread to know why this solution is preferred over each seqam having its own table type with all the columns it needs. I see that Heikki is waffling a bit between the two, and I have a fairly strong opinion that amdata should be split into separate columns. The patch already destroys and recreates what it needs when changing access method via ALTER SEQUENCE, so I don't really see what the problem is. There is no psql tab completion for the new USING clause in ALTER SEQUENCE, and in looking at that I noticed we don't have tab completion for CREATE SEQUENCE at all. I know we don't complete everything, but if we're going to do ALTER, I think we should do CREATE. I'll submit a patch for that on its own thread, but then this patch will need to modify it to include USING. There is no \d command for sequence access methods. Without querying pg_seqam directly, how does one discover what's available? There are some unfinished comments, such as "When relam is specified, record dependency on the" in heap.c. Comments and code in pg_dump.c check that the server is at least 9.5. Those'll need to be changed to at least 9.6. Patch 2: seqam ddl When defining a new access method for sequences, it is possible to list the arguments multiple times (last one wins). Other defel loops raise an error if the argument is specified more than once. I haven't looked at all of such loops to see if this is the only odd man out or not, but I prefer the error behavior. It doesn't seem possible to create a comment for a seqam, is that intentional? Ah, after more review I see it is possible, just not documented. I think that needs to be corrected. Same comment as above about testing for 9.5. Patch 3: gapless_seq I really like the idea of having a gapless sequence in contrib. Although it has big potential to be abused, doing it manually when it's needed (like for invoices, at least in France) is a major pain. So big +1 for including this. However, the patch doesn't update the main contrib Makefile so you have to compile it explicitly. It also doesn't have the .sgml file in the right place, so that's not installed either. There is a FIXME in get_last_value_tup() which should probably be removed in light of the code comment in catcache.c stating that pg_seqam is too small to benefit from an index scan. It would be nice to be able to specify an access method when declaring a serial or bigserial column. This could be a separate patch later on, though. On the whole, I think this is a pretty good patchset. Aside from the design decision of whether amdata is a single opaque column or a set of columns, there are only a few things that need to be changed before it's ready for committer, and those things are mostly documentation. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 2015-06-15 11:32, Vik Fearing wrote: > I've been looking at these patches a bit and here are some comments: > Thanks for looking at this. > Patch 1: seqam > > I would like to see an example in the docs for CREATE SEQUENCE. That's > perhaps not possible (or desirable) with only the "local" seqam? Not sure. > It is possible to have example with local seqam, it might be confusing though, given it produces same results as not putting USING in the query. > In the docs for pg_class, there is no mention that relam refers to > pg_seqam for sequences but pg_am for all other types. > > There are errant half-sentences in the documentation, for example "to > the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in > Sequence Access Method Functions. I think that's the side effect of all the rebases and rewrites over the 2y(!) that this has been going forward and back. It can be easily fixed by proof reading before final submission. I didn't pay too much attention yet because it's not clear how the docs should look like if there is no real agreement on the api. (This applies to other comments about docs as well) > > I'd prefer a README instead of the long comment at the start of seqam.c. > The other ams do that. > OK, since things have been moved to separate directory, README is doable, I personally prefer the docs in the main .c file usually but I know project uses README sometimes for this. > As mentioned upthread, this patch isn't a seamless replacement for > what's already there because of the amdata field. I wasn't part of the > conversation of FOSDEM unfortunately, and there's not enough information > in this thread to know why this solution is preferred over each seqam > having its own table type with all the columns it needs. I see that > Heikki is waffling a bit between the two, and I have a fairly strong > opinion that amdata should be split into separate columns. The patch > already destroys and recreates what it needs when changing access method > via ALTER SEQUENCE, so I don't really see what the problem is. > FOSDEM was just about agreeing that amdata is simpler after we discussed it with Heikki. Nothing too important you missed there I guess. I can try to summarize what are the differences: - amdata is somewhat simpler in terms of code for both init, alter and DDL, since with custom columns you have to specify them somehow and deal with them in catalog, also ALTER SEQUENCE USING means that there are going to be colums marked as deleted which produces needless waste, etc - amdata make it easier to change the storage model as the tuple descriptor is same for all sequences - the separate columns are much nicer from user point of view - my opinion is that separate columns also more nicely separate state from options and I think that if we move to separate storage model, there can be state table per AM which solves the tuple descriptor issue - there is probably some slight performance benefit to amdata but I don't think it's big enough to be important I personally have slight preference to separate columns design, but I am ok with both ways honestly. > > There is no \d command for sequence access methods. Without querying > pg_seqam directly, how does one discover what's available? > Good point. > > Patch 2: seqam ddl > > When defining a new access method for sequences, it is possible to list > the arguments multiple times (last one wins). Other defel loops raise > an error if the argument is specified more than once. I haven't looked > at all of such loops to see if this is the only odd man out or not, but > I prefer the error behavior. > Hmm yeah, there should be error. I think only tsearch doesn't enforce errors from the existing stuff, should probably be fixed as well (separately of course). > > Patch 3: gapless_seq > > I really like the idea of having a gapless sequence in contrib. > Although it has big potential to be abused, doing it manually when it's > needed (like for invoices, at least in France) is a major pain. So big > +1 for including this. > Yeah, people make gapless sequences regardless, it's better to provide them one that behaves correctly, also it's quite good test for the API. > > It would be nice to be able to specify an access method when declaring a > serial or bigserial column. This could be a separate patch later on, > though. > The patch originally had GUC for this, but Heikki didn't like it so it's left for the future developments. > > On the whole, I think this is a pretty good patchset. Aside from the > design decision of whether amdata is a single opaque column or a set of > columns, there are only a few things that need to be changed before it's > ready for committer, and those things are mostly documentation. > Unfortunately the amdata being opaque vs set of columns is the main issue here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
So, we have this patch in the commitfest again. Let's see where we are, and try to find a consensus on what needs to be done before this can be committed. On 06/17/2015 06:51 PM, Petr Jelinek wrote: > On 2015-06-15 11:32, Vik Fearing wrote: >> I've been looking at these patches a bit and here are some comments: > > Thanks for looking at this. +1, thanks Vik. >> As mentioned upthread, this patch isn't a seamless replacement for >> what's already there because of the amdata field. I wasn't part of the >> conversation of FOSDEM unfortunately, and there's not enough information >> in this thread to know why this solution is preferred over each seqam >> having its own table type with all the columns it needs. I see that >> Heikki is waffling a bit between the two, and I have a fairly strong >> opinion that amdata should be split into separate columns. The patch >> already destroys and recreates what it needs when changing access method >> via ALTER SEQUENCE, so I don't really see what the problem is. > > FOSDEM was just about agreeing that amdata is simpler after we discussed > it with Heikki. Nothing too important you missed there I guess. > > I can try to summarize what are the differences: > - amdata is somewhat simpler in terms of code for both init, alter and > DDL, since with custom columns you have to specify them somehow and deal > with them in catalog, also ALTER SEQUENCE USING means that there are > going to be colums marked as deleted which produces needless waste, etc > - amdata make it easier to change the storage model as the tuple > descriptor is same for all sequences > - the separate columns are much nicer from user point of view > - my opinion is that separate columns also more nicely separate state > from options and I think that if we move to separate storage model, > there can be state table per AM which solves the tuple descriptor issue > - there is probably some slight performance benefit to amdata but I > don't think it's big enough to be important > > I personally have slight preference to separate columns design, but I am > ok with both ways honestly. Regarding the amdata issue, I'm also leaning towards set of columns. I've felt that way all along, but not very strongly, so I relented at some point when Andres felt strongly that a single column would be better. But the more I think about it, the more I feel that separate columns really would be better. As evidence, I offer this recent thread: Tom said (http://www.postgresql.org/message-id/8739.1436893588@sss.pgh.pa.us): > I really don't see what's wrong with "SELECT last_value FROM sequence", > especially since that has worked in every Postgres version since 6.x. > Anyone slightly worried about backwards compatibility wouldn't use > an equivalent function even if we did add one. If we went with the single amdata column, that would break. Or we'd need to leave last_value as a separate column anyway, and leave it unused for sequence AMs where it's not applicable. But that's a bit ugly too. Jim Nasby said in the same thread: > FWIW, I think it'd be better to have a pg_sequences view that's the > equivalent of SELECT * FROM <sequence> for every sequence in the > database. That would let you get whatever info you needed. Creating such a view would be difficult if all the sequences have a different set of columns. But when you think about it, it's not really any better with a single amdata column. You can't easily access the data in the amdata column that way either. Anyway, that's my opinion. Several others have weighed in to support separate columns, too, so I think that is the consensus. Separate columns it is. >> There is no \d command for sequence access methods. Without querying >> pg_seqam directly, how does one discover what's available? > > Good point. Well, you can query pg_seqam. I don't think this deserves a \d command. >> On the whole, I think this is a pretty good patchset. Aside from the >> design decision of whether amdata is a single opaque column or a set of >> columns, there are only a few things that need to be changed before it's >> ready for committer, and those things are mostly documentation. > > Unfortunately the amdata being opaque vs set of columns is the main > issue here. There was discussion on another thread on how the current sequence AM API is modeled after the indexam API, at http://www.postgresql.org/message-id/3896.1437059303@sss.pgh.pa.us. Will need to do something about that too. Petr, is this enough feedback on this patch for this commitfest, or are there some other issues you want to discuss before I mark this as returned? - Heikki
On 2015-07-28 20:11, Heikki Linnakangas wrote: > > Petr, is this enough feedback on this patch for this commitfest, or are > there some other issues you want to discuss before I mark this as returned? > You can mark it as returned, I didn't have much time to actually do much useful work on this in the current CF. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 28 July 2015 at 19:51, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 2015-07-28 20:11, Heikki Linnakangas wrote:
Petr, is this enough feedback on this patch for this commitfest, or are
there some other issues you want to discuss before I mark this as returned?
You can mark it as returned, I didn't have much time to actually do much useful work on this in the current CF.
Is this now dependant on the work Alexander Korotkov is doing on the AM interface?
Thom
On 2015-09-16 13:21, Thom Brown wrote: > On 28 July 2015 at 19:51, Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>> wrote: > > On 2015-07-28 20:11, Heikki Linnakangas wrote: > > > Petr, is this enough feedback on this patch for this commitfest, > or are > there some other issues you want to discuss before I mark this > as returned? > > > You can mark it as returned, I didn't have much time to actually do > much useful work on this in the current CF. > > > Is this now dependant on the work Alexander Korotkov is doing on the AM > interface? > Yes. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, here is another try at this. I had some more time to think about various corner cases (like the sequence not having TOAST). I decided for this version to try bit different approach and that is to use custom datatype for the sequence access method state. So there is one extra amstate column and the access method defines the custom type. The idea is that the custom type should be fixed width so that we can do proper checks during the creation of access method for size and stuff. It also makes the interface bit cleaner as we can provide method for reading and saving of the state and abstract the internals more nicely. It also solves the issues with critical sections vs memory allocations when the access methods wants to save completely new state. It also means that get_state and set_state interfaces just work with text representation of custom type, no need to have complex conversions to arrays and back. One downside is that this means access methods can't use composite types as those are not fixed width (nor plain) so the names of the individual items in the output are somewhat hidden (see the modified create_view test for example of how this affects things). Other than that, this is based on the new am api by Alexander Korotkov [1]. It extends it by adding another column called amkind to the pg_am which can have either value "i" for index or "S" for sequence (same as relkind in pg_class for those). I didn't attach DDL and the gapless sequence yet, mainly because I don't want to waste time rewriting it again in case we can't agree that this approach is good (based on the long discussion and resulting several rewrites wrt the multiple columns vs bytea previously). [1] https://commitfest.postgresql.org/8/336/ -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 1 January 2016 at 07:51, Petr Jelinek <petr@2ndquadrant.com> wrote:
Other than that, this is based on the new am api by Alexander Korotkov [1]. It extends it by adding another column called amkind to the pg_am which can have either value "i" for index or "S" for sequence (same as relkind in pg_class for those).
This patch will no longer apply after 65c5fcd353 (http://github.com/postgres/postgres/commit/65c5fcd353) as outlined in http://www.postgresql.org/message-id/10804.1453077804@sss.pgh.pa.us .
Setting waiting-on-author in the CF app.
The good news is that the commit of the pg_am rework greatly eases the path of this patch into core.
Needs rework after the commit of https://commitfest.postgresql.org/8/336/
On 18 January 2016 at 09:19, Craig Ringer <craig@2ndquadrant.com> wrote: > Needs rework after the commit of https://commitfest.postgresql.org/8/336/ Here is version that applies to current master. There is some work to do (mostly cleanup) and the DDL is missing, but that's because I want to know what people think about the principle of how it works now and if it makes sense to finish it this way (explained in the original submission for Jan CF). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Petr Jelinek wrote: > On 18 January 2016 at 09:19, Craig Ringer <craig@2ndquadrant.com> wrote: > > Needs rework after the commit of https://commitfest.postgresql.org/8/336/ > > Here is version that applies to current master. There is some work to > do (mostly cleanup) and the DDL is missing, but that's because I want > to know what people think about the principle of how it works now and > if it makes sense to finish it this way (explained in the original > submission for Jan CF). I would guess that the DDL boilterplate should come from Alexander Korotkov's patch, right? I think a first easy step may be to combine parts both patches so that we get the "amkind" column from this patch and the DDL support from Alexander's patch (means that his proposed command needs a new clause to specify the amkind); then the really tough stuff in both Alexander's and this patch can be rebased on top of that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I would guess that the DDL boilterplate should come from Alexander > Korotkov's patch, right? I think a first easy step may be to combine > parts both patches so that we get the "amkind" column from this patch > and the DDL support from Alexander's patch (means that his proposed > command needs a new clause to specify the amkind); Uh, what? Surely we would provide a bespoke command for each possible sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check that the provided function has the right signature, and then it would put the correct amkind into the pg_am entry automatically. If we skimp on this infrastructure then we're just relying on the user not to make a typo, which doesn't seem like a good idea. (Agreed though that a reasonable first step would be to add amkind to pg_am and make the appropriate adjustments to existing code, ie check that amkind is correct when fetching an index handler. I considered putting that into the AM API refactor patch, but desisted.) regards, tom lane
On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> I would guess that the DDL boilterplate should come from Alexander >> Korotkov's patch, right? I think a first easy step may be to combine >> parts both patches so that we get the "amkind" column from this patch >> and the DDL support from Alexander's patch (means that his proposed >> command needs a new clause to specify the amkind); > > Uh, what? Surely we would provide a bespoke command for each possible > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check > that the provided function has the right signature, and then it would put > the correct amkind into the pg_am entry automatically. > > If we skimp on this infrastructure then we're just relying on the > user not to make a typo, which doesn't seem like a good idea. > Yeah it has to be completely separate command for both that do completely different sanity checks. It can't even be called CREATE INDEX/SEQUENCE ACCESS METHOD unless we are willing to make ACCESS a keyword due to preexisting CREATE INDEX/SEQUENCE <name> commands. The previous version of the patch which used somewhat different underlying catalog structure already had DDL and honestly writing the DDL part is quite easy. I used CREATE ACCESS METHOD FOR SEQUENCE there. > (Agreed though that a reasonable first step would be to add amkind to > pg_am and make the appropriate adjustments to existing code, ie check > that amkind is correct when fetching an index handler. I considered > putting that into the AM API refactor patch, but desisted.) > Well I was wondering how to handle this as well, the submitted patch currently just adds Asserts, because IMHO the actual ERROR should be thrown in the DDL not in the code that just uses it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek wrote: > On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Uh, what? Surely we would provide a bespoke command for each possible > > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check > > that the provided function has the right signature, and then it would put > > the correct amkind into the pg_am entry automatically. I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something like that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Uh, what? Surely we would provide a bespoke command for each possible > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check > that the provided function has the right signature, and then it would put > the correct amkind into the pg_am entry automatically. > I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something > like that. That seems okay. I had the impression you were proposingCREATE ACCESS METHOD foobar TYPE 'i' USING functionname or something like that, where it would be totally up to the user that the amkind matched the function. That seems unnecessarily error-prone, not to mention that it would close the door forever on any hope that we might allow non-superusers to execute such commands. regards, tom lane
On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Uh, what? Surely we would provide a bespoke command for each possible
> > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.
I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.
Ok! Let's nail down the syntax and I can integrate it into my createam patch.
I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER handler;", but I don't insist.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: >> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something >> like that. > I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER > handler;", but I don't insist. I think that Alvaro's idea is less likely to risk future grammar conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] to make me really wary of variables in the statement-name part of the syntax. regards, tom lane
On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> >> wrote: >>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something >>> like that. > >> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER >> handler;", but I don't insist. > > I think that Alvaro's idea is less likely to risk future grammar > conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX > [ CONCURRENTLY ] to make me really wary of variables in the statement-name > part of the syntax. Strong +1. If we put the type of access method immediately after CREATE, I'm almost positive we'll regret it for exactly that reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 January 2016 at 23:59, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> >>> wrote: >>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something >>>> like that. >> >>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER >>> handler;", but I don't insist. >> >> I think that Alvaro's idea is less likely to risk future grammar >> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX >> [ CONCURRENTLY ] to make me really wary of variables in the statement-name >> part of the syntax. > > Strong +1. If we put the type of access method immediately after > CREATE, I'm almost positive we'll regret it for exactly that reason. > Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar conflict now, that's why my proposal was different, I didn't want to add more keywords. I think Alvaro's proposal is fine as well. The other point is that we are creating ACCESS METHOD object so that's what should be after CREATE. In any case this is slightly premature IMHO as DDL is somewhat unless until we have sequence access methods implementation we can agree on, or the generic WAL logging so that custom indexes can be crash safe. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: > Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar > conflict now, that's why my proposal was different, I didn't want to > add more keywords. I think Alvaro's proposal is fine as well. I missed your proposal, I guess, so please don't take as having any position on whether it's better or worse than Alvaro's. I was only intending to vote for the proposition that the type of access method should follow the name of the access method. > The other point is that we are creating ACCESS METHOD object so that's > what should be after CREATE. Agreed. > In any case this is slightly premature IMHO as DDL is somewhat unless > until we have sequence access methods implementation we can agree on, > or the generic WAL logging so that custom indexes can be crash safe. Generic WAL logging seems like a *great* idea to me. But I think it's largely independent from the access method stuff. If we have generic WAL logging, people can create WAL-logged stuff that is not a new access method. If we have access methods, they can create new access methods that are not WAL-logged. If we have both, then they can create WAL-logged access methods which of course is the payoff pitch, but I don't see it as necessary or desirable for the two systems to be tied together in any way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 30 January 2016 at 13:48, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar >> conflict now, that's why my proposal was different, I didn't want to >> add more keywords. I think Alvaro's proposal is fine as well. > > I missed your proposal, I guess, so please don't take as having any > position on whether it's better or worse than Alvaro's. I was only > intending to vote for the proposition that the type of access method > should follow the name of the access method. > No worries didn't mean it that way. >> In any case this is slightly premature IMHO as DDL is somewhat unless >> until we have sequence access methods implementation we can agree on, >> or the generic WAL logging so that custom indexes can be crash safe. > > Generic WAL logging seems like a *great* idea to me. But I think it's > largely independent from the access method stuff. If we have generic > WAL logging, people can create WAL-logged stuff that is not a new > access method. If we have access methods, they can create new access > methods that are not WAL-logged. If we have both, then they can > create WAL-logged access methods which of course is the payoff pitch, > but I don't see it as necessary or desirable for the two systems to be > tied together in any way. Okay, I am only debating the usefulness of DDL for access methods in the current situation where we only have custom index access methods which can't create WAL records. In other words trying to nudge people slightly back towards the actual patch(es). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 29 January 2016 at 23:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>>> wrote:
>>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>>> like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1. If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>
Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.
The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.
In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.
I've changed syntax of CREATE ACCESS METHOD syntax in the thread about index access method extendability.
Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New column amtype of pg_am stores access method type.
It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE HANDLER handler;".
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 16/02/16 13:58, Alexander Korotkov wrote: > > I've changed syntax of CREATE ACCESS METHOD syntax in the thread about > index access method extendability. > Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New > column amtype of pg_am stores access method type. > http://www.postgresql.org/message-id/CAPpHfdu9gLN7kuicweGsp50CaAMWx8Q-JWzbPehc92rvFHzkeg@mail.gmail.com > It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE > HANDLER handler;". > Yes I've seen and I will send a review within couple of days. But first here is updated patch for sequence access methods. I went with the previously discussed custom type as this gives us proper control over the width of the state and making sure that it's not gonna be toastable, etc and we need this as sequence is limited to single page. Attached are 3 separate files. The first one (0001-create-am) is mainly separate for the reason that there is some overlap with Alexander's index access methods patch, so I extracted common code into separate patch as it will make it easier to merge in case we are lucky enough to get these patches in (but it's still based on Alexander's code). It provides infra for defining new access methods from SQL, although without the parser and without any actual access methods. The 0002-seqam provides the SQL interface and support for sequence access methods on top of the infra patch, and also provides all the sequence access methods infrastructure and abstraction and documentation. And finally the 0003-gapless-seq is example contrib module that implements dependably and transitionally safe gapless sequence access method. It's obviously slow as it has to do locking and basically serialize all the changes to sequence so only one transaction may use it at a time but it's truly gapless. It also serves as an example of use of the api and as a test. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 3/4/16 11:09 PM, Petr Jelinek wrote: > But first here is updated patch for sequence access methods. I went with > the previously discussed custom type as this gives us proper control > over the width of the state and making sure that it's not gonna be > toastable, etc and we need this as sequence is limited to single page. > > Attached are 3 separate files. The first one (0001-create-am) is mainly > separate for the reason that there is some overlap with Alexander's > index access methods patch, so I extracted common code into separate > patch as it will make it easier to merge in case we are lucky enough to > get these patches in (but it's still based on Alexander's code). It > provides infra for defining new access methods from SQL, although > without the parser and without any actual access methods. > > The 0002-seqam provides the SQL interface and support for sequence > access methods on top of the infra patch, and also provides all the > sequence access methods infrastructure and abstraction and documentation. > > And finally the 0003-gapless-seq is example contrib module that > implements dependably and transitionally safe gapless sequence access > method. It's obviously slow as it has to do locking and basically > serialize all the changes to sequence so only one transaction may use it > at a time but it's truly gapless. It also serves as an example of use of > the api and as a test. As far as I can see Petr has addressed all the outstanding issues in this patch and it's ready for a review. The patch applies with some easily-fixed conflicts: $ git apply -3 ../other/0002-seqam-2015-03-05.patch error: patch failed: src/include/catalog/pg_proc.h:5225 Falling back to three-way merge... Applied patch to 'src/include/catalog/pg_proc.h' with conflicts. Simon, you are signed up to review. Do you have an idea of when you'll be doing that? -- -David david@pgmasters.net
Petr Jelinek wrote: > And finally the 0003-gapless-seq is example contrib module that implements > dependably and transitionally safe gapless sequence access method. It's > obviously slow as it has to do locking and basically serialize all the > changes to sequence so only one transaction may use it at a time but it's > truly gapless. It also serves as an example of use of the api and as a test. I'm trying to figure out this one, and I think I found something very surprising. This code contains this +#define GAPLESS_SEQ_NAMESPACE "gapless_seq" +#define VALUES_TABLE_NAME "seqam_gapless_values" which as far as I understand is something like a side table where values for all the sequences are stored. Is that right? If it is, I admit I didn't realize that these sequences worked in this way. This seems broken to me. I had been under the impression that the values were always stored in the sequence's relation. Maybe I'm misunderstanding; please explain how this works. FWIW I find it annoying that this submission's 0001 patch and Alexander's 0001 have so much in common, yet they aren't similar enough to consider that either is the definite form. Also, you have the SGML docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both versions isn't trivial. Anyway I'm back at reviewing Alexander's 0001, which should be okay as a basis for this series regardless of any edits I suggest there. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 19/03/16 23:02, Alvaro Herrera wrote: > Petr Jelinek wrote: > >> And finally the 0003-gapless-seq is example contrib module that implements >> dependably and transitionally safe gapless sequence access method. It's >> obviously slow as it has to do locking and basically serialize all the >> changes to sequence so only one transaction may use it at a time but it's >> truly gapless. It also serves as an example of use of the api and as a test. > > I'm trying to figure out this one, and I think I found something very > surprising. This code contains this > > +#define GAPLESS_SEQ_NAMESPACE "gapless_seq" > +#define VALUES_TABLE_NAME "seqam_gapless_values" > > which as far as I understand is something like a side table where values > for all the sequences are stored. Is that right? If it is, I admit I > didn't realize that these sequences worked in this way. This seems > broken to me. I had been under the impression that the values were > always stored in the sequence's relation. Maybe I'm misunderstanding; > please explain how this works. > Yes, I did explain in the original submission that added gapless sequence, which is about million messages upthread so it's understandable that it's hard to follow. The sequence am API is non-transactional (and does not use MVCC) as sequences are non-trasactional and this provides significant performance boost. But the gapless sequence needs to handle things like subtransactions and rollbacks so it needs MVCC, for this reasons it also uses external table to handle that, the locking and stuff is still handled using normal sequence api but the MVCC part is handled by side table yes. Just as a note to you and anybody who wasn't following in the beginning of this quite long thread - one of the goals of the sequence am api was to abstract the actual storage and WAL logging of the sequences away from the extension, so there is no generic WAL or anything like the index am provides here. At this point though, it might be worth reevaluating that approach if the generic WAL gets in. > > FWIW I find it annoying that this submission's 0001 patch and > Alexander's 0001 have so much in common, yet they aren't similar enough > to consider that either is the definite form. Also, you have the SGML > docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both > versions isn't trivial. Well it's relatively annoying for the patch author as well. I tried to write it so that it's as easy to merge as possible - the common code is in my 0001, except for the SQL (gram.y, docs) because the SQL does not have any meaning until either indexam or seqeunceam gets in (which also makes it impossible to submit as separate infrastructure patch, because there is no point of having the code in without the stuff that either indexam or sequenceam provides on top of it). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, I rebased this on top of the recently committed CREATE ACCESS METHOD. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
<div dir="ltr"><div class="gmail_extra">On Thu, Mar 24, 2016 at 6:12 PM, Petr Jelinek <<a href="mailto:petr@2ndquadrant.com"target="_blank">petr@2ndquadrant.com</a>> wrote:<br />><br />> Hi,<br />><br/>> I rebased this on top of the recently committed CREATE ACCESS METHOD.<br />><br /><br /></div><div class="gmail_extra">Hi,<br/><br /></div><div class="gmail_extra">I got the above error trying to apply to the current master:<br/><br />$ git apply /home/fabrizio/Downloads/0001-seqam-2016-03-24.patch<br />error: patch failed: src/backend/commands/amcmds.c:29<br/>error: src/backend/commands/amcmds.c: patch does not apply<br /><br /><br /></div><divclass="gmail_extra">There are a wrong definition at the beginning of the amcmds.c:<br /><br /> 34 <<<<<<<ours<br /> 35 static Oid lookup_index_am_handler_func(List *handler_name, char amtype);<br /> 36static const char *get_am_type_string(char amtype);<br /> 37 =======<br /> 38 static Oid lookup_am_handler_func(List*handler_name, char amtype);<br /> 39 static char *get_am_type_string(char amtype);<br /> 40 >>>>>>>theirs<br /><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">After thissmall fix I can build and ran regress tests without errors.<br /><br />But running "check-world" I got the error:<br/><br />make[1]: Leaving directory `/data/postgresql/src/test/regress'<br />make: Leaving directory `/data/postgresql'<br/>+ pg_dumpall -f /data/postgresql/src/bin/pg_upgrade/tmp_check/dump1.sql<br />ok 9 - dropuser foobar1exit code 0<br />ok 10 - SQL DROP ROLE run: SQL found in server log<br />ok 11 - fails with nonexistent user<br />ok<br/>t/<a href="http://080_pg_isready.pl">080_pg_isready.pl</a> ....... <br />1..10<br />ok 1 - pg_isready --help exitcode 0<br />ok 2 - pg_isready --help goes to stdout<br />ok 3 - pg_isready --help nothing to stderr<br />ok 4 - pg_isready--version exit code 0<br />ok 5 - pg_isready --version goes to stdout<br />ok 6 - pg_isready --version nothingto stderr<br />ok 7 - pg_isready with invalid option nonzero exit code<br />ok 8 - pg_isready with invalid optionprints error message<br />ok 9 - fails with no server running<br />pg_dump: [archiver (db)] query failed: ERROR: column"sequence_name" does not exist<br />LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br /> ^<br />pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, increment_by, CASE WHEN increment_by> 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by< 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycledFROM check_seq<br />pg_dumpall: pg_dump failed on database "regression", exiting<br />+ pg_dumpall1_status=1<br/>+ [ /data/postgresql != /data/postgresql ]<br />+ /data/postgresql/src/bin/pg_upgrade/tmp_check/install//home/fabrizio/pgsql/bin/pg_ctl-m fast stop<br />waiting for serverto shut down.... done<br />server stopped<br />+ [ -n ]<br />+ [ -n ]<br />+ [ -n 1 ]<br />+ echo pg_dumpall of pre-upgradedatabase cluster failed<br />pg_dumpall of pre-upgrade database cluster failed<br />+ exit 1<br />+ rm -rf /tmp/pg_upgrade_check-3NUa0X<br/>make[2]: *** [check] Error 1<br />make[2]: Leaving directory `/data/postgresql/src/bin/pg_upgrade'<br/>make[1]: *** [check-pg_upgrade-recurse] Error 2<br />make[1]: *** Waiting for unfinishedjobs....<br /><br /><br /></div><div class="gmail_extra"><br />And testing pg_dump itself I got the same errortrying to dump a database that contains a sequence.<br /><br />fabrizio=# create sequence x;<br />CREATE SEQUENCE<br/>fabrizio=# \ds<br /> List of relations<br /> Schema | Name | Type | Owner <br />--------+------+----------+----------<br/> public | x | sequence | fabrizio<br />(1 row)<br /><br />fabrizio=# \d x<br/> Sequence "public.x"<br /> Column | Type | Value <br />--------------+-------------------+---------------------<br/> start_value | bigint | 1<br /> increment_by |bigint | 1<br /> max_value | bigint | 9223372036854775807<br /> min_value | bigint | 1<br /> cache_value | bigint | 1<br /> is_cycled | boolean | f<br /> amstate | seqam_local_state | (1,f,0)<br />Access Method: local<br /><br />fabrizio=# \q<br /><br />fabrizio@bagual:~/pgsql<br />$ bin/pg_dump > /tmp/fabrizio.sql<br />pg_dump: [archiver (db)] query failed: ERROR: column"sequence_name" does not exist<br />LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br /> ^<br />pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, increment_by, CASE WHEN increment_by> 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by< 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycledFROM x<br /><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Toninght I'll review someparts of the code.<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Regards,<br /></div><divclass="gmail_extra"><br /><br /></div><div class="gmail_extra">Att,<br /></div><div class="gmail_extra"><br />--<br/>Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a href="http://www.timbira.com.br"target="_blank">http://www.timbira.com.br</a><br />>> Blog: <a href="http://fabriziomello.github.io"target="_blank">http://fabriziomello.github.io</a><br />>> Linkedin: <a href="http://br.linkedin.com/in/fabriziomello"target="_blank">http://br.linkedin.com/in/fabriziomello</a><br />>> Twitter:<a href="http://twitter.com/fabriziomello" target="_blank">http://twitter.com/fabriziomello</a><br />>> Github:<a href="http://github.com/fabriziomello" target="_blank">http://github.com/fabriziomello</a></div></div>
Hi Petr, On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote: > fabrizio@bagual:~/pgsql > $ bin/pg_dump > /tmp/fabrizio.sql > pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name" > does not exist > LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i... > ^ > pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, > increment_by, CASE WHEN increment_by > 0 AND max_value = > 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = > -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN > increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 > AND min_value = -9223372036854775807 THEN NULL ELSE min_value END > AS min_value, cache_value, is_cycled FROM x It looks like a new patch is needed. I've marked this "waiting on author". Thanks, -- -David david@pgmasters.net
On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david@pgmasters.net> wrote:
>
> Hi Petr,
>
> On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>
>> fabrizio@bagual:~/pgsql
>> $ bin/pg_dump > /tmp/fabrizio.sql
>> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"
>> does not exist
>> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>> ^
>> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
>> increment_by, CASE WHEN increment_by > 0 AND max_value =
>> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value =
>> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN
>> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0
>> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END
>> AS min_value, cache_value, is_cycled FROM x
>
>
> It looks like a new patch is needed. I've marked this "waiting on author".
>
But there are other issue in the gapless-seq extension when I ran "make check":
43 CREATE EXTENSION gapless_seq;
44 CREATE SEQUENCE test_gapless USING gapless;
45 SELECT nextval('test_gapless'::regclass);
46 ! ERROR: could not access status of transaction 1275068416
47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or directory.
48 BEGIN;
49 SELECT nextval('test_gapless'::regclass);
50 ! ERROR: could not access status of transaction 1275068416
51 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or directory.
52 SELECT nextval('test_gapless'::regclass);
53 ! ERROR: current transaction is aborted, commands ignored until end of transaction block
54 SELECT nextval('test_gapless'::regclass);
55 ! ERROR: current transaction is aborted, commands ignored until end of transaction block
56 ROLLBACK;
57 SELECT nextval('test_gapless'::regclass);
58 ! ERROR: could not access status of transaction 1275068416
59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or directory.
fabrizio=# create extension gapless_seq;
CREATE EXTENSION
fabrizio=# create sequence x using gapless;
CREATE SEQUENCE
fabrizio=# select nextval('x');
ERROR: could not access status of transaction 1258291200
DETAIL: Could not open file "pg_subtrans/4B00": No such file or directory.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Attachment
On 29/03/16 18:50, Fabrízio de Royes Mello wrote: > > > On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > > > Hi Petr, > > > > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote: > > > >> fabrizio@bagual:~/pgsql > >> $ bin/pg_dump > /tmp/fabrizio.sql > >> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name" > >> does not exist > >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i... > >> ^ > >> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, > >> increment_by, CASE WHEN increment_by > 0 AND max_value = > >> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = > >> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN > >> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 > >> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END > >> AS min_value, cache_value, is_cycled FROM x > > > > > > It looks like a new patch is needed. I've marked this "waiting on > author". > > > Yeah there were some incompatible commits since my last rebase, fixed, along with the pg_dump bugs.. > > But there are other issue in the gapless-seq extension when I ran "make > check": > > 43 CREATE EXTENSION gapless_seq; > 44 CREATE SEQUENCE test_gapless USING gapless; > 45 SELECT nextval('test_gapless'::regclass); > 46 ! ERROR: could not access status of transaction 1275068416 > 47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or > directory. > 48 BEGIN; > 49 SELECT nextval('test_gapless'::regclass); > 50 ! ERROR: could not access status of transaction 1275068416 > 51 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or > directory. > 52 SELECT nextval('test_gapless'::regclass); > 53 ! ERROR: current transaction is aborted, commands ignored until > end of transaction block > 54 SELECT nextval('test_gapless'::regclass); > 55 ! ERROR: current transaction is aborted, commands ignored until > end of transaction block > 56 ROLLBACK; > 57 SELECT nextval('test_gapless'::regclass); > 58 ! ERROR: could not access status of transaction 1275068416 > 59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or > directory. > > > And I see the same running manually: > > fabrizio=# create extension gapless_seq; > CREATE EXTENSION > fabrizio=# create sequence x using gapless; > CREATE SEQUENCE > fabrizio=# select nextval('x'); > ERROR: could not access status of transaction 1258291200 > DETAIL: Could not open file "pg_subtrans/4B00": No such file or directory. > > Regards, > Hmm I am unable to reproduce this. What OS? Any special configure flags you use? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
<div dir="ltr"><div class="gmail_extra"><br />On Tue, Mar 29, 2016 at 2:26 PM, Petr Jelinek <<a href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br />><br />> On 29/03/16 18:50, Fabrízio deRoyes Mello wrote:<br />>><br />>><br />>><br />>> On Tue, Mar 29, 2016 at 12:25 PM, David Steele<<a href="mailto:david@pgmasters.net">david@pgmasters.net</a><br />>> <mailto:<a href="mailto:david@pgmasters.net">david@pgmasters.net</a>>>wrote:<br />>> ><br />>> > Hi Petr,<br/>>> ><br />>> > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:<br />>> ><br />>> >> fabrizio@bagual:~/pgsql<br />>> >> $ bin/pg_dump > /tmp/fabrizio.sql<br />>> >>pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"<br />>> >> does not exist<br/>>> >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br />>> >> ^<br />>> >> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,<br/>>> >> increment_by, CASE WHEN increment_by > 0 AND max_value =<br />>> >> 9223372036854775807THEN NULL WHEN increment_by < 0 AND max_value =<br />>> >> -1 THEN NULL ELSEmax_value END AS max_value, CASE WHEN<br />>> >> increment_by > 0 AND min_value = 1 THEN NULL WHENincrement_by < 0<br />>> >> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END<br/>>> >> AS min_value, cache_value, is_cycled FROM x<br />>> ><br />>> ><br />>> > It looks like a new patch is needed. I've marked this "waiting on<br />>> author".<br />>> ><br/>>><br />><br />> Yeah there were some incompatible commits since my last rebase, fixed, along withthe pg_dump bugs..<br />><br /><br /></div><div class="gmail_extra">Now all applies without errors, build and "makecheck" too.<br /></div><div class="gmail_extra"><br /><br /><br />>> But there are other issue in the gapless-seqextension when I ran "make<br />>> check":<br />>><br />>> 43 CREATE EXTENSION gapless_seq;<br/>>> 44 CREATE SEQUENCE test_gapless USING gapless;<br />>> 45 SELECT nextval('test_gapless'::regclass);<br/>>> 46 ! ERROR: could not access status of transaction 1275068416<br />>> 47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or<br />>> directory.<br />>> 48 BEGIN;<br />>> 49 SELECT nextval('test_gapless'::regclass);<br />>> 50 ! ERROR: couldnot access status of transaction 1275068416<br />>> 51 ! DETAIL: Could not open file "pg_subtrans/4C00": Nosuch file or<br />>> directory.<br />>> 52 SELECT nextval('test_gapless'::regclass);<br />>> 53 ! ERROR: current transaction is aborted, commands ignored until<br />>> end of transaction block<br />>> 54 SELECT nextval('test_gapless'::regclass);<br />>> 55 ! ERROR: current transaction is aborted,commands ignored until<br />>> end of transaction block<br />>> 56 ROLLBACK;<br />>> 57 SELECT nextval('test_gapless'::regclass);<br />>> 58 ! ERROR: could not access status of transaction 1275068416<br/>>> 59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or<br />>> directory.<br/>>><br />>><br />>> And I see the same running manually:<br />>><br />>> fabrizio=#create extension gapless_seq;<br />>> CREATE EXTENSION<br />>> fabrizio=# create sequence x using gapless;<br/>>> CREATE SEQUENCE<br />>> fabrizio=# select nextval('x');<br />>> ERROR: could not accessstatus of transaction 1258291200<br />>> DETAIL: Could not open file "pg_subtrans/4B00": No such file or directory.<br/>>><br />>> Regards,<br />>><br />><br />> Hmm I am unable to reproduce this. WhatOS? Any special configure flags you use?<br />><br /><br /></div><div class="gmail_extra">In my environment the errorremains with your last patches. <br /><br />I didn't use any special. <br /><br />./configure --prefix=/home/fabrizio/pgsql--enable-cassert --enable-coverage --enable-tap-tests --enable-depend <br /></div><div class="gmail_extra">make-s -j8 install<br /></div><div class="gmail_extra">make check-world<br /></div><div class="gmail_extra"><br/><br /></div><div class="gmail_extra">My environment:<br /></div><div class="gmail_extra"><br />fabrizio@bagual:/d/postgresql(0002-gapless-seq-petr) <br />$ uname -a<br />Linux bagual 3.13.0-83-generic #127-Ubuntu SMPFri Mar 11 00:25:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux<br /><br />fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)<br />$ gcc --version<br />gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4<br />Copyright (C) 2013 FreeSoftware Foundation, Inc.<br />This is free software; see the source for copying conditions. There is NO<br />warranty;not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br /><br /><br /></div><div class="gmail_extra">Regards,<br/></div><div class="gmail_extra"><br />--<br />Fabrízio de Royes Mello<br />Consultoria/CoachingPostgreSQL<br />>> Timbira: <a href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog: <a href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>
On 29/03/16 19:46, Fabrízio de Royes Mello wrotez > > > > > Hmm I am unable to reproduce this. What OS? Any special configure > flags you use? > > > > In my environment the error remains with your last patches. > > I didn't use any special. > > ./configure --prefix=/home/fabrizio/pgsql --enable-cassert > --enable-coverage --enable-tap-tests --enable-depend > make -s -j8 install > make check-world > > > My environment: > > fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr) > $ uname -a > Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > > fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr) > $ gcc --version > gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4 > Copyright (C) 2013 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > Hmm nothing special indeed, still can't reproduce, I did one blind try for a fix. Can you test with attached? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>
>>
>> >
>> > Hmm I am unable to reproduce this. What OS? Any special configure
>> flags you use?
>> >
>>
>> In my environment the error remains with your last patches.
>>
>> I didn't use any special.
>>
>> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> --enable-coverage --enable-tap-tests --enable-depend
>> make -s -j8 install
>> make check-world
>>
>>
>> My environment:
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ uname -a
>> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ gcc --version
>> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>
> Hmm nothing special indeed, still can't reproduce, I did one blind try for a fix. Can you test with attached?
>
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On 29/03/16 22:08, Fabrízio de Royes Mello wrote: > > > On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>> wrote: > > > > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez > >> > >> > >> > > >> > Hmm I am unable to reproduce this. What OS? Any special configure > >> flags you use? > >> > > >> > >> In my environment the error remains with your last patches. > >> > >> I didn't use any special. > >> > >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert > >> --enable-coverage --enable-tap-tests --enable-depend > >> make -s -j8 install > >> make check-world > >> > >> > >> My environment: > >> > >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr) > >> $ uname -a > >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC > >> 2016 x86_64 x86_64 x86_64 GNU/Linux > >> > >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr) > >> $ gcc --version > >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4 > >> Copyright (C) 2013 Free Software Foundation, Inc. > >> This is free software; see the source for copying conditions. There > is NO > >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR > PURPOSE. > >> > > > > Hmm nothing special indeed, still can't reproduce, I did one blind > try for a fix. Can you test with attached? > > > > Same error... Now I'll leave in a trip but when I arrive I'll try to > figure out what happening debugging the code. > Okay, thanks. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 29, 2016 at 5:58 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com
>> <mailto:petr@2ndquadrant.com>> wrote:
>> >
>> > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>> >>
>> >>
>> >> >
>> >> > Hmm I am unable to reproduce this. What OS? Any special configure
>> >> flags you use?
>> >> >
>> >>
>> >> In my environment the error remains with your last patches.
>> >>
>> >> I didn't use any special.
>> >>
>> >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> >> --enable-coverage --enable-tap-tests --enable-depend
>> >> make -s -j8 install
>> >> make check-world
>> >>
>> >>
>> >> My environment:
>> >>
>> >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> >> $ uname -a
>> >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> >> $ gcc --version
>> >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> >> Copyright (C) 2013 Free Software Foundation, Inc.
>> >> This is free software; see the source for copying conditions. There
>> is NO
>> >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>> >>
>> >
>> > Hmm nothing special indeed, still can't reproduce, I did one blind
>> try for a fix. Can you test with attached?
>> >
>>
>> Same error... Now I'll leave in a trip but when I arrive I'll try to
>> figure out what happening debugging the code.
>>
>
> Okay, thanks.
>
After some debugging seems that the "seqstate->xid" in "wait_for_sequence" isn't properly initialized or initialized with the wrong value (1258291200).
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: not tested [Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ] Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b) It does compile cleanly. DESIGN The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME ("seqam_gapless_values")strikes me a bit: while I understand the creation of a private schema named after the extension,it seems overkill for just a single table. I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation details,if deemed reasonable. On the other hand, creating the schema/table upon extension installation makes the values table use the default tablespacefor the database, which can be good for concurrency --- direct writes to less loaded storage (Note that usersmay want to move this table into an SSD-backed tablespace or equivalently fast storage ... moreso when many --not justone-- gapless sequences are being used) Yet, there is <20141203102425.GT2456@alap3.anarazel.de> where Andres argues against anything different than one-page-per-sequenceimplementations. I guess this patch changes everything in this respect. CODE seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value, bool restart_requested,bool is_init) -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid readingas "is_initIALIZED" DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master. Please update/rebase the patch and resubmit.
Hi, Thanks for review. On 30/03/16 15:22, Jose Luis Tallon wrote: > [Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch > Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ] > Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b) > Ouch good point, it does show how long the whole sequence am thing has been around. > DESIGN > The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME ("seqam_gapless_values")strikes me a bit: while I understand the creation of a private schema named after the extension,it seems overkill for just a single table. > I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation details,if deemed reasonable. > On the other hand, creating the schema/table upon extension installation makes the values table use the default tablespacefor the database, which can be good for concurrency --- direct writes to less loaded storage > (Note that users may want to move this table into an SSD-backed tablespace or equivalently fast storage ... moresowhen many --not just one-- gapless sequences are being used) > Schema is needed for the handler function as well. In general I don't want to add another internal schema that will be empty when no sequence AM is installed. > Yet, there is <20141203102425.GT2456@alap3.anarazel.de> where Andres argues against anything different than one-page-per-sequenceimplementations. > I guess this patch changes everything in this respect. Not really, gapless just needs table for transactionality and as such is an exception. It does not change how the nontransactional sequence storage works though. I agree with Andres on this one. > CODE > seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value, > bool restart_requested, bool is_init) > -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid reading as "is_initIALIZED" Sounds good. > DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master. > Please update/rebase the patch and resubmit. > The current version of seqam is 0001-seqam-2016-03-29 which should apply correctly. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, new version attached that should fix the issue. It was alignment - honestly don't know what I was thinking using fixed alignment when the AMs can define their own type. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> new version attached that should fix the issue. It was alignment - honestly don't know what I was thinking using fixed alignment when the AMs can define their own type.
>
I have just one question about de 0002 patch:
- There are some reason to not use TransactionId instead of uint32 in GaplessSequenceState struct?
- There are some reason to not use TransactionId instead of uint32 in GaplessSequenceState struct?
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Attachment
On 04/04/16 15:53, Fabrízio de Royes Mello wrote: > > > On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>> wrote: > > > > Hi, > > > > new version attached that should fix the issue. It was alignment - > honestly don't know what I was thinking using fixed alignment when the > AMs can define their own type. > > > > Yeah... now all works fine... I had a minor issue when apply to the > current master but the attached fix it and I also added tab-complete > support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING. > Cool thanks. > I ran all the regression tests and all passed. > > I have just one question about de 0002 patch: > - There are some reason to not use TransactionId instead of uint32 in > GaplessSequenceState struct? > I missed we have that :) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 5, 2016 at 12:52 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> I have just one question about de 0002 patch:
>> - There are some reason to not use TransactionId instead of uint32 in
>> GaplessSequenceState struct?
>>
>
> I missed we have that :)
>
- if (seqstate->xid != local_xid)
+ if (!TransactionIdEquals(seqstate->xid, local_xid))
IMHO this patch is Ready for Commiter. Changed status in CommitfestApp.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello