Thread: Sequence Access Method WIP

Sequence Access Method WIP

From
Simon Riggs
Date:
SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
  CREATE SEQUENCE foo_seq
      USING globalseq WITH (custom_option=setting);
which allows you to specify an alternate Sequence Access Method for a
sequence object,
or you can specify a default_sequence_mgr as a USERSET parameter
  SET default_sequence_mgr = globalseq;

Existing sequences can be modified to use a different SeqAM, by calling
  ALTER SEQUENCE foo_seq USING globalseq;

SeqAM is similar to IndexAM: There is a separate catalog table for
SeqAMs, but no specific API to create them. Initdb creates one
sequence am, called "local", which is the initial default. If
default_sequence_mgr is set to '' or 'local' then we use the local
seqam. The local seqam's functions are included in core.

Status is still "Work In Progress". Having said that most of the grunt
work is done and if we agree the shape of this is right, its
relatively easy going code.

postgres=# select oid, * from pg_seqam;
-[ RECORD 1 ]+--------------------
oid          | 3839
seqamname    | local
seqamalloc   | seqam_local_alloc
seqamsetval  | seqam_local_setval
seqamoptions | seqam_local_options

postgres=# select relname, relam from pg_class where relname = 'foo2';
 relname | relam
---------+-------
 foo2    |  3839

postgres=# create sequence foo5 using global;
ERROR:  access method "global" does not exist

Footprint
 backend/access/Makefile            |    2
 backend/access/common/reloptions.c |   26 +++
 backend/access/sequence/Makefile   |   17 ++
 backend/access/sequence/seqam.c    |  278 +++++++++++++++++++++++++++++++++++++
 backend/catalog/Makefile           |    2
 backend/commands/sequence.c        |  132 +++++++++++++++--
 backend/commands/tablecmds.c       |    3
 backend/nodes/copyfuncs.c          |    4
 backend/nodes/equalfuncs.c         |    4
 backend/parser/gram.y              |   84 ++++++++++-
 backend/parser/parse_utilcmd.c     |    4
 backend/utils/cache/catcache.c     |    6
 backend/utils/cache/syscache.c     |   23 +++
 backend/utils/misc/guc.c           |   12 +
 include/access/reloptions.h        |    6
 include/access/seqam.h             |   27 +++
 include/catalog/indexing.h         |    5
 include/catalog/pg_proc.h          |    6
 include/catalog/pg_seqam.h         |   70 +++++++++
 include/nodes/parsenodes.h         |    8 -
 include/utils/guc.h                |    1
 include/utils/rel.h                |   22 +-
 include/utils/syscache.h           |    2
 23 files changed, 706 insertions(+), 38 deletions(-)

Tasks to complete
* contrib module for example/testing
* Docs

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:

> SeqAm allows you to specify a plugin that alters the behaviour for
> sequence allocation and resetting, aimed specifically at clustering
> systems.
>
> New command options on end of statement allow syntax
>   CREATE SEQUENCE foo_seq
>       USING globalseq

Production version of this, ready for commit to PG 9.4

Includes test extension which allows sequences without gaps - "gapless".

Test using seq_test.psql after creating extension.

No dependencies on other patches.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 14.11.2013 22:10, Simon Riggs wrote:
> On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> SeqAm allows you to specify a plugin that alters the behaviour for
>> sequence allocation and resetting, aimed specifically at clustering
>> systems.
>>
>> New command options on end of statement allow syntax
>>    CREATE SEQUENCE foo_seq
>>        USING globalseq
>
> Production version of this, ready for commit to PG 9.4
>
> Includes test extension which allows sequences without gaps - "gapless".
>
> Test using seq_test.psql after creating extension.
>
> No dependencies on other patches.

It's pretty hard to review the this without seeing the "other" 
implementation you're envisioning to use this API. But I'll try:

I wonder if the level of abstraction is right. The patch assumes that 
the on-disk storage of all sequences is the same, so the access method 
can't change that. But then it leaves the details of actually updating 
the on-disk block, WAL-logging and all, as the responsibility of the 
access method. Surely that's going to look identical in all the seqams, 
if they all use the same on-disk format. That also means that the 
sequence access methods can't be implemented as plugins, as plugins 
can't do WAL-logging.

The comment in seqam.c says that there's a private column reserved for 
access method-specific data, called am_data, but that seems to be the 
only mention of am_data in the patch.

- Heikki



Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:
> It's pretty hard to review the this without seeing the "other"
> implementation you're envisioning to use this API. But I'll try:

We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.

http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/bdr/bdr_seq.c;h=c9480c8021882f888e35080f0e7a7494af37ae27;hb=bdr

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().

> I wonder if the level of abstraction is right. The patch assumes that the
> on-disk storage of all sequences is the same, so the access method can't
> change that.  But then it leaves the details of actually updating the on-disk
> block, WAL-logging and all, as the responsibility of the access method.
> Surely that's going to look identical in all the seqams, if they all use the
> same on-disk format. That also means that the sequence access methods can't
> be implemented as plugins, as plugins can't do WAL-logging.

Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

> The comment in seqam.c says that there's a private column reserved for
> access method-specific data, called am_data, but that seems to be the only
> mention of am_data in the patch.

It's amdata, not am_data :/. Directly at the end of pg_sequence.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 15 November 2013 15:08, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> I wonder if the level of abstraction is right.

That is the big question and not something to shy away from.

What is presented is not the first thought, by a long way. Andres'
contribution to the patch is mainly around this point, so the seq am
is designed with the needs of the main use case in mind.

I'm open to suggested changes but I would say that practical usage
beats changes suggested for purity.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Peter Eisentraut
Date:
On 11/14/13, 3:10 PM, Simon Riggs wrote:
> On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:
> 
>> SeqAm allows you to specify a plugin that alters the behaviour for
>> sequence allocation and resetting, aimed specifically at clustering
>> systems.
>>
>> New command options on end of statement allow syntax
>>   CREATE SEQUENCE foo_seq
>>       USING globalseq
> 
> Production version of this, ready for commit to PG 9.4

This patch doesn't apply anymore.

Also, you set this to "returned with feedback" in the CF app.  Please
verify whether that was intentional.




Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 15 November 2013 15:48, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 11/14/13, 3:10 PM, Simon Riggs wrote:
>> On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>>> SeqAm allows you to specify a plugin that alters the behaviour for
>>> sequence allocation and resetting, aimed specifically at clustering
>>> systems.
>>>
>>> New command options on end of statement allow syntax
>>>   CREATE SEQUENCE foo_seq
>>>       USING globalseq
>>
>> Production version of this, ready for commit to PG 9.4
>
> This patch doesn't apply anymore.

Yes, a patch conflict was just committed, will repost.

> Also, you set this to "returned with feedback" in the CF app.  Please
> verify whether that was intentional.

Not sure that was me, if so, corrected.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 15.11.2013 21:00, Simon Riggs wrote:
> On 15 November 2013 15:48, Peter Eisentraut <peter_e@gmx.net> wrote:
>> Also, you set this to "returned with feedback" in the CF app.  Please
>> verify whether that was intentional.
>
> Not sure that was me, if so, corrected.

It was me, sorry. I figured this needs such a large restructuring that 
it's not going to be committable in this commitfest. But I'm happy to 
keep the discussion going if you think otherwise.

- Heikki



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 15.11.2013 20:21, Andres Freund wrote:
> On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:
>> It's pretty hard to review the this without seeing the "other"
>> implementation you're envisioning to use this API. But I'll try:
>
> We've written a distributed sequence implementation against it, so it
> works at least for that use case.
>
> While certainly not release worthy, it still might be interesting to
> look at.
>
https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdr&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0A&m=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0A&s=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d
>
> bdr_sequencer_fill_sequence pre-allocates ranges of values,
> bdr_sequence_alloc returns them upon nextval().

Thanks. That pokes into the low-level details of sequence tuples, just 
like I was afraid it would.

>> I wonder if the level of abstraction is right. The patch assumes that the
>> on-disk storage of all sequences is the same, so the access method can't
>> change that.  But then it leaves the details of actually updating the on-disk
>> block, WAL-logging and all, as the responsibility of the access method.
>> Surely that's going to look identical in all the seqams, if they all use the
>> same on-disk format. That also means that the sequence access methods can't
>> be implemented as plugins, as plugins can't do WAL-logging.
>
> Well, it exposes log_sequence_tuple() - together with the added "am
> private" column of pg_squence that allows to do quite a bit of different
> things. I think unless we really implement pluggable RMGRs - which I
> don't really see coming - we cannot be radically different.

The proposed abstraction leaks like a sieve. The plugin should not need 
to touch anything but the private amdata field. log_sequence_tuple() is 
way too low-level.

Just as a thought-experiment, imagine that we decided to re-implement 
sequences so that all the sequence values are stored in one big table, 
or flat-file in the data directory, instead of the current 
implementation where we have a one-block relation file for each 
sequence. If the sequence access method API is any good, it should stay 
unchanged. That's clearly not the case with the proposed API.

>> The comment in seqam.c says that there's a private column reserved for
>> access method-specific data, called am_data, but that seems to be the only
>> mention of am_data in the patch.
>
> It's amdata, not am_data :/. Directly at the end of pg_sequence.

Ah, got it.


Stepping back a bit and looking at this problem from a higher level, why 
do you need to hack this stuff into the sequences? Couldn't you just 
define a new set of functions, say bdr_currval() and bdr_nextval(), to 
operate on these new kinds of sequences? You're not making much use of 
the existing sequence infrastructure, anyway, so it might be best to 
just keep the implementation completely separate. If you need it for 
compatibility with applications, you could create facade 
currval/nextval() functions that call the built-in version or the bdr 
version depending on the argument.

- Heikki



Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
> On 15.11.2013 20:21, Andres Freund wrote:
> >Well, it exposes log_sequence_tuple() - together with the added "am
> >private" column of pg_squence that allows to do quite a bit of different
> >things. I think unless we really implement pluggable RMGRs - which I
> >don't really see coming - we cannot be radically different.
> 
> The proposed abstraction leaks like a sieve. The plugin should not need to
> touch anything but the private amdata field. log_sequence_tuple() is way too
> low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence->amdata               ->is_called               ->last_value               ->log_cnt
SeqTable        ->last
SeqTable        ->cached
SeqTable        ->last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at once are.  Which e.g. you need to
controlif want a sequence that doesn't leak values in crashes
 
* cached is needed to control the per-backend caching.


> Just as a thought-experiment, imagine that we decided to re-implement
> sequences so that all the sequence values are stored in one big table, or
> flat-file in the data directory, instead of the current implementation where
> we have a one-block relation file for each sequence. If the sequence access
> method API is any good, it should stay unchanged. That's clearly not the
> case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

> Stepping back a bit and looking at this problem from a higher level, why do
> you need to hack this stuff into the sequences? Couldn't you just define a
> new set of functions, say bdr_currval() and bdr_nextval(), to operate on
> these new kinds of sequences?

Basically two things:
a) User interface. For one everyone would need to reimplement the entire
sequence DDL from start. For another it means it's hard to write
(client) code that doesn't depend on a specific implementation.
b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

> You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the
> implementation completely separate. If you need it for compatibility with
> applications, you could create facade currval/nextval() functions that call
> the built-in version or the bdr version depending on the argument.

That doesn't get you very far:
a) the default functions created by sequences are pg_catalog
prefixed. So you'd need to hack up the catalog to get your own functions
to be used if you want the application to work transparently. In which
you need to remember the former function because you now cannot call it
normally anymore. Yuck.
b) One would need nearly all of sequence.c again. You need the state
across calls, the locking, the WAL logging, DDL support.  Pretty much
the only thing *not* used would be nextval_internal() and do_setval().

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 18.11.2013 11:48, Andres Freund wrote:
> On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
>> On 15.11.2013 20:21, Andres Freund wrote:
>>> Well, it exposes log_sequence_tuple() - together with the added "am
>>> private" column of pg_squence that allows to do quite a bit of different
>>> things. I think unless we really implement pluggable RMGRs - which I
>>> don't really see coming - we cannot be radically different.
>>
>> The proposed abstraction leaks like a sieve. The plugin should not need to
>> touch anything but the private amdata field. log_sequence_tuple() is way too
>> low-level.
>
> Why? It's *less* low level than a good part of other crash recovery code
> we have. I have quite some doubts about the layering, but it's imo
> pretty sensible to centralize the wal logging code that plugins couldn't
> write.

It doesn't go far enough, it's still too *low*-level. The sequence AM 
implementation shouldn't need to have direct access to the buffer page 
at all.

> If you look at what e.g the _alloc callback currently gets passed.
> a) Relation: Important for metadata like TupleDesc, name etc.
> b) SeqTable entry: Important to setup state for cur/lastval
> c) Buffer of the tuple: LSN etc.
> d) HeapTuple itself: a place to store the tuples changed state
>
> And if you then look at what gets modified in that callback:
> Form_pg_sequence->amdata
>                  ->is_called
>                  ->last_value
>                  ->log_cnt
> SeqTable        ->last
> SeqTable        ->cached
> SeqTable        ->last_valid
>
> You need is_called, last_valid because of our current representation of
> sequences state (which we could change, to remove that need). The rest
> contains values that need to be set depending on how you want your
> sequence to behave:
> * Amdata is obvious.
> * You need log_cnt to influence/remember how big the chunks are you WAL log at
>    once are.  Which e.g. you need to control if want a sequence that
>    doesn't leak values in crashes
> * cached is needed to control the per-backend caching.

I don't think the sequence AM should be in control of 'cached'. The 
caching is done outside the AM. And log_cnt probably should be passed to 
the _alloc function directly as an argument, ie. the server code asks 
the AM to allocate N new values in one call.

I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.

The backend code handles the caching and logging of values. When it has 
exhausted all the cached values (or doesn't have any yet), it calls the 
AM's alloc function to get a new batch. The AM returns the new batch, 
and updates its private state as necessary. Then the backend code 
updates the relation file with the new values and the AM's private data. 
WAL-logging and checkpointing is the backend's responsibility.

>> Just as a thought-experiment, imagine that we decided to re-implement
>> sequences so that all the sequence values are stored in one big table, or
>> flat-file in the data directory, instead of the current implementation where
>> we have a one-block relation file for each sequence. If the sequence access
>> method API is any good, it should stay unchanged. That's clearly not the
>> case with the proposed API.
>
> I don't think we can entirely abstract away the reality that now they are
> based on relations and might not be at some later point. Otherwise we'd
> have to invent an API that copied all the data that's stored in struct
> RelationData. Like name, owner, ...
> Which non SQL accessible API we provide has a chance of providing that
> level of consistency in the face of fundamental refactoring? I'd say
> none.

I'm not thinking that we'd change sequences to not be relations. I'm 
thinking that we might not want to store the state as a one-page file 
anymore. In fact that was just discussed in the other thread titled 
"init_sequence spill to hash table".

> b) It's not actually easy to get similar semantics in "userspace". How
> would you emulate the crash-safe but non-transactional semantics of
> sequences without copying most of sequence.c? Without writing
> XLogInsert()s which we cannot do from a out-of-tree code?

heap_inplace_update()

- Heikki



Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote:
> On 18.11.2013 11:48, Andres Freund wrote:
> I don't think the sequence AM should be in control of 'cached'. The caching
> is done outside the AM. And log_cnt probably should be passed to the _alloc
> function directly as an argument, ie. the server code asks the AM to
> allocate N new values in one call.

Sounds sane.

> I'm thinking that the alloc function should look something like this:
> 
> seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

I don't think we can avoid giving access to the other columns of
pg_sequence, stuff like increment, limits et all need to be available
for reading, so that'd also need to get passed in. And we need to signal
that am_private can be NULL, otherwise we'd end up with ugly ways to
signal that.
So I'd say to pass in the entire tuple, and return a copy? Alternatively
we can return am_private as a 'struct varlena *', so we can properly
signal empty values.

We also need a way to set am_private from outside
seqam_alloc/setval/... Many of the fancier sequences that people talked
about will need preallocation somewhere in the background. As proposed
that's easy enough to write using log_sequence_tuple(), this way we'd
need something that calls a callback with the appropriate buffer lock
held. So maybe a seqam_update(Relation seqrel, ...) callback that get's
called when somebody executes pg_sequence_update(oid)?

It'd probably a good idea to provide a generic function for checking
whether a new value falls in the boundaries of the sequence's min, max +
error handling.

> I'm not thinking that we'd change sequences to not be relations. I'm
> thinking that we might not want to store the state as a one-page file
> anymore. In fact that was just discussed in the other thread titled
> "init_sequence spill to hash table".

Yes, I read and even commented in that thread... But nothing in the
current proposed API would prevent you from going in that direction,
you'd just get passed in a different tuple/buffer.

> >b) It's not actually easy to get similar semantics in "userspace". How
> >would you emulate the crash-safe but non-transactional semantics of
> >sequences without copying most of sequence.c? Without writing
> >XLogInsert()s which we cannot do from a out-of-tree code?
> 
> heap_inplace_update()

That gets the crashsafe part partially (doesn't allow making the tuple
wider than before), but not the caching/stateful part et al. The point
is that you need most of sequence.c again.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> It doesn't go far enough, it's still too *low*-level. The sequence AM
> implementation shouldn't need to have direct access to the buffer page at
> all.

> I don't think the sequence AM should be in control of 'cached'. The caching
> is done outside the AM. And log_cnt probably should be passed to the _alloc
> function directly as an argument, ie. the server code asks the AM to
> allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Do you have a new use case that shows why changes should happen? We
can't just redesign things based upon arbitrary decisions about what
things should or should not be possible via the API.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 18.11.2013 13:48, Simon Riggs wrote:
> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>> It doesn't go far enough, it's still too *low*-level. The sequence AM
>> implementation shouldn't need to have direct access to the buffer page at
>> all.
>
>> I don't think the sequence AM should be in control of 'cached'. The caching
>> is done outside the AM. And log_cnt probably should be passed to the _alloc
>> function directly as an argument, ie. the server code asks the AM to
>> allocate N new values in one call.
>
> I can't see what the rationale of your arguments is. All index Ams
> write WAL and control buffer locking etc..

Index AM's are completely responsible for the on-disk structure, while 
with the proposed API, both the AM and the backend are intimately aware 
of the on-disk representation. Such a shared responsibility is not a 
good thing in an API. I would also be fine with going 100% to the index 
AM direction, and remove all knowledge of the on-disk layout from the 
backend code and move it into the AMs. Then you could actually implement 
the discussed "store all sequences in a single file" change by writing a 
new sequence AM for it.

- Heikki



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 14.11.2013 22:10, Simon Riggs wrote:
> Includes test extension which allows sequences without gaps - "gapless".

I realize this is just for demonstration purposes, but it's worth noting 
that it doesn't actually guarantee that when you use the sequence to 
populate a column in the table, the column would not have gaps. 
Sequences are not transactional, so rollbacks will still produce gaps. 
The documentation is misleading on that point. Without a strong 
guarantee, it's a pretty useless extension.

- Heikki



Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 14.11.2013 22:10, Simon Riggs wrote:
>>
>> Includes test extension which allows sequences without gaps - "gapless".
>
>
> I realize this is just for demonstration purposes, but it's worth noting
> that it doesn't actually guarantee that when you use the sequence to
> populate a column in the table, the column would not have gaps. Sequences
> are not transactional, so rollbacks will still produce gaps. The
> documentation is misleading on that point. Without a strong guarantee, it's
> a pretty useless extension.

True.

If I fix that problem, I should change the name to "lockup" sequences,
since only one transaction at a time could use the nextval.

Should I change the documentation, or just bin the idea?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 18.11.2013 13:48, Simon Riggs wrote:
>>
>> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com>
>> wrote:
>>
>>> It doesn't go far enough, it's still too *low*-level. The sequence AM
>>> implementation shouldn't need to have direct access to the buffer page at
>>> all.
>>
>>
>>> I don't think the sequence AM should be in control of 'cached'. The
>>> caching
>>> is done outside the AM. And log_cnt probably should be passed to the
>>> _alloc
>>> function directly as an argument, ie. the server code asks the AM to
>>> allocate N new values in one call.
>>
>>
>> I can't see what the rationale of your arguments is. All index Ams
>> write WAL and control buffer locking etc..
>
>
> Index AM's are completely responsible for the on-disk structure, while with
> the proposed API, both the AM and the backend are intimately aware of the
> on-disk representation. Such a shared responsibility is not a good thing in
> an API. I would also be fine with going 100% to the index AM direction, and
> remove all knowledge of the on-disk layout from the backend code and move it
> into the AMs. Then you could actually implement the discussed "store all
> sequences in a single file" change by writing a new sequence AM for it.

I think the way to resolve this is to do both of these things, i.e. a
two level API

1. Implement SeqAM API at the most generic level. Add a nextval() call
as well as alloc()

2. Also implement the proposed changes to alloc()

So the SeqAM would implement either nextval() or alloc() but not both

global sequences as envisaged for BDR would use a special alloc() call.

I don't think that is too much work, but I want to do this just once...

Thoughts on exact next steps for implementation please?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 24.11.2013 19:23, Simon Riggs wrote:
> On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> On 18.11.2013 13:48, Simon Riggs wrote:
>>>
>>> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com>
>>> wrote:
>>>
>>>> It doesn't go far enough, it's still too *low*-level. The sequence AM
>>>> implementation shouldn't need to have direct access to the buffer page at
>>>> all.
>>>
>>>> I don't think the sequence AM should be in control of 'cached'. The
>>>> caching
>>>> is done outside the AM. And log_cnt probably should be passed to the
>>>> _alloc
>>>> function directly as an argument, ie. the server code asks the AM to
>>>> allocate N new values in one call.
>>>
>>> I can't see what the rationale of your arguments is. All index Ams
>>> write WAL and control buffer locking etc..
>>
>> Index AM's are completely responsible for the on-disk structure, while with
>> the proposed API, both the AM and the backend are intimately aware of the
>> on-disk representation. Such a shared responsibility is not a good thing in
>> an API. I would also be fine with going 100% to the index AM direction, and
>> remove all knowledge of the on-disk layout from the backend code and move it
>> into the AMs. Then you could actually implement the discussed "store all
>> sequences in a single file" change by writing a new sequence AM for it.
>
> I think the way to resolve this is to do both of these things, i.e. a
> two level API
>
> 1. Implement SeqAM API at the most generic level. Add a nextval() call
> as well as alloc()
>
> 2. Also implement the proposed changes to alloc()

The proposed changes to alloc() would still suffer from all the problems 
that I complained about. Adding a new API alongside doesn't help with that.

- Heikki



Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> The proposed changes to alloc() would still suffer from all the problems
> that I complained about. Adding a new API alongside doesn't help with that.

You made two proposals. I suggested implementing both.

What would you have me do?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 11/24/13 19:15, Simon Riggs wrote:
> On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> On 14.11.2013 22:10, Simon Riggs wrote:
>>>
>>> Includes test extension which allows sequences without gaps - "gapless".
>>
>> I realize this is just for demonstration purposes, but it's worth noting
>> that it doesn't actually guarantee that when you use the sequence to
>> populate a column in the table, the column would not have gaps. Sequences
>> are not transactional, so rollbacks will still produce gaps. The
>> documentation is misleading on that point. Without a strong guarantee, it's
>> a pretty useless extension.
>
> True.
>
> If I fix that problem, I should change the name to "lockup" sequences,
> since only one transaction at a time could use the nextval.
>
> Should I change the documentation, or just bin the idea?

Just bin it. It would be useful if it could guarantee gaplessness, but I 
don't see how to do that.

- Heikki



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 11/25/13 12:00, Simon Riggs wrote:
> On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>> The proposed changes to alloc() would still suffer from all the problems
>> that I complained about. Adding a new API alongside doesn't help with that.
>
> You made two proposals. I suggested implementing both.
>
> What would you have me do?

Dunno. I do know that the proposed changes to alloc() are not a good 
API. You could implement the other API, I think that has a chance of 
being a cleaner API, but looking at the BDR extension that would use 
that facility, I'm not sure how useful that would be for you. (and I'd 
really like to see an actual implementation of whatever API we come up 
with, before we commit to maintaining it).

- Heikki



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 18/11/13 11:50, Heikki Linnakangas wrote:
>
> I don't think the sequence AM should be in control of 'cached'. The
> caching is done outside the AM. And log_cnt probably should be passed to
> the _alloc function directly as an argument, ie. the server code asks
> the AM to allocate N new values in one call.
>
> I'm thinking that the alloc function should look something like this:
>
> seqam_alloc(Relation seqrel, int nrequested, Datum am_private)
>

I was looking at this a bit today and what I see is that it's not that 
simple.

Minimum input the seqam_alloc needs is:
- Relation seqrel
- int64 minv, maxv, incby, bool is_cycled - these are basically options 
giving info about how the new numbers are allocated (I guess some 
implementations are not going to support all of those)
- bool is_called - the current built-in sequence generator behaves 
differently based on it and I am not sure we can get over it (it could 
perhaps be done in back-end independently of AM?)
- int64 nrequested - number of requested values
- Datum am_private - current private data

In this light I agree with what Andres wrote - let's just send the whole 
Form_pg_sequence object.

Also makes me think that the seqam options interface should also be 
passed the minv/maxv/incby/is_cycled etc options for validation, not 
just the amoptions.


> And it should return:
>
> int64 value - the first value allocated.
> int nvalues - the number of values allocated.
> am_private - updated private data.
>

There is also more needed than this, you need:
- int64 value - first value allocated (value to be returned)
- int64 nvalues - number of values allocated
- int64 last - last cached value (used for cached/last_value)
- int64 next - last logged value (used for wal logging)
- am_private - updated private data, must be possible to return as null

I personally don't like that we need all the "nvalues", "next" and 
"last" as it makes the seqam a little bit too aware of the sequence 
logging internals in my opinion but I haven't found a way around it - 
it's impossible for backend to know how the AM will act around 
incby/maxv/minv/cycling so it can't really calculate these values by 
itself, unless ofcourse we fix the behavior and require seqams to behave 
predictably, but that somewhat breaks the whole idea of leaving the 
allocation to the seqam. Obviously it would also work to return list of 
allocated values and then backend could calculate the "value", 
"nvalues", "last", "next" from that list by itself, but I am worried 
about performance of that approach.

>
> The backend code handles the caching and logging of values. When it has
> exhausted all the cached values (or doesn't have any yet), it calls the
> AM's alloc function to get a new batch. The AM returns the new batch,
> and updates its private state as necessary. Then the backend code
> updates the relation file with the new values and the AM's private data.
> WAL-logging and checkpointing is the backend's responsibility.
>

Agreed here.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote:
> - int64 minv, maxv, incby, bool is_cycled - these are basically options
> giving info about how the new numbers are allocated (I guess some
> implementations are not going to support all of those)
> - bool is_called - the current built-in sequence generator behaves
> differently based on it and I am not sure we can get over it (it could
> perhaps be done in back-end independently of AM?)

I think we might be able to get rid of is_called entirely. Or at least
get rid of it from the view of the AMs.

> Also makes me think that the seqam options interface should also be passed
> the minv/maxv/incby/is_cycled etc options for validation, not just the
> amoptions.

Sup.

BTW: Is 'is_cycled' a horrible name, or is that just me? Horribly easy
to confuse with the fact that a sequence has already wrapped around...

> >And it should return:
> >
> >int64 value - the first value allocated.
> >int nvalues - the number of values allocated.
> >am_private - updated private data.
> >
> 
> There is also more needed than this, you need:
> - int64 value - first value allocated (value to be returned)
> - int64 nvalues - number of values allocated
> - int64 last - last cached value (used for cached/last_value)
> - int64 next - last logged value (used for wal logging)
> - am_private - updated private data, must be possible to return as null

> I personally don't like that we need all the "nvalues", "next" and "last" as
> it makes the seqam a little bit too aware of the sequence logging internals
> in my opinion but I haven't found a way around it - it's impossible for
> backend to know how the AM will act around incby/maxv/minv/cycling so it
> can't really calculate these values by itself, unless ofcourse we fix the
> behavior and require seqams to behave predictably, but that somewhat breaks
> the whole idea of leaving the allocation to the seqam. Obviously it would
> also work to return list of allocated values and then backend could
> calculate the "value", "nvalues", "last", "next" from that list by itself,
> but I am worried about performance of that approach.

Yea, it's far from pretty.

I'm not convinced that the AM ever needs to/should care about
caching. To me that's more like a generic behaviour. So it probably
should be abstracted away from the individual AMs.

I think the allocation routine might also need to be able to indicate
whether WAL logging is needed or not.

One thing I want attention to be paid to is that the interface should be
able to support 'gapless' sequences. I.e. where nextval() (and thus
alloc) needs to wait until the last caller to it finished. That very
well can be relevant from the locking *and* WAL logging perspective.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 16/09/14 14:17, Andres Freund wrote:
> On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote:
>>
>> There is also more needed than this, you need:
>> - int64 value - first value allocated (value to be returned)
>> - int64 nvalues - number of values allocated
>> - int64 last - last cached value (used for cached/last_value)
>> - int64 next - last logged value (used for wal logging)
>> - am_private - updated private data, must be possible to return as null
>
>> I personally don't like that we need all the "nvalues", "next" and "last" as
>> it makes the seqam a little bit too aware of the sequence logging internals
>> in my opinion but I haven't found a way around it - it's impossible for
>> backend to know how the AM will act around incby/maxv/minv/cycling so it
>> can't really calculate these values by itself, unless ofcourse we fix the
>> behavior and require seqams to behave predictably, but that somewhat breaks
>> the whole idea of leaving the allocation to the seqam. Obviously it would
>> also work to return list of allocated values and then backend could
>> calculate the "value", "nvalues", "last", "next" from that list by itself,
>> but I am worried about performance of that approach.
>
> Yea, it's far from pretty.
>
> I'm not convinced that the AM ever needs to/should care about
> caching. To me that's more like a generic behaviour. So it probably
> should be abstracted away from the individual AMs.
>
> I think the allocation routine might also need to be able to indicate
> whether WAL logging is needed or not.

Well that means we probably want to return first allocated value, last 
allocated value and then some boolean that tells backend if to wal log 
the sequence or not (number of values allocated does not really seem to 
be important unless I am missing something).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Hi,

I rewrote the patch with different API along the lines of what was
discussed.

The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.

For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be. There are probably some optimizations possible there but I think the
patch is good enough for the review now, so I am adding it to October
commitfest.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 10/13/2014 01:01 PM, Petr Jelinek wrote:
> Hi,
>
> I rewrote the patch with different API along the lines of what was
> discussed.

Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to 
see how it really works. The "local" one is too dummy to expose any 
possible issues.

> The API now consists of following functions:
> sequence_alloc - allocating range of new values
> The function receives the sequence relation, current value, number of
> requested values amdata and relevant sequence options like min/max and
> returns new amdata, new current value, number of values allocated and
> also if it needs wal write (that should be returned if amdata has
> changed plus other reasons the AM might have to force the wal update).
>
> sequence_setval - notification that setval is happening
> This function gets sequence relation, previous value and new value plus
> the amdata and returns amdata (I can imagine some complex sequence AMs
> will want to throw error that setval can't be done on them).
>
> sequence_request_update/sequence_update - used for background processing
> Basically AM can call the sequence_request_update and backend will then
> call the sequence_update method of an AM with current amdata and will
> write the updated amdata to disk
>
> sequence_seqparams - function to process/validate the standard sequence
> options like start position, min/max, increment by etc by the AM, it's
> called in addition to the standard processing
>
> sequence_reloptions - this is the only thing that remained unchanged
> from previous patch, it's meant to pass custom options to the AM
>
> Only the alloc and reloptions methods are required (and implemented by
> the local AM).
>
> The caching, xlog writing, updating the page, etc is handled by backend,
> the AM does not see the tuple at all. I decided to not pass even the
> struct around and just pass the relevant options because I think if we
> want to abstract the storage properly then the AM should not care about
> how the pg_sequence looks like at all, even if it means that the
> sequence_alloc parameter list is bit long.

Hmm. The division of labour between the seqam and commands/sequence.c 
still feels a bit funny. sequence.c keeps track of how many values have 
been WAL-logged, and thus usable immediately, but we still call 
sequence_alloc even when using up those already WAL-logged values. If 
you think of using this for something like a centralized sequence server 
in a replication cluster, you certainly don't want to make a call to the 
remote server for every value - you'll want to cache them.

With the "local" seqam, there are two levels of caching. Each backend 
caches some values (per the CACHE <value> option in CREATE SEQUENCE). In 
addition to that, the server WAL-logs 32 values at a time. If you have a 
remote seqam, it would most likely add a third cache, but it would 
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The 
server keeps the sequence page locked throughout nextval(). But if the 
actual state of the sequence is maintained elsewhere, there's no need to 
serialize the calls to the remote allocator, i.e. the sequence_alloc() 
calls.

I'm not exactly sure what to do about that. One option is to completely 
move the maintenance of the "current" value, i.e. sequence.last_value, 
to the seqam. That makes sense from an abstraction point of view. For 
example with a remote server managing the sequence, storing the 
"current" value in the local catalog table makes no sense as it's always 
going to be out-of-date. The local seqam would store it as part of the 
am-private data. However, you would need to move the responsibility of 
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to 
provide an API that the seqam can call to do that. Perhaps just let the 
seqam call heap_inplace_update on the sequence relation.

> For the amdata handling (which is the AM's private data variable) the
> API assumes that (Datum) 0 is NULL, this seems to work well for
> reloptions so should work here also and it simplifies things a little
> compared to passing pointers to pointers around and making sure
> everything is allocated, etc.
>
> Sadly the fact that amdata is not fixed size and can be NULL made the
> page updates of the sequence relation quite more complex that it used to
> be.

It would be nice if the seqam could define exactly the columns it needs, 
with any datatypes. There would be a set of common attributes: 
sequence_name, start_value, cache_value, increment_by, max_value, 
min_value, is_cycled. The local seqam would add "last_value", "log_cnt" 
and "is_called" to that. A remote seqam that calls out to some other 
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the 
required columns, for example.

- Heikki



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 04/11/14 13:11, Heikki Linnakangas wrote:
> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>> Hi,
>>
>> I rewrote the patch with different API along the lines of what was
>> discussed.
>
> Thanks, that's better.
>
> It would be good to see an alternative seqam to implement this API, to
> see how it really works. The "local" one is too dummy to expose any
> possible issues.
>

Yeah, I don't know what that would be, It's hard for me to find 
something that would sever purely demonstration purposes. I guess I 
could port BDR sequences to this if it would help (once we have bit more 
solid agreement that the proposed API at least theoretically seems ok so 
that I don't have to rewrite it 10 times if at all possible).

>> ...
>> Only the alloc and reloptions methods are required (and implemented by
>> the local AM).
>>
>> The caching, xlog writing, updating the page, etc is handled by backend,
>> the AM does not see the tuple at all. I decided to not pass even the
>> struct around and just pass the relevant options because I think if we
>> want to abstract the storage properly then the AM should not care about
>> how the pg_sequence looks like at all, even if it means that the
>> sequence_alloc parameter list is bit long.
>
> Hmm. The division of labour between the seqam and commands/sequence.c
> still feels a bit funny. sequence.c keeps track of how many values have
> been WAL-logged, and thus usable immediately, but we still call
> sequence_alloc even when using up those already WAL-logged values. If
> you think of using this for something like a centralized sequence server
> in a replication cluster, you certainly don't want to make a call to the
> remote server for every value - you'll want to cache them.
>
> With the "local" seqam, there are two levels of caching. Each backend
> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
> addition to that, the server WAL-logs 32 values at a time. If you have a
> remote seqam, it would most likely add a third cache, but it would
> interact in strange ways with the second cache.
>
> Considering a non-local seqam, the locking is also a bit strange. The
> server keeps the sequence page locked throughout nextval(). But if the
> actual state of the sequence is maintained elsewhere, there's no need to
> serialize the calls to the remote allocator, i.e. the sequence_alloc()
> calls.
>
> I'm not exactly sure what to do about that. One option is to completely
> move the maintenance of the "current" value, i.e. sequence.last_value,
> to the seqam. That makes sense from an abstraction point of view. For
> example with a remote server managing the sequence, storing the
> "current" value in the local catalog table makes no sense as it's always
> going to be out-of-date. The local seqam would store it as part of the
> am-private data. However, you would need to move the responsibility of
> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
> provide an API that the seqam can call to do that. Perhaps just let the
> seqam call heap_inplace_update on the sequence relation.
>

My idea of how this works is - sequence_next handles the allocation and 
the level2 caching (WAL logged cache) via amdata if it supports it, or 
returns single value if it doesn't - then the WAL will always just write 
the 1 value and there will basically be no level2 cache, since it is the 
sequence_next who controls how much will be WAL-logged, what backend 
asks for is just "suggestion".

The third level caching as you say should be solved by the 
sequence_request_update and sequence_update callback - that will enable 
the sequence AM to handle this kind of caching asynchronously without 
blocking the sequence_next unnecessarily.

That way it's possible to implement many different strategies, from no 
cache at all, to three levels of caching, with and without blocking of 
calls to sequence_next.

>> For the amdata handling (which is the AM's private data variable) the
>> API assumes that (Datum) 0 is NULL, this seems to work well for
>> reloptions so should work here also and it simplifies things a little
>> compared to passing pointers to pointers around and making sure
>> everything is allocated, etc.
>>
>> Sadly the fact that amdata is not fixed size and can be NULL made the
>> page updates of the sequence relation quite more complex that it used to
>> be.
>
> It would be nice if the seqam could define exactly the columns it needs,
> with any datatypes. There would be a set of common attributes:
> sequence_name, start_value, cache_value, increment_by, max_value,
> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
> and "is_called" to that. A remote seqam that calls out to some other
> server might store the remote server's hostname etc.
>
> There could be a seqam function that returns a TupleDesc with the
> required columns, for example.
>

Wouldn't that somewhat bloat catalog if we had new catalog table for 
each sequence AM? It also does not really solve the amdata being dynamic 
size "issue". I think just dynamic field where AM stores whatever it 
wants is enough (ie the current solution), it's just that the hackery 
that sequences storage implementation does is more visible once there 
are non-fixed fields.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 11/04/2014 11:01 PM, Petr Jelinek wrote:
> On 04/11/14 13:11, Heikki Linnakangas wrote:
>> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>>> Only the alloc and reloptions methods are required (and implemented by
>>> the local AM).
>>>
>>> The caching, xlog writing, updating the page, etc is handled by backend,
>>> the AM does not see the tuple at all. I decided to not pass even the
>>> struct around and just pass the relevant options because I think if we
>>> want to abstract the storage properly then the AM should not care about
>>> how the pg_sequence looks like at all, even if it means that the
>>> sequence_alloc parameter list is bit long.
>>
>> Hmm. The division of labour between the seqam and commands/sequence.c
>> still feels a bit funny. sequence.c keeps track of how many values have
>> been WAL-logged, and thus usable immediately, but we still call
>> sequence_alloc even when using up those already WAL-logged values. If
>> you think of using this for something like a centralized sequence server
>> in a replication cluster, you certainly don't want to make a call to the
>> remote server for every value - you'll want to cache them.
>>
>> With the "local" seqam, there are two levels of caching. Each backend
>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
>> addition to that, the server WAL-logs 32 values at a time. If you have a
>> remote seqam, it would most likely add a third cache, but it would
>> interact in strange ways with the second cache.
>>
>> Considering a non-local seqam, the locking is also a bit strange. The
>> server keeps the sequence page locked throughout nextval(). But if the
>> actual state of the sequence is maintained elsewhere, there's no need to
>> serialize the calls to the remote allocator, i.e. the sequence_alloc()
>> calls.
>>
>> I'm not exactly sure what to do about that. One option is to completely
>> move the maintenance of the "current" value, i.e. sequence.last_value,
>> to the seqam. That makes sense from an abstraction point of view. For
>> example with a remote server managing the sequence, storing the
>> "current" value in the local catalog table makes no sense as it's always
>> going to be out-of-date. The local seqam would store it as part of the
>> am-private data. However, you would need to move the responsibility of
>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
>> provide an API that the seqam can call to do that. Perhaps just let the
>> seqam call heap_inplace_update on the sequence relation.
>
> My idea of how this works is - sequence_next handles the allocation and
> the level2 caching (WAL logged cache) via amdata if it supports it, or
> returns single value if it doesn't - then the WAL will always just write
> the 1 value and there will basically be no level2 cache, since it is the
> sequence_next who controls how much will be WAL-logged, what backend
> asks for is just "suggestion".

Hmm, so the AM might return an "nallocated" value less than the "fetch" 
value that sequence.c requested? As the patch stands, wouldn't that make 
sequence.c write a WAL record more often?

In fact, if the seqam manages the current value outside the database 
(e.g. a "remote" seqam that gets the value from another server), 
nextval() never needs to write a WAL record.

> The third level caching as you say should be solved by the
> sequence_request_update and sequence_update callback - that will enable
> the sequence AM to handle this kind of caching asynchronously without
> blocking the sequence_next unnecessarily.
>
> That way it's possible to implement many different strategies, from no
> cache at all, to three levels of caching, with and without blocking of
> calls to sequence_next.

It's nice that asynchronous operation is possible, but that seems like a 
more advanced feature. Surely an approach where you fetch a bunch of 
values when needed is going to be more common. Let's focus on the simple 
synchronous case first, and make sure the API works well for that.

>>> For the amdata handling (which is the AM's private data variable) the
>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>> reloptions so should work here also and it simplifies things a little
>>> compared to passing pointers to pointers around and making sure
>>> everything is allocated, etc.
>>>
>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>> page updates of the sequence relation quite more complex that it used to
>>> be.
>>
>> It would be nice if the seqam could define exactly the columns it needs,
>> with any datatypes. There would be a set of common attributes:
>> sequence_name, start_value, cache_value, increment_by, max_value,
>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>> and "is_called" to that. A remote seqam that calls out to some other
>> server might store the remote server's hostname etc.
>>
>> There could be a seqam function that returns a TupleDesc with the
>> required columns, for example.
>
> Wouldn't that somewhat bloat catalog if we had new catalog table for
> each sequence AM?

No, that's not what I meant. The number of catalog tables would be the 
same as today. Sequences look much like any other relation, with entries 
in pg_attribute catalog table for all the attributes for each sequence. 
Currently, all sequences have the same set of attributes, sequence_name, 
last_value and so forth. What I'm proposing is that there would a set of 
attributes that are common to all sequences, but in addition to that 
there could be any number of AM-specific attributes.

> It also does not really solve the amdata being dynamic
> size "issue".

Yes it would. There would not be a single amdata attribute, but the AM 
could specify any number of custom attributes, which could be fixed size 
or varlen. It would be solely the AM's responsibility to set the values 
of those attributes.

> I think just dynamic field where AM stores whatever it
> wants is enough (ie the current solution), it's just that the hackery
> that sequences storage implementation does is more visible once there
> are non-fixed fields.

Using a single amdata field is like creating a table with a single text 
column, and putting all your data in the single column separated by 
tabs. I'm sure you can make it work, but normalized data is much nicer 
to work with.

- Heikki




Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 05/11/14 13:45, Heikki Linnakangas wrote:
> On 11/04/2014 11:01 PM, Petr Jelinek wrote:
>> On 04/11/14 13:11, Heikki Linnakangas wrote:
>>> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>>>> Only the alloc and reloptions methods are required (and implemented by
>>>> the local AM).
>>>>
>>>> The caching, xlog writing, updating the page, etc is handled by
>>>> backend,
>>>> the AM does not see the tuple at all. I decided to not pass even the
>>>> struct around and just pass the relevant options because I think if we
>>>> want to abstract the storage properly then the AM should not care about
>>>> how the pg_sequence looks like at all, even if it means that the
>>>> sequence_alloc parameter list is bit long.
>>>
>>> Hmm. The division of labour between the seqam and commands/sequence.c
>>> still feels a bit funny. sequence.c keeps track of how many values have
>>> been WAL-logged, and thus usable immediately, but we still call
>>> sequence_alloc even when using up those already WAL-logged values. If
>>> you think of using this for something like a centralized sequence server
>>> in a replication cluster, you certainly don't want to make a call to the
>>> remote server for every value - you'll want to cache them.
>>>
>>> With the "local" seqam, there are two levels of caching. Each backend
>>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
>>> addition to that, the server WAL-logs 32 values at a time. If you have a
>>> remote seqam, it would most likely add a third cache, but it would
>>> interact in strange ways with the second cache.
>>>
>>> Considering a non-local seqam, the locking is also a bit strange. The
>>> server keeps the sequence page locked throughout nextval(). But if the
>>> actual state of the sequence is maintained elsewhere, there's no need to
>>> serialize the calls to the remote allocator, i.e. the sequence_alloc()
>>> calls.
>>>
>>> I'm not exactly sure what to do about that. One option is to completely
>>> move the maintenance of the "current" value, i.e. sequence.last_value,
>>> to the seqam. That makes sense from an abstraction point of view. For
>>> example with a remote server managing the sequence, storing the
>>> "current" value in the local catalog table makes no sense as it's always
>>> going to be out-of-date. The local seqam would store it as part of the
>>> am-private data. However, you would need to move the responsibility of
>>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
>>> provide an API that the seqam can call to do that. Perhaps just let the
>>> seqam call heap_inplace_update on the sequence relation.
>>
>> My idea of how this works is - sequence_next handles the allocation and
>> the level2 caching (WAL logged cache) via amdata if it supports it, or
>> returns single value if it doesn't - then the WAL will always just write
>> the 1 value and there will basically be no level2 cache, since it is the
>> sequence_next who controls how much will be WAL-logged, what backend
>> asks for is just "suggestion".
>
> Hmm, so the AM might return an "nallocated" value less than the "fetch"
> value that sequence.c requested? As the patch stands, wouldn't that make
> sequence.c write a WAL record more often?
>

That's correct, that's also why you usually want to have some form of 
local caching if possible.

> In fact, if the seqam manages the current value outside the database
> (e.g. a "remote" seqam that gets the value from another server),
> nextval() never needs to write a WAL record.

Sure it does, you need to keep the current state in Postgres also, at 
least the current value so that you can pass correct input to 
sequence_alloc(). And you need to do this in crash-safe way so WAL is 
necessary.

I think sequences will cache in amdata if possible for that type of 
sequence and in case where it's not and it's caching on some external 
server then you'll most likely get bigger overhead from network than WAL 
anyway...

>
>>>> For the amdata handling (which is the AM's private data variable) the
>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>> reloptions so should work here also and it simplifies things a little
>>>> compared to passing pointers to pointers around and making sure
>>>> everything is allocated, etc.
>>>>
>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>> page updates of the sequence relation quite more complex that it
>>>> used to
>>>> be.
>>>
>>> It would be nice if the seqam could define exactly the columns it needs,
>>> with any datatypes. There would be a set of common attributes:
>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>> and "is_called" to that. A remote seqam that calls out to some other
>>> server might store the remote server's hostname etc.
>>>
>>> There could be a seqam function that returns a TupleDesc with the
>>> required columns, for example.
>>
>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>> each sequence AM?
>
> No, that's not what I meant. The number of catalog tables would be the
> same as today. Sequences look much like any other relation, with entries
> in pg_attribute catalog table for all the attributes for each sequence.
> Currently, all sequences have the same set of attributes, sequence_name,
> last_value and so forth. What I'm proposing is that there would a set of
> attributes that are common to all sequences, but in addition to that
> there could be any number of AM-specific attributes.

Oh, that's interesting idea, so the AM interfaces would basically return 
updated tuple and there would be some description function that returns 
tupledesc. I am bit worried that this would kill any possibility of 
ALTER SEQUENCE USING access_method. Plus I don't think it actually 
solves any real problem - serializing the internal C structs into bytea 
is not any harder than serializing them into tuple IMHO.

>
>> It also does not really solve the amdata being dynamic
>> size "issue".
>
> Yes it would. There would not be a single amdata attribute, but the AM
> could specify any number of custom attributes, which could be fixed size
> or varlen. It would be solely the AM's responsibility to set the values
> of those attributes.
>

That's not the issue I was referring to, I was talking about the page 
replacement code which is not as simple now that we have potentially 
dynamic size tuple and if tuples were different for different AMs the 
code would still have to be able to handle that case. Setting the values 
in tuple itself is not too complicated.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 11/05/2014 05:07 PM, Petr Jelinek wrote:
> On 05/11/14 13:45, Heikki Linnakangas wrote:
>> In fact, if the seqam manages the current value outside the database
>> (e.g. a "remote" seqam that gets the value from another server),
>> nextval() never needs to write a WAL record.
>
> Sure it does, you need to keep the current state in Postgres also, at
> least the current value so that you can pass correct input to
> sequence_alloc(). And you need to do this in crash-safe way so WAL is
> necessary.

Why does sequence_alloc need the current value? If it's a "remote" 
seqam, the current value is kept in the remote server, and the last 
value that was given to this PostgreSQL server is irrelevant.

That irks me with this API. The method for acquiring a new value isn't 
fully abstracted behind the AM interface, as sequence.c still needs to 
track it itself. That's useful for the local AM, of course, and maybe 
some others, but for others it's totally useless.

>>>>> For the amdata handling (which is the AM's private data variable) the
>>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>>> reloptions so should work here also and it simplifies things a little
>>>>> compared to passing pointers to pointers around and making sure
>>>>> everything is allocated, etc.
>>>>>
>>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>>> page updates of the sequence relation quite more complex that it
>>>>> used to
>>>>> be.
>>>>
>>>> It would be nice if the seqam could define exactly the columns it needs,
>>>> with any datatypes. There would be a set of common attributes:
>>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>>> and "is_called" to that. A remote seqam that calls out to some other
>>>> server might store the remote server's hostname etc.
>>>>
>>>> There could be a seqam function that returns a TupleDesc with the
>>>> required columns, for example.
>>>
>>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>>> each sequence AM?
>>
>> No, that's not what I meant. The number of catalog tables would be the
>> same as today. Sequences look much like any other relation, with entries
>> in pg_attribute catalog table for all the attributes for each sequence.
>> Currently, all sequences have the same set of attributes, sequence_name,
>> last_value and so forth. What I'm proposing is that there would a set of
>> attributes that are common to all sequences, but in addition to that
>> there could be any number of AM-specific attributes.
>
> Oh, that's interesting idea, so the AM interfaces would basically return
> updated tuple and there would be some description function that returns
> tupledesc.

Yeah, something like that.

> I am bit worried that this would kill any possibility of
> ALTER SEQUENCE USING access_method. Plus I don't think it actually
> solves any real problem - serializing the internal C structs into bytea
> is not any harder than serializing them into tuple IMHO.

I agree that serialization to bytea isn't that difficult, but it's still 
nicer to work directly with the correct data types. And it makes the 
internal state easily accessible for monitoring and debugging purposes.

>>> It also does not really solve the amdata being dynamic
>>> size "issue".
>>
>> Yes it would. There would not be a single amdata attribute, but the AM
>> could specify any number of custom attributes, which could be fixed size
>> or varlen. It would be solely the AM's responsibility to set the values
>> of those attributes.
>>
>
> That's not the issue I was referring to, I was talking about the page
> replacement code which is not as simple now that we have potentially
> dynamic size tuple and if tuples were different for different AMs the
> code would still have to be able to handle that case. Setting the values
> in tuple itself is not too complicated.

I don't see the problem with that. We deal with variable-sized tuples in 
heap pages all the time. The max size of amdata (or the extra 
AM-specific columns) is going to be determined by the block size, though.

- Heikki



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 05/11/14 18:32, Heikki Linnakangas wrote:
> On 11/05/2014 05:07 PM, Petr Jelinek wrote:
>> On 05/11/14 13:45, Heikki Linnakangas wrote:
>>> In fact, if the seqam manages the current value outside the database
>>> (e.g. a "remote" seqam that gets the value from another server),
>>> nextval() never needs to write a WAL record.
>>
>> Sure it does, you need to keep the current state in Postgres also, at
>> least the current value so that you can pass correct input to
>> sequence_alloc(). And you need to do this in crash-safe way so WAL is
>> necessary.
>
> Why does sequence_alloc need the current value? If it's a "remote"
> seqam, the current value is kept in the remote server, and the last
> value that was given to this PostgreSQL server is irrelevant.

Hmm, I am not sure if I see this usecase as practical TBH, but I also 
don't see fundamental problem with it.

> That irks me with this API. The method for acquiring a new value isn't
> fully abstracted behind the AM interface, as sequence.c still needs to
> track it itself. That's useful for the local AM, of course, and maybe
> some others, but for others it's totally useless.

Hmm, I think that kind of abstraction can only be done by passing 
current tuple and returning updated tuple (yes I realize that it's what 
you have been saying basically).

In general it sounds like the level of abstraction you'd want would be 
one where AM cares about everything except the the code that does the 
actual writes to page and WAL (but when to do those would still be 
controlled completely by AM?) and the SQL interface.
I don't see how to make that work with ALTER SEQUENCE USING to be honest 
and I do care quite a lot about that use-case (I think the ability to 
convert the "local" sequences to 3rd party ones and back is very important).

>>
>> That's not the issue I was referring to, I was talking about the page
>> replacement code which is not as simple now that we have potentially
>> dynamic size tuple and if tuples were different for different AMs the
>> code would still have to be able to handle that case. Setting the values
>> in tuple itself is not too complicated.
>
> I don't see the problem with that. We deal with variable-sized tuples in
> heap pages all the time. The max size of amdata (or the extra
> AM-specific columns) is going to be determined by the block size, though.
>

Glad to hear that. Yes the limit is block size, I think we can live with 
that at least for the moment...

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Craig Ringer
Date:
On 11/05/2014 05:01 AM, Petr Jelinek wrote:
> I guess I could port BDR sequences to this if it would help (once we
> have bit more solid agreement that the proposed API at least
> theoretically seems ok so that I don't have to rewrite it 10 times if at
> all possible).

Because the BDR sequences rely on all the other BDR machinery I suspect
that'd be a pretty big thing to review and follow for someone who
doesn't know the BDR code.

Do you think it'd be simple to provide a blocking, transactional
sequence allocator via this API? i.e. gapless sequences, much the same
as typically implemented with SELECT ... FOR UPDATE on a counter table.

It might be more digestible standalone, and would be a handy contrib/
example extension demonstrating use of the API.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Alvaro Herrera
Date:
Craig Ringer wrote:
> On 11/05/2014 05:01 AM, Petr Jelinek wrote:
> > I guess I could port BDR sequences to this if it would help (once we
> > have bit more solid agreement that the proposed API at least
> > theoretically seems ok so that I don't have to rewrite it 10 times if at
> > all possible).
> 
> Because the BDR sequences rely on all the other BDR machinery I suspect
> that'd be a pretty big thing to review and follow for someone who
> doesn't know the BDR code.
> 
> Do you think it'd be simple to provide a blocking, transactional
> sequence allocator via this API? i.e. gapless sequences, much the same
> as typically implemented with SELECT ... FOR UPDATE on a counter table.

I think that would be a useful contrib module too; gapless allocation is
a frequent user need.  If we can build it in some reasonable way with the
seqam API, it'd show the API is good.

But on the other hand, since BDR sequences are the ultimate goal of this
patch, we need to make sure that they can be made to work with whatever
the seqam API ends up being.  It's not necessary for reviewers here to
review that code, but the BDR developers need to say "yes, this API
works for us" -- just like the slony developers (well, ssinger) did for
logical decoding.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 06/11/14 11:22, Craig Ringer wrote:
> On 11/05/2014 05:01 AM, Petr Jelinek wrote:
>> I guess I could port BDR sequences to this if it would help (once we
>> have bit more solid agreement that the proposed API at least
>> theoretically seems ok so that I don't have to rewrite it 10 times if at
>> all possible).
>
> Because the BDR sequences rely on all the other BDR machinery I suspect
> that'd be a pretty big thing to review and follow for someone who
> doesn't know the BDR code.
>
> Do you think it'd be simple to provide a blocking, transactional
> sequence allocator via this API? i.e. gapless sequences, much the same
> as typically implemented with SELECT ... FOR UPDATE on a counter table.
>
> It might be more digestible standalone, and would be a handy contrib/
> example extension demonstrating use of the API.
>

Yes I think that's doable (once we iron out the API we can agree on).


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that
use-case(I think the ability to convert the "local" sequences to 3rd party ones and back is very important). 

What specific problems do you foresee?  There's an issue if something depends on one of the added sequence columns, but
ifthat is the case then you had *better* fail. 

I think that the debugability value of making extra sequence columns human-readable is quite high.

...Robert


Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 08/11/14 00:45, Robert Haas wrote:
> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that
use-case(I think the ability to convert the "local" sequences to 3rd party ones and back is very important).
 
>
> What specific problems do you foresee?  There's an issue if something depends on one of the added sequence columns,
butif that is the case then you had *better* fail.
 
>
> I think that the debugability value of making extra sequence columns human-readable is quite high.
>

My main problem is actually not with having tuple per-seqAM, but more 
with the fact that Heikki does not want to have last_value as compulsory 
column/parameter. How is the new AM then supposed to know where to pick 
up and if it even can pick up?

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 08/11/14 00:57, Petr Jelinek wrote:
> On 08/11/14 00:45, Robert Haas wrote:
>> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>> I don't see how to make that work with ALTER SEQUENCE USING to be
>>> honest and I do care quite a lot about that use-case (I think the
>>> ability to convert the "local" sequences to 3rd party ones and back
>>> is very important).
>>
>> What specific problems do you foresee?  There's an issue if something
>> depends on one of the added sequence columns, but if that is the case
>> then you had *better* fail.
>>
>> I think that the debugability value of making extra sequence columns
>> human-readable is quite high.
>>
>
> My main problem is actually not with having tuple per-seqAM, but more
> with the fact that Heikki does not want to have last_value as compulsory
> column/parameter. How is the new AM then supposed to know where to pick
> up and if it even can pick up?
>

And obviously, once the last_value is part of the compulsory columns we 
again have to WAL log all the time for the use-case which Heikki is 
using as model, so it does not help there (just to clear what my point 
was about).


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> My main problem is actually not with having tuple per-seqAM, but more
>> with the fact that Heikki does not want to have last_value as compulsory
>> column/parameter. How is the new AM then supposed to know where to pick
>> up and if it even can pick up?

That seems pretty well impossible to know anyway.  If the pluggable AM
was handing out values at random or in some unpredictable fashion,
there may be no well-defined point where it's safe for the default AM
to resume.  Granted, in the case of replication, it probably is
possible, and maybe that's important enough to be worth catering to.

> And obviously, once the last_value is part of the compulsory columns we
> again have to WAL log all the time for the use-case which Heikki is using as
> model, so it does not help there (just to clear what my point was about).

But I don't know what to do about that.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sequence Access Method WIP

From
Craig Ringer
Date:
On 11/08/2014 12:35 AM, Petr Jelinek wrote:
>>
>> Do you think it'd be simple to provide a blocking, transactional
>> sequence allocator via this API? i.e. gapless sequences, much the same
>> as typically implemented with SELECT ... FOR UPDATE on a counter table.
>>
>> It might be more digestible standalone, and would be a handy contrib/
>> example extension demonstrating use of the API.
>>
> 
> Yes I think that's doable (once we iron out the API we can agree on).

Cool; I think that'd be an interesting self-contained example that'd
also have real-world uses ... and provide a nice succinct way to answer
those semi-regular "how do I create a sequence that doesn't have holes"
questions.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 08/11/14 03:10, Robert Haas wrote:
> On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>> My main problem is actually not with having tuple per-seqAM, but more
>>> with the fact that Heikki does not want to have last_value as compulsory
>>> column/parameter. How is the new AM then supposed to know where to pick
>>> up and if it even can pick up?
>
> That seems pretty well impossible to know anyway.  If the pluggable AM
> was handing out values at random or in some unpredictable fashion,
> there may be no well-defined point where it's safe for the default AM
> to resume.  Granted, in the case of replication, it probably is
> possible, and maybe that's important enough to be worth catering to.

While this is true, I think 99% of this use-case in practice is about 
converting existing schema with "local" sequence to some other sequence 
and perhaps fixing schema after people create sequence using wrong AM 
because their default is not what they thought it is.

>
>> And obviously, once the last_value is part of the compulsory columns we
>> again have to WAL log all the time for the use-case which Heikki is using as
>> model, so it does not help there (just to clear what my point was about).
>
> But I don't know what to do about that.  :-(
>

Me neither and I don't think this is actually solvable, we either have 
last_value and logcnt as mandatory columns and the central sequence 
server example Heikki is talking about will be impacted by this, or we 
leave those columns to AM implementations and we lose the ability to do 
AM change for majority of cases, and also IMHO most AMs will then have 
to implement their own last_value and logcnt logic anyway.

Honestly, I am still not convinced that the centralized sequence server 
with no local caching is something we should be optimizing for as that 
one will suffer from performance problems anyway. And it can ignore the 
last_value input from postgres if it choses to, so it's not like the 
currently proposed patch forbids implementation of such AMs.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> Why does sequence_alloc need the current value? If it's a "remote" seqam,
> the current value is kept in the remote server, and the last value that was
> given to this PostgreSQL server is irrelevant.
>
> That irks me with this API. The method for acquiring a new value isn't fully
> abstracted behind the AM interface, as sequence.c still needs to track it
> itself. That's useful for the local AM, of course, and maybe some others,
> but for others it's totally useless.

Please bear in mind what seqam is for...

At present it's only use is to provide Global Sequences. There's a few
ways of doing that, but they all look sorta similar.

The only other use we thought of was shot down, so producing the
perfect API isn't likely to help anyone. It's really not worth the
effort to produce a better API.

BDR doesn't require Global Sequences, nor are Global Sequences
restricted in their use to just BDR - lots of cluster configurations
would want something like this.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Sat, Nov 8, 2014 at 6:17 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> Honestly, I am still not convinced that the centralized sequence server with
> no local caching is something we should be optimizing for as that one will
> suffer from performance problems anyway. And it can ignore the last_value
> input from postgres if it choses to, so it's not like the currently proposed
> patch forbids implementation of such AMs.

I can buy that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 11/08/2014 01:57 AM, Petr Jelinek wrote:
> My main problem is actually not with having tuple per-seqAM, but more
> with the fact that Heikki does not want to have last_value as compulsory
> column/parameter. How is the new AM then supposed to know where to pick
> up and if it even can pick up?

Call nextval(), and use the value it returns as the starting point for 
the new AM.

- Heikki




Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 09/11/14 20:47, Heikki Linnakangas wrote:
> On 11/08/2014 01:57 AM, Petr Jelinek wrote:
>> My main problem is actually not with having tuple per-seqAM, but more
>> with the fact that Heikki does not want to have last_value as compulsory
>> column/parameter. How is the new AM then supposed to know where to pick
>> up and if it even can pick up?
>
> Call nextval(), and use the value it returns as the starting point for
> the new AM.
>

Hah, so simple, works for me.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 11/08/2014 04:21 PM, Simon Riggs wrote:
> On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>> Why does sequence_alloc need the current value? If it's a "remote" seqam,
>> the current value is kept in the remote server, and the last value that was
>> given to this PostgreSQL server is irrelevant.
>>
>> That irks me with this API. The method for acquiring a new value isn't fully
>> abstracted behind the AM interface, as sequence.c still needs to track it
>> itself. That's useful for the local AM, of course, and maybe some others,
>> but for others it's totally useless.
>
> Please bear in mind what seqam is for...
>
> At present it's only use is to provide Global Sequences. There's a few
> ways of doing that, but they all look sorta similar.

Ok.

> The only other use we thought of was shot down, so producing the
> perfect API isn't likely to help anyone. It's really not worth the
> effort to produce a better API.

Your argument seems to be that because this API is only used for 
creating global sequences, it's not worth it to make it better for that 
purpose. That doesn't make much sense, so that's probably not what you 
meant. I'm confused.

To be clear: I don't think this API is very good for its stated purpose, 
for implementing global sequences for use in a cluster. For the reasons 
I've mentioned before.  I'd like to see two changes to this proposal:

1. Make the AM implementation solely responsible for remembering the 
"last value". (if it's a global or remote sequence, the current value 
might not be stored in the local server at all)

2. Instead of the single amdata field, make it possible for the 
implementation to define any number of fields with any datatype in the 
tuple. That would make debugging, monitoring etc. easier.

> BDR doesn't require Global Sequences, nor are Global Sequences
> restricted in their use to just BDR - lots of cluster configurations
> would want something like this.

Sure. (I don't see how that's relevant to this discussion, however).

(marking this as "Returned with Feedback" in the commitfest)

- Heikki




Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
> To be clear: I don't think this API is very good for its stated purpose, for
> implementing global sequences for use in a cluster. For the reasons I've
> mentioned before.  I'd like to see two changes to this proposal:
> 
> 1. Make the AM implementation solely responsible for remembering the "last
> value". (if it's a global or remote sequence, the current value might not be
> stored in the local server at all)

I think that reason isn't particularly good. The practical applicability
for such a implementation doesn't seem to be particularly large.

> 2. Instead of the single amdata field, make it possible for the
> implementation to define any number of fields with any datatype in the
> tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence. Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
José Luis Tallón
Date:
On 12/02/2014 08:21 PM, Andres Freund wrote:
> [snip]
>> 2. Instead of the single amdata field, make it possible for the
>> implementation to define any number of fields with any datatype in the
>> tuple. That would make debugging, monitoring etc. easier.
> My main problem with that approach is that it pretty much nails the door
> shut for moving sequences into a catalog table instead of the current,
> pretty insane, approach of a physical file per sequence.

Hmm...  having done my fair bit of testing, I can say that this isn't 
actually that bad (though depends heavily on the underlying filesystem 
and workload, of course)
With this approach, I fear extreme I/O contention with an update-heavy 
workload... unless all sequence activity is finally WAL-logged and hence 
writes to the actual files become mostly sequential and asynchronous.

May I possibly suggest a file-per-schema model instead? This approach 
would certainly solve the excessive i-node consumption problem that --I 
guess-- Andres is trying to address here.
Moreover, the "one file per schema for sequences" solution would fit a 
quite common model of grouping tables (in schemas) for physical 
[tablespace] location purposes....
> Currently, with
> our without seqam, it'd not be all that hard to force it into a catalog,
> taking care to to force each tuple onto a separate page...

IMHO, this is jst as wasteful as the current approach (one-page file per 
sequence) in terms of disk usage and complicates the code a bit .... but 
I really don't see how we can have more than one sequence state per page 
without severe (page) locking problems.
However, someone with deeper knowledge of page pinning and buffer 
manager internals could certainly devise a better solution...

Just my 2c

Thanks,
    / J.L.




Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
> On 12/02/2014 08:21 PM, Andres Freund wrote:
> >[snip]
> >>2. Instead of the single amdata field, make it possible for the
> >>implementation to define any number of fields with any datatype in the
> >>tuple. That would make debugging, monitoring etc. easier.
> >My main problem with that approach is that it pretty much nails the door
> >shut for moving sequences into a catalog table instead of the current,
> >pretty insane, approach of a physical file per sequence.
> 
> Hmm...  having done my fair bit of testing, I can say that this isn't
> actually that bad (though depends heavily on the underlying filesystem and
> workload, of course)
> With this approach, I fear extreme I/O contention with an update-heavy
> workload... unless all sequence activity is finally WAL-logged and hence
> writes to the actual files become mostly sequential and asynchronous.

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.

> May I possibly suggest a file-per-schema model instead? This approach would
> certainly solve the excessive i-node consumption problem that --I guess--
> Andres is trying to address here.

I don't think that really has any advantages.

> >Currently, with
> >our without seqam, it'd not be all that hard to force it into a catalog,
> >taking care to to force each tuple onto a separate page...
> 
> IMHO, this is jst as wasteful as the current approach (one-page file per
> sequence) in terms of disk usage and complicates the code a bit .... but I
> really don't see how we can have more than one sequence state per page
> without severe (page) locking problems.

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...

> However, someone with deeper knowledge of page pinning and buffer manager
> internals could certainly devise a better solution...

I think there's pretty much no chance of accepting more than one page
per

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 02/12/14 20:21, Andres Freund wrote:
> On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
>> To be clear: I don't think this API is very good for its stated purpose, for
>> implementing global sequences for use in a cluster. For the reasons I've
>> mentioned before.  I'd like to see two changes to this proposal:
>> ...
>> 2. Instead of the single amdata field, make it possible for the
>> implementation to define any number of fields with any datatype in the
>> tuple. That would make debugging, monitoring etc. easier.
>
> My main problem with that approach is that it pretty much nails the door
> shut for moving sequences into a catalog table instead of the current,
> pretty insane, approach of a physical file per sequence. Currently, with
> our without seqam, it'd not be all that hard to force it into a catalog,
> taking care to to force each tuple onto a separate page...
>

I don't know, I think if we decide to change storage format we can do 
serialization/conversion in seqam layer, it does not seem to matter too 
much if the serialization into some opaque type is done in AM itself or 
by the API layer. Or we can have one relation for all sequences in 
single AM, etc. In general I don't think that the custom columns for AM 
approach prohibits future storage changes.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
José Luis Tallón
Date:
On 12/03/2014 11:24 AM, Andres Freund wrote:
> On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
>> snip]
> I don't think the WAL logging would need to change much in comparison to
> the current solution. We'd just add the page number to the WAL record.
>
> The biggest advantage would be to require fewer heavyweight locks,
> because the pg_sequence one could be a single fastpath lock. Currently
> we have to take the sequence's relation lock (heavyweight) and then the
> the page level lock (lwlock) for every single sequence used.

Got it, thank you for the explanation.

>> May I possibly suggest a file-per-schema model instead? This approach would
>> certainly solve the excessive i-node consumption problem that --I guess--
>> Andres is trying to address here.
> I don't think that really has any advantages.

Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have 
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences.... save 
for *massively* multi-tenant databases.

The downside being that all that random I/O --- in general, it can't 
really be sequential unless there are very very few sequences--- can't 
be spread to other spindles. Create a "sequence_default_tablespace" GUC 
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe? (I could take a shot at the patch, if deemed
worthwhile)

>> [snip]
> The overhead of a file is much more than wasting the remainder of a
> page. Alone the requirement of separate fsyncs and everything is pretty
> bothersome. The generated IO patterns are also much worse...

Yes, you are right. I stand corrected.

> [snip]
> I think there's pretty much no chance of accepting more than one page
> per sequence

Definitively.


Thanks,
    J.L.




Re: Sequence Access Method WIP

From
Jim Nasby
Date:
On 12/3/14, 8:50 AM, José Luis Tallón wrote:
>
>>> May I possibly suggest a file-per-schema model instead? This approach would
>>> certainly solve the excessive i-node consumption problem that --I guess--
>>> Andres is trying to address here.
>> I don't think that really has any advantages.
>
> Just spreading the I/O load, nothing more, it seems:
>
> Just to elaborate a bit on the reasoning, for completeness' sake:
> Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k sequences per relation segment.
> Arguably, not many real use cases will have that many sequences.... save for *massively* multi-tenant databases.
>
> The downside being that all that random I/O --- in general, it can't really be sequential unless there are very very
fewsequences--- can't be spread to other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE SET
TABLESPACE,to use an SSD for this purpose maybe?
 

Why not? RAID arrays typically use stripe sizes in the 128-256k range, which means only 16 or 32 sequences per stripe.

It still might make sense to allow controlling what tablespace a sequence is in, but IMHO the default should just be
pg_default.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Sequence Access Method WIP

From
Andres Freund
Date:
> >>May I possibly suggest a file-per-schema model instead? This approach would
> >>certainly solve the excessive i-node consumption problem that --I guess--
> >>Andres is trying to address here.
> >I don't think that really has any advantages.
> 
> Just spreading the I/O load, nothing more, it seems:
> 
> Just to elaborate a bit on the reasoning, for completeness' sake:
> Given that a relation's segment maximum size is 1GB, we'd have
> (1048576/8)=128k sequences per relation segment.
> Arguably, not many real use cases will have that many sequences.... save for
> *massively* multi-tenant databases.
> 
> The downside being that all that random I/O --- in general, it can't really
> be sequential unless there are very very few sequences--- can't be spread to
> other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE
> SET TABLESPACE, to use an SSD for this purpose maybe?
>  (I could take a shot at the patch, if deemed worthwhile)

I think that's just so far outside the sane usecases that I really don't
see us adding complexity to reign it in. If your frequently used
sequences get flushed out to disk by anything but the checkpointer the
schema is just badly designed...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 24/11/14 12:16, Heikki Linnakangas wrote:
> I'd like to see two changes to this proposal:
>
> 1. Make the AM implementation solely responsible for remembering the
> "last value". (if it's a global or remote sequence, the current value
> might not be stored in the local server at all)
>
> 2. Instead of the single amdata field, make it possible for the
> implementation to define any number of fields with any datatype in the
> tuple. That would make debugging, monitoring etc. easier.
>

Hi,

here is the new version for next CF, there are several rough edges (more
about that later) but I think it's ready for more feedback.

I did change the API quite considerably (rewrite is the word I believe).

Instead of trying to put smarts on one or the other side of the API, I
basically created 2 APIs, one is the seqam which is responsible for
maintaining the sequence data and the other is sequence storage API
provided by postgres backend.

This means that backend cares purely about storage, and local caching
while AM cares about the data inside sequence, and decides when to WAL.

I'll start with storage API description:
- sequence_open(Oid) - open sequence with given Oid and returns
SequenceHandle (opens relation in share lock only)

- sequence_read_tuple(SequenceHandle) - returns heap tuple of the
sequence, this will also do the buffer locking

- sequence_save_tuple(SequenceHandle, newtuple, do_wal) - updates the
buffer with new sequence, optionally does WAL logging, newtuple can be
NULL in which case the tuple is not replaced (this is performance
optimization for sequences that don't need varlena/nullable columns and
can update everything inplace)

- sequence_release_tuple(SequenceHandle) - unlocks previously locked
buffer - usable for synchronization inside the seqam implementation, the
gapless sequence uses this

- sequence_close(SequenceHandle) - closes the sequence handle, also
unlocks buffer if needed

- sequence_needs_wal(SequenceHandle) - returns true if sequence needs
wal, that is if the last wal write happened before last checkpoint -
this is a little bit low level but I don't see a way around it if the
local sequence is supposed to maintain last_value and logcnt itself,
also we have similar function for relations (that checks for different
things but naming and usage is similar) so it's probably not too bad

The SequenceHandle is opaque struct so no gory details seen by the AM.

Now for the actual sequence AM API:
- seqam_extra_columns(Oid) - returns list of custom columns needed by
specified sequenceAM

- seqam_init - Initializes the columns for the custom sequenceAM, used
when creating, altering or resetting sequence. It gets sequenceam Oid,
sequence options, reloptions on input and returns updated values and
nulls arrays.

- seqam_alloc - Allocates new sequence values. It gets sequence
relation, handle and how many new values the backend wants and returns
first and last allocated value.

- seqam_setval - implements setval() support

- seqam_dump_state - pg_dump support
- seqam_restore_state - pg_dump support


The API is slightly bigger, but it also includes proper support for
dumping and restoring via pg_dump and seems somewhat cleaner, plus the
synchronous and asynchronous APIs are now same and no callbacks anywhere
(if AM needs to do something asynchronously it just calls the storage API).

There is also gapless sequence contrib module included as separate
patch, it's missing proper locking during dump/restore atm but otherwise
should be fully working (see tests). I am also using it for testing the
ALTER TABLE USING.

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly
- I removed the setval3 since there is no guarantee that sequence will
have is_called parameter but there is now proper pg_dump support so it's
not needed for pg_dump usecase, but it's still incompatible change
- ResetSequence basically does ALTER SEQUENCE RESET internally instead
calling setval now because of the above
- there are no docs for the C APIs yet - I don't even know where those
should be placed btw, would appreciate suggestions (I have same problem
with committs API docs)

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 10/12/14 03:33, Petr Jelinek wrote:
> On 24/11/14 12:16, Heikki Linnakangas wrote:
>
> About the rough edges:
> - The AlterSequence is not prettiest code around as we now have to
> create new relation when sequence AM is changed and I don't know how to
> do that nicely
> - I am not sure if I did the locking, command order and dependency
> handling in AlterSequence correcly

This version does AlterSequence differently and better. Didn't attach
the gapless sequence again as that one is unchanged.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 15/12/14 11:36, Petr Jelinek wrote:
> On 10/12/14 03:33, Petr Jelinek wrote:
>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>
>> About the rough edges:
>> - The AlterSequence is not prettiest code around as we now have to
>> create new relation when sequence AM is changed and I don't know how to
>> do that nicely
>> - I am not sure if I did the locking, command order and dependency
>> handling in AlterSequence correcly
>
> This version does AlterSequence differently and better. Didn't attach
> the gapless sequence again as that one is unchanged.
>
>

And another version, separated into patch-set of 3 patches.
First patch contains the seqam patch itself, not many changes there,
mainly docs/comments related. What I wrote in the previous email for
version 3 still applies.

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.

And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Tomas Vondra
Date:
On 12.1.2015 22:33, Petr Jelinek wrote:
> On 15/12/14 11:36, Petr Jelinek wrote:
>> On 10/12/14 03:33, Petr Jelinek wrote:
>>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>>
>>> About the rough edges:
>>> - The AlterSequence is not prettiest code around as we now have to
>>> create new relation when sequence AM is changed and I don't know how to
>>> do that nicely
>>> - I am not sure if I did the locking, command order and dependency
>>> handling in AlterSequence correcly
>>
>> This version does AlterSequence differently and better. Didn't attach
>> the gapless sequence again as that one is unchanged.
>>
>>
> 
> And another version, separated into patch-set of 3 patches.
> First patch contains the seqam patch itself, not many changes there,
> mainly docs/comments related. What I wrote in the previous email for
> version 3 still applies.

I did a review of the first part today - mostly by reading through the
diff. I plan to do a bit more thorough testing in a day or two. I'll
also look at the two (much smaller) patches.

comments/questions/general nitpicking:

(1) Why treating empty string as equal to 'local'? Isn't enforcing the   actual value a better approach?

(2) NITPICK: Maybe we could use access_method in the docs (instead of   sequence_access_method), as the 'sequence' part
isclear from the   context?
 

(3) Why index_reloptions / sequence_reloptions when both do exactly the   same thing (call to common_am_reloptions)?
I'dunderstand this if   the patch(es) then change the sequence_reloptions() but that's not   happening. Maybe that's
expectedto happen?
 

(4) DOCS: Each sequence can only use one access method at a time.
   Does that mean a sequence can change the access method during it's   lifetime? My understanding is that the AM is
fixedafter creating   the sequence?
 

(5) DOCS/NITPICK: SeqAM / SeqAm ... (probably should be the same?)

(6) cache lookup failed for relation %u
   I believe this should reference 'sequence access method' instead of   a relation.

(7) seqam_init
   I believe this is a bug (not setting nulls[4] but twice nulls[3]):

+    fcinfo.argnull[0] = seqparams == NULL;
+    fcinfo.argnull[1] = reloptions_parsed == NULL;
+    fcinfo.argnull[2] = false;
+    fcinfo.argnull[3] = false;
+    fcinfo.argnull[3] = false;

(8) check_default_seqam without a transaction
* If we aren't inside a transaction, we cannot do database access so* cannot verify the name.    Must accept the value
onfaith.
 
   In which situation this happens? Wouldn't it be better to simply   enforce the transaction and fail otherwise?

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 13/01/15 13:24, Tomas Vondra wrote:
> On 12.1.2015 22:33, Petr Jelinek wrote:
>> On 15/12/14 11:36, Petr Jelinek wrote:
>>> On 10/12/14 03:33, Petr Jelinek wrote:
>>>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>>>
>>>> About the rough edges:
>>>> - The AlterSequence is not prettiest code around as we now have to
>>>> create new relation when sequence AM is changed and I don't know how to
>>>> do that nicely
>>>> - I am not sure if I did the locking, command order and dependency
>>>> handling in AlterSequence correcly
>>>
>>> This version does AlterSequence differently and better. Didn't attach
>>> the gapless sequence again as that one is unchanged.
>>>
>>>
>>
>> And another version, separated into patch-set of 3 patches.
>> First patch contains the seqam patch itself, not many changes there,
>> mainly docs/comments related. What I wrote in the previous email for
>> version 3 still applies.
>
> I did a review of the first part today - mostly by reading through the
> diff. I plan to do a bit more thorough testing in a day or two. I'll
> also look at the two (much smaller) patches.
>

Thanks!

> comments/questions/general nitpicking:
>
> (1) Why treating empty string as equal to 'local'? Isn't enforcing the
>      actual value a better approach?
>

Álvaro mentioned on IM also, I already changed it to saner normal GUC 
with 'local' as default value in my working copy

> (2) NITPICK: Maybe we could use access_method in the docs (instead of
>      sequence_access_method), as the 'sequence' part is clear from the
>      context?

Yes.

> (3) Why index_reloptions / sequence_reloptions when both do exactly the
>      same thing (call to common_am_reloptions)? I'd understand this if
>      the patch(es) then change the sequence_reloptions() but that's not
>      happening. Maybe that's expected to happen?

That's leftover from the original design where AM was supposed to call 
this, since this is not exposed to AM anymore I think it can be single 
function now.

>
> (4) DOCS: Each sequence can only use one access method at a time.
>
>      Does that mean a sequence can change the access method during it's
>      lifetime? My understanding is that the AM is fixed after creating
>      the sequence?
>

Oh, I forgot to add ALTER SEQUENCE USING into docs, you can change AM 
even though you probably don't want to do it often, but for migrations 
it's useful.

> (8) check_default_seqam without a transaction
>
>   * If we aren't inside a transaction, we cannot do database access so
>   * cannot verify the name.    Must accept the value on faith.
>
>      In which situation this happens? Wouldn't it be better to simply
>      enforce the transaction and fail otherwise?

Reading postgresql.conf during startup, I don't think we want to fail in 
that scenario ;)


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 01/12/2015 11:33 PM, Petr Jelinek wrote:
> Second patch adds DDL support. I originally wanted to make it
> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
> which does not need to change anything (besides adding METHOD to
> unreserved keywords).
> The DDL support uses the DefineStmt infra with some very small change as
> the sequence ams are not schema qualified, but I think it's acceptable
> and saves considerable amount of boilerplate.

Do we need DDL commands for this at all? I could go either way on that 
question. We recently had a discussion on that wrt. index access methods 
[1], and Tom opined that providing DDL for creating index access methods 
is not worth it. The extension can just insert the rows into pg_seqam 
with INSERT. Do we expect sequence access methods as extensions to be 
more popular than index access methods? Maybe, because the WAL-logging 
problem doesn't exist. But OTOH, if you're writing something like a 
replication system that needs global sequences as part of it, there 
aren't that many of those, and the installation scripts will need to 
deal with more complicated stuff than inserting a row in pg_seqam.

[1] http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us

> And third patch is gapless sequence implementation updated to work with
> the new DDL support with some tests added for checking if dependencies
> work correctly. It also acts as example on how to make custom AMs.

I'll take a look at that to see how the API works, but we're not going 
to include it in the source tree, because it doesn't actually guarantee 
gaplessness. That makes it a pretty dangerous example. I'm sure we can 
come up with a better example that might even be useful. How about a 
Lamport's clock sequence, which advances once per second, in addition to 
when anyone calls nextval() ? Or a remote sequence that uses an FDW to 
call nextval() in a foreign server?

- Heikki




Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Thu, Jan 22, 2015 at 10:50 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 01/12/2015 11:33 PM, Petr Jelinek wrote:
>> Second patch adds DDL support. I originally wanted to make it
>> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
>> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
>> which does not need to change anything (besides adding METHOD to
>> unreserved keywords).
>> The DDL support uses the DefineStmt infra with some very small change as
>> the sequence ams are not schema qualified, but I think it's acceptable
>> and saves considerable amount of boilerplate.
>
> Do we need DDL commands for this at all? I could go either way on that
> question. We recently had a discussion on that wrt. index access methods
> [1], and Tom opined that providing DDL for creating index access methods is
> not worth it. The extension can just insert the rows into pg_seqam with
> INSERT. Do we expect sequence access methods as extensions to be more
> popular than index access methods? Maybe, because the WAL-logging problem
> doesn't exist. But OTOH, if you're writing something like a replication
> system that needs global sequences as part of it, there aren't that many of
> those, and the installation scripts will need to deal with more complicated
> stuff than inserting a row in pg_seqam.

I think the main reason we don't need DDL for pg_am is because there's
no real hope of anybody actually inserting anything in there whether
we have a command for it or not.  If we actually expect this to be
used, I think it should probably have some kind of DDL, because
otherwise what will pg_dump emit?  "Nothing" is bad because then dumps
won't restore, and "direct inserts to pg_seqam" doesn't seem great
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 22/01/15 16:50, Heikki Linnakangas wrote:
> On 01/12/2015 11:33 PM, Petr Jelinek wrote:
>> Second patch adds DDL support. I originally wanted to make it
>> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
>> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
>> which does not need to change anything (besides adding METHOD to
>> unreserved keywords).
>> The DDL support uses the DefineStmt infra with some very small change as
>> the sequence ams are not schema qualified, but I think it's acceptable
>> and saves considerable amount of boilerplate.
>
> Do we need DDL commands for this at all? I could go either way on that
> question. We recently had a discussion on that wrt. index access methods
> [1], and Tom opined that providing DDL for creating index access methods
> is not worth it. The extension can just insert the rows into pg_seqam
> with INSERT. Do we expect sequence access methods as extensions to be
> more popular than index access methods? Maybe, because the WAL-logging
> problem doesn't exist. But OTOH, if you're writing something like a
> replication system that needs global sequences as part of it, there
> aren't that many of those, and the installation scripts will need to
> deal with more complicated stuff than inserting a row in pg_seqam.

I don't know about popularity, and I've seen the discussion about 
indexes. The motivation for DDL for me was handling dependencies 
correctly, that's all. If we say we don't care about that (and allow 
DROP EXTENSION even though user has sequences that are using the AM) 
then we don't need DDL.

>
> [1] http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us
>
>> And third patch is gapless sequence implementation updated to work with
>> the new DDL support with some tests added for checking if dependencies
>> work correctly. It also acts as example on how to make custom AMs.
>
> I'll take a look at that to see how the API works, but we're not going
> to include it in the source tree, because it doesn't actually guarantee
> gaplessness. That makes it a pretty dangerous example. I'm sure we can
> come up with a better example that might even be useful. How about a
> Lamport's clock sequence, which advances once per second, in addition to
> when anyone calls nextval() ? Or a remote sequence that uses an FDW to
> call nextval() in a foreign server?
>

I have updated patch ready, just didn't submit it because I am otherwise 
busy this week, I hope to get to it today evening or tomorrow morning, 
so I'd wait until that with looking at the patch.

The new version (the one that is not submitted yet) of gapless sequence 
is way more ugly and probably not best example either but does guarantee 
gaplessness (it stores the last value in it's own value table). So I am 
not sure if it should be included either...

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 22/01/15 17:02, Petr Jelinek wrote:
>
> The new version (the one that is not submitted yet) of gapless sequence
> is way more ugly and probably not best example either but does guarantee
> gaplessness (it stores the last value in it's own value table). So I am
> not sure if it should be included either...
>

Here it is as promised.

Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.

- Custom columns are now stored in the catalog instead of generated by
some special interface. This makes it possible to record dependencies on
types and also seems in line with how other similar things are done.

- Removed couple of pallocs from the nextval codepath which improved
performance considerably.

- Changed how the ALTER SEQUENCE ... USING behaves - when RESET WITH
value is specified the new sequenceAM will use that one for starting, if
it's not specified the nextval from original sequenceAM will be used as
RESET value. In general handling of the RESET parameter is left to the AMs.

- Unified the reloptions for index and sequences into single function.

- Re-added setval3 for compatibility reasons - it only works on local
sequences, otherwise it throws error.

- Made the 'local' default value for the GUC instead of special handling
of the empty string.


Notable missing things:
- Docs for pg_seqam.

- pg_dump support for pg_seqam. It's embarrassing but I didn't think
about this until Robert mentioned it, I am apparently too used to handle
these things using extensions.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 01/23/2015 02:34 AM, Petr Jelinek wrote:
> On 22/01/15 17:02, Petr Jelinek wrote:
>>
>> The new version (the one that is not submitted yet) of gapless sequence
>> is way more ugly and probably not best example either but does guarantee
>> gaplessness (it stores the last value in it's own value table). So I am
>> not sure if it should be included either...
>
> Here it is as promised.

I generally like the division of labour between the seqam 
implementations and the API now.

I don't like the default_sequenceam GUC. That seems likely to create 
confusion. And it's not really enough for something like a remote 
sequence AM anyway that definitely needs some extra reloptions, like the 
hostname of the remote server. The default should be 'local', unless you 
specify something else with CREATE SEQUENCE USING - no GUCs.

Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number 
instead of checking whether pg_sequence_dump_state() function exists. 
That's what we usually do when new features are introduced. And there's 
actually a bug there: you have the check backwards. (try dumping a 
database with any sequences in it; it fails)

* pg_dump should not output a USING clause when a sequence uses the 
'local' sequence. No point in adding such noise - making the SQL command 
not standard-compatible - for the 99% of databases that don't use other 
sequence AMs.

* Be careful to escape all strings correctly in pg_dump. I think you're 
missing escaping for the 'state' variable at least.

In sequence_save_tuple:
>     else
>     {
>         /*
>          * New tuple was not sent, so the original tuple was probably just
>          * changed inline, all we need to do is mark the buffer dirty and
>          * optionally log the update tuple.
>          */
>         START_CRIT_SECTION();
>
>         MarkBufferDirty(seqh->buf);
>
>         if (do_wal)
>             log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page);
>
>         END_CRIT_SECTION();
>     }

This is wrong when checksums are enabled and !do_wal. I believe this 
should use MarkBufferDirtyHint().

> Notable changes:
> - The gapless sequence rewritten to use the transactional storage as
> that's the only way to guarantee gaplessness between dump and restore.

Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of 
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the 
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, 
updating a row in a table for every nextval() call is pretty darn 
expensive. But I don't actually see a problem with dump and restore.

Can you rewrite it without using the values table? AFAICS, there are a 
two of problems to solve:

1. If the top-level transaction aborts, you need to restore the old 
value. I suggest keeping two values in the sequence tuple: the old, 
definitely committed one, and the last value. The last value is only 
considered current if the associated XID committed; otherwise the old 
value is current. When a transaction is about to commit, it stores its 
top-level XID and the new value in the "last" field, and copies the 
previously current value to the "old" field.

2. You need to track the last value on a per-subtransaction basis, until 
the transaction commits/rolls back, in order to have "rollback to 
savepoint" to retreat the sequence's value. That can be done in 
backend-private memory, maintaining a stack of subtransactions and the 
last value of each. Use RegisterSubXactCallback to hook into subxact 
commit/abort. Just before top-level commit (in pre-commit callback), 
update the sequence tuple with the latest value, so that it gets 
WAL-logged before the commit record.

- Heikki




Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 28/01/15 18:09, Heikki Linnakangas wrote:
> On 01/23/2015 02:34 AM, Petr Jelinek wrote:
>> On 22/01/15 17:02, Petr Jelinek wrote:
>>>
>>> The new version (the one that is not submitted yet) of gapless sequence
>>> is way more ugly and probably not best example either but does guarantee
>>> gaplessness (it stores the last value in it's own value table). So I am
>>> not sure if it should be included either...
>>
>> Here it is as promised.
>
> I generally like the division of labour between the seqam
> implementations and the API now.
>
> I don't like the default_sequenceam GUC. That seems likely to create
> confusion. And it's not really enough for something like a remote
> sequence AM anyway that definitely needs some extra reloptions, like the
> hostname of the remote server. The default should be 'local', unless you
> specify something else with CREATE SEQUENCE USING - no GUCs.
>

Hmm, I am not too happy about this, I want SERIAL to work with custom 
sequences (as long as it's possible for the sequence to work that way at 
least). That's actually important feature for me. Your argument about 
this being potential problem for some sequenceAMs is valid though.

> Some comments on pg_dump:
>
> * In pg_dump's dumpSequence function, check the server version number
> instead of checking whether pg_sequence_dump_state() function exists.
> That's what we usually do when new features are introduced. And there's
> actually a bug there: you have the check backwards. (try dumping a
> database with any sequences in it; it fails)

Right.

> * pg_dump should not output a USING clause when a sequence uses the
> 'local' sequence. No point in adding such noise - making the SQL command
> not standard-compatible - for the 99% of databases that don't use other
> sequence AMs.

Well this only works if we remove the GUC. Because otherwise if GUC is 
there then you always need to either add USING clause or set the GUC in 
advance (like we do with search_path for example).

> * Be careful to escape all strings correctly in pg_dump. I think you're
> missing escaping for the 'state' variable at least.

Ouch.

> In sequence_save_tuple:
>>     else
>>     {
>>         /*
>>          * New tuple was not sent, so the original tuple was probably
>> just
>>          * changed inline, all we need to do is mark the buffer dirty and
>>          * optionally log the update tuple.
>>          */
>>         START_CRIT_SECTION();
>>
>>         MarkBufferDirty(seqh->buf);
>>
>>         if (do_wal)
>>             log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page);
>>
>>         END_CRIT_SECTION();
>>     }
>
> This is wrong when checksums are enabled and !do_wal. I believe this
> should use MarkBufferDirtyHint().
>

Oh ok, didn't realize.

>> Notable changes:
>> - The gapless sequence rewritten to use the transactional storage as
>> that's the only way to guarantee gaplessness between dump and restore.
>
> Can you elaborate?
>
> Using the auxiliary seqam_gapless_values is a bit problematic. First of
> all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
> whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,

Yeah but at worst there are some unused rows there, it's not too bad. I 
could also create relation per sequence so that dependency code would 
handle everything correctly, but seems bit too expensive to create not 
one but two relations per sequence...

> updating a row in a table for every nextval() call is pretty darn
> expensive.

Yes it's expensive, but the gapless sequence also serializes access so 
it will never be speed daemon.

> But I don't actually see a problem with dump and restore.

The problem is that the tuple stored in the sequence relation is always 
the one with latest state (and with frozen xid), so pg_dump has no way 
of getting the value as it was at the time it took the snapshot. This 
means that after dump/restore cycle, the sequence can be further away 
than the table it's attached to and you end up with a gap there.


> Can you rewrite it without using the values table? AFAICS, there are a
> two of problems to solve:
>
> 1. If the top-level transaction aborts, you need to restore the old
> value. I suggest keeping two values in the sequence tuple: the old,
> definitely committed one, and the last value. The last value is only
> considered current if the associated XID committed; otherwise the old
> value is current. When a transaction is about to commit, it stores its
> top-level XID and the new value in the "last" field, and copies the
> previously current value to the "old" field.
>

Yes, this is how the previous implementation worked.

> 2. You need to track the last value on a per-subtransaction basis, until
> the transaction commits/rolls back, in order to have "rollback to
> savepoint" to retreat the sequence's value. That can be done in
> backend-private memory, maintaining a stack of subtransactions and the
> last value of each. Use RegisterSubXactCallback to hook into subxact
> commit/abort. Just before top-level commit (in pre-commit callback),
> update the sequence tuple with the latest value, so that it gets
> WAL-logged before the commit record.
>

I started writing something like what you described here but then I 
realized the dump/restore problem and went with the values table which 
solves both of these issues at the same time.

BTW I am happy to discuss this at FOSDEM if you are interested and will 
have time.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Hi,

sending new version that is updated along the lines of what we discussed
at FOSDEM, which means:

- back to single bytea amdata column (no custom columns)

- the dump/restore interfaces were changed to produce/accept array of
key/value pairs

- psql shows the output of dump interface in verbose mode (this makes it
easier to inspect the state)

- last_value and is_called are always present columns even if the
sequence am does not care for them (this helps with backwards
compatibility with various admin tools)

- pg_dump uses new interface for dumping only for non-local sequences so
that dumps of schemas which are not using custom seqams are restorable
to older postgres

- the default_sequenceam was changed to serial_sequenceam and only
applies to serial/bigserial columns - I would personally prefer to have
the old default but default for serial is better than nothing

- added description of pg_seqam catalog to docs

- gapless_seq still uses table internally to support snapshots for pg_dump

- pg_dump support for dumping the sequence ams


It would be nice to also have something along the lines of chapter 55.2.
(Index Access Method Functions), I plan to send that as additional patch
in few days.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> sending new version that is updated along the lines of what we discussed at
> FOSDEM, which means:
>
> - back to single bytea amdata column (no custom columns)

Why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 17/02/15 23:11, Robert Haas wrote:
> On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> sending new version that is updated along the lines of what we discussed at
>> FOSDEM, which means:
>>
>> - back to single bytea amdata column (no custom columns)
>

Well, the main argument is still future possibility of moving into 
single table for storage. And when we discussed about it in person we 
agreed that there is not too big advantage in having separate columns 
since there need to be dump/restore state interfaces anyway so you can 
see the relevant state via those as we made the output more human 
readable (and the psql support reflects that).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 18/02/15 02:59, Petr Jelinek wrote:
> On 17/02/15 23:11, Robert Haas wrote:
>> On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr@2ndquadrant.com>
>> wrote:
>>> sending new version that is updated along the lines of what we
>>> discussed at
>>> FOSDEM, which means:
>>>
>>> - back to single bytea amdata column (no custom columns)
>>
>
> Well, the main argument is still future possibility of moving into
> single table for storage. And when we discussed about it in person we
> agreed that there is not too big advantage in having separate columns
> since there need to be dump/restore state interfaces anyway so you can
> see the relevant state via those as we made the output more human
> readable (and the psql support reflects that).
>

Also makes the patch a little simpler obviously.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Slightly updated version of the patch.

Mainly rebased against current master (there were several conflicts) and
fixed some typos, no real functional change.

I also attached initial version of the API sgml doc.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 03/15/2015 09:07 PM, Petr Jelinek wrote:
> Slightly updated version of the patch.
>
> Mainly rebased against current master (there were several conflicts) and
> fixed some typos, no real functional change.
>
> I also attached initial version of the API sgml doc.

Thanks!

With the patch, pg_class.relam column references to the pg_seqam table 
for sequences, but pg_indexam for indexes. I believe it's the first 
instance where we reuse a "foreign key" column like that. It's not a 
real foreign key, of course - that wouldn't work with a real foreign key 
at all - but it's a bit strange. That makes me a bit uncomfortable. How 
do others feel about that?

- Heikki




Re: Sequence Access Method WIP

From
Andres Freund
Date:
On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:
> With the patch, pg_class.relam column references to the pg_seqam table for
> sequences, but pg_indexam for indexes. I believe it's the first instance
> where we reuse a "foreign key" column like that. It's not a real foreign
> key, of course - that wouldn't work with a real foreign key at all - but
> it's a bit strange. That makes me a bit uncomfortable. How do others feel
> about that?

Hm. I'd modeled it more as an extension of the 'relkind' column
mentally. I.e. it further specifies how exactly the relation is
behaving. Given that the field has been added to pg_class and not
pg_index, combined with it not having index in its name, makes me think
that it actually was intended to be used the way it's done in the patch.

It's not the first column that behaves that way btw, at least pg_depend
comes to mind.

Greetings,

Andres Freund



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 20/04/15 12:05, Andres Freund wrote:
> On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:
>> With the patch, pg_class.relam column references to the pg_seqam table for
>> sequences, but pg_indexam for indexes. I believe it's the first instance
>> where we reuse a "foreign key" column like that. It's not a real foreign
>> key, of course - that wouldn't work with a real foreign key at all - but
>> it's a bit strange. That makes me a bit uncomfortable. How do others feel
>> about that?
>
> Hm. I'd modeled it more as an extension of the 'relkind' column
> mentally. I.e. it further specifies how exactly the relation is
> behaving. Given that the field has been added to pg_class and not
> pg_index, combined with it not having index in its name, makes me think
> that it actually was intended to be used the way it's done in the patch.
>

That's how I think about it too. It's also short for access method, 
nothing really suggests to me that it should be index specific by design.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 03/15/2015 09:07 PM, Petr Jelinek wrote:
> Slightly updated version of the patch.
>
> Mainly rebased against current master (there were several conflicts) and
> fixed some typos, no real functional change.
>
> I also attached initial version of the API sgml doc.

I finally got around to have another round of review on this. I fixed a
couple of little bugs, did some minor edition on comments etc. See
attached. It is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
if you want to look at individual changes. It combines your patches 1
and 4, I think those need to be applied together. I haven't looked at
the DDL changes yet.


I'm fairly happy with the alloc API now. I'm not sure it's a good idea
for the AM to access the sequence tuple directly, though. I would seem
cleaner if it only manipulated the amdata Datum. But it's not too bad as
it is.

We went back and forth on whether 'amdata' should be a single column or
multiple columns, but I guess the single bytea column was the consensus
in the end.

The division of labour between sequence.c and the AM, in the init and
the get/set_state functions, is a bit more foggy:

* Do we really need a separate amoptions() method and an init() method,
when the only caller to amoptions() is just before the init() method?
The changes in extractRelOptions suggest that it would call
am_reloptions for sequences too, but that doesn't actually seem to be
happening.

postgres=# create sequence fooseq using local with (garbage=option);
CREATE SEQUENCE

Invalid options probably should throw an error.

* Currently, the AM's init function is responsible for basic sanity
checking, like min < max. It also has to extract the RESTART value from
the list of options. I think it would make more sense to move that to
sequence.c. The AM should only be responsible for initializing the
'amdata' portion, and for handling any AM-specific options. If the AM
doesn't support some options, like MIN and MAX value, it can check them
and throw an error, but it shouldn't be responsible for just passing
them through from the arguments to the sequence tuple.

* It might be better to form the sequence tuple before calling the init
function, and pass the ready-made tuple to it. All the other API
functions deal with the tuple (after calling sequence_read_tuple), so
that would be more consistent. The init function would need to
deconstruct it to modify it, but we're not concerned about performance here.

* The transformations of the arrays in get_state() and set_state()
functions are a bit complicated. The seqam_get_state() function returns
two C arrays, and pg_sequence_get_state() turns them into a text[]
array. Why not construct the text[] array directly in the AM? I guess
it's a bit more convenient to the AM, if the pg_sequence_get_state() do
that, but if that's an issue, you could create generic helper function
to turn two C string arrays into text[], and call that from the AM.

>     seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>
>     for (i = 0; i < count; i++)
>     {
>         if (pg_strcasecmp(keys[i], "last_value") == 0)
>             seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>                                                 CStringGetDatum(values[i])));
>         else if (pg_strcasecmp(keys[i], "is_called") == 0)
>             seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>                                                 CStringGetDatum(values[i])));
>         else
>             ereport(ERROR,
>                     (errcode(ERRCODE_SYNTAX_ERROR),
>                      errmsg("invalid state key \"%s\" for local sequence",
>                             keys[i])));
>     }
>
>     sequence_save_tuple(seqh, NULL, true);

If that error happens after having already processed a "last_value" or
"is_called" entry, you have already modified the on-disk tuple. AFAICS
that's the only instance of that bug, but sequence_read_tuple - modify
tuple in place - sequence_save_tuple pattern is quite unsafe in general.
If you modify a tuple directly in a Buffer, you should have a critical
section around it. It would make sense to start a critical section in
sequence_read_tuple(), except that not all callers want to modify the
tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
functions should be split into two:

sequence_read_tuple_for_update()
sequence_save_tuple()

* seqam_local_get_state() calls sequence_read_tuple() but not
sequence_release_tuple(). It looks like sequence_close() releases the
tuple and the buffer, but that seems sloppy. sequence_close() probably
shouldn't be calling sequence_release_tuple at all, so that you'd get a
warning at the end of transaction about the buffer leak.

- Heikki


Attachment

Re: Sequence Access Method WIP

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:

> * The transformations of the arrays in get_state() and set_state() functions
> are a bit complicated. The seqam_get_state() function returns two C arrays,
> and pg_sequence_get_state() turns them into a text[] array. Why not
> construct the text[] array directly in the AM? I guess it's a bit more
> convenient to the AM, if the pg_sequence_get_state() do that, but if that's
> an issue, you could create generic helper function to turn two C string
> arrays into text[], and call that from the AM.

Um, see strlist_to_textarray() in objectaddress.c if you do that.  Maybe
we need some generic place to store that kind of helper function.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 20/04/15 17:50, Heikki Linnakangas wrote:
> On 03/15/2015 09:07 PM, Petr Jelinek wrote:
>> Slightly updated version of the patch.
>>
>> Mainly rebased against current master (there were several conflicts) and
>> fixed some typos, no real functional change.
>>
>> I also attached initial version of the API sgml doc.
>
> I finally got around to have another round of review on this. I fixed a
> couple of little bugs, did some minor edition on comments etc. See
> attached. It is also available in my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
> if you want to look at individual changes. It combines your patches 1
> and 4, I think those need to be applied together. I haven't looked at
> the DDL changes yet.

Thanks!

>
> I'm fairly happy with the alloc API now. I'm not sure it's a good idea
> for the AM to access the sequence tuple directly, though. I would seem
> cleaner if it only manipulated the amdata Datum. But it's not too bad as
> it is.

Yeah, I was thinking about this myself I just liked sending 10 
parameters to the function less than this.

>
> The division of labour between sequence.c and the AM, in the init and
> the get/set_state functions, is a bit more foggy:
>
> * Do we really need a separate amoptions() method and an init() method,
> when the only caller to amoptions() is just before the init() method?
> The changes in extractRelOptions suggest that it would call
> am_reloptions for sequences too, but that doesn't actually seem to be
> happening.

Hmm yes it should and I am sure it did at some point, must have messed 
it during one of the rebases :(

And it's the reason why we need separate API function.

>
> postgres=# create sequence fooseq using local with (garbage=option);
> CREATE SEQUENCE
>
> Invalid options probably should throw an error.
>
> * Currently, the AM's init function is responsible for basic sanity
> checking, like min < max. It also has to extract the RESTART value from
> the list of options. I think it would make more sense to move that to
> sequence.c. The AM should only be responsible for initializing the
> 'amdata' portion, and for handling any AM-specific options. If the AM
> doesn't support some options, like MIN and MAX value, it can check them
> and throw an error, but it shouldn't be responsible for just passing
> them through from the arguments to the sequence tuple.

Well then we need to send restart as additional parameter to the init 
function as restart is normally not stored anywhere.

The checking is actually if new value is withing min/max but yes that 
can be done inside sequence.c I guess.

>
> * It might be better to form the sequence tuple before calling the init
> function, and pass the ready-made tuple to it. All the other API
> functions deal with the tuple (after calling sequence_read_tuple), so
> that would be more consistent. The init function would need to
> deconstruct it to modify it, but we're not concerned about performance
> here.

Right, this is actually more of a relic of the custom columns 
implementation where I didn't want to build the tuple with NULLs for 
columns that might have been specified as NOT NULL, but with the amdata 
approach it can create the tuple safely.

>
> * The transformations of the arrays in get_state() and set_state()
> functions are a bit complicated. The seqam_get_state() function returns
> two C arrays, and pg_sequence_get_state() turns them into a text[]
> array. Why not construct the text[] array directly in the AM? I guess
> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
> that, but if that's an issue, you could create generic helper function
> to turn two C string arrays into text[], and call that from the AM.

Yeah that was exactly the reasoning. Helper function works for me (will 
check what Álvaro's suggested, maybe it can be moved somewhere and reused).

>
>>     seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>
>>     for (i = 0; i < count; i++)
>>     {
>>         if (pg_strcasecmp(keys[i], "last_value") == 0)
>>             seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>
>> CStringGetDatum(values[i])));
>>         else if (pg_strcasecmp(keys[i], "is_called") == 0)
>>             seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>
>> CStringGetDatum(values[i])));
>>         else
>>             ereport(ERROR,
>>                     (errcode(ERRCODE_SYNTAX_ERROR),
>>                      errmsg("invalid state key \"%s\" for local
>> sequence",
>>                             keys[i])));
>>     }
>>
>>     sequence_save_tuple(seqh, NULL, true);
>
> If that error happens after having already processed a "last_value" or
> "is_called" entry, you have already modified the on-disk tuple. AFAICS
> that's the only instance of that bug, but sequence_read_tuple - modify
> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
> If you modify a tuple directly in a Buffer, you should have a critical
> section around it. It would make sense to start a critical section in
> sequence_read_tuple(), except that not all callers want to modify the
> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
> functions should be split into two:
>
> sequence_read_tuple_for_update()
> sequence_save_tuple()

Agreed, however I am much more concerned about the way 
seqam_local_alloc() works, see the XXX comment there. I am thinking it's 
not really safe that way, but problem is that we can't put critical 
section around it currently as the sequence_save_tuple() can potentially 
call GetTopTransactionId(). Separating it into more functions might be 
solution. Except the split into two does not really help there, it would 
have to be something like this:
sequence_tuple_update_start() - does the GetTopTransactionId bit and 
starts critical section
sequence_tuple_save() - saves the tuple/does wal
sequence_tuple_update_finish() - ends the critical section

That looks slightly cumbersome but I don't really have better idea.

>
> * seqam_local_get_state() calls sequence_read_tuple() but not
> sequence_release_tuple(). It looks like sequence_close() releases the
> tuple and the buffer, but that seems sloppy. sequence_close() probably
> shouldn't be calling sequence_release_tuple at all, so that you'd get a
> warning at the end of transaction about the buffer leak.
>

Do you mean that you want to make call to sequence_release_tuple() 
mandatory when sequence_read_tuple() was called?


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 22/04/15 22:01, Petr Jelinek wrote:
> On 20/04/15 17:50, Heikki Linnakangas wrote:
>>>
>> I finally got around to have another round of review on this. I fixed a
>> couple of little bugs, did some minor edition on comments etc. See
>> attached. It is also available in my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
>> if you want to look at individual changes. It combines your patches 1
>> and 4, I think those need to be applied together. I haven't looked at
>> the DDL changes yet.
>
> Thanks!

I merged all your changes and merged the patch#1 with patch#4.

Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam
and gave you (Heikki) commit rights there in case you want to change
anything.

>> The division of labour between sequence.c and the AM, in the init and
>> the get/set_state functions, is a bit more foggy:
>>
>> * Do we really need a separate amoptions() method and an init() method,
>> when the only caller to amoptions() is just before the init() method?
>> The changes in extractRelOptions suggest that it would call
>> am_reloptions for sequences too, but that doesn't actually seem to be
>> happening.
>
> Hmm yes it should and I am sure it did at some point, must have messed
> it during one of the rebases :(
>
> And it's the reason why we need separate API function.
>

Actually the reloption handling was broken in more than one place, I
fixed it all around. I had to teach the heap_create_with_catalog about
relam but the change does not seem to be too intrusive to me so I think
it's fine. We no longer do the update of pg_class tuple after the
sequence was created anymore and the relcache works correctly now. Also
the whole reloption handling was moved to more proper places than the
seqam_init so it's more in line with how it works for other relation kinds.

>>
>> postgres=# create sequence fooseq using local with (garbage=option);
>> CREATE SEQUENCE
>>
>> Invalid options probably should throw an error.
>>

Done, well actually the local sequence now throws error on any options.
I also updated CREATE SEQUENCE and ALTER SEQUENCE docs with the
reloptions syntax.

>> * Currently, the AM's init function is responsible for basic sanity
>> checking, like min < max. It also has to extract the RESTART value from
>> the list of options. I think it would make more sense to move that to
>> sequence.c. The AM should only be responsible for initializing the
>> 'amdata' portion, and for handling any AM-specific options. If the AM
>> doesn't support some options, like MIN and MAX value, it can check them
>> and throw an error, but it shouldn't be responsible for just passing
>> them through from the arguments to the sequence tuple.
>

I now do the checking in sequence.c where possible, the init got
generally redone to accept the newly created tuple and restart value
parameters.

One thing that I am not sure how much like is that if the AM wants to
change the amdata in the init, it now has to modify the tuple and free
the old one.

>
>>
>> * The transformations of the arrays in get_state() and set_state()
>> functions are a bit complicated. The seqam_get_state() function returns
>> two C arrays, and pg_sequence_get_state() turns them into a text[]
>> array. Why not construct the text[] array directly in the AM? I guess
>> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
>> that, but if that's an issue, you could create generic helper function
>> to turn two C string arrays into text[], and call that from the AM.
>
> Yeah that was exactly the reasoning. Helper function works for me (will
> check what Álvaro's suggested, maybe it can be moved somewhere and reused).
>

Didn't use Álvaro's code in the end as ISTM working directly with arrays
is simple enough for this use case. The only slightly ugly part is the
use of TextDatumGetCString/CStringGetDatum but I think that's survivable
(maybe the sequence.c could convert the TEXT[] to CSTRING[] or just char
*[] as this is not performance critical code path).

>>
>>>     seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>>
>>>     for (i = 0; i < count; i++)
>>>     {
>>>         if (pg_strcasecmp(keys[i], "last_value") == 0)
>>>             seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>>
>>> CStringGetDatum(values[i])));
>>>         else if (pg_strcasecmp(keys[i], "is_called") == 0)
>>>             seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>>
>>> CStringGetDatum(values[i])));
>>>         else
>>>             ereport(ERROR,
>>>                     (errcode(ERRCODE_SYNTAX_ERROR),
>>>                      errmsg("invalid state key \"%s\" for local
>>> sequence",
>>>                             keys[i])));
>>>     }
>>>
>>>     sequence_save_tuple(seqh, NULL, true);
>>
>> If that error happens after having already processed a "last_value" or
>> "is_called" entry, you have already modified the on-disk tuple. AFAICS
>> that's the only instance of that bug, but sequence_read_tuple - modify
>> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
>> If you modify a tuple directly in a Buffer, you should have a critical
>> section around it. It would make sense to start a critical section in
>> sequence_read_tuple(), except that not all callers want to modify the
>> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
>> functions should be split into two:
>>
>> sequence_read_tuple_for_update()
>> sequence_save_tuple()
>
> Agreed, however I am much more concerned about the way
> seqam_local_alloc() works, see the XXX comment there. I am thinking it's
> not really safe that way, but problem is that we can't put critical
> section around it currently as the sequence_save_tuple() can potentially
> call GetTopTransactionId(). Separating it into more functions might be
> solution. Except the split into two does not really help there, it would
> have to be something like this:
> sequence_tuple_update_start() - does the GetTopTransactionId bit and
> starts critical section
> sequence_tuple_save() - saves the tuple/does wal
> sequence_tuple_update_finish() - ends the critical section
>
> That looks slightly cumbersome but I don't really have better idea.
>

So I went with something like what was proposed:
- sequence_start_update(seqh, do_wal) - prepares the state and starts
the critical section
- sequence_apply_update(seqh, do_wal) - does buffer dirtying and wal logging
- sequence_finish_update(seqh) - closes the critical section and cleans
up the state

The API might look slightly hairy now but I don't see much difference
between having to do:
START_CRIT_SECTION()
sequence_update_tuple()
END_CRIT_SECTION()
and the above, except the above also solves the xid acquiring and shows
less internals.

I also added sequence_swap_tuple() for the use-case when the AM does not
want to do inline updates because having this separate makes things
easier and less ugly. And once you are changing whole tuple you are
already in slow/complex code path so one more function call does not matter.

>> * seqam_local_get_state() calls sequence_read_tuple() but not
>> sequence_release_tuple(). It looks like sequence_close() releases the
>> tuple and the buffer, but that seems sloppy. sequence_close() probably
>> shouldn't be calling sequence_release_tuple at all, so that you'd get a
>> warning at the end of transaction about the buffer leak.
>>

I did this (requiring the sequence_release_tuple() always) and
documented it, but personally don't like it much as it just adds one
more call to the API which already has enough of them and I really don't
see the advantage of it.


Other than things mentioned above I rebased it on current master (the
transforms commit produced several issues). And fixed several bugs like
checking min/max value in set_state, proper initialization of session
cached increment_by, etc.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Alvaro Herrera
Date:
Heikki, do you have time to go through this at this point?

Petr Jelinek wrote:

> I merged all your changes and merged the patch#1 with patch#4.
>
> Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam and
> gave you (Heikki) commit rights there in case you want to change anything.

I rebased your #1 to current master; attached.  It builds and passes
regression test, but I didn't check any further.  Doc build fails with:

$ make check
onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s
/pgsql/source/master/doc/src/sgml/postgres.sgml
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open
Makefile:320: recipe for target 'check' failed
make: *** [check] Error 1

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
> Heikki, do you have time to go through this at this point?

I'm afraid I won't :-(. I did intend to, but looking at the calendar, I 
won't have the time to review this thoroughly enough to commit. Sorry.

I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax 
patch at all yet.

We discussed using a single amdata column vs. any number of am-specific 
columns. We settled on amdata, but I'm still not 100% convinced that's 
the best approach. Just as a data point, this removes the log_cnt field 
and moves it into amdata in a non-human-readable format. So for someone 
who only uses the local seqam, this just makes things slightly worse. 
For more complicated seqam's, it would be even more important to display 
the state in a human-readable format. Perhaps it's OK that each seqam 
provides its own functions or similar to do that, but I'd like to 
revisit that decision.

I still don't like the serial_sequenceam GUC. Not sure what to do 
instead. Needs some thought.

- Heikki




Re: Sequence Access Method WIP

From
Simon Riggs
Date:
On 13 May 2015 at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
Heikki, do you have time to go through this at this point?

I'm afraid I won't :-(. I did intend to, but looking at the calendar, I won't have the time to review this thoroughly enough to commit. Sorry.

I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax patch at all yet.

We discussed using a single amdata column vs. any number of am-specific columns. We settled on amdata, but I'm still not 100% convinced that's the best approach. Just as a data point, this removes the log_cnt field and moves it into amdata in a non-human-readable format. So for someone who only uses the local seqam, this just makes things slightly worse. For more complicated seqam's, it would be even more important to display the state in a human-readable format. Perhaps it's OK that each seqam provides its own functions or similar to do that, but I'd like to revisit that decision.

I still don't like the serial_sequenceam GUC. Not sure what to do instead. Needs some thought.

This has had around 2 years of thought at this point. I don't agree it needs more thought.

There is one clear use case for this and it is of benefit to many distributed architectures.

I don't see what calamity will occur if we commit this. If you don't want a sequence AM, don't ever use this.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 13/05/15 12:56, Heikki Linnakangas wrote:
> On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
>> Heikki, do you have time to go through this at this point?
>
> I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
> won't have the time to review this thoroughly enough to commit. Sorry.
>
> We discussed using a single amdata column vs. any number of am-specific
> columns. We settled on amdata, but I'm still not 100% convinced that's
> the best approach. Just as a data point, this removes the log_cnt field
> and moves it into amdata in a non-human-readable format. So for someone
> who only uses the local seqam, this just makes things slightly worse.
> For more complicated seqam's, it would be even more important to display
> the state in a human-readable format. Perhaps it's OK that each seqam
> provides its own functions or similar to do that, but I'd like to
> revisit that decision.
>

I do think it's ok for seqam to provide functions that can give you the 
data in human readable form.

I think main argument against custom human readable columns is that it 
will kill any possibility to have common storage for sequences.

There is also additional complexity in the API required for that.

The performance implications are probably small as one could still 
define opaque bytea column and store the data same way a now.

> I still don't like the serial_sequenceam GUC. Not sure what to do
> instead. Needs some thought.
>

I think it would be ok if this issue was solved by follow-up patch at 
later time.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
On 05/13/2015 02:12 PM, Simon Riggs wrote:
> This has had around 2 years of thought at this point. I don't agree it
> needs more thought.

Noted.

> There is one clear use case for this and it is of benefit to many
> distributed architectures.

Right. What's your point?

> I don't see what calamity will occur if we commit this. If you don't want a
> sequence AM, don't ever use this.

I'd like the API to be good for its purpose. Also, I did mention that 
the the current patch makes the situation slightly worse for people who 
never use this: it makes the log_cnt field non human-readable. That's a 
really minor thing, but it shows that it *does* matter how this is 
implemented, even if you only ever use the local seq AM.

- Heikki




Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 13/05/15 14:15, Heikki Linnakangas wrote:
>> I don't see what calamity will occur if we commit this. If you don't
>> want a
>> sequence AM, don't ever use this.
>
> I'd like the API to be good for its purpose. Also, I did mention that
> the the current patch makes the situation slightly worse for people who
> never use this: it makes the log_cnt field non human-readable. That's a
> really minor thing, but it shows that it *does* matter how this is
> implemented, even if you only ever use the local seq AM.
>

It definitely does matter.

I don't think we'll find perfect compromise here though, you can either 
do it one way or the other. Trust me it does not make me happy either, I 
like perfect solutions too, but when there is lack of perfect solution I 
prefer the simpler one.

Both of the solutions have drawbacks - amdata has opaque blob which does not store data in user visible way 
but that can be worked around by providing function that shows it in 
human readable way (and the dump function for many sequence types 
actually does that). - multiple columns basically kill any future ability to unify the 
storage for sequences and also adds complexity, especially around alter 
table (since it means drop/add column and stuff)

But I already wrote both versions anyway so from that perspective it 
does not matter much which part we merge.

(As a side-node I would have preferred to have this discussion earlier 
than 2 days before feature freeze because the current implementation is 
something that we agreed on several months ago so there was plenty of 
time to revisit that decision.)

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 13/05/15 06:10, Alvaro Herrera wrote:
>
> I rebased your #1 to current master; attached.  It builds and passes

Thanks

> regression test, but I didn't check any further.  Doc build fails with:
>
> $ make check
> onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s
/pgsql/source/master/doc/src/sgml/postgres.sgml
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open
> Makefile:320: recipe for target 'check' failed
> make: *** [check] Error 1
>

Missing <para>, attached fixed version with your rebase and one of the
commits from Heikki that I missed last time.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Vik Fearing
Date:
I've been looking at these patches a bit and here are some comments:

Patch 1: seqam

I would like to see an example in the docs for CREATE SEQUENCE.  That's
perhaps not possible (or desirable) with only the "local" seqam?  Not sure.

In the docs for pg_class, there is no mention that relam refers to
pg_seqam for sequences but pg_am for all other types.

There are errant half-sentences in the documentation, for example "to
the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in
Sequence Access Method Functions.

I'd prefer a README instead of the long comment at the start of seqam.c.The other ams do that.

As mentioned upthread, this patch isn't a seamless replacement for
what's already there because of the amdata field.  I wasn't part of the
conversation of FOSDEM unfortunately, and there's not enough information
in this thread to know why this solution is preferred over each seqam
having its own table type with all the columns it needs.  I see that
Heikki is waffling a bit between the two, and I have a fairly strong
opinion that amdata should be split into separate columns.  The patch
already destroys and recreates what it needs when changing access method
via ALTER SEQUENCE, so I don't really see what the problem is.

There is no psql tab completion for the new USING clause in ALTER
SEQUENCE, and in looking at that I noticed we don't have tab completion
for CREATE SEQUENCE at all.  I know we don't complete everything, but if
we're going to do ALTER, I think we should do CREATE.  I'll submit a
patch for that on its own thread, but then this patch will need to
modify it to include USING.

There is no \d command for sequence access methods.  Without querying
pg_seqam directly, how does one discover what's available?

There are some unfinished comments, such as "When relam is specified,
record dependency on the" in heap.c.

Comments and code in pg_dump.c check that the server is at least 9.5.
Those'll need to be changed to at least 9.6.

Patch 2: seqam ddl

When defining a new access method for sequences, it is possible to list
the arguments multiple times (last one wins).  Other defel loops raise
an error if the argument is specified more than once.  I haven't looked
at all of such loops to see if this is the only odd man out or not, but
I prefer the error behavior.

It doesn't seem possible to create a comment for a seqam, is that
intentional?  Ah, after more review I see it is possible, just not
documented.  I think that needs to be corrected.

Same comment as above about testing for 9.5.

Patch 3: gapless_seq

I really like the idea of having a gapless sequence in contrib.
Although it has big potential to be abused, doing it manually when it's
needed (like for invoices, at least in France) is a major pain.  So big
+1 for including this.

However, the patch doesn't update the main contrib Makefile so you have
to compile it explicitly.  It also doesn't have the .sgml file in the
right place, so that's not installed either.

There is a FIXME in get_last_value_tup() which should probably be
removed in light of the code comment in catcache.c stating that pg_seqam
is too small to benefit from an index scan.


It would be nice to be able to specify an access method when declaring a
serial or bigserial column.  This could be a separate patch later on,
though.


On the whole, I think this is a pretty good patchset.  Aside from the
design decision of whether amdata is a single opaque column or a set of
columns, there are only a few things that need to be changed before it's
ready for committer, and those things are mostly documentation.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 2015-06-15 11:32, Vik Fearing wrote:
> I've been looking at these patches a bit and here are some comments:
>

Thanks for looking at this.

> Patch 1: seqam
>
> I would like to see an example in the docs for CREATE SEQUENCE.  That's
> perhaps not possible (or desirable) with only the "local" seqam?  Not sure.
>

It is possible to have example with local seqam, it might be confusing 
though, given it produces same results as not putting USING in the query.

> In the docs for pg_class, there is no mention that relam refers to
> pg_seqam for sequences but pg_am for all other types.
>
> There are errant half-sentences in the documentation, for example "to
> the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in
> Sequence Access Method Functions.

I think that's the side effect of all the rebases and rewrites over the 
2y(!) that this has been going forward and back. It can be easily fixed 
by proof reading before final submission. I didn't pay too much 
attention yet because it's not clear how the docs should look like if 
there is no real agreement on the api. (This applies to other comments 
about docs as well)

>
> I'd prefer a README instead of the long comment at the start of seqam.c.
>   The other ams do that.
>

OK, since things have been moved to separate directory, README is 
doable, I personally prefer the docs in the main .c file usually but I 
know project uses README sometimes for this.

> As mentioned upthread, this patch isn't a seamless replacement for
> what's already there because of the amdata field.  I wasn't part of the
> conversation of FOSDEM unfortunately, and there's not enough information
> in this thread to know why this solution is preferred over each seqam
> having its own table type with all the columns it needs.  I see that
> Heikki is waffling a bit between the two, and I have a fairly strong
> opinion that amdata should be split into separate columns.  The patch
> already destroys and recreates what it needs when changing access method
> via ALTER SEQUENCE, so I don't really see what the problem is.
>

FOSDEM was just about agreeing that amdata is simpler after we discussed 
it with Heikki. Nothing too important you missed there I guess.

I can try to summarize what are the differences:
- amdata is somewhat simpler in terms of code for both init, alter and 
DDL, since with custom columns you have to specify them somehow and deal 
with them in catalog, also ALTER SEQUENCE USING means that there are 
going to be colums marked as deleted which produces needless waste, etc
- amdata make it easier to change the storage model as the tuple 
descriptor is same for all sequences
- the separate columns are much nicer from user point of view
- my opinion is that separate columns also more nicely separate state 
from options and I think that if we move to separate storage model, 
there can be state table per AM which solves the tuple descriptor issue
- there is probably some slight performance benefit to amdata but I 
don't think it's big enough to be important

I personally have slight preference to separate columns design, but I am 
ok with both ways honestly.

>
> There is no \d command for sequence access methods.  Without querying
> pg_seqam directly, how does one discover what's available?
>

Good point.

>
> Patch 2: seqam ddl
>
> When defining a new access method for sequences, it is possible to list
> the arguments multiple times (last one wins).  Other defel loops raise
> an error if the argument is specified more than once.  I haven't looked
> at all of such loops to see if this is the only odd man out or not, but
> I prefer the error behavior.
>

Hmm yeah, there should be error. I think only tsearch doesn't enforce 
errors from the existing stuff, should probably be fixed as well 
(separately of course).

>
> Patch 3: gapless_seq
>
> I really like the idea of having a gapless sequence in contrib.
> Although it has big potential to be abused, doing it manually when it's
> needed (like for invoices, at least in France) is a major pain.  So big
> +1 for including this.
>

Yeah, people make gapless sequences regardless, it's better to provide 
them one that behaves correctly, also it's quite good test for the API.

>
> It would be nice to be able to specify an access method when declaring a
> serial or bigserial column.  This could be a separate patch later on,
> though.
>

The patch originally had GUC for this, but Heikki didn't like it so it's 
left for the future developments.

>
> On the whole, I think this is a pretty good patchset.  Aside from the
> design decision of whether amdata is a single opaque column or a set of
> columns, there are only a few things that need to be changed before it's
> ready for committer, and those things are mostly documentation.
>

Unfortunately the amdata being opaque vs set of columns is the main 
issue here.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Heikki Linnakangas
Date:
So, we have this patch in the commitfest again. Let's see where we are, 
and try to find a consensus on what needs to be done before this can be 
committed.

On 06/17/2015 06:51 PM, Petr Jelinek wrote:
> On 2015-06-15 11:32, Vik Fearing wrote:
>> I've been looking at these patches a bit and here are some comments:
>
> Thanks for looking at this.

+1, thanks Vik.

>> As mentioned upthread, this patch isn't a seamless replacement for
>> what's already there because of the amdata field.  I wasn't part of the
>> conversation of FOSDEM unfortunately, and there's not enough information
>> in this thread to know why this solution is preferred over each seqam
>> having its own table type with all the columns it needs.  I see that
>> Heikki is waffling a bit between the two, and I have a fairly strong
>> opinion that amdata should be split into separate columns.  The patch
>> already destroys and recreates what it needs when changing access method
>> via ALTER SEQUENCE, so I don't really see what the problem is.
>
> FOSDEM was just about agreeing that amdata is simpler after we discussed
> it with Heikki. Nothing too important you missed there I guess.
>
> I can try to summarize what are the differences:
> - amdata is somewhat simpler in terms of code for both init, alter and
> DDL, since with custom columns you have to specify them somehow and deal
> with them in catalog, also ALTER SEQUENCE USING means that there are
> going to be colums marked as deleted which produces needless waste, etc
> - amdata make it easier to change the storage model as the tuple
> descriptor is same for all sequences
> - the separate columns are much nicer from user point of view
> - my opinion is that separate columns also more nicely separate state
> from options and I think that if we move to separate storage model,
> there can be state table per AM which solves the tuple descriptor issue
> - there is probably some slight performance benefit to amdata but I
> don't think it's big enough to be important
>
> I personally have slight preference to separate columns design, but I am
> ok with both ways honestly.

Regarding the amdata issue, I'm also leaning towards set of columns. 
I've felt that way all along, but not very strongly, so I relented at 
some point when Andres felt strongly that a single column would be 
better. But the more I think about it, the more I feel that separate 
columns really would be better. As evidence, I offer this recent thread:

Tom said 
(http://www.postgresql.org/message-id/8739.1436893588@sss.pgh.pa.us):
> I really don't see what's wrong with "SELECT last_value FROM sequence",
> especially since that has worked in every Postgres version since 6.x.
> Anyone slightly worried about backwards compatibility wouldn't use
> an equivalent function even if we did add one.

If we went with the single amdata column, that would break. Or we'd need 
to leave last_value as a separate column anyway, and leave it unused for 
sequence AMs where it's not applicable. But that's a bit ugly too.

Jim Nasby said in the same thread:
> FWIW, I think it'd be better to have a pg_sequences view that's the
> equivalent of SELECT * FROM <sequence> for every sequence in the
> database. That would let you get whatever info you needed.

Creating such a view would be difficult if all the sequences have a 
different set of columns. But when you think about it, it's not really 
any better with a single amdata column. You can't easily access the data 
in the amdata column that way either.

Anyway, that's my opinion. Several others have weighed in to support 
separate columns, too, so I think that is the consensus. Separate 
columns it is.

>> There is no \d command for sequence access methods.  Without querying
>> pg_seqam directly, how does one discover what's available?
>
> Good point.

Well, you can query pg_seqam. I don't think this deserves a \d command.

>> On the whole, I think this is a pretty good patchset.  Aside from the
>> design decision of whether amdata is a single opaque column or a set of
>> columns, there are only a few things that need to be changed before it's
>> ready for committer, and those things are mostly documentation.
>
> Unfortunately the amdata being opaque vs set of columns is the main
> issue here.

There was discussion on another thread on how the current sequence AM 
API is modeled after the indexam API, at 
http://www.postgresql.org/message-id/3896.1437059303@sss.pgh.pa.us. Will 
need to do something about that too.

Petr, is this enough feedback on this patch for this commitfest, or are 
there some other issues you want to discuss before I mark this as returned?

- Heikki




Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
> Petr, is this enough feedback on this patch for this commitfest, or are
> there some other issues you want to discuss before I mark this as returned?
>

You can mark it as returned, I didn't have much time to actually do much 
useful work on this in the current CF.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Thom Brown
Date:
On 28 July 2015 at 19:51, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 2015-07-28 20:11, Heikki Linnakangas wrote:

Petr, is this enough feedback on this patch for this commitfest, or are
there some other issues you want to discuss before I mark this as returned?


You can mark it as returned, I didn't have much time to actually do much useful work on this in the current CF.

Is this now dependant on the work Alexander Korotkov is doing on the AM interface?

Thom

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 2015-09-16 13:21, Thom Brown wrote:
> On 28 July 2015 at 19:51, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
>     On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
>
>         Petr, is this enough feedback on this patch for this commitfest,
>         or are
>         there some other issues you want to discuss before I mark this
>         as returned?
>
>
>     You can mark it as returned, I didn't have much time to actually do
>     much useful work on this in the current CF.
>
>
> Is this now dependant on the work Alexander Korotkov is doing on the AM
> interface?
>

Yes.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Hi,

here is another try at this. I had some more time to think about various
corner cases (like the sequence not having TOAST).

I decided for this version to try bit different approach and that is to
use custom datatype for the sequence access method state. So there is
one extra amstate column and the access method defines the custom type.

The idea is that the custom type should be fixed width so that we can do
proper checks during the creation of access method for size and stuff.
It also makes the interface bit cleaner as we can provide method for
reading and saving of the state and abstract the internals more nicely.
It also solves the issues with critical sections vs memory allocations
when the access methods wants to save completely new state.

It also means that get_state and set_state interfaces just work with
text representation of custom type, no need to have complex conversions
to arrays and back.

One downside is that this means access methods can't use composite types
as those are not fixed width (nor plain) so the names of the individual
items in the output are somewhat hidden (see the modified create_view
test for example of how this affects things).

Other than that, this is based on the new am api by Alexander Korotkov
[1]. It extends it by adding another column called amkind to the pg_am
which can have either value "i" for index or "S" for sequence (same as
relkind in pg_class for those).

I didn't attach DDL and the gapless sequence yet, mainly because I don't
want to waste time rewriting it again in case we can't agree that this
approach is good (based on the long discussion and resulting several
rewrites wrt the multiple columns vs bytea previously).

[1] https://commitfest.postgresql.org/8/336/

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Craig Ringer
Date:

On 1 January 2016 at 07:51, Petr Jelinek <petr@2ndquadrant.com> wrote:

Other than that, this is based on the new am api by Alexander Korotkov [1]. It extends it by adding another column called amkind to the pg_am which can have either value "i" for index or "S" for sequence (same as relkind in pg_class for those).


Setting waiting-on-author in the CF app.

The good news is that the commit of the pg_am rework greatly eases the path of this patch into core.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Sequence Access Method WIP

From
Craig Ringer
Date:
Needs rework after the commit of https://commitfest.postgresql.org/8/336/

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 18 January 2016 at 09:19, Craig Ringer <craig@2ndquadrant.com> wrote:
> Needs rework after the commit of https://commitfest.postgresql.org/8/336/

Here is version that applies to current master. There is some work to
do (mostly cleanup) and the DDL is missing, but that's because I want
to know what people think about the principle of how it works now and
if it makes sense to finish it this way (explained in the original
submission for Jan CF).

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 18 January 2016 at 09:19, Craig Ringer <craig@2ndquadrant.com> wrote:
> > Needs rework after the commit of https://commitfest.postgresql.org/8/336/
> 
> Here is version that applies to current master. There is some work to
> do (mostly cleanup) and the DDL is missing, but that's because I want
> to know what people think about the principle of how it works now and
> if it makes sense to finish it this way (explained in the original
> submission for Jan CF).

I would guess that the DDL boilterplate should come from Alexander
Korotkov's patch, right?  I think a first easy step may be to combine
parts both patches so that we get the "amkind" column from this patch
and the DDL support from Alexander's patch (means that his proposed
command needs a new clause to specify the amkind); then the really tough
stuff in both Alexander's and this patch can be rebased on top of that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Sequence Access Method WIP

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I would guess that the DDL boilterplate should come from Alexander
> Korotkov's patch, right?  I think a first easy step may be to combine
> parts both patches so that we get the "amkind" column from this patch
> and the DDL support from Alexander's patch (means that his proposed
> command needs a new clause to specify the amkind);

Uh, what?  Surely we would provide a bespoke command for each possible
sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
that the provided function has the right signature, and then it would put
the correct amkind into the pg_am entry automatically.

If we skimp on this infrastructure then we're just relying on the
user not to make a typo, which doesn't seem like a good idea.

(Agreed though that a reasonable first step would be to add amkind to
pg_am and make the appropriate adjustments to existing code, ie check
that amkind is correct when fetching an index handler.  I considered
putting that into the AM API refactor patch, but desisted.)
        regards, tom lane



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> I would guess that the DDL boilterplate should come from Alexander
>> Korotkov's patch, right?  I think a first easy step may be to combine
>> parts both patches so that we get the "amkind" column from this patch
>> and the DDL support from Alexander's patch (means that his proposed
>> command needs a new clause to specify the amkind);
>
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.
>
> If we skimp on this infrastructure then we're just relying on the
> user not to make a typo, which doesn't seem like a good idea.
>

Yeah it has to be completely separate command for both that do
completely different sanity checks. It can't even be called CREATE
INDEX/SEQUENCE ACCESS METHOD unless we are willing to make ACCESS a
keyword due to preexisting CREATE INDEX/SEQUENCE <name> commands. The
previous version of the patch which used somewhat different underlying
catalog structure already had DDL and honestly writing the DDL part is
quite easy. I used CREATE ACCESS METHOD FOR SEQUENCE there.

> (Agreed though that a reasonable first step would be to add amkind to
> pg_am and make the appropriate adjustments to existing code, ie check
> that amkind is correct when fetching an index handler.  I considered
> putting that into the AM API refactor patch, but desisted.)
>

Well I was wondering how to handle this as well, the submitted patch
currently just adds Asserts, because IMHO the actual ERROR should be
thrown in the DDL not in the code that just uses it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> > Uh, what?  Surely we would provide a bespoke command for each possible
> > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Sequence Access Method WIP

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.

> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.

That seems okay.  I had the impression you were proposingCREATE ACCESS METHOD foobar TYPE 'i' USING functionname
or something like that, where it would be totally up to the user that
the amkind matched the function.  That seems unnecessarily error-prone,
not to mention that it would close the door forever on any hope that
we might allow non-superusers to execute such commands.
        regards, tom lane



Re: Sequence Access Method WIP

From
Alexander Korotkov
Date:
On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> > Uh, what?  Surely we would provide a bespoke command for each possible
> > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

Ok! Let's nail down the syntax and I can integrate it into my createam patch.
I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER handler;", but I don't insist.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Sequence Access Method WIP

From
Tom Lane
Date:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>> like that.

> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
> handler;", but I don't insist.

I think that Alvaro's idea is less likely to risk future grammar
conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
[ CONCURRENTLY ] to make me really wary of variables in the statement-name
part of the syntax.
        regards, tom lane



Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>> wrote:
>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>> like that.
>
>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>> handler;", but I don't insist.
>
> I think that Alvaro's idea is less likely to risk future grammar
> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
> part of the syntax.

Strong +1.  If we put the type of access method immediately after
CREATE, I'm almost positive we'll regret it for exactly that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 29 January 2016 at 23:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>>> wrote:
>>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>>> like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1.  If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Robert Haas
Date:
On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.

I missed your proposal, I guess, so please don't take as having any
position on whether it's better or worse than Alvaro's.  I was only
intending to vote for the proposition that the type of access method
should follow the name of the access method.

> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.

Agreed.

> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.

Generic WAL logging seems like a *great* idea to me.  But I think it's
largely independent from the access method stuff.  If we have generic
WAL logging, people can create WAL-logged stuff that is not a new
access method.  If we have access methods, they can create new access
methods that are not WAL-logged.  If we have both, then they can
create WAL-logged access methods which of course is the payoff pitch,
but I don't see it as necessary or desirable for the two systems to be
tied together in any way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 30 January 2016 at 13:48, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
>> conflict now, that's why my proposal was different, I didn't want to
>> add more keywords. I think Alvaro's proposal is fine as well.
>
> I missed your proposal, I guess, so please don't take as having any
> position on whether it's better or worse than Alvaro's.  I was only
> intending to vote for the proposition that the type of access method
> should follow the name of the access method.
>

No worries didn't mean it that way.

>> In any case this is slightly premature IMHO as DDL is somewhat unless
>> until we have sequence access methods implementation we can agree on,
>> or the generic WAL logging so that custom indexes can be crash safe.
>
> Generic WAL logging seems like a *great* idea to me.  But I think it's
> largely independent from the access method stuff.  If we have generic
> WAL logging, people can create WAL-logged stuff that is not a new
> access method.  If we have access methods, they can create new access
> methods that are not WAL-logged.  If we have both, then they can
> create WAL-logged access methods which of course is the payoff pitch,
> but I don't see it as necessary or desirable for the two systems to be
> tied together in any way.

Okay, I am only debating the usefulness of DDL for access methods in
the current situation where we only have custom index access methods
which can't create WAL records. In other words trying to nudge people
slightly back towards the actual patch(es).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Alexander Korotkov
Date:
On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 29 January 2016 at 23:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>>> wrote:
>>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>>> like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1.  If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

I've changed syntax of CREATE ACCESS METHOD syntax in the thread about index access method extendability.
Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New column amtype of pg_am stores access method type.
It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE HANDLER handler;".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 16/02/16 13:58, Alexander Korotkov wrote:
>
> I've changed syntax of CREATE ACCESS METHOD syntax in the thread about
> index access method extendability.
> Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New
> column amtype of pg_am stores access method type.
> http://www.postgresql.org/message-id/CAPpHfdu9gLN7kuicweGsp50CaAMWx8Q-JWzbPehc92rvFHzkeg@mail.gmail.com
> It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE
> HANDLER handler;".
>

Yes I've seen and I will send a review within couple of days.

But first here is updated patch for sequence access methods. I went with
the previously discussed custom type as this gives us proper control
over the width of the state and making sure that it's not gonna be
toastable, etc and we need this as sequence is limited to single page.

Attached are 3 separate files. The first one (0001-create-am) is mainly
separate for the reason that there is some overlap with Alexander's
index access methods patch, so I extracted common code into separate
patch as it will make it easier to merge in case we are lucky enough to
get these patches in (but it's still based on Alexander's code). It
provides infra for defining new access methods from SQL, although
without the parser and without any actual access methods.

The 0002-seqam provides the SQL interface and support for sequence
access methods on top of the infra patch, and also provides all the
sequence access methods infrastructure and abstraction and documentation.

And finally the 0003-gapless-seq is example contrib module that
implements dependably and transitionally safe gapless sequence access
method. It's obviously slow as it has to do locking and basically
serialize all the changes to sequence so only one transaction may use it
at a time but it's truly gapless. It also serves as an example of use of
the api and as a test.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
David Steele
Date:
On 3/4/16 11:09 PM, Petr Jelinek wrote:

> But first here is updated patch for sequence access methods. I went with
> the previously discussed custom type as this gives us proper control
> over the width of the state and making sure that it's not gonna be
> toastable, etc and we need this as sequence is limited to single page.
> 
> Attached are 3 separate files. The first one (0001-create-am) is mainly
> separate for the reason that there is some overlap with Alexander's
> index access methods patch, so I extracted common code into separate
> patch as it will make it easier to merge in case we are lucky enough to
> get these patches in (but it's still based on Alexander's code). It
> provides infra for defining new access methods from SQL, although
> without the parser and without any actual access methods.
> 
> The 0002-seqam provides the SQL interface and support for sequence
> access methods on top of the infra patch, and also provides all the
> sequence access methods infrastructure and abstraction and documentation.
> 
> And finally the 0003-gapless-seq is example contrib module that
> implements dependably and transitionally safe gapless sequence access
> method. It's obviously slow as it has to do locking and basically
> serialize all the changes to sequence so only one transaction may use it
> at a time but it's truly gapless. It also serves as an example of use of
> the api and as a test.

As far as I can see Petr has addressed all the outstanding issues in
this patch and it's ready for a review.  The patch applies with some
easily-fixed conflicts:

$ git apply -3 ../other/0002-seqam-2015-03-05.patch
error: patch failed: src/include/catalog/pg_proc.h:5225
Falling back to three-way merge...
Applied patch to 'src/include/catalog/pg_proc.h' with conflicts.

Simon, you are signed up to review.  Do you have an idea of when you'll
be doing that?

-- 
-David
david@pgmasters.net



Re: Sequence Access Method WIP

From
Alvaro Herrera
Date:
Petr Jelinek wrote:

> And finally the 0003-gapless-seq is example contrib module that implements
> dependably and transitionally safe gapless sequence access method. It's
> obviously slow as it has to do locking and basically serialize all the
> changes to sequence so only one transaction may use it at a time but it's
> truly gapless. It also serves as an example of use of the api and as a test.

I'm trying to figure out this one, and I think I found something very
surprising.  This code contains this

+#define GAPLESS_SEQ_NAMESPACE "gapless_seq"
+#define VALUES_TABLE_NAME "seqam_gapless_values"

which as far as I understand is something like a side table where values
for all the sequences are stored.  Is that right?  If it is, I admit I
didn't realize that these sequences worked in this way.  This seems
broken to me.  I had been under the impression that the values were
always stored in the sequence's relation.  Maybe I'm misunderstanding;
please explain how this works.


FWIW I find it annoying that this submission's 0001 patch and
Alexander's 0001 have so much in common, yet they aren't similar enough
to consider that either is the definite form.  Also, you have the SGML
docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both
versions isn't trivial.

Anyway I'm back at reviewing Alexander's 0001, which should be okay as a
basis for this series regardless of any edits I suggest there.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 19/03/16 23:02, Alvaro Herrera wrote:
> Petr Jelinek wrote:
>
>> And finally the 0003-gapless-seq is example contrib module that implements
>> dependably and transitionally safe gapless sequence access method. It's
>> obviously slow as it has to do locking and basically serialize all the
>> changes to sequence so only one transaction may use it at a time but it's
>> truly gapless. It also serves as an example of use of the api and as a test.
>
> I'm trying to figure out this one, and I think I found something very
> surprising.  This code contains this
>
> +#define GAPLESS_SEQ_NAMESPACE "gapless_seq"
> +#define VALUES_TABLE_NAME "seqam_gapless_values"
>
> which as far as I understand is something like a side table where values
> for all the sequences are stored.  Is that right?  If it is, I admit I
> didn't realize that these sequences worked in this way.  This seems
> broken to me.  I had been under the impression that the values were
> always stored in the sequence's relation.  Maybe I'm misunderstanding;
> please explain how this works.
>

Yes, I did explain in the original submission that added gapless 
sequence, which is about million messages upthread so it's 
understandable that it's hard to follow.

The sequence am API is non-transactional (and does not use MVCC) as 
sequences are non-trasactional and this provides significant performance 
boost. But the gapless sequence needs to handle things like 
subtransactions and rollbacks so it needs MVCC, for this reasons it also 
uses external table to handle that, the locking and stuff is still 
handled using normal sequence api but the MVCC part is handled by side 
table yes.

Just as a note to you and anybody who wasn't following in the beginning 
of this quite long thread - one of the goals of the sequence am api was 
to abstract the actual storage and WAL logging of the sequences away 
from the extension, so there is no generic WAL or anything like the 
index am provides here. At this point though, it might be worth 
reevaluating that approach if the generic WAL gets in.

>
> FWIW I find it annoying that this submission's 0001 patch and
> Alexander's 0001 have so much in common, yet they aren't similar enough
> to consider that either is the definite form.  Also, you have the SGML
> docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both
> versions isn't trivial.

Well it's relatively annoying for the patch author as well. I tried to 
write it so that it's as easy to merge as possible - the common code is 
in my 0001, except for the SQL (gram.y, docs) because the SQL does not 
have any meaning until either indexam or seqeunceam gets in (which also 
makes it impossible to submit as separate infrastructure patch, because 
there is no point of having the code in without the stuff that either 
indexam or sequenceam provides on top of it).

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Hi,

I rebased this on top of the recently committed CREATE ACCESS METHOD.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:
<div dir="ltr"><div class="gmail_extra">On Thu, Mar 24, 2016 at 6:12 PM, Petr Jelinek <<a
href="mailto:petr@2ndquadrant.com"target="_blank">petr@2ndquadrant.com</a>> wrote:<br />><br />> Hi,<br
/>><br/>> I rebased this on top of the recently committed CREATE ACCESS METHOD.<br />><br /><br /></div><div
class="gmail_extra">Hi,<br/><br /></div><div class="gmail_extra">I got the above error trying to apply to the current
master:<br/><br />$ git apply /home/fabrizio/Downloads/0001-seqam-2016-03-24.patch<br />error: patch failed:
src/backend/commands/amcmds.c:29<br/>error: src/backend/commands/amcmds.c: patch does not apply<br /><br /><br
/></div><divclass="gmail_extra">There are a wrong definition at the beginning of the amcmds.c:<br /><br /> 34
<<<<<<<ours<br /> 35 static Oid  lookup_index_am_handler_func(List *handler_name, char amtype);<br
/> 36static const char *get_am_type_string(char amtype);<br /> 37 =======<br /> 38 static Oid 
lookup_am_handler_func(List*handler_name, char amtype);<br /> 39 static char *get_am_type_string(char amtype);<br /> 40
>>>>>>>theirs<br /><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">After
thissmall fix I can build and ran regress tests without errors.<br /><br />But running "check-world" I got the
error:<br/><br />make[1]: Leaving directory `/data/postgresql/src/test/regress'<br />make: Leaving directory
`/data/postgresql'<br/>+ pg_dumpall -f /data/postgresql/src/bin/pg_upgrade/tmp_check/dump1.sql<br />ok 9 - dropuser
foobar1exit code 0<br />ok 10 - SQL DROP ROLE run: SQL found in server log<br />ok 11 - fails with nonexistent user<br
/>ok<br/>t/<a href="http://080_pg_isready.pl">080_pg_isready.pl</a> ....... <br />1..10<br />ok 1 - pg_isready --help
exitcode 0<br />ok 2 - pg_isready --help goes to stdout<br />ok 3 - pg_isready --help nothing to stderr<br />ok 4 -
pg_isready--version exit code 0<br />ok 5 - pg_isready --version goes to stdout<br />ok 6 - pg_isready --version
nothingto stderr<br />ok 7 - pg_isready with invalid option nonzero exit code<br />ok 8 - pg_isready with invalid
optionprints error message<br />ok 9 - fails with no server running<br />pg_dump: [archiver (db)] query failed: ERROR: 
column"sequence_name" does not exist<br />LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br
/>              ^<br />pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, increment_by, CASE WHEN
increment_by> 0 AND max_value = 9223372036854775807 THEN NULL      WHEN increment_by < 0 AND max_value = -1 THEN
NULL     ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL      WHEN
increment_by< 0 AND min_value = -9223372036854775807 THEN NULL      ELSE min_value END AS min_value, cache_value,
is_cycledFROM check_seq<br />pg_dumpall: pg_dump failed on database "regression", exiting<br />+
pg_dumpall1_status=1<br/>+ [ /data/postgresql != /data/postgresql ]<br />+
/data/postgresql/src/bin/pg_upgrade/tmp_check/install//home/fabrizio/pgsql/bin/pg_ctl-m fast stop<br />waiting for
serverto shut down.... done<br />server stopped<br />+ [ -n  ]<br />+ [ -n  ]<br />+ [ -n 1 ]<br />+ echo pg_dumpall of
pre-upgradedatabase cluster failed<br />pg_dumpall of pre-upgrade database cluster failed<br />+ exit 1<br />+ rm -rf
/tmp/pg_upgrade_check-3NUa0X<br/>make[2]: *** [check] Error 1<br />make[2]: Leaving directory
`/data/postgresql/src/bin/pg_upgrade'<br/>make[1]: *** [check-pg_upgrade-recurse] Error 2<br />make[1]: *** Waiting for
unfinishedjobs....<br /><br /><br /></div><div class="gmail_extra"><br />And testing pg_dump itself I got the same
errortrying to dump a database that contains a sequence.<br /><br />fabrizio=# create sequence x;<br />CREATE
SEQUENCE<br/>fabrizio=# \ds<br />          List of relations<br /> Schema | Name |   Type   |  Owner   <br
/>--------+------+----------+----------<br/> public | x    | sequence | fabrizio<br />(1 row)<br /><br />fabrizio=# \d
x<br/>                  Sequence "public.x"<br />    Column    |       Type        |        Value        <br
/>--------------+-------------------+---------------------<br/> start_value  | bigint            | 1<br /> increment_by
|bigint            | 1<br /> max_value    | bigint            | 9223372036854775807<br /> min_value    |
bigint           | 1<br /> cache_value  | bigint            | 1<br /> is_cycled    | boolean           | f<br
/> amstate     | seqam_local_state | (1,f,0)<br />Access Method: local<br /><br />fabrizio=# \q<br /><br
/>fabrizio@bagual:~/pgsql<br />$ bin/pg_dump > /tmp/fabrizio.sql<br />pg_dump: [archiver (db)] query failed: ERROR: 
column"sequence_name" does not exist<br />LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br
/>              ^<br />pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, increment_by, CASE WHEN
increment_by> 0 AND max_value = 9223372036854775807 THEN NULL      WHEN increment_by < 0 AND max_value = -1 THEN
NULL     ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL      WHEN
increment_by< 0 AND min_value = -9223372036854775807 THEN NULL      ELSE min_value END AS min_value, cache_value,
is_cycledFROM x<br /><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Toninght I'll review
someparts of the code.<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Regards,<br
/></div><divclass="gmail_extra"><br /><br /></div><div class="gmail_extra">Att,<br /></div><div class="gmail_extra"><br
/>--<br/>Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a
href="http://www.timbira.com.br"target="_blank">http://www.timbira.com.br</a><br />>> Blog: <a
href="http://fabriziomello.github.io"target="_blank">http://fabriziomello.github.io</a><br />>> Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello"target="_blank">http://br.linkedin.com/in/fabriziomello</a><br />>>
Twitter:<a href="http://twitter.com/fabriziomello" target="_blank">http://twitter.com/fabriziomello</a><br />>>
Github:<a href="http://github.com/fabriziomello" target="_blank">http://github.com/fabriziomello</a></div></div> 

Re: Sequence Access Method WIP

From
David Steele
Date:
Hi Petr,

On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:

> fabrizio@bagual:~/pgsql
> $ bin/pg_dump > /tmp/fabrizio.sql
> pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
> does not exist
> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>                 ^
> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
> increment_by, CASE WHEN increment_by > 0 AND max_value =
> 9223372036854775807 THEN NULL      WHEN increment_by < 0 AND max_value =
> -1 THEN NULL      ELSE max_value END AS max_value, CASE WHEN
> increment_by > 0 AND min_value = 1 THEN NULL      WHEN increment_by < 0
> AND min_value = -9223372036854775807 THEN NULL      ELSE min_value END
> AS min_value, cache_value, is_cycled FROM x

It looks like a new patch is needed.  I've marked this "waiting on author".

Thanks,
-- 
-David
david@pgmasters.net



Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:


On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david@pgmasters.net> wrote:
>
> Hi Petr,
>
> On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>
>> fabrizio@bagual:~/pgsql
>> $ bin/pg_dump > /tmp/fabrizio.sql
>> pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
>> does not exist
>> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>>                 ^
>> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
>> increment_by, CASE WHEN increment_by > 0 AND max_value =
>> 9223372036854775807 THEN NULL      WHEN increment_by < 0 AND max_value =
>> -1 THEN NULL      ELSE max_value END AS max_value, CASE WHEN
>> increment_by > 0 AND min_value = 1 THEN NULL      WHEN increment_by < 0
>> AND min_value = -9223372036854775807 THEN NULL      ELSE min_value END
>> AS min_value, cache_value, is_cycled FROM x
>
>
> It looks like a new patch is needed.  I've marked this "waiting on author".
>

The attached patches fix the issues previous pointed by me.

But there are other issue in the gapless-seq extension when I ran "make check":

 43   CREATE EXTENSION gapless_seq;
 44   CREATE SEQUENCE test_gapless USING gapless;
 45   SELECT nextval('test_gapless'::regclass);
 46 ! ERROR:  could not access status of transaction 1275068416
 47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or directory.
 48   BEGIN;
 49     SELECT nextval('test_gapless'::regclass);
 50 ! ERROR:  could not access status of transaction 1275068416
 51 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or directory.
 52     SELECT nextval('test_gapless'::regclass);
 53 ! ERROR:  current transaction is aborted, commands ignored until end of transaction block
 54     SELECT nextval('test_gapless'::regclass);
 55 ! ERROR:  current transaction is aborted, commands ignored until end of transaction block
 56   ROLLBACK;
 57   SELECT nextval('test_gapless'::regclass);
 58 ! ERROR:  could not access status of transaction 1275068416
 59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or directory.


And I see the same running manually:

fabrizio=# create extension gapless_seq;
CREATE EXTENSION
fabrizio=# create sequence x using gapless;
CREATE SEQUENCE
fabrizio=# select nextval('x');
ERROR:  could not access status of transaction 1258291200
DETAIL:  Could not open file "pg_subtrans/4B00": No such file or directory.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Attachment

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 29/03/16 18:50, Fabrízio de Royes Mello wrote:
>
>
> On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>  >
>  > Hi Petr,
>  >
>  > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>  >
>  >> fabrizio@bagual:~/pgsql
>  >> $ bin/pg_dump > /tmp/fabrizio.sql
>  >> pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
>  >> does not exist
>  >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>  >>                 ^
>  >> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
>  >> increment_by, CASE WHEN increment_by > 0 AND max_value =
>  >> 9223372036854775807 THEN NULL      WHEN increment_by < 0 AND max_value =
>  >> -1 THEN NULL      ELSE max_value END AS max_value, CASE WHEN
>  >> increment_by > 0 AND min_value = 1 THEN NULL      WHEN increment_by < 0
>  >> AND min_value = -9223372036854775807 THEN NULL      ELSE min_value END
>  >> AS min_value, cache_value, is_cycled FROM x
>  >
>  >
>  > It looks like a new patch is needed.  I've marked this "waiting on
> author".
>  >
>

Yeah there were some incompatible commits since my last rebase, fixed,
along with the pg_dump bugs..

>
> But there are other issue in the gapless-seq extension when I ran "make
> check":
>
>   43   CREATE EXTENSION gapless_seq;
>   44   CREATE SEQUENCE test_gapless USING gapless;
>   45   SELECT nextval('test_gapless'::regclass);
>   46 ! ERROR:  could not access status of transaction 1275068416
>   47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
> directory.
>   48   BEGIN;
>   49     SELECT nextval('test_gapless'::regclass);
>   50 ! ERROR:  could not access status of transaction 1275068416
>   51 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
> directory.
>   52     SELECT nextval('test_gapless'::regclass);
>   53 ! ERROR:  current transaction is aborted, commands ignored until
> end of transaction block
>   54     SELECT nextval('test_gapless'::regclass);
>   55 ! ERROR:  current transaction is aborted, commands ignored until
> end of transaction block
>   56   ROLLBACK;
>   57   SELECT nextval('test_gapless'::regclass);
>   58 ! ERROR:  could not access status of transaction 1275068416
>   59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
> directory.
>
>
> And I see the same running manually:
>
> fabrizio=# create extension gapless_seq;
> CREATE EXTENSION
> fabrizio=# create sequence x using gapless;
> CREATE SEQUENCE
> fabrizio=# select nextval('x');
> ERROR:  could not access status of transaction 1258291200
> DETAIL:  Could not open file "pg_subtrans/4B00": No such file or directory.
>
> Regards,
>

Hmm I am unable to reproduce this. What OS? Any special configure flags
you use?


--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:
<div dir="ltr"><div class="gmail_extra"><br />On Tue, Mar 29, 2016 at 2:26 PM, Petr Jelinek <<a
href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br />><br />> On 29/03/16 18:50, Fabrízio
deRoyes Mello wrote:<br />>><br />>><br />>><br />>> On Tue, Mar 29, 2016 at 12:25 PM, David
Steele<<a href="mailto:david@pgmasters.net">david@pgmasters.net</a><br />>> <mailto:<a
href="mailto:david@pgmasters.net">david@pgmasters.net</a>>>wrote:<br />>>  ><br />>>  > Hi
Petr,<br/>>>  ><br />>>  > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:<br />>>  ><br
/>>> >> fabrizio@bagual:~/pgsql<br />>>  >> $ bin/pg_dump > /tmp/fabrizio.sql<br />>>
 >>pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"<br />>>  >> does not
exist<br/>>>  >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br />>>
 >>                ^<br />>>  >> pg_dump: [archiver (db)] query was: SELECT sequence_name,
start_value,<br/>>>  >> increment_by, CASE WHEN increment_by > 0 AND max_value =<br />>>  >>
9223372036854775807THEN NULL      WHEN increment_by < 0 AND max_value =<br />>>  >> -1 THEN NULL    
 ELSEmax_value END AS max_value, CASE WHEN<br />>>  >> increment_by > 0 AND min_value = 1 THEN NULL    
 WHENincrement_by < 0<br />>>  >> AND min_value = -9223372036854775807 THEN NULL      ELSE min_value
END<br/>>>  >> AS min_value, cache_value, is_cycled FROM x<br />>>  ><br />>>  ><br
/>>> > It looks like a new patch is needed.  I've marked this "waiting on<br />>> author".<br />>>
 ><br/>>><br />><br />> Yeah there were some incompatible commits since my last rebase, fixed, along
withthe pg_dump bugs..<br />><br /><br /></div><div class="gmail_extra">Now all applies without errors, build and
"makecheck" too.<br /></div><div class="gmail_extra"><br /><br /><br />>> But there are other issue in the
gapless-seqextension when I ran "make<br />>> check":<br />>><br />>>   43   CREATE EXTENSION
gapless_seq;<br/>>>   44   CREATE SEQUENCE test_gapless USING gapless;<br />>>   45   SELECT
nextval('test_gapless'::regclass);<br/>>>   46 ! ERROR:  could not access status of transaction 1275068416<br
/>>>  47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or<br />>> directory.<br
/>>>  48   BEGIN;<br />>>   49     SELECT nextval('test_gapless'::regclass);<br />>>   50 ! ERROR:
 couldnot access status of transaction 1275068416<br />>>   51 ! DETAIL:  Could not open file "pg_subtrans/4C00":
Nosuch file or<br />>> directory.<br />>>   52     SELECT nextval('test_gapless'::regclass);<br />>>
 53 ! ERROR:  current transaction is aborted, commands ignored until<br />>> end of transaction block<br
/>>>  54     SELECT nextval('test_gapless'::regclass);<br />>>   55 ! ERROR:  current transaction is
aborted,commands ignored until<br />>> end of transaction block<br />>>   56   ROLLBACK;<br />>>   57
 SELECT nextval('test_gapless'::regclass);<br />>>   58 ! ERROR:  could not access status of transaction
1275068416<br/>>>   59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or<br />>>
directory.<br/>>><br />>><br />>> And I see the same running manually:<br />>><br />>>
fabrizio=#create extension gapless_seq;<br />>> CREATE EXTENSION<br />>> fabrizio=# create sequence x using
gapless;<br/>>> CREATE SEQUENCE<br />>> fabrizio=# select nextval('x');<br />>> ERROR:  could not
accessstatus of transaction 1258291200<br />>> DETAIL:  Could not open file "pg_subtrans/4B00": No such file or
directory.<br/>>><br />>> Regards,<br />>><br />><br />> Hmm I am unable to reproduce this.
WhatOS? Any special configure flags you use?<br />><br /><br /></div><div class="gmail_extra">In my environment the
errorremains with your last patches. <br /><br />I didn't use any special. <br /><br />./configure
--prefix=/home/fabrizio/pgsql--enable-cassert --enable-coverage --enable-tap-tests --enable-depend <br /></div><div
class="gmail_extra">make-s -j8 install<br /></div><div class="gmail_extra">make check-world<br /></div><div
class="gmail_extra"><br/><br /></div><div class="gmail_extra">My environment:<br /></div><div class="gmail_extra"><br
/>fabrizio@bagual:/d/postgresql(0002-gapless-seq-petr) <br />$ uname -a<br />Linux bagual 3.13.0-83-generic #127-Ubuntu
SMPFri Mar 11 00:25:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux<br /><br />fabrizio@bagual:/d/postgresql
(0002-gapless-seq-petr)<br />$ gcc --version<br />gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4<br />Copyright (C) 2013
FreeSoftware Foundation, Inc.<br />This is free software; see the source for copying conditions.  There is NO<br
/>warranty;not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br /><br /><br /></div><div
class="gmail_extra">Regards,<br/></div><div class="gmail_extra"><br />--<br />Fabrízio de Royes Mello<br
/>Consultoria/CoachingPostgreSQL<br />>> Timbira: <a
href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog: <a
href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a
href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a
href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>
>  >
>  > Hmm I am unable to reproduce this. What OS? Any special configure
> flags you use?
>  >
>
> In my environment the error remains with your last patches.
>
> I didn't use any special.
>
> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
> --enable-coverage --enable-tap-tests --enable-depend
> make -s -j8 install
> make check-world
>
>
> My environment:
>
> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
> $ uname -a
> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
> $ gcc --version
> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
> Copyright (C) 2013 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>

Hmm nothing special indeed, still can't reproduce, I did one blind try
for a fix. Can you test with attached?

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:


On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>
>>
>>  >
>>  > Hmm I am unable to reproduce this. What OS? Any special configure
>> flags you use?
>>  >
>>
>> In my environment the error remains with your last patches.
>>
>> I didn't use any special.
>>
>> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> --enable-coverage --enable-tap-tests --enable-depend
>> make -s -j8 install
>> make check-world
>>
>>
>> My environment:
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ uname -a
>> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ gcc --version
>> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>
> Hmm nothing special indeed, still can't reproduce, I did one blind try for a fix. Can you test with attached?
>

Same error... Now I'll leave in a trip but when I arrive I'll try to figure out what happening debugging the code.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>
>
> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>  >>
>  >>
>  >>  >
>  >>  > Hmm I am unable to reproduce this. What OS? Any special configure
>  >> flags you use?
>  >>  >
>  >>
>  >> In my environment the error remains with your last patches.
>  >>
>  >> I didn't use any special.
>  >>
>  >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>  >> --enable-coverage --enable-tap-tests --enable-depend
>  >> make -s -j8 install
>  >> make check-world
>  >>
>  >>
>  >> My environment:
>  >>
>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>  >> $ uname -a
>  >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>  >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>  >>
>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>  >> $ gcc --version
>  >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>  >> Copyright (C) 2013 Free Software Foundation, Inc.
>  >> This is free software; see the source for copying conditions.  There
> is NO
>  >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
>  >>
>  >
>  > Hmm nothing special indeed, still can't reproduce, I did one blind
> try for a fix. Can you test with attached?
>  >
>
> Same error... Now I'll leave in a trip but when I arrive I'll try to
> figure out what happening debugging the code.
>

Okay, thanks.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:


On Tue, Mar 29, 2016 at 5:58 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com
>> <mailto:petr@2ndquadrant.com>> wrote:
>>  >
>>  > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>  >>
>>  >>
>>  >>  >
>>  >>  > Hmm I am unable to reproduce this. What OS? Any special configure
>>  >> flags you use?
>>  >>  >
>>  >>
>>  >> In my environment the error remains with your last patches.
>>  >>
>>  >> I didn't use any special.
>>  >>
>>  >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>>  >> --enable-coverage --enable-tap-tests --enable-depend
>>  >> make -s -j8 install
>>  >> make check-world
>>  >>
>>  >>
>>  >> My environment:
>>  >>
>>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>>  >> $ uname -a
>>  >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>>  >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>  >>
>>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>>  >> $ gcc --version
>>  >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>>  >> Copyright (C) 2013 Free Software Foundation, Inc.
>>  >> This is free software; see the source for copying conditions.  There
>> is NO
>>  >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>>  >>
>>  >
>>  > Hmm nothing special indeed, still can't reproduce, I did one blind
>> try for a fix. Can you test with attached?
>>  >
>>
>> Same error... Now I'll leave in a trip but when I arrive I'll try to
>> figure out what happening debugging the code.
>>
>
> Okay, thanks.
>

Hi,

After some debugging seems that the "seqstate->xid" in "wait_for_sequence" isn't properly initialized or initialized with the wrong value (1258291200).

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Sequence Access Method WIP

From
Jose Luis Tallon
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
It does compile cleanly.

DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME
("seqam_gapless_values")strikes me a bit: while I understand the creation of a private schema named after the
extension,it seems overkill for just a single table.
 
I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation
details,if deemed reasonable.
 
On the other hand, creating the schema/table upon extension installation makes the values table use the default
tablespacefor the database, which can be good for concurrency --- direct writes to less loaded storage  (Note that
usersmay want to move this table into an SSD-backed tablespace or equivalently fast storage ... moreso when many --not
justone-- gapless sequences are being used)
 

Yet, there is <20141203102425.GT2456@alap3.anarazel.de> where Andres argues against anything different than
one-page-per-sequenceimplementations.  I guess this patch changes everything in this respect.
 

CODE seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,                               bool
restart_requested,bool is_init) -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid
readingas "is_initIALIZED"
 

DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master.
Please update/rebase the patch and resubmit.

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Hi,

Thanks for review.

On 30/03/16 15:22, Jose Luis Tallon wrote:
> [Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
> Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ]
> Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
>

Ouch good point, it does show how long the whole sequence am thing has 
been around.

> DESIGN
> The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME
("seqam_gapless_values")strikes me a bit: while I understand the creation of a private schema named after the
extension,it seems overkill for just a single table.
 
> I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation
details,if deemed reasonable.
 
> On the other hand, creating the schema/table upon extension installation makes the values table use the default
tablespacefor the database, which can be good for concurrency --- direct writes to less loaded storage
 
>     (Note that users may want to move this table into an SSD-backed tablespace or equivalently fast storage ...
moresowhen many --not just one-- gapless sequences are being used)
 
>

Schema is needed for the handler function as well. In general I don't 
want to add another internal schema that will be empty when no sequence 
AM is installed.

> Yet, there is <20141203102425.GT2456@alap3.anarazel.de> where Andres argues against anything different than
one-page-per-sequenceimplementations.
 
>     I guess this patch changes everything in this respect.

Not really, gapless just needs table for transactionality and as such is 
an exception. It does not change how the nontransactional sequence 
storage works though. I agree with Andres on this one.

> CODE
>    seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,
>                                  bool restart_requested, bool is_init)
>    -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid reading as "is_initIALIZED"

Sounds good.

> DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master.
> Please update/rebase the patch and resubmit.
>

The current version of seqam is 0001-seqam-2016-03-29 which should apply 
correctly.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
Hi,

new version attached that should fix the issue. It was alignment -
honestly don't know what I was thinking using fixed alignment when the
AMs can define their own type.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:


On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> new version attached that should fix the issue. It was alignment - honestly don't know what I was thinking using fixed alignment when the AMs can define their own type.
>

Yeah... now all works fine... I had a minor issue when apply to the current master but the attached fix it and I also added tab-complete support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.

I ran all the regression tests and all passed.

I have just one question about de 0002 patch:
- There are some reason to not use TransactionId instead of uint32 in GaplessSequenceState struct?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Attachment

Re: Sequence Access Method WIP

From
Petr Jelinek
Date:
On 04/04/16 15:53, Fabrízio de Royes Mello wrote:
>
>
> On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  > Hi,
>  >
>  > new version attached that should fix the issue. It was alignment -
> honestly don't know what I was thinking using fixed alignment when the
> AMs can define their own type.
>  >
>
> Yeah... now all works fine... I had a minor issue when apply to the
> current master but the attached fix it and I also added tab-complete
> support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.
>

Cool thanks.

> I ran all the regression tests and all passed.
>
> I have just one question about de 0002 patch:
> - There are some reason to not use TransactionId instead of uint32 in
> GaplessSequenceState struct?
>

I missed we have that :)

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sequence Access Method WIP

From
Fabrízio de Royes Mello
Date:


On Tue, Apr 5, 2016 at 12:52 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> I have just one question about de 0002 patch:
>> - There are some reason to not use TransactionId instead of uint32 in
>> GaplessSequenceState struct?
>>
>
> I missed we have that :)
>

Attached fix it and also I changed to use TransactionIdEquals instead of compare xids directly.

- if (seqstate->xid != local_xid)
+ if (!TransactionIdEquals(seqstate->xid, local_xid))

IMHO this patch is Ready for Commiter. Changed status in CommitfestApp.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Attachment