Thread: Sequence Access Method WIP

Sequence Access Method WIP

From

Simon Riggs

Date:

16 January 2013, 02:40:40

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
  CREATE SEQUENCE foo_seq
      USING globalseq WITH (custom_option=setting);
which allows you to specify an alternate Sequence Access Method for a
sequence object,
or you can specify a default_sequence_mgr as a USERSET parameter
  SET default_sequence_mgr = globalseq;

Existing sequences can be modified to use a different SeqAM, by calling
  ALTER SEQUENCE foo_seq USING globalseq;

SeqAM is similar to IndexAM: There is a separate catalog table for
SeqAMs, but no specific API to create them. Initdb creates one
sequence am, called "local", which is the initial default. If
default_sequence_mgr is set to '' or 'local' then we use the local
seqam. The local seqam's functions are included in core.

Status is still "Work In Progress". Having said that most of the grunt
work is done and if we agree the shape of this is right, its
relatively easy going code.

postgres=# select oid, * from pg_seqam;
-[ RECORD 1 ]+--------------------
oid          | 3839
seqamname    | local
seqamalloc   | seqam_local_alloc
seqamsetval  | seqam_local_setval
seqamoptions | seqam_local_options

postgres=# select relname, relam from pg_class where relname = 'foo2';
 relname | relam
---------+-------
 foo2    |  3839

postgres=# create sequence foo5 using global;
ERROR:  access method "global" does not exist

Footprint
 backend/access/Makefile            |    2
 backend/access/common/reloptions.c |   26 +++
 backend/access/sequence/Makefile   |   17 ++
 backend/access/sequence/seqam.c    |  278 +++++++++++++++++++++++++++++++++++++
 backend/catalog/Makefile           |    2
 backend/commands/sequence.c        |  132 +++++++++++++++--
 backend/commands/tablecmds.c       |    3
 backend/nodes/copyfuncs.c          |    4
 backend/nodes/equalfuncs.c         |    4
 backend/parser/gram.y              |   84 ++++++++++-
 backend/parser/parse_utilcmd.c     |    4
 backend/utils/cache/catcache.c     |    6
 backend/utils/cache/syscache.c     |   23 +++
 backend/utils/misc/guc.c           |   12 +
 include/access/reloptions.h        |    6
 include/access/seqam.h             |   27 +++
 include/catalog/indexing.h         |    5
 include/catalog/pg_proc.h          |    6
 include/catalog/pg_seqam.h         |   70 +++++++++
 include/nodes/parsenodes.h         |    8 -
 include/utils/guc.h                |    1
 include/utils/rel.h                |   22 +-
 include/utils/syscache.h           |    2
 23 files changed, 706 insertions(+), 38 deletions(-)

Tasks to complete
* contrib module for example/testing
* Docs

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

seqam.v5.patch

Re: Sequence Access Method WIP

From

Simon Riggs

Date:

14 November 2013, 20:10:31

On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:

> SeqAm allows you to specify a plugin that alters the behaviour for
> sequence allocation and resetting, aimed specifically at clustering
> systems.
>
> New command options on end of statement allow syntax
>   CREATE SEQUENCE foo_seq
>       USING globalseq

Production version of this, ready for commit to PG 9.4

Includes test extension which allows sequences without gaps - "gapless".

Test using seq_test.psql after creating extension.

No dependencies on other patches.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Hi,

I rewrote the patch with different API along the lines of what was
discussed.

The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.

For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be. There are probably some optimizations possible there but I think the
patch is good enough for the review now, so I am adding it to October
commitfest.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

seqamv3.patch

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

04 November 2014, 12:11:47

On 10/13/2014 01:01 PM, Petr Jelinek wrote:
> Hi,
>
> I rewrote the patch with different API along the lines of what was
> discussed.

Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to 
see how it really works. The "local" one is too dummy to expose any 
possible issues.

> The API now consists of following functions:
> sequence_alloc - allocating range of new values
> The function receives the sequence relation, current value, number of
> requested values amdata and relevant sequence options like min/max and
> returns new amdata, new current value, number of values allocated and
> also if it needs wal write (that should be returned if amdata has
> changed plus other reasons the AM might have to force the wal update).
>
> sequence_setval - notification that setval is happening
> This function gets sequence relation, previous value and new value plus
> the amdata and returns amdata (I can imagine some complex sequence AMs
> will want to throw error that setval can't be done on them).
>
> sequence_request_update/sequence_update - used for background processing
> Basically AM can call the sequence_request_update and backend will then
> call the sequence_update method of an AM with current amdata and will
> write the updated amdata to disk
>
> sequence_seqparams - function to process/validate the standard sequence
> options like start position, min/max, increment by etc by the AM, it's
> called in addition to the standard processing
>
> sequence_reloptions - this is the only thing that remained unchanged
> from previous patch, it's meant to pass custom options to the AM
>
> Only the alloc and reloptions methods are required (and implemented by
> the local AM).
>
> The caching, xlog writing, updating the page, etc is handled by backend,
> the AM does not see the tuple at all. I decided to not pass even the
> struct around and just pass the relevant options because I think if we
> want to abstract the storage properly then the AM should not care about
> how the pg_sequence looks like at all, even if it means that the
> sequence_alloc parameter list is bit long.

Hmm. The division of labour between the seqam and commands/sequence.c 
still feels a bit funny. sequence.c keeps track of how many values have 
been WAL-logged, and thus usable immediately, but we still call 
sequence_alloc even when using up those already WAL-logged values. If 
you think of using this for something like a centralized sequence server 
in a replication cluster, you certainly don't want to make a call to the 
remote server for every value - you'll want to cache them.

With the "local" seqam, there are two levels of caching. Each backend 
caches some values (per the CACHE <value> option in CREATE SEQUENCE). In 
addition to that, the server WAL-logs 32 values at a time. If you have a 
remote seqam, it would most likely add a third cache, but it would 
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The 
server keeps the sequence page locked throughout nextval(). But if the 
actual state of the sequence is maintained elsewhere, there's no need to 
serialize the calls to the remote allocator, i.e. the sequence_alloc() 
calls.

I'm not exactly sure what to do about that. One option is to completely 
move the maintenance of the "current" value, i.e. sequence.last_value, 
to the seqam. That makes sense from an abstraction point of view. For 
example with a remote server managing the sequence, storing the 
"current" value in the local catalog table makes no sense as it's always 
going to be out-of-date. The local seqam would store it as part of the 
am-private data. However, you would need to move the responsibility of 
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to 
provide an API that the seqam can call to do that. Perhaps just let the 
seqam call heap_inplace_update on the sequence relation.

> For the amdata handling (which is the AM's private data variable) the
> API assumes that (Datum) 0 is NULL, this seems to work well for
> reloptions so should work here also and it simplifies things a little
> compared to passing pointers to pointers around and making sure
> everything is allocated, etc.
>
> Sadly the fact that amdata is not fixed size and can be NULL made the
> page updates of the sequence relation quite more complex that it used to
> be.

It would be nice if the seqam could define exactly the columns it needs, 
with any datatypes. There would be a set of common attributes: 
sequence_name, start_value, cache_value, increment_by, max_value, 
min_value, is_cycled. The local seqam would add "last_value", "log_cnt" 
and "is_called" to that. A remote seqam that calls out to some other 
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the 
required columns, for example.

- Heikki

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

04 November 2014, 21:02:03

On 04/11/14 13:11, Heikki Linnakangas wrote:
> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>> Hi,
>>
>> I rewrote the patch with different API along the lines of what was
>> discussed.
>
> Thanks, that's better.
>
> It would be good to see an alternative seqam to implement this API, to
> see how it really works. The "local" one is too dummy to expose any
> possible issues.
>

Yeah, I don't know what that would be, It's hard for me to find 
something that would sever purely demonstration purposes. I guess I 
could port BDR sequences to this if it would help (once we have bit more 
solid agreement that the proposed API at least theoretically seems ok so 
that I don't have to rewrite it 10 times if at all possible).

>> ...
>> Only the alloc and reloptions methods are required (and implemented by
>> the local AM).
>>
>> The caching, xlog writing, updating the page, etc is handled by backend,
>> the AM does not see the tuple at all. I decided to not pass even the
>> struct around and just pass the relevant options because I think if we
>> want to abstract the storage properly then the AM should not care about
>> how the pg_sequence looks like at all, even if it means that the
>> sequence_alloc parameter list is bit long.
>
> Hmm. The division of labour between the seqam and commands/sequence.c
> still feels a bit funny. sequence.c keeps track of how many values have
> been WAL-logged, and thus usable immediately, but we still call
> sequence_alloc even when using up those already WAL-logged values. If
> you think of using this for something like a centralized sequence server
> in a replication cluster, you certainly don't want to make a call to the
> remote server for every value - you'll want to cache them.
>
> With the "local" seqam, there are two levels of caching. Each backend
> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
> addition to that, the server WAL-logs 32 values at a time. If you have a
> remote seqam, it would most likely add a third cache, but it would
> interact in strange ways with the second cache.
>
> Considering a non-local seqam, the locking is also a bit strange. The
> server keeps the sequence page locked throughout nextval(). But if the
> actual state of the sequence is maintained elsewhere, there's no need to
> serialize the calls to the remote allocator, i.e. the sequence_alloc()
> calls.
>
> I'm not exactly sure what to do about that. One option is to completely
> move the maintenance of the "current" value, i.e. sequence.last_value,
> to the seqam. That makes sense from an abstraction point of view. For
> example with a remote server managing the sequence, storing the
> "current" value in the local catalog table makes no sense as it's always
> going to be out-of-date. The local seqam would store it as part of the
> am-private data. However, you would need to move the responsibility of
> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
> provide an API that the seqam can call to do that. Perhaps just let the
> seqam call heap_inplace_update on the sequence relation.
>

My idea of how this works is - sequence_next handles the allocation and 
the level2 caching (WAL logged cache) via amdata if it supports it, or 
returns single value if it doesn't - then the WAL will always just write 
the 1 value and there will basically be no level2 cache, since it is the 
sequence_next who controls how much will be WAL-logged, what backend 
asks for is just "suggestion".

The third level caching as you say should be solved by the 
sequence_request_update and sequence_update callback - that will enable 
the sequence AM to handle this kind of caching asynchronously without 
blocking the sequence_next unnecessarily.

That way it's possible to implement many different strategies, from no 
cache at all, to three levels of caching, with and without blocking of 
calls to sequence_next.

>> For the amdata handling (which is the AM's private data variable) the
>> API assumes that (Datum) 0 is NULL, this seems to work well for
>> reloptions so should work here also and it simplifies things a little
>> compared to passing pointers to pointers around and making sure
>> everything is allocated, etc.
>>
>> Sadly the fact that amdata is not fixed size and can be NULL made the
>> page updates of the sequence relation quite more complex that it used to
>> be.
>
> It would be nice if the seqam could define exactly the columns it needs,
> with any datatypes. There would be a set of common attributes:
> sequence_name, start_value, cache_value, increment_by, max_value,
> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
> and "is_called" to that. A remote seqam that calls out to some other
> server might store the remote server's hostname etc.
>
> There could be a seqam function that returns a TupleDesc with the
> required columns, for example.
>

Wouldn't that somewhat bloat catalog if we had new catalog table for 
each sequence AM? It also does not really solve the amdata being dynamic 
size "issue". I think just dynamic field where AM stores whatever it 
wants is enough (ie the current solution), it's just that the hackery 
that sequences storage implementation does is more visible once there 
are non-fixed fields.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

05 November 2014, 12:45:55

On 11/04/2014 11:01 PM, Petr Jelinek wrote:
> On 04/11/14 13:11, Heikki Linnakangas wrote:
>> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>>> Only the alloc and reloptions methods are required (and implemented by
>>> the local AM).
>>>
>>> The caching, xlog writing, updating the page, etc is handled by backend,
>>> the AM does not see the tuple at all. I decided to not pass even the
>>> struct around and just pass the relevant options because I think if we
>>> want to abstract the storage properly then the AM should not care about
>>> how the pg_sequence looks like at all, even if it means that the
>>> sequence_alloc parameter list is bit long.
>>
>> Hmm. The division of labour between the seqam and commands/sequence.c
>> still feels a bit funny. sequence.c keeps track of how many values have
>> been WAL-logged, and thus usable immediately, but we still call
>> sequence_alloc even when using up those already WAL-logged values. If
>> you think of using this for something like a centralized sequence server
>> in a replication cluster, you certainly don't want to make a call to the
>> remote server for every value - you'll want to cache them.
>>
>> With the "local" seqam, there are two levels of caching. Each backend
>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
>> addition to that, the server WAL-logs 32 values at a time. If you have a
>> remote seqam, it would most likely add a third cache, but it would
>> interact in strange ways with the second cache.
>>
>> Considering a non-local seqam, the locking is also a bit strange. The
>> server keeps the sequence page locked throughout nextval(). But if the
>> actual state of the sequence is maintained elsewhere, there's no need to
>> serialize the calls to the remote allocator, i.e. the sequence_alloc()
>> calls.
>>
>> I'm not exactly sure what to do about that. One option is to completely
>> move the maintenance of the "current" value, i.e. sequence.last_value,
>> to the seqam. That makes sense from an abstraction point of view. For
>> example with a remote server managing the sequence, storing the
>> "current" value in the local catalog table makes no sense as it's always
>> going to be out-of-date. The local seqam would store it as part of the
>> am-private data. However, you would need to move the responsibility of
>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
>> provide an API that the seqam can call to do that. Perhaps just let the
>> seqam call heap_inplace_update on the sequence relation.
>
> My idea of how this works is - sequence_next handles the allocation and
> the level2 caching (WAL logged cache) via amdata if it supports it, or
> returns single value if it doesn't - then the WAL will always just write
> the 1 value and there will basically be no level2 cache, since it is the
> sequence_next who controls how much will be WAL-logged, what backend
> asks for is just "suggestion".

Hmm, so the AM might return an "nallocated" value less than the "fetch" 
value that sequence.c requested? As the patch stands, wouldn't that make 
sequence.c write a WAL record more often?

In fact, if the seqam manages the current value outside the database 
(e.g. a "remote" seqam that gets the value from another server), 
nextval() never needs to write a WAL record.

> The third level caching as you say should be solved by the
> sequence_request_update and sequence_update callback - that will enable
> the sequence AM to handle this kind of caching asynchronously without
> blocking the sequence_next unnecessarily.
>
> That way it's possible to implement many different strategies, from no
> cache at all, to three levels of caching, with and without blocking of
> calls to sequence_next.

It's nice that asynchronous operation is possible, but that seems like a 
more advanced feature. Surely an approach where you fetch a bunch of 
values when needed is going to be more common. Let's focus on the simple 
synchronous case first, and make sure the API works well for that.

>>> For the amdata handling (which is the AM's private data variable) the
>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>> reloptions so should work here also and it simplifies things a little
>>> compared to passing pointers to pointers around and making sure
>>> everything is allocated, etc.
>>>
>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>> page updates of the sequence relation quite more complex that it used to
>>> be.
>>
>> It would be nice if the seqam could define exactly the columns it needs,
>> with any datatypes. There would be a set of common attributes:
>> sequence_name, start_value, cache_value, increment_by, max_value,
>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>> and "is_called" to that. A remote seqam that calls out to some other
>> server might store the remote server's hostname etc.
>>
>> There could be a seqam function that returns a TupleDesc with the
>> required columns, for example.
>
> Wouldn't that somewhat bloat catalog if we had new catalog table for
> each sequence AM?

No, that's not what I meant. The number of catalog tables would be the 
same as today. Sequences look much like any other relation, with entries 
in pg_attribute catalog table for all the attributes for each sequence. 
Currently, all sequences have the same set of attributes, sequence_name, 
last_value and so forth. What I'm proposing is that there would a set of 
attributes that are common to all sequences, but in addition to that 
there could be any number of AM-specific attributes.

> It also does not really solve the amdata being dynamic
> size "issue".

Yes it would. There would not be a single amdata attribute, but the AM 
could specify any number of custom attributes, which could be fixed size 
or varlen. It would be solely the AM's responsibility to set the values 
of those attributes.

> I think just dynamic field where AM stores whatever it
> wants is enough (ie the current solution), it's just that the hackery
> that sequences storage implementation does is more visible once there
> are non-fixed fields.

Using a single amdata field is like creating a table with a single text 
column, and putting all your data in the single column separated by 
tabs. I'm sure you can make it work, but normalized data is much nicer 
to work with.

- Heikki

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

05 November 2014, 15:07:40

On 05/11/14 13:45, Heikki Linnakangas wrote:
> On 11/04/2014 11:01 PM, Petr Jelinek wrote:
>> On 04/11/14 13:11, Heikki Linnakangas wrote:
>>> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>>>> Only the alloc and reloptions methods are required (and implemented by
>>>> the local AM).
>>>>
>>>> The caching, xlog writing, updating the page, etc is handled by
>>>> backend,
>>>> the AM does not see the tuple at all. I decided to not pass even the
>>>> struct around and just pass the relevant options because I think if we
>>>> want to abstract the storage properly then the AM should not care about
>>>> how the pg_sequence looks like at all, even if it means that the
>>>> sequence_alloc parameter list is bit long.
>>>
>>> Hmm. The division of labour between the seqam and commands/sequence.c
>>> still feels a bit funny. sequence.c keeps track of how many values have
>>> been WAL-logged, and thus usable immediately, but we still call
>>> sequence_alloc even when using up those already WAL-logged values. If
>>> you think of using this for something like a centralized sequence server
>>> in a replication cluster, you certainly don't want to make a call to the
>>> remote server for every value - you'll want to cache them.
>>>
>>> With the "local" seqam, there are two levels of caching. Each backend
>>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
>>> addition to that, the server WAL-logs 32 values at a time. If you have a
>>> remote seqam, it would most likely add a third cache, but it would
>>> interact in strange ways with the second cache.
>>>
>>> Considering a non-local seqam, the locking is also a bit strange. The
>>> server keeps the sequence page locked throughout nextval(). But if the
>>> actual state of the sequence is maintained elsewhere, there's no need to
>>> serialize the calls to the remote allocator, i.e. the sequence_alloc()
>>> calls.
>>>
>>> I'm not exactly sure what to do about that. One option is to completely
>>> move the maintenance of the "current" value, i.e. sequence.last_value,
>>> to the seqam. That makes sense from an abstraction point of view. For
>>> example with a remote server managing the sequence, storing the
>>> "current" value in the local catalog table makes no sense as it's always
>>> going to be out-of-date. The local seqam would store it as part of the
>>> am-private data. However, you would need to move the responsibility of
>>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
>>> provide an API that the seqam can call to do that. Perhaps just let the
>>> seqam call heap_inplace_update on the sequence relation.
>>
>> My idea of how this works is - sequence_next handles the allocation and
>> the level2 caching (WAL logged cache) via amdata if it supports it, or
>> returns single value if it doesn't - then the WAL will always just write
>> the 1 value and there will basically be no level2 cache, since it is the
>> sequence_next who controls how much will be WAL-logged, what backend
>> asks for is just "suggestion".
>
> Hmm, so the AM might return an "nallocated" value less than the "fetch"
> value that sequence.c requested? As the patch stands, wouldn't that make
> sequence.c write a WAL record more often?
>

That's correct, that's also why you usually want to have some form of 
local caching if possible.

> In fact, if the seqam manages the current value outside the database
> (e.g. a "remote" seqam that gets the value from another server),
> nextval() never needs to write a WAL record.

Sure it does, you need to keep the current state in Postgres also, at 
least the current value so that you can pass correct input to 
sequence_alloc(). And you need to do this in crash-safe way so WAL is 
necessary.

I think sequences will cache in amdata if possible for that type of 
sequence and in case where it's not and it's caching on some external 
server then you'll most likely get bigger overhead from network than WAL 
anyway...

>
>>>> For the amdata handling (which is the AM's private data variable) the
>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>> reloptions so should work here also and it simplifies things a little
>>>> compared to passing pointers to pointers around and making sure
>>>> everything is allocated, etc.
>>>>
>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>> page updates of the sequence relation quite more complex that it
>>>> used to
>>>> be.
>>>
>>> It would be nice if the seqam could define exactly the columns it needs,
>>> with any datatypes. There would be a set of common attributes:
>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>> and "is_called" to that. A remote seqam that calls out to some other
>>> server might store the remote server's hostname etc.
>>>
>>> There could be a seqam function that returns a TupleDesc with the
>>> required columns, for example.
>>
>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>> each sequence AM?
>
> No, that's not what I meant. The number of catalog tables would be the
> same as today. Sequences look much like any other relation, with entries
> in pg_attribute catalog table for all the attributes for each sequence.
> Currently, all sequences have the same set of attributes, sequence_name,
> last_value and so forth. What I'm proposing is that there would a set of
> attributes that are common to all sequences, but in addition to that
> there could be any number of AM-specific attributes.

Oh, that's interesting idea, so the AM interfaces would basically return 
updated tuple and there would be some description function that returns 
tupledesc. I am bit worried that this would kill any possibility of 
ALTER SEQUENCE USING access_method. Plus I don't think it actually 
solves any real problem - serializing the internal C structs into bytea 
is not any harder than serializing them into tuple IMHO.

>
>> It also does not really solve the amdata being dynamic
>> size "issue".
>
> Yes it would. There would not be a single amdata attribute, but the AM
> could specify any number of custom attributes, which could be fixed size
> or varlen. It would be solely the AM's responsibility to set the values
> of those attributes.
>

That's not the issue I was referring to, I was talking about the page 
replacement code which is not as simple now that we have potentially 
dynamic size tuple and if tuples were different for different AMs the 
code would still have to be able to handle that case. Setting the values 
in tuple itself is not too complicated.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

05 November 2014, 17:32:47

On 11/05/2014 05:07 PM, Petr Jelinek wrote:
> On 05/11/14 13:45, Heikki Linnakangas wrote:
>> In fact, if the seqam manages the current value outside the database
>> (e.g. a "remote" seqam that gets the value from another server),
>> nextval() never needs to write a WAL record.
>
> Sure it does, you need to keep the current state in Postgres also, at
> least the current value so that you can pass correct input to
> sequence_alloc(). And you need to do this in crash-safe way so WAL is
> necessary.

Why does sequence_alloc need the current value? If it's a "remote" 
seqam, the current value is kept in the remote server, and the last 
value that was given to this PostgreSQL server is irrelevant.

That irks me with this API. The method for acquiring a new value isn't 
fully abstracted behind the AM interface, as sequence.c still needs to 
track it itself. That's useful for the local AM, of course, and maybe 
some others, but for others it's totally useless.

>>>>> For the amdata handling (which is the AM's private data variable) the
>>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>>> reloptions so should work here also and it simplifies things a little
>>>>> compared to passing pointers to pointers around and making sure
>>>>> everything is allocated, etc.
>>>>>
>>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>>> page updates of the sequence relation quite more complex that it
>>>>> used to
>>>>> be.
>>>>
>>>> It would be nice if the seqam could define exactly the columns it needs,
>>>> with any datatypes. There would be a set of common attributes:
>>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>>> and "is_called" to that. A remote seqam that calls out to some other
>>>> server might store the remote server's hostname etc.
>>>>
>>>> There could be a seqam function that returns a TupleDesc with the
>>>> required columns, for example.
>>>
>>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>>> each sequence AM?
>>
>> No, that's not what I meant. The number of catalog tables would be the
>> same as today. Sequences look much like any other relation, with entries
>> in pg_attribute catalog table for all the attributes for each sequence.
>> Currently, all sequences have the same set of attributes, sequence_name,
>> last_value and so forth. What I'm proposing is that there would a set of
>> attributes that are common to all sequences, but in addition to that
>> there could be any number of AM-specific attributes.
>
> Oh, that's interesting idea, so the AM interfaces would basically return
> updated tuple and there would be some description function that returns
> tupledesc.

Yeah, something like that.

> I am bit worried that this would kill any possibility of
> ALTER SEQUENCE USING access_method. Plus I don't think it actually
> solves any real problem - serializing the internal C structs into bytea
> is not any harder than serializing them into tuple IMHO.

I agree that serialization to bytea isn't that difficult, but it's still 
nicer to work directly with the correct data types. And it makes the 
internal state easily accessible for monitoring and debugging purposes.

>>> It also does not really solve the amdata being dynamic
>>> size "issue".
>>
>> Yes it would. There would not be a single amdata attribute, but the AM
>> could specify any number of custom attributes, which could be fixed size
>> or varlen. It would be solely the AM's responsibility to set the values
>> of those attributes.
>>
>
> That's not the issue I was referring to, I was talking about the page
> replacement code which is not as simple now that we have potentially
> dynamic size tuple and if tuples were different for different AMs the
> code would still have to be able to handle that case. Setting the values
> in tuple itself is not too complicated.

I don't see the problem with that. We deal with variable-sized tuples in 
heap pages all the time. The max size of amdata (or the extra 
AM-specific columns) is going to be determined by the block size, though.

- Heikki

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

05 November 2014, 22:43:10

On 05/11/14 18:32, Heikki Linnakangas wrote:
> On 11/05/2014 05:07 PM, Petr Jelinek wrote:
>> On 05/11/14 13:45, Heikki Linnakangas wrote:
>>> In fact, if the seqam manages the current value outside the database
>>> (e.g. a "remote" seqam that gets the value from another server),
>>> nextval() never needs to write a WAL record.
>>
>> Sure it does, you need to keep the current state in Postgres also, at
>> least the current value so that you can pass correct input to
>> sequence_alloc(). And you need to do this in crash-safe way so WAL is
>> necessary.
>
> Why does sequence_alloc need the current value? If it's a "remote"
> seqam, the current value is kept in the remote server, and the last
> value that was given to this PostgreSQL server is irrelevant.

Hmm, I am not sure if I see this usecase as practical TBH, but I also 
don't see fundamental problem with it.

> That irks me with this API. The method for acquiring a new value isn't
> fully abstracted behind the AM interface, as sequence.c still needs to
> track it itself. That's useful for the local AM, of course, and maybe
> some others, but for others it's totally useless.

Hmm, I think that kind of abstraction can only be done by passing 
current tuple and returning updated tuple (yes I realize that it's what 
you have been saying basically).

In general it sounds like the level of abstraction you'd want would be 
one where AM cares about everything except the the code that does the 
actual writes to page and WAL (but when to do those would still be 
controlled completely by AM?) and the SQL interface.
I don't see how to make that work with ALTER SEQUENCE USING to be honest 
and I do care quite a lot about that use-case (I think the ability to 
convert the "local" sequences to 3rd party ones and back is very important).

>>
>> That's not the issue I was referring to, I was talking about the page
>> replacement code which is not as simple now that we have potentially
>> dynamic size tuple and if tuples were different for different AMs the
>> code would still have to be able to handle that case. Setting the values
>> in tuple itself is not too complicated.
>
> I don't see the problem with that. We deal with variable-sized tuples in
> heap pages all the time. The max size of amdata (or the extra
> AM-specific columns) is going to be determined by the block size, though.
>

Glad to hear that. Yes the limit is block size, I think we can live with 
that at least for the moment...

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Craig Ringer

Date:

06 November 2014, 10:22:33

On 11/05/2014 05:01 AM, Petr Jelinek wrote:
> I guess I could port BDR sequences to this if it would help (once we
> have bit more solid agreement that the proposed API at least
> theoretically seems ok so that I don't have to rewrite it 10 times if at
> all possible).

Because the BDR sequences rely on all the other BDR machinery I suspect
that'd be a pretty big thing to review and follow for someone who
doesn't know the BDR code.

Do you think it'd be simple to provide a blocking, transactional
sequence allocator via this API? i.e. gapless sequences, much the same
as typically implemented with SELECT ... FOR UPDATE on a counter table.

It might be more digestible standalone, and would be a handy contrib/
example extension demonstrating use of the API.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Sequence Access Method WIP

From

Alvaro Herrera

Date:

07 November 2014, 15:48:38

Craig Ringer wrote:
> On 11/05/2014 05:01 AM, Petr Jelinek wrote:
> > I guess I could port BDR sequences to this if it would help (once we
> > have bit more solid agreement that the proposed API at least
> > theoretically seems ok so that I don't have to rewrite it 10 times if at
> > all possible).
> 
> Because the BDR sequences rely on all the other BDR machinery I suspect
> that'd be a pretty big thing to review and follow for someone who
> doesn't know the BDR code.
> 
> Do you think it'd be simple to provide a blocking, transactional
> sequence allocator via this API? i.e. gapless sequences, much the same
> as typically implemented with SELECT ... FOR UPDATE on a counter table.

I think that would be a useful contrib module too; gapless allocation is
a frequent user need.  If we can build it in some reasonable way with the
seqam API, it'd show the API is good.

But on the other hand, since BDR sequences are the ultimate goal of this
patch, we need to make sure that they can be made to work with whatever
the seqam API ends up being.  It's not necessary for reviewers here to
review that code, but the BDR developers need to say "yes, this API
works for us" -- just like the slony developers (well, ssinger) did for
logical decoding.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

07 November 2014, 16:35:23

On 06/11/14 11:22, Craig Ringer wrote:
> On 11/05/2014 05:01 AM, Petr Jelinek wrote:
>> I guess I could port BDR sequences to this if it would help (once we
>> have bit more solid agreement that the proposed API at least
>> theoretically seems ok so that I don't have to rewrite it 10 times if at
>> all possible).
>
> Because the BDR sequences rely on all the other BDR machinery I suspect
> that'd be a pretty big thing to review and follow for someone who
> doesn't know the BDR code.
>
> Do you think it'd be simple to provide a blocking, transactional
> sequence allocator via this API? i.e. gapless sequences, much the same
> as typically implemented with SELECT ... FOR UPDATE on a counter table.
>
> It might be more digestible standalone, and would be a handy contrib/
> example extension demonstrating use of the API.
>

Yes I think that's doable (once we iron out the API we can agree on).


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Robert Haas

Date:

07 November 2014, 23:45:53

On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that
use-case(I think the ability to convert the "local" sequences to 3rd party ones and back is very important). 

What specific problems do you foresee?  There's an issue if something depends on one of the added sequence columns, but
ifthat is the case then you had *better* fail. 

I think that the debugability value of making extra sequence columns human-readable is quite high.

...Robert

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

07 November 2014, 23:57:51

On 08/11/14 00:45, Robert Haas wrote:
> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that
use-case(I think the ability to convert the "local" sequences to 3rd party ones and back is very important).
 
>
> What specific problems do you foresee?  There's an issue if something depends on one of the added sequence columns,
butif that is the case then you had *better* fail.
 
>
> I think that the debugability value of making extra sequence columns human-readable is quite high.
>

My main problem is actually not with having tuple per-seqAM, but more 
with the fact that Heikki does not want to have last_value as compulsory 
column/parameter. How is the new AM then supposed to know where to pick 
up and if it even can pick up?

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

08 November 2014, 00:26:57

On 08/11/14 00:57, Petr Jelinek wrote:
> On 08/11/14 00:45, Robert Haas wrote:
>> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>> I don't see how to make that work with ALTER SEQUENCE USING to be
>>> honest and I do care quite a lot about that use-case (I think the
>>> ability to convert the "local" sequences to 3rd party ones and back
>>> is very important).
>>
>> What specific problems do you foresee?  There's an issue if something
>> depends on one of the added sequence columns, but if that is the case
>> then you had *better* fail.
>>
>> I think that the debugability value of making extra sequence columns
>> human-readable is quite high.
>>
>
> My main problem is actually not with having tuple per-seqAM, but more
> with the fact that Heikki does not want to have last_value as compulsory
> column/parameter. How is the new AM then supposed to know where to pick
> up and if it even can pick up?
>

And obviously, once the last_value is part of the compulsory columns we 
again have to WAL log all the time for the use-case which Heikki is 
using as model, so it does not help there (just to clear what my point 
was about).


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Robert Haas

Date:

08 November 2014, 02:10:46

On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> My main problem is actually not with having tuple per-seqAM, but more
>> with the fact that Heikki does not want to have last_value as compulsory
>> column/parameter. How is the new AM then supposed to know where to pick
>> up and if it even can pick up?

That seems pretty well impossible to know anyway.  If the pluggable AM
was handing out values at random or in some unpredictable fashion,
there may be no well-defined point where it's safe for the default AM
to resume.  Granted, in the case of replication, it probably is
possible, and maybe that's important enough to be worth catering to.

> And obviously, once the last_value is part of the compulsory columns we
> again have to WAL log all the time for the use-case which Heikki is using as
> model, so it does not help there (just to clear what my point was about).

But I don't know what to do about that.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sequence Access Method WIP

From

Craig Ringer

Date:

08 November 2014, 11:15:25

On 11/08/2014 12:35 AM, Petr Jelinek wrote:
>>
>> Do you think it'd be simple to provide a blocking, transactional
>> sequence allocator via this API? i.e. gapless sequences, much the same
>> as typically implemented with SELECT ... FOR UPDATE on a counter table.
>>
>> It might be more digestible standalone, and would be a handy contrib/
>> example extension demonstrating use of the API.
>>
> 
> Yes I think that's doable (once we iron out the API we can agree on).

Cool; I think that'd be an interesting self-contained example that'd
also have real-world uses ... and provide a nice succinct way to answer
those semi-regular "how do I create a sequence that doesn't have holes"
questions.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

08 November 2014, 11:17:40

On 08/11/14 03:10, Robert Haas wrote:
> On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>> My main problem is actually not with having tuple per-seqAM, but more
>>> with the fact that Heikki does not want to have last_value as compulsory
>>> column/parameter. How is the new AM then supposed to know where to pick
>>> up and if it even can pick up?
>
> That seems pretty well impossible to know anyway.  If the pluggable AM
> was handing out values at random or in some unpredictable fashion,
> there may be no well-defined point where it's safe for the default AM
> to resume.  Granted, in the case of replication, it probably is
> possible, and maybe that's important enough to be worth catering to.

While this is true, I think 99% of this use-case in practice is about 
converting existing schema with "local" sequence to some other sequence 
and perhaps fixing schema after people create sequence using wrong AM 
because their default is not what they thought it is.

>
>> And obviously, once the last_value is part of the compulsory columns we
>> again have to WAL log all the time for the use-case which Heikki is using as
>> model, so it does not help there (just to clear what my point was about).
>
> But I don't know what to do about that.  :-(
>

Me neither and I don't think this is actually solvable, we either have 
last_value and logcnt as mandatory columns and the central sequence 
server example Heikki is talking about will be impacted by this, or we 
leave those columns to AM implementations and we lose the ability to do 
AM change for majority of cases, and also IMHO most AMs will then have 
to implement their own last_value and logcnt logic anyway.

Honestly, I am still not convinced that the centralized sequence server 
with no local caching is something we should be optimizing for as that 
one will suffer from performance problems anyway. And it can ignore the 
last_value input from postgres if it choses to, so it's not like the 
currently proposed patch forbids implementation of such AMs.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Simon Riggs

Date:

08 November 2014, 14:22:08

On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> Why does sequence_alloc need the current value? If it's a "remote" seqam,
> the current value is kept in the remote server, and the last value that was
> given to this PostgreSQL server is irrelevant.
>
> That irks me with this API. The method for acquiring a new value isn't fully
> abstracted behind the AM interface, as sequence.c still needs to track it
> itself. That's useful for the local AM, of course, and maybe some others,
> but for others it's totally useless.

Please bear in mind what seqam is for...

At present it's only use is to provide Global Sequences. There's a few
ways of doing that, but they all look sorta similar.

The only other use we thought of was shot down, so producing the
perfect API isn't likely to help anyone. It's really not worth the
effort to produce a better API.

BDR doesn't require Global Sequences, nor are Global Sequences
restricted in their use to just BDR - lots of cluster configurations
would want something like this.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Sequence Access Method WIP

From

Robert Haas

Date:

08 November 2014, 15:46:31

On Sat, Nov 8, 2014 at 6:17 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> Honestly, I am still not convinced that the centralized sequence server with
> no local caching is something we should be optimizing for as that one will
> suffer from performance problems anyway. And it can ignore the last_value
> input from postgres if it choses to, so it's not like the currently proposed
> patch forbids implementation of such AMs.

I can buy that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

09 November 2014, 19:48:32

On 11/08/2014 01:57 AM, Petr Jelinek wrote:
> My main problem is actually not with having tuple per-seqAM, but more
> with the fact that Heikki does not want to have last_value as compulsory
> column/parameter. How is the new AM then supposed to know where to pick
> up and if it even can pick up?

Call nextval(), and use the value it returns as the starting point for 
the new AM.

- Heikki

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

10 November 2014, 00:39:20

On 09/11/14 20:47, Heikki Linnakangas wrote:
> On 11/08/2014 01:57 AM, Petr Jelinek wrote:
>> My main problem is actually not with having tuple per-seqAM, but more
>> with the fact that Heikki does not want to have last_value as compulsory
>> column/parameter. How is the new AM then supposed to know where to pick
>> up and if it even can pick up?
>
> Call nextval(), and use the value it returns as the starting point for
> the new AM.
>

Hah, so simple, works for me.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

24 November 2014, 11:17:03

On 11/08/2014 04:21 PM, Simon Riggs wrote:
> On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>> Why does sequence_alloc need the current value? If it's a "remote" seqam,
>> the current value is kept in the remote server, and the last value that was
>> given to this PostgreSQL server is irrelevant.
>>
>> That irks me with this API. The method for acquiring a new value isn't fully
>> abstracted behind the AM interface, as sequence.c still needs to track it
>> itself. That's useful for the local AM, of course, and maybe some others,
>> but for others it's totally useless.
>
> Please bear in mind what seqam is for...
>
> At present it's only use is to provide Global Sequences. There's a few
> ways of doing that, but they all look sorta similar.

Ok.

> The only other use we thought of was shot down, so producing the
> perfect API isn't likely to help anyone. It's really not worth the
> effort to produce a better API.

Your argument seems to be that because this API is only used for 
creating global sequences, it's not worth it to make it better for that 
purpose. That doesn't make much sense, so that's probably not what you 
meant. I'm confused.

To be clear: I don't think this API is very good for its stated purpose, 
for implementing global sequences for use in a cluster. For the reasons 
I've mentioned before.  I'd like to see two changes to this proposal:

1. Make the AM implementation solely responsible for remembering the 
"last value". (if it's a global or remote sequence, the current value 
might not be stored in the local server at all)

2. Instead of the single amdata field, make it possible for the 
implementation to define any number of fields with any datatype in the 
tuple. That would make debugging, monitoring etc. easier.

> BDR doesn't require Global Sequences, nor are Global Sequences
> restricted in their use to just BDR - lots of cluster configurations
> would want something like this.

Sure. (I don't see how that's relevant to this discussion, however).

(marking this as "Returned with Feedback" in the commitfest)

- Heikki

Re: Sequence Access Method WIP

From

Andres Freund

Date:

02 December 2014, 19:21:34

On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
> To be clear: I don't think this API is very good for its stated purpose, for
> implementing global sequences for use in a cluster. For the reasons I've
> mentioned before.  I'd like to see two changes to this proposal:
> 
> 1. Make the AM implementation solely responsible for remembering the "last
> value". (if it's a global or remote sequence, the current value might not be
> stored in the local server at all)

I think that reason isn't particularly good. The practical applicability
for such a implementation doesn't seem to be particularly large.

> 2. Instead of the single amdata field, make it possible for the
> implementation to define any number of fields with any datatype in the
> tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence. Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

José Luis Tallón

Date:

03 December 2014, 10:00:03

On 12/02/2014 08:21 PM, Andres Freund wrote:
> [snip]
>> 2. Instead of the single amdata field, make it possible for the
>> implementation to define any number of fields with any datatype in the
>> tuple. That would make debugging, monitoring etc. easier.
> My main problem with that approach is that it pretty much nails the door
> shut for moving sequences into a catalog table instead of the current,
> pretty insane, approach of a physical file per sequence.

Hmm...  having done my fair bit of testing, I can say that this isn't 
actually that bad (though depends heavily on the underlying filesystem 
and workload, of course)
With this approach, I fear extreme I/O contention with an update-heavy 
workload... unless all sequence activity is finally WAL-logged and hence 
writes to the actual files become mostly sequential and asynchronous.

May I possibly suggest a file-per-schema model instead? This approach 
would certainly solve the excessive i-node consumption problem that --I 
guess-- Andres is trying to address here.
Moreover, the "one file per schema for sequences" solution would fit a 
quite common model of grouping tables (in schemas) for physical 
[tablespace] location purposes....
> Currently, with
> our without seqam, it'd not be all that hard to force it into a catalog,
> taking care to to force each tuple onto a separate page...

IMHO, this is jst as wasteful as the current approach (one-page file per 
sequence) in terms of disk usage and complicates the code a bit .... but 
I really don't see how we can have more than one sequence state per page 
without severe (page) locking problems.
However, someone with deeper knowledge of page pinning and buffer 
manager internals could certainly devise a better solution...

Just my 2c

Thanks,
    / J.L.

Re: Sequence Access Method WIP

From

Andres Freund

Date:

03 December 2014, 10:24:35

On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
> On 12/02/2014 08:21 PM, Andres Freund wrote:
> >[snip]
> >>2. Instead of the single amdata field, make it possible for the
> >>implementation to define any number of fields with any datatype in the
> >>tuple. That would make debugging, monitoring etc. easier.
> >My main problem with that approach is that it pretty much nails the door
> >shut for moving sequences into a catalog table instead of the current,
> >pretty insane, approach of a physical file per sequence.
> 
> Hmm...  having done my fair bit of testing, I can say that this isn't
> actually that bad (though depends heavily on the underlying filesystem and
> workload, of course)
> With this approach, I fear extreme I/O contention with an update-heavy
> workload... unless all sequence activity is finally WAL-logged and hence
> writes to the actual files become mostly sequential and asynchronous.

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.

> May I possibly suggest a file-per-schema model instead? This approach would
> certainly solve the excessive i-node consumption problem that --I guess--
> Andres is trying to address here.

I don't think that really has any advantages.

> >Currently, with
> >our without seqam, it'd not be all that hard to force it into a catalog,
> >taking care to to force each tuple onto a separate page...
> 
> IMHO, this is jst as wasteful as the current approach (one-page file per
> sequence) in terms of disk usage and complicates the code a bit .... but I
> really don't see how we can have more than one sequence state per page
> without severe (page) locking problems.

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...

> However, someone with deeper knowledge of page pinning and buffer manager
> internals could certainly devise a better solution...

I think there's pretty much no chance of accepting more than one page
per

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

03 December 2014, 13:35:47

On 02/12/14 20:21, Andres Freund wrote:
> On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
>> To be clear: I don't think this API is very good for its stated purpose, for
>> implementing global sequences for use in a cluster. For the reasons I've
>> mentioned before.  I'd like to see two changes to this proposal:
>> ...
>> 2. Instead of the single amdata field, make it possible for the
>> implementation to define any number of fields with any datatype in the
>> tuple. That would make debugging, monitoring etc. easier.
>
> My main problem with that approach is that it pretty much nails the door
> shut for moving sequences into a catalog table instead of the current,
> pretty insane, approach of a physical file per sequence. Currently, with
> our without seqam, it'd not be all that hard to force it into a catalog,
> taking care to to force each tuple onto a separate page...
>

I don't know, I think if we decide to change storage format we can do 
serialization/conversion in seqam layer, it does not seem to matter too 
much if the serialization into some opaque type is done in AM itself or 
by the API layer. Or we can have one relation for all sequences in 
single AM, etc. In general I don't think that the custom columns for AM 
approach prohibits future storage changes.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

José Luis Tallón

Date:

03 December 2014, 14:50:13

On 12/03/2014 11:24 AM, Andres Freund wrote:
> On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
>> snip]
> I don't think the WAL logging would need to change much in comparison to
> the current solution. We'd just add the page number to the WAL record.
>
> The biggest advantage would be to require fewer heavyweight locks,
> because the pg_sequence one could be a single fastpath lock. Currently
> we have to take the sequence's relation lock (heavyweight) and then the
> the page level lock (lwlock) for every single sequence used.

Got it, thank you for the explanation.

>> May I possibly suggest a file-per-schema model instead? This approach would
>> certainly solve the excessive i-node consumption problem that --I guess--
>> Andres is trying to address here.
> I don't think that really has any advantages.

Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have 
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences.... save 
for *massively* multi-tenant databases.

The downside being that all that random I/O --- in general, it can't 
really be sequential unless there are very very few sequences--- can't 
be spread to other spindles. Create a "sequence_default_tablespace" GUC 
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe? (I could take a shot at the patch, if deemed
worthwhile)

>> [snip]
> The overhead of a file is much more than wasting the remainder of a
> page. Alone the requirement of separate fsyncs and everything is pretty
> bothersome. The generated IO patterns are also much worse...

Yes, you are right. I stand corrected.

> [snip]
> I think there's pretty much no chance of accepting more than one page
> per sequence

Definitively.


Thanks,
    J.L.

Re: Sequence Access Method WIP

From

Jim Nasby

Date:

04 December 2014, 16:33:48

On 12/3/14, 8:50 AM, José Luis Tallón wrote:
>
>>> May I possibly suggest a file-per-schema model instead? This approach would
>>> certainly solve the excessive i-node consumption problem that --I guess--
>>> Andres is trying to address here.
>> I don't think that really has any advantages.
>
> Just spreading the I/O load, nothing more, it seems:
>
> Just to elaborate a bit on the reasoning, for completeness' sake:
> Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k sequences per relation segment.
> Arguably, not many real use cases will have that many sequences.... save for *massively* multi-tenant databases.
>
> The downside being that all that random I/O --- in general, it can't really be sequential unless there are very very
fewsequences--- can't be spread to other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE SET
TABLESPACE,to use an SSD for this purpose maybe?
 

Why not? RAID arrays typically use stripe sizes in the 128-256k range, which means only 16 or 32 sequences per stripe.

It still might make sense to allow controlling what tablespace a sequence is in, but IMHO the default should just be
pg_default.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Sequence Access Method WIP

From

Andres Freund

Date:

04 December 2014, 18:01:49

> >>May I possibly suggest a file-per-schema model instead? This approach would
> >>certainly solve the excessive i-node consumption problem that --I guess--
> >>Andres is trying to address here.
> >I don't think that really has any advantages.
> 
> Just spreading the I/O load, nothing more, it seems:
> 
> Just to elaborate a bit on the reasoning, for completeness' sake:
> Given that a relation's segment maximum size is 1GB, we'd have
> (1048576/8)=128k sequences per relation segment.
> Arguably, not many real use cases will have that many sequences.... save for
> *massively* multi-tenant databases.
> 
> The downside being that all that random I/O --- in general, it can't really
> be sequential unless there are very very few sequences--- can't be spread to
> other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE
> SET TABLESPACE, to use an SSD for this purpose maybe?
>  (I could take a shot at the patch, if deemed worthwhile)

I think that's just so far outside the sane usecases that I really don't
see us adding complexity to reign it in. If your frequently used
sequences get flushed out to disk by anything but the checkpointer the
schema is just badly designed...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

10 December 2014, 02:34:20

On 24/11/14 12:16, Heikki Linnakangas wrote:
> I'd like to see two changes to this proposal:
>
> 1. Make the AM implementation solely responsible for remembering the
> "last value". (if it's a global or remote sequence, the current value
> might not be stored in the local server at all)
>
> 2. Instead of the single amdata field, make it possible for the
> implementation to define any number of fields with any datatype in the
> tuple. That would make debugging, monitoring etc. easier.
>

Hi,

here is the new version for next CF, there are several rough edges (more
about that later) but I think it's ready for more feedback.

I did change the API quite considerably (rewrite is the word I believe).

Instead of trying to put smarts on one or the other side of the API, I
basically created 2 APIs, one is the seqam which is responsible for
maintaining the sequence data and the other is sequence storage API
provided by postgres backend.

This means that backend cares purely about storage, and local caching
while AM cares about the data inside sequence, and decides when to WAL.

I'll start with storage API description:
- sequence_open(Oid) - open sequence with given Oid and returns
SequenceHandle (opens relation in share lock only)

- sequence_read_tuple(SequenceHandle) - returns heap tuple of the
sequence, this will also do the buffer locking

- sequence_save_tuple(SequenceHandle, newtuple, do_wal) - updates the
buffer with new sequence, optionally does WAL logging, newtuple can be
NULL in which case the tuple is not replaced (this is performance
optimization for sequences that don't need varlena/nullable columns and
can update everything inplace)

- sequence_release_tuple(SequenceHandle) - unlocks previously locked
buffer - usable for synchronization inside the seqam implementation, the
gapless sequence uses this

- sequence_close(SequenceHandle) - closes the sequence handle, also
unlocks buffer if needed

- sequence_needs_wal(SequenceHandle) - returns true if sequence needs
wal, that is if the last wal write happened before last checkpoint -
this is a little bit low level but I don't see a way around it if the
local sequence is supposed to maintain last_value and logcnt itself,
also we have similar function for relations (that checks for different
things but naming and usage is similar) so it's probably not too bad

The SequenceHandle is opaque struct so no gory details seen by the AM.

Now for the actual sequence AM API:
- seqam_extra_columns(Oid) - returns list of custom columns needed by
specified sequenceAM

- seqam_init - Initializes the columns for the custom sequenceAM, used
when creating, altering or resetting sequence. It gets sequenceam Oid,
sequence options, reloptions on input and returns updated values and
nulls arrays.

- seqam_alloc - Allocates new sequence values. It gets sequence
relation, handle and how many new values the backend wants and returns
first and last allocated value.

- seqam_setval - implements setval() support

- seqam_dump_state - pg_dump support
- seqam_restore_state - pg_dump support


The API is slightly bigger, but it also includes proper support for
dumping and restoring via pg_dump and seems somewhat cleaner, plus the
synchronous and asynchronous APIs are now same and no callbacks anywhere
(if AM needs to do something asynchronously it just calls the storage API).

There is also gapless sequence contrib module included as separate
patch, it's missing proper locking during dump/restore atm but otherwise
should be fully working (see tests). I am also using it for testing the
ALTER TABLE USING.

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly
- I removed the setval3 since there is no guarantee that sequence will
have is_called parameter but there is now proper pg_dump support so it's
not needed for pg_dump usecase, but it's still incompatible change
- ResetSequence basically does ALTER SEQUENCE RESET internally instead
calling setval now because of the above
- there are no docs for the C APIs yet - I don't even know where those
should be placed btw, would appreciate suggestions (I have same problem
with committs API docs)

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

15 December 2014, 10:37:13

On 10/12/14 03:33, Petr Jelinek wrote:
> On 24/11/14 12:16, Heikki Linnakangas wrote:
>
> About the rough edges:
> - The AlterSequence is not prettiest code around as we now have to
> create new relation when sequence AM is changed and I don't know how to
> do that nicely
> - I am not sure if I did the locking, command order and dependency
> handling in AlterSequence correcly

This version does AlterSequence differently and better. Didn't attach
the gapless sequence again as that one is unchanged.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-seqam-v5.patch

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

12 January 2015, 21:34:08

On 15/12/14 11:36, Petr Jelinek wrote:
> On 10/12/14 03:33, Petr Jelinek wrote:
>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>
>> About the rough edges:
>> - The AlterSequence is not prettiest code around as we now have to
>> create new relation when sequence AM is changed and I don't know how to
>> do that nicely
>> - I am not sure if I did the locking, command order and dependency
>> handling in AlterSequence correcly
>
> This version does AlterSequence differently and better. Didn't attach
> the gapless sequence again as that one is unchanged.
>
>

And another version, separated into patch-set of 3 patches.
First patch contains the seqam patch itself, not many changes there,
mainly docs/comments related. What I wrote in the previous email for
version 3 still applies.

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.

And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

On 22/01/15 17:02, Petr Jelinek wrote:
>
> The new version (the one that is not submitted yet) of gapless sequence
> is way more ugly and probably not best example either but does guarantee
> gaplessness (it stores the last value in it's own value table). So I am
> not sure if it should be included either...
>

Here it is as promised.

Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.

- Custom columns are now stored in the catalog instead of generated by
some special interface. This makes it possible to record dependencies on
types and also seems in line with how other similar things are done.

- Removed couple of pallocs from the nextval codepath which improved
performance considerably.

- Changed how the ALTER SEQUENCE ... USING behaves - when RESET WITH
value is specified the new sequenceAM will use that one for starting, if
it's not specified the nextval from original sequenceAM will be used as
RESET value. In general handling of the RESET parameter is left to the AMs.

- Unified the reloptions for index and sequences into single function.

- Re-added setval3 for compatibility reasons - it only works on local
sequences, otherwise it throws error.

- Made the 'local' default value for the GUC instead of special handling
of the empty string.


Notable missing things:
- Docs for pg_seqam.

- pg_dump support for pg_seqam. It's embarrassing but I didn't think
about this until Robert mentioned it, I am apparently too used to handle
these things using extensions.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Hi,

sending new version that is updated along the lines of what we discussed
at FOSDEM, which means:

- back to single bytea amdata column (no custom columns)

- the dump/restore interfaces were changed to produce/accept array of
key/value pairs

- psql shows the output of dump interface in verbose mode (this makes it
easier to inspect the state)

- last_value and is_called are always present columns even if the
sequence am does not care for them (this helps with backwards
compatibility with various admin tools)

- pg_dump uses new interface for dumping only for non-local sequences so
that dumps of schemas which are not using custom seqams are restorable
to older postgres

- the default_sequenceam was changed to serial_sequenceam and only
applies to serial/bigserial columns - I would personally prefer to have
the old default but default for serial is better than nothing

- added description of pg_seqam catalog to docs

- gapless_seq still uses table internally to support snapshots for pg_dump

- pg_dump support for dumping the sequence ams


It would be nice to also have something along the lines of chapter 55.2.
(Index Access Method Functions), I plan to send that as additional patch
in few days.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Slightly updated version of the patch.

Mainly rebased against current master (there were several conflicts) and
fixed some typos, no real functional change.

I also attached initial version of the API sgml doc.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

On 03/15/2015 09:07 PM, Petr Jelinek wrote:
> Slightly updated version of the patch.
>
> Mainly rebased against current master (there were several conflicts) and
> fixed some typos, no real functional change.
>
> I also attached initial version of the API sgml doc.

I finally got around to have another round of review on this. I fixed a
couple of little bugs, did some minor edition on comments etc. See
attached. It is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
if you want to look at individual changes. It combines your patches 1
and 4, I think those need to be applied together. I haven't looked at
the DDL changes yet.

I'm fairly happy with the alloc API now. I'm not sure it's a good idea
for the AM to access the sequence tuple directly, though. I would seem
cleaner if it only manipulated the amdata Datum. But it's not too bad as
it is.

We went back and forth on whether 'amdata' should be a single column or
multiple columns, but I guess the single bytea column was the consensus
in the end.

The division of labour between sequence.c and the AM, in the init and
the get/set_state functions, is a bit more foggy:

* Do we really need a separate amoptions() method and an init() method,
when the only caller to amoptions() is just before the init() method?
The changes in extractRelOptions suggest that it would call
am_reloptions for sequences too, but that doesn't actually seem to be
happening.

postgres=# create sequence fooseq using local with (garbage=option);
CREATE SEQUENCE

Invalid options probably should throw an error.

* Currently, the AM's init function is responsible for basic sanity
checking, like min < max. It also has to extract the RESTART value from
the list of options. I think it would make more sense to move that to
sequence.c. The AM should only be responsible for initializing the
'amdata' portion, and for handling any AM-specific options. If the AM
doesn't support some options, like MIN and MAX value, it can check them
and throw an error, but it shouldn't be responsible for just passing
them through from the arguments to the sequence tuple.

* It might be better to form the sequence tuple before calling the init
function, and pass the ready-made tuple to it. All the other API
functions deal with the tuple (after calling sequence_read_tuple), so
that would be more consistent. The init function would need to
deconstruct it to modify it, but we're not concerned about performance here.

* The transformations of the arrays in get_state() and set_state()
functions are a bit complicated. The seqam_get_state() function returns
two C arrays, and pg_sequence_get_state() turns them into a text[]
array. Why not construct the text[] array directly in the AM? I guess
it's a bit more convenient to the AM, if the pg_sequence_get_state() do
that, but if that's an issue, you could create generic helper function
to turn two C string arrays into text[], and call that from the AM.

>     seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>
>     for (i = 0; i < count; i++)
>     {
>         if (pg_strcasecmp(keys[i], "last_value") == 0)
>             seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>                                                 CStringGetDatum(values[i])));
>         else if (pg_strcasecmp(keys[i], "is_called") == 0)
>             seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>                                                 CStringGetDatum(values[i])));
>         else
>             ereport(ERROR,
>                     (errcode(ERRCODE_SYNTAX_ERROR),
>                      errmsg("invalid state key \"%s\" for local sequence",
>                             keys[i])));
>     }
>
>     sequence_save_tuple(seqh, NULL, true);

If that error happens after having already processed a "last_value" or
"is_called" entry, you have already modified the on-disk tuple. AFAICS
that's the only instance of that bug, but sequence_read_tuple - modify
tuple in place - sequence_save_tuple pattern is quite unsafe in general.
If you modify a tuple directly in a Buffer, you should have a critical
section around it. It would make sense to start a critical section in
sequence_read_tuple(), except that not all callers want to modify the
tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
functions should be split into two:

sequence_read_tuple_for_update()
sequence_save_tuple()

* seqam_local_get_state() calls sequence_read_tuple() but not
sequence_release_tuple(). It looks like sequence_close() releases the
tuple and the buffer, but that seems sloppy. sequence_close() probably
shouldn't be calling sequence_release_tuple at all, so that you'd get a
warning at the end of transaction about the buffer leak.

- Heikki

Attachment

seqam-v8+api-doc-heikki.patch

Re: Sequence Access Method WIP

From

Alvaro Herrera

Date:

20 April 2015, 16:22:31

Heikki Linnakangas wrote:

> * The transformations of the arrays in get_state() and set_state() functions
> are a bit complicated. The seqam_get_state() function returns two C arrays,
> and pg_sequence_get_state() turns them into a text[] array. Why not
> construct the text[] array directly in the AM? I guess it's a bit more
> convenient to the AM, if the pg_sequence_get_state() do that, but if that's
> an issue, you could create generic helper function to turn two C string
> arrays into text[], and call that from the AM.

Um, see strlist_to_textarray() in objectaddress.c if you do that.  Maybe
we need some generic place to store that kind of helper function.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

22 April 2015, 20:02:07

On 20/04/15 17:50, Heikki Linnakangas wrote:
> On 03/15/2015 09:07 PM, Petr Jelinek wrote:
>> Slightly updated version of the patch.
>>
>> Mainly rebased against current master (there were several conflicts) and
>> fixed some typos, no real functional change.
>>
>> I also attached initial version of the API sgml doc.
>
> I finally got around to have another round of review on this. I fixed a
> couple of little bugs, did some minor edition on comments etc. See
> attached. It is also available in my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
> if you want to look at individual changes. It combines your patches 1
> and 4, I think those need to be applied together. I haven't looked at
> the DDL changes yet.

Thanks!

>
> I'm fairly happy with the alloc API now. I'm not sure it's a good idea
> for the AM to access the sequence tuple directly, though. I would seem
> cleaner if it only manipulated the amdata Datum. But it's not too bad as
> it is.

Yeah, I was thinking about this myself I just liked sending 10 
parameters to the function less than this.

>
> The division of labour between sequence.c and the AM, in the init and
> the get/set_state functions, is a bit more foggy:
>
> * Do we really need a separate amoptions() method and an init() method,
> when the only caller to amoptions() is just before the init() method?
> The changes in extractRelOptions suggest that it would call
> am_reloptions for sequences too, but that doesn't actually seem to be
> happening.

Hmm yes it should and I am sure it did at some point, must have messed 
it during one of the rebases :(

And it's the reason why we need separate API function.

>
> postgres=# create sequence fooseq using local with (garbage=option);
> CREATE SEQUENCE
>
> Invalid options probably should throw an error.
>
> * Currently, the AM's init function is responsible for basic sanity
> checking, like min < max. It also has to extract the RESTART value from
> the list of options. I think it would make more sense to move that to
> sequence.c. The AM should only be responsible for initializing the
> 'amdata' portion, and for handling any AM-specific options. If the AM
> doesn't support some options, like MIN and MAX value, it can check them
> and throw an error, but it shouldn't be responsible for just passing
> them through from the arguments to the sequence tuple.

Well then we need to send restart as additional parameter to the init 
function as restart is normally not stored anywhere.

The checking is actually if new value is withing min/max but yes that 
can be done inside sequence.c I guess.

>
> * It might be better to form the sequence tuple before calling the init
> function, and pass the ready-made tuple to it. All the other API
> functions deal with the tuple (after calling sequence_read_tuple), so
> that would be more consistent. The init function would need to
> deconstruct it to modify it, but we're not concerned about performance
> here.

Right, this is actually more of a relic of the custom columns 
implementation where I didn't want to build the tuple with NULLs for 
columns that might have been specified as NOT NULL, but with the amdata 
approach it can create the tuple safely.

>
> * The transformations of the arrays in get_state() and set_state()
> functions are a bit complicated. The seqam_get_state() function returns
> two C arrays, and pg_sequence_get_state() turns them into a text[]
> array. Why not construct the text[] array directly in the AM? I guess
> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
> that, but if that's an issue, you could create generic helper function
> to turn two C string arrays into text[], and call that from the AM.

Yeah that was exactly the reasoning. Helper function works for me (will 
check what Álvaro's suggested, maybe it can be moved somewhere and reused).

>
>>     seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>
>>     for (i = 0; i < count; i++)
>>     {
>>         if (pg_strcasecmp(keys[i], "last_value") == 0)
>>             seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>
>> CStringGetDatum(values[i])));
>>         else if (pg_strcasecmp(keys[i], "is_called") == 0)
>>             seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>
>> CStringGetDatum(values[i])));
>>         else
>>             ereport(ERROR,
>>                     (errcode(ERRCODE_SYNTAX_ERROR),
>>                      errmsg("invalid state key \"%s\" for local
>> sequence",
>>                             keys[i])));
>>     }
>>
>>     sequence_save_tuple(seqh, NULL, true);
>
> If that error happens after having already processed a "last_value" or
> "is_called" entry, you have already modified the on-disk tuple. AFAICS
> that's the only instance of that bug, but sequence_read_tuple - modify
> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
> If you modify a tuple directly in a Buffer, you should have a critical
> section around it. It would make sense to start a critical section in
> sequence_read_tuple(), except that not all callers want to modify the
> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
> functions should be split into two:
>
> sequence_read_tuple_for_update()
> sequence_save_tuple()

Agreed, however I am much more concerned about the way 
seqam_local_alloc() works, see the XXX comment there. I am thinking it's 
not really safe that way, but problem is that we can't put critical 
section around it currently as the sequence_save_tuple() can potentially 
call GetTopTransactionId(). Separating it into more functions might be 
solution. Except the split into two does not really help there, it would 
have to be something like this:
sequence_tuple_update_start() - does the GetTopTransactionId bit and 
starts critical section
sequence_tuple_save() - saves the tuple/does wal
sequence_tuple_update_finish() - ends the critical section

That looks slightly cumbersome but I don't really have better idea.

>
> * seqam_local_get_state() calls sequence_read_tuple() but not
> sequence_release_tuple(). It looks like sequence_close() releases the
> tuple and the buffer, but that seems sloppy. sequence_close() probably
> shouldn't be calling sequence_release_tuple at all, so that you'd get a
> warning at the end of transaction about the buffer leak.
>

Do you mean that you want to make call to sequence_release_tuple() 
mandatory when sequence_read_tuple() was called?


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

27 April 2015, 15:50:01

On 22/04/15 22:01, Petr Jelinek wrote:
> On 20/04/15 17:50, Heikki Linnakangas wrote:
>>>
>> I finally got around to have another round of review on this. I fixed a
>> couple of little bugs, did some minor edition on comments etc. See
>> attached. It is also available in my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
>> if you want to look at individual changes. It combines your patches 1
>> and 4, I think those need to be applied together. I haven't looked at
>> the DDL changes yet.
>
> Thanks!

I merged all your changes and merged the patch#1 with patch#4.

Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam
and gave you (Heikki) commit rights there in case you want to change
anything.

>> The division of labour between sequence.c and the AM, in the init and
>> the get/set_state functions, is a bit more foggy:
>>
>> * Do we really need a separate amoptions() method and an init() method,
>> when the only caller to amoptions() is just before the init() method?
>> The changes in extractRelOptions suggest that it would call
>> am_reloptions for sequences too, but that doesn't actually seem to be
>> happening.
>
> Hmm yes it should and I am sure it did at some point, must have messed
> it during one of the rebases :(
>
> And it's the reason why we need separate API function.
>

Actually the reloption handling was broken in more than one place, I
fixed it all around. I had to teach the heap_create_with_catalog about
relam but the change does not seem to be too intrusive to me so I think
it's fine. We no longer do the update of pg_class tuple after the
sequence was created anymore and the relcache works correctly now. Also
the whole reloption handling was moved to more proper places than the
seqam_init so it's more in line with how it works for other relation kinds.

>>
>> postgres=# create sequence fooseq using local with (garbage=option);
>> CREATE SEQUENCE
>>
>> Invalid options probably should throw an error.
>>

Done, well actually the local sequence now throws error on any options.
I also updated CREATE SEQUENCE and ALTER SEQUENCE docs with the
reloptions syntax.

>> * Currently, the AM's init function is responsible for basic sanity
>> checking, like min < max. It also has to extract the RESTART value from
>> the list of options. I think it would make more sense to move that to
>> sequence.c. The AM should only be responsible for initializing the
>> 'amdata' portion, and for handling any AM-specific options. If the AM
>> doesn't support some options, like MIN and MAX value, it can check them
>> and throw an error, but it shouldn't be responsible for just passing
>> them through from the arguments to the sequence tuple.
>

I now do the checking in sequence.c where possible, the init got
generally redone to accept the newly created tuple and restart value
parameters.

One thing that I am not sure how much like is that if the AM wants to
change the amdata in the init, it now has to modify the tuple and free
the old one.

>
>>
>> * The transformations of the arrays in get_state() and set_state()
>> functions are a bit complicated. The seqam_get_state() function returns
>> two C arrays, and pg_sequence_get_state() turns them into a text[]
>> array. Why not construct the text[] array directly in the AM? I guess
>> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
>> that, but if that's an issue, you could create generic helper function
>> to turn two C string arrays into text[], and call that from the AM.
>
> Yeah that was exactly the reasoning. Helper function works for me (will
> check what Álvaro's suggested, maybe it can be moved somewhere and reused).
>

Didn't use Álvaro's code in the end as ISTM working directly with arrays
is simple enough for this use case. The only slightly ugly part is the
use of TextDatumGetCString/CStringGetDatum but I think that's survivable
(maybe the sequence.c could convert the TEXT[] to CSTRING[] or just char
*[] as this is not performance critical code path).

>>
>>>     seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>>
>>>     for (i = 0; i < count; i++)
>>>     {
>>>         if (pg_strcasecmp(keys[i], "last_value") == 0)
>>>             seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>>
>>> CStringGetDatum(values[i])));
>>>         else if (pg_strcasecmp(keys[i], "is_called") == 0)
>>>             seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>>
>>> CStringGetDatum(values[i])));
>>>         else
>>>             ereport(ERROR,
>>>                     (errcode(ERRCODE_SYNTAX_ERROR),
>>>                      errmsg("invalid state key \"%s\" for local
>>> sequence",
>>>                             keys[i])));
>>>     }
>>>
>>>     sequence_save_tuple(seqh, NULL, true);
>>
>> If that error happens after having already processed a "last_value" or
>> "is_called" entry, you have already modified the on-disk tuple. AFAICS
>> that's the only instance of that bug, but sequence_read_tuple - modify
>> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
>> If you modify a tuple directly in a Buffer, you should have a critical
>> section around it. It would make sense to start a critical section in
>> sequence_read_tuple(), except that not all callers want to modify the
>> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
>> functions should be split into two:
>>
>> sequence_read_tuple_for_update()
>> sequence_save_tuple()
>
> Agreed, however I am much more concerned about the way
> seqam_local_alloc() works, see the XXX comment there. I am thinking it's
> not really safe that way, but problem is that we can't put critical
> section around it currently as the sequence_save_tuple() can potentially
> call GetTopTransactionId(). Separating it into more functions might be
> solution. Except the split into two does not really help there, it would
> have to be something like this:
> sequence_tuple_update_start() - does the GetTopTransactionId bit and
> starts critical section
> sequence_tuple_save() - saves the tuple/does wal
> sequence_tuple_update_finish() - ends the critical section
>
> That looks slightly cumbersome but I don't really have better idea.
>

So I went with something like what was proposed:
- sequence_start_update(seqh, do_wal) - prepares the state and starts
the critical section
- sequence_apply_update(seqh, do_wal) - does buffer dirtying and wal logging
- sequence_finish_update(seqh) - closes the critical section and cleans
up the state

The API might look slightly hairy now but I don't see much difference
between having to do:
START_CRIT_SECTION()
sequence_update_tuple()
END_CRIT_SECTION()
and the above, except the above also solves the xid acquiring and shows
less internals.

I also added sequence_swap_tuple() for the use-case when the AM does not
want to do inline updates because having this separate makes things
easier and less ugly. And once you are changing whole tuple you are
already in slow/complex code path so one more function call does not matter.

>> * seqam_local_get_state() calls sequence_read_tuple() but not
>> sequence_release_tuple(). It looks like sequence_close() releases the
>> tuple and the buffer, but that seems sloppy. sequence_close() probably
>> shouldn't be calling sequence_release_tuple at all, so that you'd get a
>> warning at the end of transaction about the buffer leak.
>>

I did this (requiring the sequence_release_tuple() always) and
documented it, but personally don't like it much as it just adds one
more call to the API which already has enough of them and I really don't
see the advantage of it.

Other than things mentioned above I rebased it on current master (the
transforms commit produced several issues). And fixed several bugs like
checking min/max value in set_state, proper initialization of session
cached increment_by, etc.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From

Alvaro Herrera

Date:

13 May 2015, 04:10:17

Heikki, do you have time to go through this at this point?

Petr Jelinek wrote:

> I merged all your changes and merged the patch#1 with patch#4.
>
> Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam and
> gave you (Heikki) commit rights there in case you want to change anything.

I rebased your #1 to current master; attached.  It builds and passes
regression test, but I didn't check any further.  Doc build fails with:

$ make check
onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s
/pgsql/source/master/doc/src/sgml/postgres.sgml
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open
Makefile:320: recipe for target 'check' failed
make: *** [check] Error 1

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

seqam-v9ah.patch

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

13 May 2015, 10:56:18

On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
> Heikki, do you have time to go through this at this point?

I'm afraid I won't :-(. I did intend to, but looking at the calendar, I 
won't have the time to review this thoroughly enough to commit. Sorry.

I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax 
patch at all yet.

We discussed using a single amdata column vs. any number of am-specific 
columns. We settled on amdata, but I'm still not 100% convinced that's 
the best approach. Just as a data point, this removes the log_cnt field 
and moves it into amdata in a non-human-readable format. So for someone 
who only uses the local seqam, this just makes things slightly worse. 
For more complicated seqam's, it would be even more important to display 
the state in a human-readable format. Perhaps it's OK that each seqam 
provides its own functions or similar to do that, but I'd like to 
revisit that decision.

I still don't like the serial_sequenceam GUC. Not sure what to do 
instead. Needs some thought.

- Heikki

Re: Sequence Access Method WIP

From

Simon Riggs

Date:

13 May 2015, 11:13:08

On 13 May 2015 at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
Heikki, do you have time to go through this at this point?

I'm afraid I won't :-(. I did intend to, but looking at the calendar, I won't have the time to review this thoroughly enough to commit. Sorry.

I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax patch at all yet.

We discussed using a single amdata column vs. any number of am-specific columns. We settled on amdata, but I'm still not 100% convinced that's the best approach. Just as a data point, this removes the log_cnt field and moves it into amdata in a non-human-readable format. So for someone who only uses the local seqam, this just makes things slightly worse. For more complicated seqam's, it would be even more important to display the state in a human-readable format. Perhaps it's OK that each seqam provides its own functions or similar to do that, but I'd like to revisit that decision.

I still don't like the serial_sequenceam GUC. Not sure what to do instead. Needs some thought.

This has had around 2 years of thought at this point. I don't agree it needs more thought.

There is one clear use case for this and it is of benefit to many distributed architectures.

I don't see what calamity will occur if we commit this. If you don't want a sequence AM, don't ever use this.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

13 May 2015, 11:23:33

On 13/05/15 12:56, Heikki Linnakangas wrote:
> On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
>> Heikki, do you have time to go through this at this point?
>
> I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
> won't have the time to review this thoroughly enough to commit. Sorry.
>
> We discussed using a single amdata column vs. any number of am-specific
> columns. We settled on amdata, but I'm still not 100% convinced that's
> the best approach. Just as a data point, this removes the log_cnt field
> and moves it into amdata in a non-human-readable format. So for someone
> who only uses the local seqam, this just makes things slightly worse.
> For more complicated seqam's, it would be even more important to display
> the state in a human-readable format. Perhaps it's OK that each seqam
> provides its own functions or similar to do that, but I'd like to
> revisit that decision.
>

I do think it's ok for seqam to provide functions that can give you the 
data in human readable form.

I think main argument against custom human readable columns is that it 
will kill any possibility to have common storage for sequences.

There is also additional complexity in the API required for that.

The performance implications are probably small as one could still 
define opaque bytea column and store the data same way a now.

> I still don't like the serial_sequenceam GUC. Not sure what to do
> instead. Needs some thought.
>

I think it would be ok if this issue was solved by follow-up patch at 
later time.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

13 May 2015, 12:16:02

On 05/13/2015 02:12 PM, Simon Riggs wrote:
> This has had around 2 years of thought at this point. I don't agree it
> needs more thought.

Noted.

> There is one clear use case for this and it is of benefit to many
> distributed architectures.

Right. What's your point?

> I don't see what calamity will occur if we commit this. If you don't want a
> sequence AM, don't ever use this.

I'd like the API to be good for its purpose. Also, I did mention that 
the the current patch makes the situation slightly worse for people who 
never use this: it makes the log_cnt field non human-readable. That's a 
really minor thing, but it shows that it *does* matter how this is 
implemented, even if you only ever use the local seq AM.

- Heikki

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

13 May 2015, 13:45:56

On 13/05/15 14:15, Heikki Linnakangas wrote:
>> I don't see what calamity will occur if we commit this. If you don't
>> want a
>> sequence AM, don't ever use this.
>
> I'd like the API to be good for its purpose. Also, I did mention that
> the the current patch makes the situation slightly worse for people who
> never use this: it makes the log_cnt field non human-readable. That's a
> really minor thing, but it shows that it *does* matter how this is
> implemented, even if you only ever use the local seq AM.
>

It definitely does matter.

I don't think we'll find perfect compromise here though, you can either 
do it one way or the other. Trust me it does not make me happy either, I 
like perfect solutions too, but when there is lack of perfect solution I 
prefer the simpler one.

Both of the solutions have drawbacks - amdata has opaque blob which does not store data in user visible way 
but that can be worked around by providing function that shows it in 
human readable way (and the dump function for many sequence types 
actually does that). - multiple columns basically kill any future ability to unify the 
storage for sequences and also adds complexity, especially around alter 
table (since it means drop/add column and stuff)

But I already wrote both versions anyway so from that perspective it 
does not matter much which part we merge.

(As a side-node I would have preferred to have this discussion earlier 
than 2 days before feature freeze because the current implementation is 
something that we agreed on several months ago so there was plenty of 
time to revisit that decision.)

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

13 May 2015, 16:40:50

On 13/05/15 06:10, Alvaro Herrera wrote:
>
> I rebased your #1 to current master; attached.  It builds and passes

Thanks

> regression test, but I didn't check any further.  Doc build fails with:
>
> $ make check
> onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s
/pgsql/source/master/doc/src/sgml/postgres.sgml
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here;
missingone of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD"
start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open
> Makefile:320: recipe for target 'check' failed
> make: *** [check] Error 1
>

Missing <para>, attached fixed version with your rebase and one of the
commits from Heikki that I missed last time.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From

Vik Fearing

Date:

15 June 2015, 09:33:10

I've been looking at these patches a bit and here are some comments:

Patch 1: seqam

I would like to see an example in the docs for CREATE SEQUENCE. That's
perhaps not possible (or desirable) with only the "local" seqam? Not sure.

In the docs for pg_class, there is no mention that relam refers to
pg_seqam for sequences but pg_am for all other types.

There are errant half-sentences in the documentation, for example "to
the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in
Sequence Access Method Functions.

I'd prefer a README instead of the long comment at the start of seqam.c.The other ams do that.

As mentioned upthread, this patch isn't a seamless replacement for
what's already there because of the amdata field. I wasn't part of the
conversation of FOSDEM unfortunately, and there's not enough information
in this thread to know why this solution is preferred over each seqam
having its own table type with all the columns it needs. I see that
Heikki is waffling a bit between the two, and I have a fairly strong
opinion that amdata should be split into separate columns. The patch
already destroys and recreates what it needs when changing access method
via ALTER SEQUENCE, so I don't really see what the problem is.

There is no psql tab completion for the new USING clause in ALTER
SEQUENCE, and in looking at that I noticed we don't have tab completion
for CREATE SEQUENCE at all. I know we don't complete everything, but if
we're going to do ALTER, I think we should do CREATE. I'll submit a
patch for that on its own thread, but then this patch will need to
modify it to include USING.

There is no \d command for sequence access methods. Without querying
pg_seqam directly, how does one discover what's available?

There are some unfinished comments, such as "When relam is specified,
record dependency on the" in heap.c.

Comments and code in pg_dump.c check that the server is at least 9.5.
Those'll need to be changed to at least 9.6.

Patch 2: seqam ddl

When defining a new access method for sequences, it is possible to list
the arguments multiple times (last one wins). Other defel loops raise
an error if the argument is specified more than once. I haven't looked
at all of such loops to see if this is the only odd man out or not, but
I prefer the error behavior.

It doesn't seem possible to create a comment for a seqam, is that
intentional? Ah, after more review I see it is possible, just not
documented. I think that needs to be corrected.

Same comment as above about testing for 9.5.

Patch 3: gapless_seq

I really like the idea of having a gapless sequence in contrib.
Although it has big potential to be abused, doing it manually when it's
needed (like for invoices, at least in France) is a major pain. So big
+1 for including this.

However, the patch doesn't update the main contrib Makefile so you have
to compile it explicitly. It also doesn't have the .sgml file in the
right place, so that's not installed either.

There is a FIXME in get_last_value_tup() which should probably be
removed in light of the code comment in catcache.c stating that pg_seqam
is too small to benefit from an index scan.

It would be nice to be able to specify an access method when declaring a
serial or bigserial column. This could be a separate patch later on,
though.

On the whole, I think this is a pretty good patchset. Aside from the
design decision of whether amdata is a single opaque column or a set of
columns, there are only a few things that need to be changed before it's
ready for committer, and those things are mostly documentation.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

17 June 2015, 15:51:35

On 2015-06-15 11:32, Vik Fearing wrote:
> I've been looking at these patches a bit and here are some comments:
>

Thanks for looking at this.

> Patch 1: seqam
>
> I would like to see an example in the docs for CREATE SEQUENCE.  That's
> perhaps not possible (or desirable) with only the "local" seqam?  Not sure.
>

It is possible to have example with local seqam, it might be confusing 
though, given it produces same results as not putting USING in the query.

> In the docs for pg_class, there is no mention that relam refers to
> pg_seqam for sequences but pg_am for all other types.
>
> There are errant half-sentences in the documentation, for example "to
> the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in
> Sequence Access Method Functions.

I think that's the side effect of all the rebases and rewrites over the 
2y(!) that this has been going forward and back. It can be easily fixed 
by proof reading before final submission. I didn't pay too much 
attention yet because it's not clear how the docs should look like if 
there is no real agreement on the api. (This applies to other comments 
about docs as well)

>
> I'd prefer a README instead of the long comment at the start of seqam.c.
>   The other ams do that.
>

OK, since things have been moved to separate directory, README is 
doable, I personally prefer the docs in the main .c file usually but I 
know project uses README sometimes for this.

> As mentioned upthread, this patch isn't a seamless replacement for
> what's already there because of the amdata field.  I wasn't part of the
> conversation of FOSDEM unfortunately, and there's not enough information
> in this thread to know why this solution is preferred over each seqam
> having its own table type with all the columns it needs.  I see that
> Heikki is waffling a bit between the two, and I have a fairly strong
> opinion that amdata should be split into separate columns.  The patch
> already destroys and recreates what it needs when changing access method
> via ALTER SEQUENCE, so I don't really see what the problem is.
>

FOSDEM was just about agreeing that amdata is simpler after we discussed 
it with Heikki. Nothing too important you missed there I guess.

I can try to summarize what are the differences:
- amdata is somewhat simpler in terms of code for both init, alter and 
DDL, since with custom columns you have to specify them somehow and deal 
with them in catalog, also ALTER SEQUENCE USING means that there are 
going to be colums marked as deleted which produces needless waste, etc
- amdata make it easier to change the storage model as the tuple 
descriptor is same for all sequences
- the separate columns are much nicer from user point of view
- my opinion is that separate columns also more nicely separate state 
from options and I think that if we move to separate storage model, 
there can be state table per AM which solves the tuple descriptor issue
- there is probably some slight performance benefit to amdata but I 
don't think it's big enough to be important

I personally have slight preference to separate columns design, but I am 
ok with both ways honestly.

>
> There is no \d command for sequence access methods.  Without querying
> pg_seqam directly, how does one discover what's available?
>

Good point.

>
> Patch 2: seqam ddl
>
> When defining a new access method for sequences, it is possible to list
> the arguments multiple times (last one wins).  Other defel loops raise
> an error if the argument is specified more than once.  I haven't looked
> at all of such loops to see if this is the only odd man out or not, but
> I prefer the error behavior.
>

Hmm yeah, there should be error. I think only tsearch doesn't enforce 
errors from the existing stuff, should probably be fixed as well 
(separately of course).

>
> Patch 3: gapless_seq
>
> I really like the idea of having a gapless sequence in contrib.
> Although it has big potential to be abused, doing it manually when it's
> needed (like for invoices, at least in France) is a major pain.  So big
> +1 for including this.
>

Yeah, people make gapless sequences regardless, it's better to provide 
them one that behaves correctly, also it's quite good test for the API.

>
> It would be nice to be able to specify an access method when declaring a
> serial or bigserial column.  This could be a separate patch later on,
> though.
>

The patch originally had GUC for this, but Heikki didn't like it so it's 
left for the future developments.

>
> On the whole, I think this is a pretty good patchset.  Aside from the
> design decision of whether amdata is a single opaque column or a set of
> columns, there are only a few things that need to be changed before it's
> ready for committer, and those things are mostly documentation.
>

Unfortunately the amdata being opaque vs set of columns is the main 
issue here.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Heikki Linnakangas

Date:

28 July 2015, 18:11:55

So, we have this patch in the commitfest again. Let's see where we are, 
and try to find a consensus on what needs to be done before this can be 
committed.

On 06/17/2015 06:51 PM, Petr Jelinek wrote:
> On 2015-06-15 11:32, Vik Fearing wrote:
>> I've been looking at these patches a bit and here are some comments:
>
> Thanks for looking at this.

+1, thanks Vik.

>> As mentioned upthread, this patch isn't a seamless replacement for
>> what's already there because of the amdata field.  I wasn't part of the
>> conversation of FOSDEM unfortunately, and there's not enough information
>> in this thread to know why this solution is preferred over each seqam
>> having its own table type with all the columns it needs.  I see that
>> Heikki is waffling a bit between the two, and I have a fairly strong
>> opinion that amdata should be split into separate columns.  The patch
>> already destroys and recreates what it needs when changing access method
>> via ALTER SEQUENCE, so I don't really see what the problem is.
>
> FOSDEM was just about agreeing that amdata is simpler after we discussed
> it with Heikki. Nothing too important you missed there I guess.
>
> I can try to summarize what are the differences:
> - amdata is somewhat simpler in terms of code for both init, alter and
> DDL, since with custom columns you have to specify them somehow and deal
> with them in catalog, also ALTER SEQUENCE USING means that there are
> going to be colums marked as deleted which produces needless waste, etc
> - amdata make it easier to change the storage model as the tuple
> descriptor is same for all sequences
> - the separate columns are much nicer from user point of view
> - my opinion is that separate columns also more nicely separate state
> from options and I think that if we move to separate storage model,
> there can be state table per AM which solves the tuple descriptor issue
> - there is probably some slight performance benefit to amdata but I
> don't think it's big enough to be important
>
> I personally have slight preference to separate columns design, but I am
> ok with both ways honestly.

Regarding the amdata issue, I'm also leaning towards set of columns. 
I've felt that way all along, but not very strongly, so I relented at 
some point when Andres felt strongly that a single column would be 
better. But the more I think about it, the more I feel that separate 
columns really would be better. As evidence, I offer this recent thread:

Tom said 
(http://www.postgresql.org/message-id/8739.1436893588@sss.pgh.pa.us):
> I really don't see what's wrong with "SELECT last_value FROM sequence",
> especially since that has worked in every Postgres version since 6.x.
> Anyone slightly worried about backwards compatibility wouldn't use
> an equivalent function even if we did add one.

If we went with the single amdata column, that would break. Or we'd need 
to leave last_value as a separate column anyway, and leave it unused for 
sequence AMs where it's not applicable. But that's a bit ugly too.

Jim Nasby said in the same thread:
> FWIW, I think it'd be better to have a pg_sequences view that's the
> equivalent of SELECT * FROM <sequence> for every sequence in the
> database. That would let you get whatever info you needed.

Creating such a view would be difficult if all the sequences have a 
different set of columns. But when you think about it, it's not really 
any better with a single amdata column. You can't easily access the data 
in the amdata column that way either.

Anyway, that's my opinion. Several others have weighed in to support 
separate columns, too, so I think that is the consensus. Separate 
columns it is.

>> There is no \d command for sequence access methods.  Without querying
>> pg_seqam directly, how does one discover what's available?
>
> Good point.

Well, you can query pg_seqam. I don't think this deserves a \d command.

>> On the whole, I think this is a pretty good patchset.  Aside from the
>> design decision of whether amdata is a single opaque column or a set of
>> columns, there are only a few things that need to be changed before it's
>> ready for committer, and those things are mostly documentation.
>
> Unfortunately the amdata being opaque vs set of columns is the main
> issue here.

There was discussion on another thread on how the current sequence AM 
API is modeled after the indexam API, at 
http://www.postgresql.org/message-id/3896.1437059303@sss.pgh.pa.us. Will 
need to do something about that too.

Petr, is this enough feedback on this patch for this commitfest, or are 
there some other issues you want to discuss before I mark this as returned?

- Heikki

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

28 July 2015, 18:51:30

On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
> Petr, is this enough feedback on this patch for this commitfest, or are
> there some other issues you want to discuss before I mark this as returned?
>

You can mark it as returned, I didn't have much time to actually do much 
useful work on this in the current CF.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Thom Brown

Date:

16 September 2015, 11:22:09

On 28 July 2015 at 19:51, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 2015-07-28 20:11, Heikki Linnakangas wrote:

Petr, is this enough feedback on this patch for this commitfest, or are
there some other issues you want to discuss before I mark this as returned?

You can mark it as returned, I didn't have much time to actually do much useful work on this in the current CF.

Is this now dependant on the work Alexander Korotkov is doing on the AM interface?

Thom

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

17 September 2015, 02:36:20

On 2015-09-16 13:21, Thom Brown wrote:
> On 28 July 2015 at 19:51, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
>     On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
>
>         Petr, is this enough feedback on this patch for this commitfest,
>         or are
>         there some other issues you want to discuss before I mark this
>         as returned?
>
>
>     You can mark it as returned, I didn't have much time to actually do
>     much useful work on this in the current CF.
>
>
> Is this now dependant on the work Alexander Korotkov is doing on the AM
> interface?
>

Yes.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

31 December 2015, 23:52:01

Hi,

here is another try at this. I had some more time to think about various
corner cases (like the sequence not having TOAST).

I decided for this version to try bit different approach and that is to
use custom datatype for the sequence access method state. So there is
one extra amstate column and the access method defines the custom type.

The idea is that the custom type should be fixed width so that we can do
proper checks during the creation of access method for size and stuff.
It also makes the interface bit cleaner as we can provide method for
reading and saving of the state and abstract the internals more nicely.
It also solves the issues with critical sections vs memory allocations
when the access methods wants to save completely new state.

It also means that get_state and set_state interfaces just work with
text representation of custom type, no need to have complex conversions
to arrays and back.

One downside is that this means access methods can't use composite types
as those are not fixed width (nor plain) so the names of the individual
items in the output are somewhat hidden (see the modified create_view
test for example of how this affects things).

Other than that, this is based on the new am api by Alexander Korotkov
[1]. It extends it by adding another column called amkind to the pg_am
which can have either value "i" for index or "S" for sequence (same as
relkind in pg_class for those).

I didn't attach DDL and the gapless sequence yet, mainly because I don't
want to waste time rewriting it again in case we can't agree that this
approach is good (based on the long discussion and resulting several
rewrites wrt the multiple columns vs bytea previously).

[1] https://commitfest.postgresql.org/8/336/

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

seqam-2015-12-31.patch

Re: Sequence Access Method WIP

From

Craig Ringer

Date:

18 January 2016, 08:17:50

On 1 January 2016 at 07:51, Petr Jelinek <petr@2ndquadrant.com> wrote:

Other than that, this is based on the new am api by Alexander Korotkov [1]. It extends it by adding another column called amkind to the pg_am which can have either value "i" for index or "S" for sequence (same as relkind in pg_class for those).

This patch will no longer apply after 65c5fcd353 (http://github.com/postgres/postgres/commit/65c5fcd353) as outlined in http://www.postgresql.org/message-id/10804.1453077804@sss.pgh.pa.us .

Setting waiting-on-author in the CF app.

The good news is that the commit of the pg_am rework greatly eases the path of this patch into core.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Sequence Access Method WIP

From

Craig Ringer

Date:

18 January 2016, 08:20:08

Needs rework after the commit of https://commitfest.postgresql.org/8/336/

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

27 January 2016, 16:57:18

On 18 January 2016 at 09:19, Craig Ringer <craig@2ndquadrant.com> wrote:
> Needs rework after the commit of https://commitfest.postgresql.org/8/336/

Here is version that applies to current master. There is some work to
do (mostly cleanup) and the DDL is missing, but that's because I want
to know what people think about the principle of how it works now and
if it makes sense to finish it this way (explained in the original
submission for Jan CF).

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-seqam-2015-01-28.patch

Re: Sequence Access Method WIP

From

Alvaro Herrera

Date:

29 January 2016, 12:00:51

Petr Jelinek wrote:
> On 18 January 2016 at 09:19, Craig Ringer <craig@2ndquadrant.com> wrote:
> > Needs rework after the commit of https://commitfest.postgresql.org/8/336/
> 
> Here is version that applies to current master. There is some work to
> do (mostly cleanup) and the DDL is missing, but that's because I want
> to know what people think about the principle of how it works now and
> if it makes sense to finish it this way (explained in the original
> submission for Jan CF).

I would guess that the DDL boilterplate should come from Alexander
Korotkov's patch, right?  I think a first easy step may be to combine
parts both patches so that we get the "amkind" column from this patch
and the DDL support from Alexander's patch (means that his proposed
command needs a new clause to specify the amkind); then the really tough
stuff in both Alexander's and this patch can be rebased on top of that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Sequence Access Method WIP

From

Tom Lane

Date:

29 January 2016, 13:48:44

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I would guess that the DDL boilterplate should come from Alexander
> Korotkov's patch, right?  I think a first easy step may be to combine
> parts both patches so that we get the "amkind" column from this patch
> and the DDL support from Alexander's patch (means that his proposed
> command needs a new clause to specify the amkind);

Uh, what?  Surely we would provide a bespoke command for each possible
sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
that the provided function has the right signature, and then it would put
the correct amkind into the pg_am entry automatically.

If we skimp on this infrastructure then we're just relying on the
user not to make a typo, which doesn't seem like a good idea.

(Agreed though that a reasonable first step would be to add amkind to
pg_am and make the appropriate adjustments to existing code, ie check
that amkind is correct when fetching an index handler.  I considered
putting that into the AM API refactor patch, but desisted.)
        regards, tom lane

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

29 January 2016, 14:12:07

On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> I would guess that the DDL boilterplate should come from Alexander
>> Korotkov's patch, right?  I think a first easy step may be to combine
>> parts both patches so that we get the "amkind" column from this patch
>> and the DDL support from Alexander's patch (means that his proposed
>> command needs a new clause to specify the amkind);
>
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.
>
> If we skimp on this infrastructure then we're just relying on the
> user not to make a typo, which doesn't seem like a good idea.
>

Yeah it has to be completely separate command for both that do
completely different sanity checks. It can't even be called CREATE
INDEX/SEQUENCE ACCESS METHOD unless we are willing to make ACCESS a
keyword due to preexisting CREATE INDEX/SEQUENCE <name> commands. The
previous version of the patch which used somewhat different underlying
catalog structure already had DDL and honestly writing the DDL part is
quite easy. I used CREATE ACCESS METHOD FOR SEQUENCE there.

> (Agreed though that a reasonable first step would be to add amkind to
> pg_am and make the appropriate adjustments to existing code, ie check
> that amkind is correct when fetching an index handler.  I considered
> putting that into the AM API refactor patch, but desisted.)
>

Well I was wondering how to handle this as well, the submitted patch
currently just adds Asserts, because IMHO the actual ERROR should be
thrown in the DDL not in the code that just uses it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Alvaro Herrera

Date:

29 January 2016, 15:36:57

Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> > Uh, what?  Surely we would provide a bespoke command for each possible
> > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Sequence Access Method WIP

From

Tom Lane

Date:

29 January 2016, 16:03:26

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.

> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.

That seems okay.  I had the impression you were proposingCREATE ACCESS METHOD foobar TYPE 'i' USING functionname
or something like that, where it would be totally up to the user that
the amkind matched the function.  That seems unnecessarily error-prone,
not to mention that it would close the door forever on any hope that
we might allow non-superusers to execute such commands.
        regards, tom lane

Re: Sequence Access Method WIP

From

Alexander Korotkov

Date:

29 January 2016, 20:56:28

On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> > Uh, what? Surely we would provide a bespoke command for each possible
> > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

Ok! Let's nail down the syntax and I can integrate it into my createam patch.

I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER handler;", but I don't insist.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

Re: Sequence Access Method WIP

From

Tom Lane

Date:

29 January 2016, 22:24:17

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>> like that.

> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
> handler;", but I don't insist.

I think that Alvaro's idea is less likely to risk future grammar
conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
[ CONCURRENTLY ] to make me really wary of variables in the statement-name
part of the syntax.
        regards, tom lane

Re: Sequence Access Method WIP

From

Robert Haas

Date:

29 January 2016, 22:59:56

On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>> wrote:
>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>> like that.
>
>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>> handler;", but I don't insist.
>
> I think that Alvaro's idea is less likely to risk future grammar
> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
> part of the syntax.

Strong +1.  If we put the type of access method immediately after
CREATE, I'm almost positive we'll regret it for exactly that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

30 January 2016, 12:38:37

On 29 January 2016 at 23:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>>> wrote:
>>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>>> like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1.  If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Robert Haas

Date:

30 January 2016, 12:48:43

On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.

I missed your proposal, I guess, so please don't take as having any
position on whether it's better or worse than Alvaro's.  I was only
intending to vote for the proposition that the type of access method
should follow the name of the access method.

> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.

Agreed.

> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.

Generic WAL logging seems like a *great* idea to me.  But I think it's
largely independent from the access method stuff.  If we have generic
WAL logging, people can create WAL-logged stuff that is not a new
access method.  If we have access methods, they can create new access
methods that are not WAL-logged.  If we have both, then they can
create WAL-logged access methods which of course is the payoff pitch,
but I don't see it as necessary or desirable for the two systems to be
tied together in any way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

30 January 2016, 13:04:33

On 30 January 2016 at 13:48, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
>> conflict now, that's why my proposal was different, I didn't want to
>> add more keywords. I think Alvaro's proposal is fine as well.
>
> I missed your proposal, I guess, so please don't take as having any
> position on whether it's better or worse than Alvaro's.  I was only
> intending to vote for the proposition that the type of access method
> should follow the name of the access method.
>

No worries didn't mean it that way.

>> In any case this is slightly premature IMHO as DDL is somewhat unless
>> until we have sequence access methods implementation we can agree on,
>> or the generic WAL logging so that custom indexes can be crash safe.
>
> Generic WAL logging seems like a *great* idea to me.  But I think it's
> largely independent from the access method stuff.  If we have generic
> WAL logging, people can create WAL-logged stuff that is not a new
> access method.  If we have access methods, they can create new access
> methods that are not WAL-logged.  If we have both, then they can
> create WAL-logged access methods which of course is the payoff pitch,
> but I don't see it as necessary or desirable for the two systems to be
> tied together in any way.

Okay, I am only debating the usefulness of DDL for access methods in
the current situation where we only have custom index access methods
which can't create WAL records. In other words trying to nudge people
slightly back towards the actual patch(es).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Alexander Korotkov

Date:

16 February 2016, 12:59:21

On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 29 January 2016 at 23:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
>>> wrote:
>>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>>> like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1. If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

I've changed syntax of CREATE ACCESS METHOD syntax in the thread about index access method extendability.

Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New column amtype of pg_am stores access method type.

http://www.postgresql.org/message-id/CAPpHfdu9gLN7kuicweGsp50CaAMWx8Q-JWzbPehc92rvFHzkeg@mail.gmail.com

It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE HANDLER handler;".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

05 March 2016, 04:09:43

On 16/02/16 13:58, Alexander Korotkov wrote:
>
> I've changed syntax of CREATE ACCESS METHOD syntax in the thread about
> index access method extendability.
> Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New
> column amtype of pg_am stores access method type.
> http://www.postgresql.org/message-id/CAPpHfdu9gLN7kuicweGsp50CaAMWx8Q-JWzbPehc92rvFHzkeg@mail.gmail.com
> It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE
> HANDLER handler;".
>

Yes I've seen and I will send a review within couple of days.

But first here is updated patch for sequence access methods. I went with
the previously discussed custom type as this gives us proper control
over the width of the state and making sure that it's not gonna be
toastable, etc and we need this as sequence is limited to single page.

Attached are 3 separate files. The first one (0001-create-am) is mainly
separate for the reason that there is some overlap with Alexander's
index access methods patch, so I extracted common code into separate
patch as it will make it easier to merge in case we are lucky enough to
get these patches in (but it's still based on Alexander's code). It
provides infra for defining new access methods from SQL, although
without the parser and without any actual access methods.

The 0002-seqam provides the SQL interface and support for sequence
access methods on top of the infra patch, and also provides all the
sequence access methods infrastructure and abstraction and documentation.

And finally the 0003-gapless-seq is example contrib module that
implements dependably and transitionally safe gapless sequence access
method. It's obviously slow as it has to do locking and basically
serialize all the changes to sequence so only one transaction may use it
at a time but it's truly gapless. It also serves as an example of use of
the api and as a test.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Hi,

I rebased this on top of the recently committed CREATE ACCESS METHOD.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david@pgmasters.net> wrote:
>
> Hi Petr,
>
> On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>
>> fabrizio@bagual:~/pgsql
>> $ bin/pg_dump > /tmp/fabrizio.sql
>> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"
>> does not exist
>> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>> ^
>> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
>> increment_by, CASE WHEN increment_by > 0 AND max_value =
>> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value =
>> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN
>> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0
>> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END
>> AS min_value, cache_value, is_cycled FROM x
>
>
> It looks like a new patch is needed. I've marked this "waiting on author".
>

The attached patches fix the issues previous pointed by me.

But there are other issue in the gapless-seq extension when I ran "make check":

43   CREATE EXTENSION gapless_seq;
44   CREATE SEQUENCE test_gapless USING gapless;
45   SELECT nextval('test_gapless'::regclass);
46 ! ERROR: could not access status of transaction 1275068416
47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or directory.
48   BEGIN;
49     SELECT nextval('test_gapless'::regclass);
50 ! ERROR: could not access status of transaction 1275068416
51 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or directory.
52     SELECT nextval('test_gapless'::regclass);
53 ! ERROR: current transaction is aborted, commands ignored until end of transaction block
54     SELECT nextval('test_gapless'::regclass);
55 ! ERROR: current transaction is aborted, commands ignored until end of transaction block
56   ROLLBACK;
57   SELECT nextval('test_gapless'::regclass);
58 ! ERROR: could not access status of transaction 1275068416
59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or directory.

And I see the same running manually:

fabrizio=# create extension gapless_seq;
CREATE EXTENSION
fabrizio=# create sequence x using gapless;
CREATE SEQUENCE
fabrizio=# select nextval('x');
ERROR: could not access status of transaction 1258291200
DETAIL: Could not open file "pg_subtrans/4B00": No such file or directory.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Attachment

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

29 March 2016, 17:26:48

On 29/03/16 18:50, Fabrízio de Royes Mello wrote:
>
>
> On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>  >
>  > Hi Petr,
>  >
>  > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>  >
>  >> fabrizio@bagual:~/pgsql
>  >> $ bin/pg_dump > /tmp/fabrizio.sql
>  >> pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"
>  >> does not exist
>  >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>  >>                 ^
>  >> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
>  >> increment_by, CASE WHEN increment_by > 0 AND max_value =
>  >> 9223372036854775807 THEN NULL      WHEN increment_by < 0 AND max_value =
>  >> -1 THEN NULL      ELSE max_value END AS max_value, CASE WHEN
>  >> increment_by > 0 AND min_value = 1 THEN NULL      WHEN increment_by < 0
>  >> AND min_value = -9223372036854775807 THEN NULL      ELSE min_value END
>  >> AS min_value, cache_value, is_cycled FROM x
>  >
>  >
>  > It looks like a new patch is needed.  I've marked this "waiting on
> author".
>  >
>

Yeah there were some incompatible commits since my last rebase, fixed,
along with the pg_dump bugs..

>
> But there are other issue in the gapless-seq extension when I ran "make
> check":
>
>   43   CREATE EXTENSION gapless_seq;
>   44   CREATE SEQUENCE test_gapless USING gapless;
>   45   SELECT nextval('test_gapless'::regclass);
>   46 ! ERROR:  could not access status of transaction 1275068416
>   47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
> directory.
>   48   BEGIN;
>   49     SELECT nextval('test_gapless'::regclass);
>   50 ! ERROR:  could not access status of transaction 1275068416
>   51 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
> directory.
>   52     SELECT nextval('test_gapless'::regclass);
>   53 ! ERROR:  current transaction is aborted, commands ignored until
> end of transaction block
>   54     SELECT nextval('test_gapless'::regclass);
>   55 ! ERROR:  current transaction is aborted, commands ignored until
> end of transaction block
>   56   ROLLBACK;
>   57   SELECT nextval('test_gapless'::regclass);
>   58 ! ERROR:  could not access status of transaction 1275068416
>   59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or
> directory.
>
>
> And I see the same running manually:
>
> fabrizio=# create extension gapless_seq;
> CREATE EXTENSION
> fabrizio=# create sequence x using gapless;
> CREATE SEQUENCE
> fabrizio=# select nextval('x');
> ERROR:  could not access status of transaction 1258291200
> DETAIL:  Could not open file "pg_subtrans/4B00": No such file or directory.
>
> Regards,
>

Hmm I am unable to reproduce this. What OS? Any special configure flags
you use?


--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From

Fabrízio de Royes Mello

Date:

29 March 2016, 17:47:00

<div dir="ltr"><div class="gmail_extra"><br />On Tue, Mar 29, 2016 at 2:26 PM, Petr Jelinek <<a
href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br />><br />> On 29/03/16 18:50, Fabrízio
deRoyes Mello wrote:<br />>><br />>><br />>><br />>> On Tue, Mar 29, 2016 at 12:25 PM, David
Steele<<a href="mailto:david@pgmasters.net">david@pgmasters.net</a><br />>> <mailto:<a
href="mailto:david@pgmasters.net">david@pgmasters.net</a>>>wrote:<br />>>  ><br />>>  > Hi
Petr,<br/>>>  ><br />>>  > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:<br />>>  ><br
/>>> >> fabrizio@bagual:~/pgsql<br />>>  >> $ bin/pg_dump > /tmp/fabrizio.sql<br />>>
 >>pg_dump: [archiver (db)] query failed: ERROR:  column "sequence_name"<br />>>  >> does not
exist<br/>>>  >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...<br />>>
 >>                ^<br />>>  >> pg_dump: [archiver (db)] query was: SELECT sequence_name,
start_value,<br/>>>  >> increment_by, CASE WHEN increment_by > 0 AND max_value =<br />>>  >>
9223372036854775807THEN NULL      WHEN increment_by < 0 AND max_value =<br />>>  >> -1 THEN NULL    
 ELSEmax_value END AS max_value, CASE WHEN<br />>>  >> increment_by > 0 AND min_value = 1 THEN NULL    
 WHENincrement_by < 0<br />>>  >> AND min_value = -9223372036854775807 THEN NULL      ELSE min_value
END<br/>>>  >> AS min_value, cache_value, is_cycled FROM x<br />>>  ><br />>>  ><br
/>>> > It looks like a new patch is needed.  I've marked this "waiting on<br />>> author".<br />>>
 ><br/>>><br />><br />> Yeah there were some incompatible commits since my last rebase, fixed, along
withthe pg_dump bugs..<br />><br /><br /></div><div class="gmail_extra">Now all applies without errors, build and
"makecheck" too.<br /></div><div class="gmail_extra"><br /><br /><br />>> But there are other issue in the
gapless-seqextension when I ran "make<br />>> check":<br />>><br />>>   43   CREATE EXTENSION
gapless_seq;<br/>>>   44   CREATE SEQUENCE test_gapless USING gapless;<br />>>   45   SELECT
nextval('test_gapless'::regclass);<br/>>>   46 ! ERROR:  could not access status of transaction 1275068416<br
/>>>  47 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or<br />>> directory.<br
/>>>  48   BEGIN;<br />>>   49     SELECT nextval('test_gapless'::regclass);<br />>>   50 ! ERROR:
 couldnot access status of transaction 1275068416<br />>>   51 ! DETAIL:  Could not open file "pg_subtrans/4C00":
Nosuch file or<br />>> directory.<br />>>   52     SELECT nextval('test_gapless'::regclass);<br />>>
 53 ! ERROR:  current transaction is aborted, commands ignored until<br />>> end of transaction block<br
/>>>  54     SELECT nextval('test_gapless'::regclass);<br />>>   55 ! ERROR:  current transaction is
aborted,commands ignored until<br />>> end of transaction block<br />>>   56   ROLLBACK;<br />>>   57
 SELECT nextval('test_gapless'::regclass);<br />>>   58 ! ERROR:  could not access status of transaction
1275068416<br/>>>   59 ! DETAIL:  Could not open file "pg_subtrans/4C00": No such file or<br />>>
directory.<br/>>><br />>><br />>> And I see the same running manually:<br />>><br />>>
fabrizio=#create extension gapless_seq;<br />>> CREATE EXTENSION<br />>> fabrizio=# create sequence x using
gapless;<br/>>> CREATE SEQUENCE<br />>> fabrizio=# select nextval('x');<br />>> ERROR:  could not
accessstatus of transaction 1258291200<br />>> DETAIL:  Could not open file "pg_subtrans/4B00": No such file or
directory.<br/>>><br />>> Regards,<br />>><br />><br />> Hmm I am unable to reproduce this.
WhatOS? Any special configure flags you use?<br />><br /><br /></div><div class="gmail_extra">In my environment the
errorremains with your last patches. <br /><br />I didn't use any special. <br /><br />./configure
--prefix=/home/fabrizio/pgsql--enable-cassert --enable-coverage --enable-tap-tests --enable-depend <br /></div><div
class="gmail_extra">make-s -j8 install<br /></div><div class="gmail_extra">make check-world<br /></div><div
class="gmail_extra"><br/><br /></div><div class="gmail_extra">My environment:<br /></div><div class="gmail_extra"><br
/>fabrizio@bagual:/d/postgresql(0002-gapless-seq-petr) <br />$ uname -a<br />Linux bagual 3.13.0-83-generic #127-Ubuntu
SMPFri Mar 11 00:25:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux<br /><br />fabrizio@bagual:/d/postgresql
(0002-gapless-seq-petr)<br />$ gcc --version<br />gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4<br />Copyright (C) 2013
FreeSoftware Foundation, Inc.<br />This is free software; see the source for copying conditions.  There is NO<br
/>warranty;not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br /><br /><br /></div><div
class="gmail_extra">Regards,<br/></div><div class="gmail_extra"><br />--<br />Fabrízio de Royes Mello<br
/>Consultoria/CoachingPostgreSQL<br />>> Timbira: <a
href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog: <a
href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a
href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a
href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

29 March 2016, 19:59:57

On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>
>  >
>  > Hmm I am unable to reproduce this. What OS? Any special configure
> flags you use?
>  >
>
> In my environment the error remains with your last patches.
>
> I didn't use any special.
>
> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
> --enable-coverage --enable-tap-tests --enable-depend
> make -s -j8 install
> make check-world
>
>
> My environment:
>
> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
> $ uname -a
> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
> $ gcc --version
> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
> Copyright (C) 2013 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>

Hmm nothing special indeed, still can't reproduce, I did one blind try
for a fix. Can you test with attached?

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0002-gapless-seq-2016-03-29-2.patch

Re: Sequence Access Method WIP

From

Fabrízio de Royes Mello

Date:

29 March 2016, 20:08:12

On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>
>>
>> >
>> > Hmm I am unable to reproduce this. What OS? Any special configure
>> flags you use?
>> >
>>
>> In my environment the error remains with your last patches.
>>
>> I didn't use any special.
>>
>> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> --enable-coverage --enable-tap-tests --enable-depend
>> make -s -j8 install
>> make check-world
>>
>>
>> My environment:
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ uname -a
>> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ gcc --version
>> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>
> Hmm nothing special indeed, still can't reproduce, I did one blind try for a fix. Can you test with attached?
>

Same error... Now I'll leave in a trip but when I arrive I'll try to figure out what happening debugging the code.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

29 March 2016, 20:58:55

On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>
>
> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>  >>
>  >>
>  >>  >
>  >>  > Hmm I am unable to reproduce this. What OS? Any special configure
>  >> flags you use?
>  >>  >
>  >>
>  >> In my environment the error remains with your last patches.
>  >>
>  >> I didn't use any special.
>  >>
>  >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>  >> --enable-coverage --enable-tap-tests --enable-depend
>  >> make -s -j8 install
>  >> make check-world
>  >>
>  >>
>  >> My environment:
>  >>
>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>  >> $ uname -a
>  >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>  >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>  >>
>  >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>  >> $ gcc --version
>  >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>  >> Copyright (C) 2013 Free Software Foundation, Inc.
>  >> This is free software; see the source for copying conditions.  There
> is NO
>  >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
>  >>
>  >
>  > Hmm nothing special indeed, still can't reproduce, I did one blind
> try for a fix. Can you test with attached?
>  >
>
> Same error... Now I'll leave in a trip but when I arrive I'll try to
> figure out what happening debugging the code.
>

Okay, thanks.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Fabrízio de Royes Mello

Date:

30 March 2016, 04:06:40

On Tue, Mar 29, 2016 at 5:58 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr@2ndquadrant.com
>> <mailto:petr@2ndquadrant.com>> wrote:
>> >
>> > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>> >>
>> >>
>> >> >
>> >> > Hmm I am unable to reproduce this. What OS? Any special configure
>> >> flags you use?
>> >> >
>> >>
>> >> In my environment the error remains with your last patches.
>> >>
>> >> I didn't use any special.
>> >>
>> >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> >> --enable-coverage --enable-tap-tests --enable-depend
>> >> make -s -j8 install
>> >> make check-world
>> >>
>> >>
>> >> My environment:
>> >>
>> >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> >> $ uname -a
>> >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >> fabrizio@bagual:/d/postgresql (0002-gapless-seq-petr)
>> >> $ gcc --version
>> >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> >> Copyright (C) 2013 Free Software Foundation, Inc.
>> >> This is free software; see the source for copying conditions. There
>> is NO
>> >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>> >>
>> >
>> > Hmm nothing special indeed, still can't reproduce, I did one blind
>> try for a fix. Can you test with attached?
>> >
>>
>> Same error... Now I'll leave in a trip but when I arrive I'll try to
>> figure out what happening debugging the code.
>>
>
> Okay, thanks.
>

Hi,

After some debugging seems that the "seqstate->xid" in "wait_for_sequence" isn't properly initialized or initialized with the wrong value (1258291200).

Re: Sequence Access Method WIP

From

Jose Luis Tallon

Date:

30 March 2016, 13:23:06

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
It does compile cleanly.

DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME
("seqam_gapless_values")strikes me a bit: while I understand the creation of a private schema named after the
extension,it seems overkill for just a single table.

I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation
details,if deemed reasonable.

On the other hand, creating the schema/table upon extension installation makes the values table use the default
tablespacefor the database, which can be good for concurrency --- direct writes to less loaded storage (Note that
usersmay want to move this table into an SSD-backed tablespace or equivalently fast storage ... moreso when many --not
justone-- gapless sequences are being used)

Yet, there is <20141203102425.GT2456@alap3.anarazel.de> where Andres argues against anything different than
one-page-per-sequenceimplementations. I guess this patch changes everything in this respect.

CODE seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value, bool
restart_requested,bool is_init) -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid
readingas "is_initIALIZED"

DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master.
Please update/rebase the patch and resubmit.

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

30 March 2016, 15:32:51

Hi,

Thanks for review.

On 30/03/16 15:22, Jose Luis Tallon wrote:
> [Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
> Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ]
> Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
>

Ouch good point, it does show how long the whole sequence am thing has 
been around.

> DESIGN
> The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME
("seqam_gapless_values")strikes me a bit: while I understand the creation of a private schema named after the
extension,it seems overkill for just a single table.
 
> I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation
details,if deemed reasonable.
 
> On the other hand, creating the schema/table upon extension installation makes the values table use the default
tablespacefor the database, which can be good for concurrency --- direct writes to less loaded storage
 
>     (Note that users may want to move this table into an SSD-backed tablespace or equivalently fast storage ...
moresowhen many --not just one-- gapless sequences are being used)
 
>

Schema is needed for the handler function as well. In general I don't 
want to add another internal schema that will be empty when no sequence 
AM is installed.

> Yet, there is <20141203102425.GT2456@alap3.anarazel.de> where Andres argues against anything different than
one-page-per-sequenceimplementations.
 
>     I guess this patch changes everything in this respect.

Not really, gapless just needs table for transactionality and as such is 
an exception. It does not change how the nontransactional sequence 
storage works though. I agree with Andres on this one.

> CODE
>    seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,
>                                  bool restart_requested, bool is_init)
>    -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid reading as "is_initIALIZED"

Sounds good.

> DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master.
> Please update/rebase the patch and resubmit.
>

The current version of seqam is 0001-seqam-2016-03-29 which should apply 
correctly.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

31 March 2016, 12:19:39

Hi,

new version attached that should fix the issue. It was alignment -
honestly don't know what I was thinking using fixed alignment when the
AMs can define their own type.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Sequence Access Method WIP

From

Fabrízio de Royes Mello

Date:

04 April 2016, 13:53:30

On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> new version attached that should fix the issue. It was alignment - honestly don't know what I was thinking using fixed alignment when the AMs can define their own type.
>

Yeah... now all works fine... I had a minor issue when apply to the current master but the attached fix it and I also added tab-complete support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.

I ran all the regression tests and all passed.

I have just one question about de 0002 patch:
- There are some reason to not use TransactionId instead of uint32 in GaplessSequenceState struct?

Regards,

Attachment

Re: Sequence Access Method WIP

From

Petr Jelinek

Date:

05 April 2016, 15:52:54

On 04/04/16 15:53, Fabrízio de Royes Mello wrote:
>
>
> On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  > Hi,
>  >
>  > new version attached that should fix the issue. It was alignment -
> honestly don't know what I was thinking using fixed alignment when the
> AMs can define their own type.
>  >
>
> Yeah... now all works fine... I had a minor issue when apply to the
> current master but the attached fix it and I also added tab-complete
> support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.
>

Cool thanks.

> I ran all the regression tests and all passed.
>
> I have just one question about de 0002 patch:
> - There are some reason to not use TransactionId instead of uint32 in
> GaplessSequenceState struct?
>

I missed we have that :)

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Sequence Access Method WIP

From

Fabrízio de Royes Mello

Date:

05 April 2016, 17:54:49

On Tue, Apr 5, 2016 at 12:52 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> I have just one question about de 0002 patch:
>> - There are some reason to not use TransactionId instead of uint32 in
>> GaplessSequenceState struct?
>>
>
> I missed we have that :)
>

Attached fix it and also I changed to use TransactionIdEquals instead of compare xids directly.

- if (seqstate->xid != local_xid)
+ if (!TransactionIdEquals(seqstate->xid, local_xid))

IMHO this patch is Ready for Commiter. Changed status in CommitfestApp.

Regards,