Advices on custom data type and extension development - Mailing list pgsql-hackers

From Luciano Coutinho Barcellos
Subject Advices on custom data type and extension development
Date
Msg-id 569DAF4C.8010302@geocontrol.com.br
Whole thread Raw
Responses Re: Advices on custom data type and extension development  (Kevin Grittner <kgrittn@gmail.com>)
Re: Advices on custom data type and extension development  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Dear friends,
    I'm planning to develop an extension, and I'm here for getting some 
help. But I would like to share the problem I intend to solve. Maybe my 
desired solution is not a good option.
    What I have:
        * a lot of data being generated every day, which are mainly 
queried by an immutable column of type date or timestamp;        * as a standard, almost every table has a bigserial id
column
 
as a primary key;        * data is huge enough to demand table partitioning, which is 
implemented as suggested in Postgres documentation, by using triggers 
and table inheritance. A function called by cron deal with creation of 
new partitions.
    What I would like to develop first is a custom type (let's call it 
datedserial) for replacing bigserial as the primary key:
        * the type would be 8 bytes long, being 4 dedicated to storing 
the Date, and 4 dedicated to storing a serial within that day;        * the text representation of the type would show
itsdate and 
 
its serial number (something like '2015-10-02.0000007296' as a canonical 
form, but which could accept inputs like '20151002.0000007296');        * as a consequence of this internal
representation,the serial 
 
part could not be greater than 4 billion and some;        * support for operator classes allowing the type being used
in
 
GIN and GIST indexes would be optional for now.
    That would allow me to have a compact primary key which I could use 
to partition the table based on the object's date. That would also allow 
me to partition detail tables on the foreign key column having this data 
type. Besides that, just by examining the value, mainly when used as a 
foreign key, I could infer where the record belongs to.
    When I have a working custom data type, I would go to the next and 
harder part. I would like to create a new structure like a sequence, and 
it should behave exactly like sequences, but separated by a date space. 
So I would have functions similar to the following:
        * createsequencegroup(sequence_group_name text): create a new 
named structure for managing the sequence group;        * nextval(sequence_group_name text, context_date date): return

next value of the sequence (as a datedserial) belonging to the sequence 
group and associated with the context date. The value returned have the 
context_date in its date part and the next value for that date in the 
sequence part. The first call for a specific date would return 1 for the 
sequence part. Concerning to concurrency and transactions, the function 
behaves exactly like nextval(sequence_group_name text);        * currval(sequence_group_name text, context_date date):
the
 
currval function counterpart;        * setval(sequence_group_name text, context_date date, int4 
value): the setval function counterpart;        * freeze_before(sequence_group_name text, freeze_date date): 
disallow using the sequence group with context dates before the freeze_date.
    I would consider extending the data type to allow including 
information about the cluster which generated the values. This way, the 
user could set a configuration entry defining a byte value for 
identifying the cluster among others involved in replication, so that 
the sequence group could have different sequences not only for different 
dates, but for different nodes as well.
    As I've said, I would like to package the resulting work as an 
extension.
    For now, I would like some help about where to start. I've 
downloaded the postgres source code and have successfully compiled it 
using my Ubuntu desktop, although have not tested the resulting binary. 
Should I create a folder in the contrib directory and use another 
extension as a starting point? Is this the recommended path? Or is this 
too much and I should create a separate project?
    Thanks in advance.
    Best regards,    Luciano Barcellos




pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: pgindent-polluted commits
Next
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive