Thread: Immutable datastore library?

Immutable datastore library?

From
Guyren Howe
Date:
I would like to use Postgres as an immutable data store. A subject table would have a timestamp column, and that would
beadded to what would otherwise be the primary key. 

Trap updates and turn them into inserts. Have an associated _deleted table. Trap deletes and turn them into inserts of
theprimary key into that table. 

Create a view that only shows the non-deleted records with the most recent timestamps.

Stored procedure to do all that to a table. Event trigger to drop and re-create the view on changes to the table
columns.

I think that's about it. Seems an obvious and nice general-purpose thing to do. It also seems like something someone
mighthave already done, but googling "immutable postgres" pulls up things about writing functions. 

Does anyone know of a project that has already done this?

Re: Immutable datastore library?

From
Thomas Munro
Date:
On Tue, Oct 18, 2016 at 9:39 AM, Guyren Howe <guyren@gmail.com> wrote:
> I would like to use Postgres as an immutable data store. A subject table would have a timestamp column, and that
wouldbe added to what would otherwise be the primary key. 
>
> Trap updates and turn them into inserts. Have an associated _deleted table. Trap deletes and turn them into inserts
ofthe primary key into that table. 
>
> Create a view that only shows the non-deleted records with the most recent timestamps.
>
> Stored procedure to do all that to a table. Event trigger to drop and re-create the view on changes to the table
columns.
>
> I think that's about it. Seems an obvious and nice general-purpose thing to do. It also seems like something someone
mighthave already done, but googling "immutable postgres" pulls up things about writing functions. 
>
> Does anyone know of a project that has already done this?

The term you need is 'temporal' and searching for that you'll find
several projects providing something like that for PostgreSQL.

I've done some projects using temporal (and bitemporal) models based
on Richard Snodgrass's excellent book (which I hear is widely read at
utility companies among others), without any special library support:

http://www.cs.arizona.edu/~rts/tdbbook.pdf

His work influenced the SQL standard which I expect/hope is inspiring
those projects.  SQL:2011 has a temporal feature that does this sort
of thing automatically.

--
Thomas Munro
http://www.enterprisedb.com


Re: Immutable datastore library?

From
Guyren Howe
Date:
On Oct 17, 2016, at 13:54 , Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> I've done some projects using temporal (and bitemporal) models based
> on Richard Snodgrass's excellent book (which I hear is widely read at
> utility companies among others), without any special library support:
>
> http://www.cs.arizona.edu/~rts/tdbbook.pdf

I would need something robust. Can anyone recommend an implementation they've tested and know to be reliable?

Re: Immutable datastore library?

From
"Mike Sofen"
Date:
From: Guyren Howe   Sent: Monday, October 17, 2016 1:40 PM
I would like to use Postgres as an immutable data store. A subject table
would have a timestamp column, and that would be added to what would
otherwise be the primary key.  Trap updates and turn them into inserts. Have
an associated _deleted table. Trap deletes and turn them into inserts of the
primary key into that table.  Create a view that only shows the non-deleted
records with the most recent timestamps.

Stored procedure to do all that to a table. Event trigger to drop and
re-create the view on changes to the table columns.
----------------------
A couple years ago at another firm, I designed and built a real time ODS
(operational data store) for a large healthcare company, on SQL Server 2012
that supported your exact requirements, since the ODS was the primary data
feed for their data warehouse.

My solution leveraged the Merge tsql function ("Upsert" in PG) to detect
inserts/updates/deletes.  We don't allow physical row deletes in medical
data, so these were logical deletes aka an update to an
"InactivatedDatetime" column making it not null.  I used a checksum function
in the Update branch to detect if the inbound data had changed at all, to
avoid creating dry updates (no change in the data but a new Update row would
be written otherwise).

Ok that's the internals for the write to the ODS.  I wrapped the entire
Merge statement inside of another insert statement using the equivalent of
the PG "Into" function, which took every column from the ODS write and wrote
the same data to a "History" table of the same name - those writes were
always inserts, creating a persistent, complete picture of every write to
the ODS.  Each row going into the History tables was marked with a "D"
(delete), "I" (insert) or "U" (update).  The History data was used for both
auditing and for nightly batches feeding the data warehouse, where row type
(D, I, or U) drove their Type 2 processing.  As you can imagine, the table
design was crucial to the success of this model.

This was ultra-efficient on the real time data flowing in from the hospital
and clinical EMRs (different systems) - in one transaction/one query, I was
able to double-write the data and ensure both writes completed or fail both
and error.  The "batches" were small - up to 100k rows or less, and
processed in under 100ms.  But even when there was a network outage and we
had to do a catch up load with millions of rows, it ran very quickly.  IOWs,
the double write overhead was very modest, especially with modern disk
performance.

Mike Sofen (Synthetic Genomics)