Thread: Immutable datastore library?
I would like to use Postgres as an immutable data store. A subject table would have a timestamp column, and that would beadded to what would otherwise be the primary key. Trap updates and turn them into inserts. Have an associated _deleted table. Trap deletes and turn them into inserts of theprimary key into that table. Create a view that only shows the non-deleted records with the most recent timestamps. Stored procedure to do all that to a table. Event trigger to drop and re-create the view on changes to the table columns. I think that's about it. Seems an obvious and nice general-purpose thing to do. It also seems like something someone mighthave already done, but googling "immutable postgres" pulls up things about writing functions. Does anyone know of a project that has already done this?
On Tue, Oct 18, 2016 at 9:39 AM, Guyren Howe <guyren@gmail.com> wrote: > I would like to use Postgres as an immutable data store. A subject table would have a timestamp column, and that wouldbe added to what would otherwise be the primary key. > > Trap updates and turn them into inserts. Have an associated _deleted table. Trap deletes and turn them into inserts ofthe primary key into that table. > > Create a view that only shows the non-deleted records with the most recent timestamps. > > Stored procedure to do all that to a table. Event trigger to drop and re-create the view on changes to the table columns. > > I think that's about it. Seems an obvious and nice general-purpose thing to do. It also seems like something someone mighthave already done, but googling "immutable postgres" pulls up things about writing functions. > > Does anyone know of a project that has already done this? The term you need is 'temporal' and searching for that you'll find several projects providing something like that for PostgreSQL. I've done some projects using temporal (and bitemporal) models based on Richard Snodgrass's excellent book (which I hear is widely read at utility companies among others), without any special library support: http://www.cs.arizona.edu/~rts/tdbbook.pdf His work influenced the SQL standard which I expect/hope is inspiring those projects. SQL:2011 has a temporal feature that does this sort of thing automatically. -- Thomas Munro http://www.enterprisedb.com
On Oct 17, 2016, at 13:54 , Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > I've done some projects using temporal (and bitemporal) models based > on Richard Snodgrass's excellent book (which I hear is widely read at > utility companies among others), without any special library support: > > http://www.cs.arizona.edu/~rts/tdbbook.pdf I would need something robust. Can anyone recommend an implementation they've tested and know to be reliable?
From: Guyren Howe Sent: Monday, October 17, 2016 1:40 PM I would like to use Postgres as an immutable data store. A subject table would have a timestamp column, and that would be added to what would otherwise be the primary key. Trap updates and turn them into inserts. Have an associated _deleted table. Trap deletes and turn them into inserts of the primary key into that table. Create a view that only shows the non-deleted records with the most recent timestamps. Stored procedure to do all that to a table. Event trigger to drop and re-create the view on changes to the table columns. ---------------------- A couple years ago at another firm, I designed and built a real time ODS (operational data store) for a large healthcare company, on SQL Server 2012 that supported your exact requirements, since the ODS was the primary data feed for their data warehouse. My solution leveraged the Merge tsql function ("Upsert" in PG) to detect inserts/updates/deletes. We don't allow physical row deletes in medical data, so these were logical deletes aka an update to an "InactivatedDatetime" column making it not null. I used a checksum function in the Update branch to detect if the inbound data had changed at all, to avoid creating dry updates (no change in the data but a new Update row would be written otherwise). Ok that's the internals for the write to the ODS. I wrapped the entire Merge statement inside of another insert statement using the equivalent of the PG "Into" function, which took every column from the ODS write and wrote the same data to a "History" table of the same name - those writes were always inserts, creating a persistent, complete picture of every write to the ODS. Each row going into the History tables was marked with a "D" (delete), "I" (insert) or "U" (update). The History data was used for both auditing and for nightly batches feeding the data warehouse, where row type (D, I, or U) drove their Type 2 processing. As you can imagine, the table design was crucial to the success of this model. This was ultra-efficient on the real time data flowing in from the hospital and clinical EMRs (different systems) - in one transaction/one query, I was able to double-write the data and ensure both writes completed or fail both and error. The "batches" were small - up to 100k rows or less, and processed in under 100ms. But even when there was a network outage and we had to do a catch up load with millions of rows, it ran very quickly. IOWs, the double write overhead was very modest, especially with modern disk performance. Mike Sofen (Synthetic Genomics)