Reconstructing transaction content after the fact - Mailing list pgsql-general

From Lionel Bouton
Subject Reconstructing transaction content after the fact
Date
Msg-id 091ea8f4-1530-40b7-ff01-56576903a4b1@jtek.fr
Whole thread Raw
Responses Re: Reconstructing transaction content after the fact  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Reconstructing transaction content after the fact  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general
Hi,

I'm a long time PostgreSQL user (since 7.x), familiar with the basics of
PostgreSQL internals and I'm currently participating in an audit of a
software voting solution which should for obvious reasons avoid to store
any link between the voter register and the actual votes.

The software uses a modern version of PostgreSQL (9.6 or later) and uses
a transaction to store a vote and the corresponding register entry in
two separate tables.

There's already an obvious angle of attack to correlate a vote and a
register entry : the WAL. That said the WAL can be configured to be
short lived to minimize the attack surface. By my understanding a low
checkpoint_timeout value should allow postgresql to recycle/remove old
WAL files early unless the system is overloaded. This can become a bit
messy but if the system is tightly controlled I believe the attack
surface can be reduced.

Another angle I could think of are the transaction ids. If I'm not
mistaken they are stored on disk inside the files storing table and
index data (my quick read of the documentation lets me think the txid is
stored in t_xmin in the HeapTupleHeaderData for a row content).
These values might be cleaned up when the data is visible by all
currently running transaction but I think this isn't needed and so
probably not done (this information is stored in the visibility map). So
reading the raw content from disk you should be able to correlate data
in several tables from a single transaction by comparing the txid.

Are txids in table file data indeed a means by which you can recover the
data written by a single transaction (assuming the txids don't overflow
the 32bit limit during the life of the cluster) ?

Are these t_xmin values ever cleaned up (by VACUUM or another mechanism)
? If positive is there a way to configure the approximate time during
which these values can be recovered ?

Is there a way to access these values by connecting to a PostgreSQL
server instead of analyzing in-memory or on-disk data ?

Best regards,

-- 
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/




pgsql-general by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: How bad is using queries with thousands of values for operators IN or ANY?
Next
From: Tom Lane
Date:
Subject: Re: Reconstructing transaction content after the fact