Home > mailing lists

Reconstructing transaction content after the fact - Mailing list pgsql-general

From	Lionel Bouton
Subject	Reconstructing transaction content after the fact
Date	September 2, 2020 18:45:53
Msg-id	091ea8f4-1530-40b7-ff01-56576903a4b1@jtek.fr Whole thread Raw
Responses	Re: Reconstructing transaction content after the fact (Tom Lane <tgl@sss.pgh.pa.us>) Re: Reconstructing transaction content after the fact (Francisco Olarte <folarte@peoplecall.com>)
List	pgsql-general

Tree view

Hi,

I'm a long time PostgreSQL user (since 7.x), familiar with the basics of
PostgreSQL internals and I'm currently participating in an audit of a
software voting solution which should for obvious reasons avoid to store
any link between the voter register and the actual votes.

The software uses a modern version of PostgreSQL (9.6 or later) and uses
a transaction to store a vote and the corresponding register entry in
two separate tables.

There's already an obvious angle of attack to correlate a vote and a
register entry : the WAL. That said the WAL can be configured to be
short lived to minimize the attack surface. By my understanding a low
checkpoint_timeout value should allow postgresql to recycle/remove old
WAL files early unless the system is overloaded. This can become a bit
messy but if the system is tightly controlled I believe the attack
surface can be reduced.

Another angle I could think of are the transaction ids. If I'm not
mistaken they are stored on disk inside the files storing table and
index data (my quick read of the documentation lets me think the txid is
stored in t_xmin in the HeapTupleHeaderData for a row content).
These values might be cleaned up when the data is visible by all
currently running transaction but I think this isn't needed and so
probably not done (this information is stored in the visibility map). So
reading the raw content from disk you should be able to correlate data
in several tables from a single transaction by comparing the txid.

Are txids in table file data indeed a means by which you can recover the
data written by a single transaction (assuming the txids don't overflow
the 32bit limit during the life of the cluster) ?

Are these t_xmin values ever cleaned up (by VACUUM or another mechanism)
? If positive is there a way to configure the approximate time during
which these values can be recovered ?

Is there a way to access these values by connecting to a PostgreSQL
server instead of analyzing in-memory or on-disk data ?

Best regards,

-- 
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/

pgsql-general by date:

From: Kyotaro Horiguchi
Date: 02 September 2020, 04:50:33
Subject: Re: How bad is using queries with thousands of values for operators IN or ANY?

From: Tom Lane
Date: 02 September 2020, 19:50:18
Subject: Re: Reconstructing transaction content after the fact

Reconstructing transaction content after the fact - Mailing list pgsql-general

Previous

Next