Re: System column xmin makes anonymity hard - Mailing list pgsql-general

From Francisco Olarte
Subject Re: System column xmin makes anonymity hard
Date
Msg-id CA+bJJbw8OcvO-tY48BFKpejBWEE5gJdjLeRBVbgvaP27Z29Lag@mail.gmail.com
Whole thread Raw
In response to System column xmin makes anonymity hard  (Johannes Linke <johannes.linke@posteo.de>)
List pgsql-general
Johannes.

On Tue, May 12, 2020 at 8:05 PM Johannes Linke <johannes.linke@posteo.de> wrote:
> since 9.4, VACUUM FREEZE just sets a flag bit instead of overwriting xmin with FrozenTransactionId [1]. This makes it
harderto build applications with a focus on data reduction. 
> We have an app that lets people anonymously vote on stuff exactly once. So we save the vote in one table without any
explicitconnection to the voting user, and separate from that a flag that this person gave their vote. That has to
happenin the same transaction for obvious reasons, but now the xmin of those two data points allows to connect them and
tode-anonymize the vote. 

> We can of course obfuscate this connection, but our goal is to not keep this data at all to make it impossible to
de-anonymizeall existing votes even when gaining access to the server. The best idea we had so far is more of a
workaround:Do dummy updates to large parts of the vote table on every insert so lots of tuples have the same xmin, and
themVACUUMing.[2] 

And even without the xmin someone could cump ctid and correlate them
if you are not careful.

You problem is going to be hard to solve without taking extra steps. I
think doing a transaction which moves all the votes for period ( using
insert into with the result of a delete returning ) and then inserts
them back ( with some things like a insert into of a select order by
random ) may work ( you may even throw a shuffled flg along the way ).
An then throw in  vacuum so next batch of inserts overwrites the freed
space.

But for someone with the appropiate access to the system, partial
deanonimization is possible unless you take very good measures. Think
of it, here in spain we use ballot boxes. But voter order is recorded
( they do double entry check, you get searched in an alphabetic list,
your name is copied on a time ordered list, and your position on the
list recorded in the alphabetic one, all in paper, nice system, easy
to audit, hard to cheat ). If you can freeze time, you can carefully
pick up votes from the box and partially correlate them with the list,
even with boxes much larger than the voting envelopes they tend to
stack with a nice order. And this is with papers, computers are much
better on purposelessly ordering everything because it is easier to do
it this way.

> Does anyone have a suggestion better than this? Is there any chance this changes anytime soon? Should I post this to
-hackers?

Something which may be useful is to use a stagging table for newly
inserted votes and move them in batches, shuffling them, to a more
permanent one periodically, ad use a view to joing them. You can even
do that with some fancy partiotioning and an extra field. And move
some users already-voted flags too, on a different transaction. Doing
some of these things and adding some old votes to the moving sets
should make the things difficult to track, but it all depends on how
hard your anonimization requirements are ( I mean, the paper system
I've described leaves my vote perfectly identificable when I've just
voted, but it is regarded as a non issue in general, and I suspect any
system you can think leaves the last vote identifiable for a finite
amount of time ). In general, move data around, in single transactions
so you do not lose anything, like shaking a ballot box periodically (
but ensure the lid is properly taped first ).

Francisco Olarte.



pgsql-general by date:

Previous
From: Tory M Blue
Date:
Subject: Re: Is there a significant difference in Memory settings between 9.5and 12
Next
From: Matthias Apitz
Date:
Subject: Re: ESQL/C: a ROLLBACK rolls back a COMMITED transaction