Re: [PATCH] Incremental backup: add backup profile to base backup - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: [PATCH] Incremental backup: add backup profile to base backup
Date
Msg-id CAO_YK0UcnV8oUNg7zKnEFf21K0F0+R58vfLsNg1E+A9K1YdO4w@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental backup: add backup profile to base backup  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
<div dir="ltr"><br /><div class="gmail_extra"><div class="gmail_quote">On Mon, Aug 18, 2014 at 10:05 AM, Heikki
Linnakangas<span dir="ltr"><<a href="mailto:hlinnakangas@vmware.com"
target="_blank">hlinnakangas@vmware.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px
0px0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">On 08/18/2014 08:05 AM, Alvaro Herrera
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Marco Nenciarini wrote:<br /><br /><blockquote class="gmail_quote" style="margin:0px
0px0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> To calculate the md5 checksum I've used the md5
codepresent in pgcrypto<br /> contrib as the code in src/include/libpq/md5.h is not suitable for large<br /> files.
Sincea core feature cannot depend on a piece of contrib, I've<br /> moved the files<br /><br />
contrib/pgcrypto/md5.c<br/> contrib/pgcrypto/md5.h<br /><br /> to<br /><br /> src/backend/utils/hash/md5.c<br />
src/include/utils/md5.h<br/><br /> changing the pgcrypto extension to use them.<br /></blockquote><br /> We already
havethe FNV checksum implementation in the backend -- can't<br /> we use that one for this and avoid messing with
MD5?<br/><br /> (I don't think we're looking for a cryptographic hash here.  Am I wrong?)<br /></blockquote><br
/></div>Hmm. Any user that can update a table can craft such an update that its checksum matches an older backup. That
mayseem like an onerous task; to correctly calculate the checksum of a file in a previous, you need to know the LSNs
andthe exact data, including deleted data, on every block in the table, and then construct a suitable INSERT or UPDATE
thatmodifies the table such that you get a collision. But for some tables it could be trivial; you might know that a
tablewas bulk-loaded with a particular LSN and there are no dead tuples. Or you can simply create your own table and
insertexactly the data you want. Messing with your own table might seem harmless, but it'll e.g. let you construct a
casewhere an index points to a tuple that doesn't exist anymore, or there's a row that doesn't pass a CHECK-constraint
thatwas added later. Even if there's no direct security issue with that, you don't want that kind of uncertainty from a
backupsolution.<br /><br /> But more to the point, I thought the consensus was to use the highest LSN of all the blocks
inthe file, no? That's essentially free to calculate (if you have to read all the data anyway), and isn't vulnerable to
collisions.<spanclass=""><font color="#888888"><br /><br /> - Heikki</font></span><div class=""><div class="h5"><br
/><br/><br /><br /> -- <br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org"
target="_blank">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/<u></u>mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br />We
alsohave both crc32 and crc64 implementations in pg_crc. If the goal is just verifying file integrity (we can't really
protectagainst intentional modification) crc sounds more appropriate to me.<br /></div></div> 

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Next
From: Heikki Linnakangas
Date:
Subject: Re: WAL format and API changes (9.5)