Re: Enabling Checksums - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Enabling Checksums
Date
Msg-id 1362430809.23497.127.camel@sussancws0025
Whole thread Raw
In response to Re: Enabling Checksums  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Enabling Checksums  ("ktm@rice.edu" <ktm@rice.edu>)
Re: Enabling Checksums  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
> Yeah, fragmentation will certainly hurt some workloads. But how badly, 
> and which workloads, and how does that compare with the work that 
> PostgreSQL has to do to maintain the checksums? I'd like to see some 
> data on those things.

I think we all would. Btrfs will be a major filesystem in a few years,
and we should be ready to support it.

Unfortunately, it's easier said than done. What you're talking about
seems like a significant benchmark report that encompasses a lot of
workloads. And there's a concern that a lot of it will be invalidated if
they are still improving the performance of btrfs.

> If you're serious enough about your data that you want checksums, you 
> should be able to choose your filesystem.

I simply disagree. I am targeting my feature at casual users. They may
not have a lot of data or a dedicated DBA, but the data they do have
might be very important transactional data.

And right now, if they take a backup of their data, it will contain all
of the corruption from the original. And since corruption is silent
today, then they would probably think the backup is fine, and may delete
the previous good backups.

> An apples-to-apples comparison is to run the benchmark and see what 
> happens. If it gets fragmented as hell on btrfs, and performance tanks 
> because of that, then that's your result. If avoiding fragmentation is 
> critical to the workload, then with btrfs you'll want to run the 
> defragmenter in the background to keep it in order, and factor that into 
> the test case.

Again, easier said than done. To get real fragmentation problems, the
data set needs to be huge, and we need to reach a steady state of this
background defrag process, and a million other things.

> I realize that performance testing is laborious. But we can't skip it 
> and assume that the patch performs fine, because it's hard to benchmark.

You aren't asking me to benchmark the patch in question. You are asking
me to benchmark a filesystem that very few people actually run postgres
on in production. I don't think that's a reasonable requirement.

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Enabling Checksums
Next
From: Heikki Linnakangas
Date:
Subject: Re: Enabling Checksums