Re: Btrfs clone WIP patch - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Btrfs clone WIP patch
Date
Msg-id 513B71A7.6080506@2ndQuadrant.com
Whole thread Raw
In response to Re: Btrfs clone WIP patch  (Jonathan Rogers <jrogers@socialserve.com>)
Responses Re: Btrfs clone WIP patch
List pgsql-hackers
On 3/1/13 1:40 AM, Jonathan Rogers wrote:
> I've been thinking about both of these issues and decided to try a
> different approach. This patch adds GUC options for two external
> commands

This is a reasonable approach for a proof of concept patch.  I like the 
idea you're playing with here, as a useful boost for both production 
deployments and development synchronization jobs.  There are some major 
coding hurdles for this idea to go beyond proof of concept into 
committed feature though, so I'm just going to dump all the obvious ones 
I can see on you too.  Feel free to ignore those and focus on the fun 
usability / capability parts for a while first.  Just want you to 
realize what work is in your way, but not visible to you necessarily.

With so many GUCs already, adding more of them to support something with 
a fairly narrow user base will be hard to justify committing.  This sort 
of thing seems like you could load it as an extension instead.  I don't 
think this is worth doing yet.  For now the GUC approach is fine, and it 
keeps your code sample small.

I'm going to start with how to continue validating this idea is useful 
without worrying about those, to get some more testing/benchmarking 
samples, and for all of that job what you're doing so far is fine.  The 
road toward something to commit is going to take a lot more complicated 
code eventually though.  There's no big conceptual leap involved, it's 
just where the changes need to go and how much they need to touch to do 
this accurately in all cases isn't as simple as your demo.  Handling 
unusual situations and race conditions turns out to be almost all the 
code in what you're replacing, and you'll need similar work in the 
replacements.
> I have just been experimenting with ZFS and it does not seem to have any> capability or interface for cloning
ordinaryfiles or directories so the> configuration is not as straightforward. However, I was able to set up a> Postgres
clusteras a hierarchy of ZFS file systems in the same pool> with each directory under "base" being a separate file
systemand> configure Postgres to call shell scripts which call zfs snapshot and> clone commands to do the cloning and
deleting.

To make things easier for a reviewer, it's useful to include working 
examples like this as part of your submission.  For this one that would 
include:

-A sample script that created a test pool
-Working postgresql.conf changes compatible with ZFS
-Test program showing how to exercise the feature

I think I can see how to construct such an example for the btrfs 
version, but having you show that explicitly (preferably with a whole 
sample session executing it) will also help reviewers.  Remember:  if 
you want to get your submission off to a good start, the reviewer should 
be able to run your sample test, see the code work, and do something fun 
within a few seconds of compiling it.  Make that easy for them, and your 
reviewer will start with a good impression of you and a positive outlook 
for the change.

Now onto the code nitpicking!

= Extension vs. GUC =

In addition to not polluting the postgresql.conf.sample, there's another 
reason this might make for better extension material eventually.  Adding 
new execv calls that are building strings like this is just generally a 
risky thing.  It would be nice from a security perspective if that 
entire mechanism wasn't even introduced into the server at all unless 
someone loaded the extension.

An extension implementation will end up being more code, both to add a 
useful hook for replace these calls and for the extension packaging 
itself.  Having a bit more code in contrib/ but less in src & 
postgresql.conf is probably a net positive trade though.

= Diff reduction / code refactoring =

Looks like you added a "File Operation Options" entry into guc.c but 
then not use it here?  I would just keep this in an existing category 
for now, try to reduce the diff length of the proof of concept version 
as much as possible in the beginning.

On the topic of smaller diffs, the similar cut and paste sections of the 
two entries that both do fork/exec/waitpid should really be refactored 
into one function.  The archiver does something similar for running 
archive_command, there may be some reuse or refactoring of its code to 
present this interface.

Again, this sort of refactoring is not necessary as a POC patch.  But it 
will probably come up if this moves toward commit candidate.

= Recursion and directory navigation =

> In either case, the directories are copied recursively while the
> Postgres internal copydir function does not recurse. I don't think that
> should be a problem since there shouldn't be nested directories in the
> first place.

copydir takes an option for whether it should recurse or not.

The rm side of makes me twitch for a number of reasons.  First off, 
there's just the general scariness of the concept of shelling out to run 
rm recursively with some text string you build.  The worst time I saw a 
bug in that sort of code destroyed a terabyte, and the last time I saw 
such a bug was only a week ago.  Validation you're doing the right thing 
is always job #1 in removing files.

Specifically here, take a look at src/port/dirmod.c for a) its comments 
on race conditions and b) how it does error handling.  The external rm 
will need to duplicate all of that.  I don't see how you could possibly 
replace rmtree with something simplier that has the same necessary 
properties.  I think what you're actually going to need is a hook to 
replace both the unlink and rmdir calls with your alternate 
implementation idea.

The same sort of issue is in your external_copydir.  Iterating into 
subdirectories when it doesn't happen now just isn't safe, even though 
the one expected case you're hooking won't be any different.  You really 
can't just do that.  Would this work instead, and is there any concern 
about files that start with a "."?

"cp * --reflink=auto"

Regardless, you need to keep most of the structure to copydir anyway. 
Error handling, handling cancellation, and fsync calls are all vital 
things.  You probably have to make the forked command copy a single file 
at a time to get the same interrupt handling behavior.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums
Next
From: Thom Brown
Date:
Subject: Re: [pgsql-advocacy] Call for Google Summer of Code mentors, admins