Re: PATCH: Exclude unlogged tables from base backups - Mailing list pgsql-hackers

From David Steele
Subject Re: PATCH: Exclude unlogged tables from base backups
Date
Msg-id 3a0be571-dc15-7384-849e-ad8f69412986@pgmasters.net
Whole thread Raw
In response to Re: PATCH: Exclude unlogged tables from base backups  (Adam Brightwell <adam.brightwell@crunchydata.com>)
List pgsql-hackers
On 1/24/18 4:02 PM, Adam Brightwell wrote:
>>> If a new unlogged relation is created after constructed the
>>> unloggedHash before sending file, we cannot exclude such relation. It
>>> would not be problem if the taking backup is not long because the new
>>> unlogged relation unlikely becomes so large. However, if takeing a
>>> backup takes a long time, we could include large main fork in the
>>> backup.
>>
>> This is a good point.  It's per database directory which makes it a
>> little better, but maybe not by much.
>>
>> Three options here:
>>
>> 1) Leave it as is knowing that unlogged relations created during the
>> backup may be copied and document it that way.
>>
>> 2) Construct a list for SendDir() to work against so the gap between
>> creating that and creating the unlogged hash is as small as possible.
>> The downside here is that the list may be very large and take up a lot
>> of memory.
>>
>> 3) Check each file that looks like a relation in the loop to see if it
>> has an init fork.  This might affect performance since an
>> opendir/readdir loop would be required for every relation.
>>
>> Personally, I'm in favor of #1, at least for the time being.  I've
>> updated the docs as indicated in case you and Adam agree.
> 
> I agree with #1 and feel the updated docs are reasonable and
> sufficient to address this case for now.
> 
> I have retested these patches against master at d6ab720360.
> 
> All test succeed.
> 
> Marking "Ready for Committer".

Thanks, Adam!

Actually, I was talking to Stephen about this it seems like #3 would be
more practical if we just stat'd the init fork for each relation file
found.  I doubt the stat would add a lot of overhead and we can track
each unlogged relation in a hash table to reduce overhead even more.

I'll look at that tomorrow and see if I can work out something practical.

-- 
-David
david@pgmasters.net


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Next
From: Pierre Ducroquet
Date:
Subject: Re: JIT compiling with LLVM v9.0