Re: pg_checksums (or checksums in general) vs tableam - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: pg_checksums (or checksums in general) vs tableam
Date
Msg-id CABUevEyvObKHt7JBPL6qT=xN+ycg=S2U85Ug4xUuB+9rjHXV4A@mail.gmail.com
Whole thread Raw
In response to Re: pg_checksums (or checksums in general) vs tableam  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers


On Thu, Jul 11, 2019 at 2:30 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 10, 2019 at 09:19:03AM -0700, Andres Freund wrote:
> On July 10, 2019 9:12:18 AM PDT, Magnus Hagander <magnus@hagander.net> wrote:
>> That would be fine, if we actually knew. Should we (or have we already?)
>> defined a rule that they are not allowed to use the same naming standard
>> unless they have the same type of header?
>
> No, don't think we have already.  There's the related problem of
> what to include in base backups, too.

Yes.  This one needs a careful design and I am not sure exactly what
that would be.  At least one new callback would be needed, called from
basebackup.c to decide if a given file should be backed up or not
based on a path.

That wouldn't be at all enough, of course. We have to think of everybody who uses the pg_start_backup/pg_stop_backup functions (including the deprecated versions we don't want to get rid of :P). So whatever it is it has to be externally reachable. And just calling something before you start your backup won't be enough, as there can be files showing up during the backup etc.

Having a strict naming standard would help a lot with that, then you'd just need the metadata. For example, one could say that each non-default storage engine has to put all their files in a subdirectory, and inside that subdirectory they can name them whatever they want. If we do that, then all a backup tool would need to know about is all the possible subdirectories in the current installation (and *that* doesn't change frequently).

 
  But then how do you make sure that a path applies to
one table AM or another, by using a regex given by all table AMs to
see if there is a match?  How do we handle conflicts?  I am not sure
either that it is a good design to restrict table AMs to have a given
format for paths as that actually limits the possibilities when it
comes to split across data across multiple files for attributes and/or
tablespaces.  (I am a pessimistic guy by nature.)

As long as the restriction contains enough wildcards, it should hopefully be enough :) E.g. data/base/1234/zheap/whatever.they.like. 

--

pgsql-hackers by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re: pg_stat_statements vs. SELECT FOR UPDATE
Next
From: Binguo Bao
Date:
Subject: Re: [proposal] de-TOAST'ing using a iterator