Thread: Scanning pg_tablespace from walsender

Scanning pg_tablespace from walsender

From
Magnus Hagander
Date:
I'm working on completing Heikki's patch for streaming base backups,
and have run into a problem:

In order to dump all tablespaces properly, I have to know where they
are (d'uh). In order to do that, I need to scan pg_tablespace.
However, scanning that from walsender gives me:

FATAL:  cannot read pg_class without having selected a database


Which means I somehow have to get pg_tablespace into the cache without
reading pg_class, I guess. Similar to how we do for pg_database for
example.

Can someone throw me a pointer or two on how to actually do that? :-)
Am I correct in assuming I need to add it to
RelationCacheInitializePhase2(), and to do that, need to figure out
how to define a TableSpaceRelation_Rowtype_Id in the headers? Or is
there an easier way I'm missing?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Scanning pg_tablespace from walsender

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> I'm working on completing Heikki's patch for streaming base backups,
> and have run into a problem:

> In order to dump all tablespaces properly, I have to know where they
> are (d'uh). In order to do that, I need to scan pg_tablespace.

Wait a minute.  Isn't this problem about to metastasize into "I need to
read *every* global catalog from walsender"?  If not, why not?  If so,
I think we need another answer.
        regards, tom lane


Re: Scanning pg_tablespace from walsender

From
Robert Haas
Date:
On Mon, Jan 3, 2011 at 10:25 AM, Magnus Hagander <magnus@hagander.net> wrote:
> I'm working on completing Heikki's patch for streaming base backups,
> and have run into a problem:
>
> In order to dump all tablespaces properly, I have to know where they
> are (d'uh).

Can you get that directly from the filesystem layout?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Scanning pg_tablespace from walsender

From
Magnus Hagander
Date:
On Mon, Jan 3, 2011 at 16:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> I'm working on completing Heikki's patch for streaming base backups,
>> and have run into a problem:
>
>> In order to dump all tablespaces properly, I have to know where they
>> are (d'uh). In order to do that, I need to scan pg_tablespace.
>
> Wait a minute.  Isn't this problem about to metastasize into "I need to
> read *every* global catalog from walsender"?  If not, why not?  If so,
> I think we need another answer.

Um, why would I need that? I need to be able to find all files, which
means I need to find all tablespaces. I don't see how that would turn
into "every global catalog"?

(It already needs pg_authid and similar for the login, but that's
shared with all others)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Scanning pg_tablespace from walsender

From
Magnus Hagander
Date:
On Mon, Jan 3, 2011 at 16:34, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 3, 2011 at 10:25 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> I'm working on completing Heikki's patch for streaming base backups,
>> and have run into a problem:
>>
>> In order to dump all tablespaces properly, I have to know where they
>> are (d'uh).
>
> Can you get that directly from the filesystem layout?

Hmm. I guess we could enumerate the pg_tblspc directory, and call
readlink() on all the symlinks in there. Assuming all platforms can do
readlink() (we'd obviously need a special windows implementation,  but
that's doable I guess).

I just figured it'd be a lot cleaner to read it from our own catalogs...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Scanning pg_tablespace from walsender

From
Robert Haas
Date:
On Mon, Jan 3, 2011 at 10:37 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Jan 3, 2011 at 16:34, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jan 3, 2011 at 10:25 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> I'm working on completing Heikki's patch for streaming base backups,
>>> and have run into a problem:
>>>
>>> In order to dump all tablespaces properly, I have to know where they
>>> are (d'uh).
>>
>> Can you get that directly from the filesystem layout?
>
> Hmm. I guess we could enumerate the pg_tblspc directory, and call
> readlink() on all the symlinks in there. Assuming all platforms can do
> readlink() (we'd obviously need a special windows implementation,  but
> that's doable I guess).
>
> I just figured it'd be a lot cleaner to read it from our own catalogs...

I don't even see why you'd need readlink.  Can't you just traverse the symlinks?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Scanning pg_tablespace from walsender

From
Magnus Hagander
Date:
On Mon, Jan 3, 2011 at 16:40, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 3, 2011 at 10:37 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Mon, Jan 3, 2011 at 16:34, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Jan 3, 2011 at 10:25 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> I'm working on completing Heikki's patch for streaming base backups,
>>>> and have run into a problem:
>>>>
>>>> In order to dump all tablespaces properly, I have to know where they
>>>> are (d'uh).
>>>
>>> Can you get that directly from the filesystem layout?
>>
>> Hmm. I guess we could enumerate the pg_tblspc directory, and call
>> readlink() on all the symlinks in there. Assuming all platforms can do
>> readlink() (we'd obviously need a special windows implementation,  but
>> that's doable I guess).
>>
>> I just figured it'd be a lot cleaner to read it from our own catalogs...
>
> I don't even see why you'd need readlink.  Can't you just traverse the symlinks?

Well, they need to be put back in the same location on the other
machine (slave in case of replication, tarball otherwise). If I just
traverse the symlinks, they'll just appears as a subdirectory of
pg_tblspc on the other machine, won't they?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Scanning pg_tablespace from walsender

From
Alvaro Herrera
Date:
Excerpts from Magnus Hagander's message of lun ene 03 12:25:28 -0300 2011:

> Can someone throw me a pointer or two on how to actually do that? :-)
> Am I correct in assuming I need to add it to
> RelationCacheInitializePhase2(), and to do that, need to figure out
> how to define a TableSpaceRelation_Rowtype_Id in the headers? Or is
> there an easier way I'm missing?

I think you just need to add BKI_ROWTYPE_OID and BKI_SCHEMA_MACRO to
pg_tablespace.h (yes, and pick a suitable OID for the rowtype).  Then
figure out formrdesc.

HTH

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Scanning pg_tablespace from walsender

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Jan 3, 2011 at 16:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Wait a minute. �Isn't this problem about to metastasize into "I need to
>> read *every* global catalog from walsender"? �If not, why not? �If so,
>> I think we need another answer.

> Um, why would I need that? I need to be able to find all files, which
> means I need to find all tablespaces. I don't see how that would turn
> into "every global catalog"?

Well, if you just need to find all the files, scan the symlinks in
$PGDATA/pg_tblspc/.  Don't turn a filesystem problem into a catalog
problem.
        regards, tom lane


Re: Scanning pg_tablespace from walsender

From
Robert Haas
Date:
On Mon, Jan 3, 2011 at 10:42 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Well, they need to be put back in the same location on the other
> machine (slave in case of replication, tarball otherwise). If I just
> traverse the symlinks, they'll just appears as a subdirectory of
> pg_tblspc on the other machine, won't they?

Sure, I guess you'd need to read the links if you want it to work that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Scanning pg_tablespace from walsender

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 3, 2011 at 10:42 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> Well, they need to be put back in the same location on the other
>> machine (slave in case of replication, tarball otherwise). If I just
>> traverse the symlinks, they'll just appears as a subdirectory of
>> pg_tblspc on the other machine, won't they?

> Sure, I guess you'd need to read the links if you want it to work that way.

Well, you're quietly ignoring a whole bunch of issues there, like
whether the tablespaces *should* be in the identical locations on the
other machine and how you'll deal with it if not.  Eventually there's
going to need to be some sort of "tablespace mapping" option for
replication.  But anyway, taking a base backup is fundamentally defined
as "scan the filesystem, paying no attention to catalogs" and ISTM that
it obviously should be the same way for tablespaces.
        regards, tom lane


Re: Scanning pg_tablespace from walsender

From
Magnus Hagander
Date:
On Mon, Jan 3, 2011 at 17:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jan 3, 2011 at 10:42 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> Well, they need to be put back in the same location on the other
>>> machine (slave in case of replication, tarball otherwise). If I just
>>> traverse the symlinks, they'll just appears as a subdirectory of
>>> pg_tblspc on the other machine, won't they?
>
>> Sure, I guess you'd need to read the links if you want it to work that way.
>
> Well, you're quietly ignoring a whole bunch of issues there, like
> whether the tablespaces *should* be in the identical locations on the
> other machine and how you'll deal with it if not.  Eventually there's
> going to need to be some sort of "tablespace mapping" option for
> replication.  But anyway, taking a base backup is fundamentally defined
> as "scan the filesystem, paying no attention to catalogs" and ISTM that
> it obviously should be the same way for tablespaces.

I'm doing that now, and it works fine on my linux box. Haven't looked
at a win32 implementation yet, but that can certainly be done.

As for relocating tablespaces - yes, that would be very useful. But at
this point, we *do* require them to be at the same place on the box
you restore to (whether it's a backup or a slave).

That said, it seems we don't actually ever *care* - from my quick grep
of the source, it seems we never ever read the location from the
catalog - we just store it there for reference. So in theory, we
should be able to relocate a tablespace by just changing the symlink.
But that would leave pg_tablespace and the filesystem out of sync, so
we probably shouldn't do that.

Either way, relocating tablespaces is for the future, let's start with
being able to do streaming base backups at all.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Scanning pg_tablespace from walsender

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Jan 3, 2011 at 10:42 AM, Magnus Hagander <magnus@hagander.net> wrote:
> > Well, they need to be put back in the same location on the other
> > machine (slave in case of replication, tarball otherwise). If I just
> > traverse the symlinks, they'll just appears as a subdirectory of
> > pg_tblspc on the other machine, won't they?
>
> Sure, I guess you'd need to read the links if you want it to work that way.

Have to admit, I'm not entirely sure if this is really the behavior that
makes the most sense.  My gut reaction to this is that it'd make more
sense for them to end up as directories rather than symlinks to places
that might not exist on the slave, or that might not be writable by PG
on the slave.  I can see arguments either way though and so I really
don't like the idea of it being forced one way or the other.

Here's my 2c- make it optional on the slave side and then don't complain
if the symlink already exists (even if it goes somewhere else).  My
thinking is that if someone needs to have the tablespaces reside
somewhere else on the slave, they could say "don't create the symlinks"
in the recovery config, and then manually create the symlinks where they
need them to go.
Thanks,
    Stephen

Re: Scanning pg_tablespace from walsender

From
Magnus Hagander
Date:
On Mon, Jan 3, 2011 at 17:17, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Mon, Jan 3, 2011 at 10:42 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> > Well, they need to be put back in the same location on the other
>> > machine (slave in case of replication, tarball otherwise). If I just
>> > traverse the symlinks, they'll just appears as a subdirectory of
>> > pg_tblspc on the other machine, won't they?
>>
>> Sure, I guess you'd need to read the links if you want it to work that way.
>
> Have to admit, I'm not entirely sure if this is really the behavior that
> makes the most sense.  My gut reaction to this is that it'd make more
> sense for them to end up as directories rather than symlinks to places
> that might not exist on the slave, or that might not be writable by PG
> on the slave.  I can see arguments either way though and so I really
> don't like the idea of it being forced one way or the other.
>
> Here's my 2c- make it optional on the slave side and then don't complain
> if the symlink already exists (even if it goes somewhere else).  My
> thinking is that if someone needs to have the tablespaces reside
> somewhere else on the slave, they could say "don't create the symlinks"
> in the recovery config, and then manually create the symlinks where they
> need them to go.

It's already a basic requirement that they need to be the same -
replicating over a CREATE TABLESPACE is going to fail otherwise..
Being able to deal with that in a nice way would be good, of course,
but we don't have that today.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Scanning pg_tablespace from walsender

From
Stephen Frost
Date:
* Magnus Hagander (magnus@hagander.net) wrote:
> It's already a basic requirement that they need to be the same -
> replicating over a CREATE TABLESPACE is going to fail otherwise..
> Being able to deal with that in a nice way would be good, of course,
> but we don't have that today.

If CREATE TABLESPACE replication also looked at the flag I was
proposing, it could work. :)  Of course, the admin wouldn't be able
to move the directory/change the symlink to where they actually want
it to be w/o taking the replication server down, but I'm not sure
that's a show-stopper...

It's certainly not the cleanest/nicest approach, don't get me wrong, but
I really hate the idea of forcing people to have an identical filesystem
layout on the slave that they have on the master.
Stephen