Thread: odd problem !

odd problem !

From
Oleg Bartunov
Date:
Hi there,

below is the problem I just bitten when play with toy db. I did:

1.initdb -D ./t1
2. pg_ctl -D ./t1 start
3. createdb test
4. psql test -c "create table a (f integer);"
5. run script which populates table a in background   perl bgupdate.pl &
6. cp -a ./t1 ./t2
6. pg_ctl -D ./t1 stop   it's waiting for bgupdate.pl, so I killed script
7. pg_ct -D ./t2 start   see, that everything is fine.
8. pg_ctl -D ./t2 stop
9. pg_ctl -D ./t1 start
10.

pg@zen:~/test$ psql -l        List of databases   Name    |  Owner   | Encoding 
-----------+----------+---------- template0 | postgres | KOI8 template1 | postgres | KOI8 test      | postgres | KOI8
(3 rows)

11.
pg@zen:~/test$ psql test
FATAL:  database "test" does not exist
psql: FATAL:  database "test" does not exist

So, how it's possible ?

    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: odd problem !

From
Oleg Bartunov
Date:
OK, here is more cleaner cut and paste from my notebook:

pg@zen:~/test$ initdb -D ./t1
pg@zen:~/test$  pg_ctl -D ./t1 start
postmaster starting
pg@zen:~/test$ LOG:  database system was shut down at 2005-03-23 01:09:34 MSK
LOG:  checkpoint record is at 0/A2C844
LOG:  redo record is at 0/A2C844; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 544; next OID: 17230
LOG:  database system is ready

pg@zen:~/test$ createdb test
CREATE DATABASE
pg@zen:~/test$ psql test -c "create table a (f integer);"
CREATE TABLE

$i); commit;"; done
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
pg@zen:~/test$ cp -a ./t1 ./t2
pg@zen:~/test$ rm ./t2/postmaster.pid 
rm: remove regular file ./t2/postmaster.pid'? y
pg@zen:~/test$ pg_ctl -D ./t1 stop
LOG:  received smart shutdown request
LOG:  shutting down
waiting for postmaster to shut down....LOG:  database system is shut down done
postmaster stopped
pg@zen:~/test$ LOG:  database system was interrupted at 2005-03-23 01:10:06 MSK
LOG:  checkpoint record is at 0/A2C844
LOG:  redo record is at 0/A2C844; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 544; next OID: 17230
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/A2C880
LOG:  record with zero length at 0/A4FD38
LOG:  redo done at 0/A4FD10
LOG:  database system is ready

pg@zen:~/test$ psql test -c "select count(*) from a;" count 
-------    10
(1 row)

pg@zen:~/test$ pg_ctl -D ./t2 stop
waiting for postmaster to shut down....LOG:  received smart shutdown request
LOG:  shutting down
LOG:  database system is shut down done
postmaster stopped

pg@zen:~/test$ pg_ctl -D ./t1 start
postmaster starting
pg@zen:~/test$ LOG:  database system was shut down at 2005-03-23 01:11:40 MSK
LOG:  checkpoint record is at 0/A4FD38
LOG:  redo record is at 0/A4FD38; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 568; next OID: 17243
LOG:  database system is ready

pg@zen:~/test$ psql test -c "select count(*) from a;"
ERROR:  relation "a" does not exist
ERROR:  relation "a" does not exist

So, where is my table 'a' ?

    Oleg

On Wed, 23 Mar 2005, Oleg Bartunov wrote:

> Hi there,
>
> below is the problem I just bitten when play with toy db. I did:
>
> 1.initdb -D ./t1
> 2. pg_ctl -D ./t1 start
> 3. createdb test
> 4. psql test -c "create table a (f integer);"
> 5. run script which populates table a in background
>   perl bgupdate.pl &
> 6. cp -a ./t1 ./t2
> 6. pg_ctl -D ./t1 stop
>   it's waiting for bgupdate.pl, so I killed script
> 7. pg_ct -D ./t2 start
>   see, that everything is fine.
> 8. pg_ctl -D ./t2 stop
> 9. pg_ctl -D ./t1 start
> 10.
>
> pg@zen:~/test$ psql -l
>        List of databases
>   Name    |  Owner   | Encoding -----------+----------+----------
> template0 | postgres | KOI8
> template1 | postgres | KOI8
> test      | postgres | KOI8
> (3 rows)
>
> 11.
> pg@zen:~/test$ psql test
> FATAL:  database "test" does not exist
> psql: FATAL:  database "test" does not exist
>
> So, how it's possible ?
>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: odd problem !

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> below is the problem I just bitten when play with toy db. I did:

> 1.initdb -D ./t1
> 2. pg_ctl -D ./t1 start
> 3. createdb test
> 4. psql test -c "create table a (f integer);"
> 5. run script which populates table a in background
>     perl bgupdate.pl &
> 6. cp -a ./t1 ./t2

I would not really expect this to produce a usable copy at all...

> 6. pg_ctl -D ./t1 stop
>     it's waiting for bgupdate.pl, so I killed script
> 7. pg_ct -D ./t2 start
>     see, that everything is fine.

Exactly how much did you test?  However, that doesn't seem relevant
to your subsequent problem with the original.

> 8. pg_ctl -D ./t2 stop
> 9. pg_ctl -D ./t1 start
> 10.

> pg@zen:~/test$ psql -l
>          List of databases
>     Name    |  Owner   | Encoding 
> -----------+----------+----------
>   template0 | postgres | KOI8
>   template1 | postgres | KOI8
>   test      | postgres | KOI8
> (3 rows)

> 11.
> pg@zen:~/test$ psql test
> FATAL:  database "test" does not exist
> psql: FATAL:  database "test" does not exist

What PG version is this exactly?  I suppose that you're seeing one of
the GetRawDatabase corner cases, but I'm not quite sure why the shutdown
and restart after the createdb wouldn't have fixed it.  Does vacuuming
pg_database and then checkpointing make it work?
        regards, tom lane


Re: odd problem !

From
Oleg Bartunov
Date:
On Tue, 22 Mar 2005, Tom Lane wrote:

>>   template1 | postgres | KOI8
>>   test      | postgres | KOI8
>> (3 rows)
>
>> 11.
>> pg@zen:~/test$ psql test
>> FATAL:  database "test" does not exist
>> psql: FATAL:  database "test" does not exist
>
> What PG version is this exactly?  I suppose that you're seeing one of
> the GetRawDatabase corner cases, but I'm not quite sure why the shutdown
> and restart after the createdb wouldn't have fixed it.  Does vacuuming
> pg_database and then checkpointing make it work?

REL8_0_STABLE, I believe. I posted another problem, now "cut and pasted".


>
>             regards, tom lane
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: odd problem !

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
>> What PG version is this exactly?

> REL8_0_STABLE, I believe. I posted another problem, now "cut and pasted".

I've been able to duplicate this here.  What is happening is that the
damage to ./t1 is being done when you start the postmaster in ./t2.
It looks to me like the fundamental problem is that the t2 postmaster
is replaying the WAL-logged CREATE DATABASE command from t1's xlog, and
*that log entry contains an absolute path name*.  So the CREATE replay
is wiping out t1's "test" database subdirectory.

This isn't a problem in normal use of course, but it'd be a serious
issue for someone engaging in WAL-shipping, if their backup postmaster
were living at a different absolute path.  We probably need to think
about whether we can make CREATE DATABASE log only relative paths.

Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
Not sure how to deal with that.
        regards, tom lane


Re: odd problem !

From
Oleg Bartunov
Date:
On Tue, 22 Mar 2005, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
>>> What PG version is this exactly?
>
>> REL8_0_STABLE, I believe. I posted another problem, now "cut and pasted".
>
> I've been able to duplicate this here.  What is happening is that the
> damage to ./t1 is being done when you start the postmaster in ./t2.
> It looks to me like the fundamental problem is that the t2 postmaster
> is replaying the WAL-logged CREATE DATABASE command from t1's xlog, and
> *that log entry contains an absolute path name*.  So the CREATE replay
> is wiping out t1's "test" database subdirectory.

I suspected this.
btw,is there any utility to see WAL log in human-readable format ?

>
> This isn't a problem in normal use of course, but it'd be a serious
> issue for someone engaging in WAL-shipping, if their backup postmaster
> were living at a different absolute path.  We probably need to think

right, this is normal situation if you backup to the same server.
Not sure how it's usefull, but still

> about whether we can make CREATE DATABASE log only relative paths.

any problem ?

>
> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
> Not sure how to deal with that.
>

in general case it's impossible. Just speculating, what if we have some
dedicated directory doing symbolical links there for all tablespaces ?


>             regards, tom lane
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: odd problem !

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
>> This isn't a problem in normal use of course, but it'd be a serious
>> issue for someone engaging in WAL-shipping, if their backup postmaster
>> were living at a different absolute path.  We probably need to think

> right, this is normal situation if you backup to the same server.
> Not sure how it's usefull, but still

>> about whether we can make CREATE DATABASE log only relative paths.

> any problem ?

I've committed a fix --- please check that it solves your problem.
(I have to leave for a few hours so I can't check it myself right away.)

>> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
>> Not sure how to deal with that.

> in general case it's impossible.

Yeah :-( ... needs more thought.
        regards, tom lane


Re: odd problem !

From
Bruce Momjian
Date:
Tom Lane wrote:
> Oleg Bartunov <oleg@sai.msu.su> writes:
> >> This isn't a problem in normal use of course, but it'd be a serious
> >> issue for someone engaging in WAL-shipping, if their backup postmaster
> >> were living at a different absolute path.  We probably need to think
> 
> > right, this is normal situation if you backup to the same server.
> > Not sure how it's usefull, but still
> 
> >> about whether we can make CREATE DATABASE log only relative paths.
> 
> > any problem ?
> 
> I've committed a fix --- please check that it solves your problem.
> (I have to leave for a few hours so I can't check it myself right away.)
> 
> >> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
> >> Not sure how to deal with that.
> 
> > in general case it's impossible.
> 
> Yeah :-( ... needs more thought.

Is this a TODO?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: odd problem !

From
Oleg Bartunov
Date:
I checked stable branch. no problem now.
    Oleg
On Tue, 22 Mar 2005, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
>>> This isn't a problem in normal use of course, but it'd be a serious
>>> issue for someone engaging in WAL-shipping, if their backup postmaster
>>> were living at a different absolute path.  We probably need to think
>
>> right, this is normal situation if you backup to the same server.
>> Not sure how it's usefull, but still
>
>>> about whether we can make CREATE DATABASE log only relative paths.
>
>> any problem ?
>
> I've committed a fix --- please check that it solves your problem.
> (I have to leave for a few hours so I can't check it myself right away.)
>
>>> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
>>> Not sure how to deal with that.
>
>> in general case it's impossible.
>
> Yeah :-( ... needs more thought.
>
>             regards, tom lane
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: odd problem !

From
Simon Riggs
Date:
On Tue, 2005-03-22 at 21:42 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Oleg Bartunov <oleg@sai.msu.su> writes:
> > >> This isn't a problem in normal use of course, but it'd be a serious
> > >> issue for someone engaging in WAL-shipping, if their backup postmaster
> > >> were living at a different absolute path.  We probably need to think
> > 
> > > right, this is normal situation if you backup to the same server.
> > > Not sure how it's usefull, but still
> > 
> > >> about whether we can make CREATE DATABASE log only relative paths.
> > 
> > > any problem ?
> > 
> > I've committed a fix --- please check that it solves your problem.
> > (I have to leave for a few hours so I can't check it myself right away.)
> > 
> > >> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
> > >> Not sure how to deal with that.
> > 
> > > in general case it's impossible.
> > 
> > Yeah :-( ... needs more thought.
> 
> Is this a TODO?
> 

Yes, it is.

In my experience, most people create only a single Database, then define
their Tablespaces (or change them rarely, if ever). So I've always
regarded CREATE DATABASE and TABLESPACE as lower priority items. I'd
always recommend taking another full backup immediately following such
changes, rather than rely on everything working in the rollforward. That
was true with PITR and similarly true with log shipping based upon PITR.

Best Regards, Simon Riggs



Re: odd problem !

From
Simon Riggs
Date:
On Tue, 2005-03-22 at 19:07 -0500, Tom Lane wrote:
> Oleg Bartunov <oleg@sai.msu.su> writes:
> >> This isn't a problem in normal use of course, but it'd be a serious
> >> issue for someone engaging in WAL-shipping, if their backup postmaster
> >> were living at a different absolute path.  We probably need to think
> 
> > right, this is normal situation if you backup to the same server.
> > Not sure how it's usefull, but still
> 
> >> about whether we can make CREATE DATABASE log only relative paths.
> 
> > any problem ?
> 
> I've committed a fix --- please check that it solves your problem.
> (I have to leave for a few hours so I can't check it myself right away.)
> 
> >> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
> >> Not sure how to deal with that.
> 
> > in general case it's impossible.
> 
> Yeah :-( ... needs more thought.

Just realised that this isn't just a problem with log-shipping.

If you restore a database onto a server that already has a data
directory on it in the same place as the one referenced by the restored
copy, then you'll overwrite the existing one.

That means for RPM distros where the path is initially fixed, you could
wipe out a database when trying to perform a recovery from another
system...

Oh dear. Yes, needs more thought and a bug fix for inclusion in 8.0.2+
is probably warranted.

Best Regards, Simon Riggs



Re: odd problem !

From
Simon Riggs
Date:
On Wed, 2005-03-23 at 13:07 +0000, Simon Riggs wrote:
> On Tue, 2005-03-22 at 19:07 -0500, Tom Lane wrote:
> > Oleg Bartunov <oleg@sai.msu.su> writes:
> > >> This isn't a problem in normal use of course, but it'd be a serious
> > >> issue for someone engaging in WAL-shipping, if their backup postmaster
> > >> were living at a different absolute path.  We probably need to think
> > 
> > > right, this is normal situation if you backup to the same server.
> > > Not sure how it's usefull, but still
> > 
> > >> about whether we can make CREATE DATABASE log only relative paths.
> > 
> > > any problem ?
> > 
> > I've committed a fix --- please check that it solves your problem.
> > (I have to leave for a few hours so I can't check it myself right away.)
> > 
> > >> Log-shipping CREATE TABLESPACE commands is even more interesting :-(.
> > >> Not sure how to deal with that.
> > 
> > > in general case it's impossible.
> > 
> > Yeah :-( ... needs more thought.
> 
> Just realised that this isn't just a problem with log-shipping.

...Just a problem with me reading the whole of a mail... :-(

Best Regards, Simon Riggs



Re: odd problem !

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2005-03-22 at 21:42 -0500, Bruce Momjian wrote:
>> Is this a TODO?

> Yes, it is.

> In my experience, most people create only a single Database, then define
> their Tablespaces (or change them rarely, if ever). So I've always
> regarded CREATE DATABASE and TABLESPACE as lower priority items. I'd
> always recommend taking another full backup immediately following such
> changes, rather than rely on everything working in the rollforward. That
> was true with PITR and similarly true with log shipping based upon PITR.

In the post-initlocation world, I think CREATE/DROP DATABASE are no big
problems.  The bug Oleg encountered was really just a thinko or
premature optimization on my part: these operations should log database
and tablespace OIDs and let the recovery process recompute directory
pathnames from those, but I had stupidly made them log absolute paths
instead.

There may still be some race-condition-type risks associated with CREATE
DATABASE but I don't think they are any worse for WAL replay than they
are for normal operation.  You just can't safely modify a DB that is an
active CREATE DATABASE template.

DROP TABLESPACE isn't a problem either; we were already logging just the
tablespace OID, so that should work fine.

The hard part is CREATE TABLESPACE, and the reason it's hard is that
someone might possibly want the tablespace to be located at a different
place on the recipient machine than it is on the master.  I do not see a
reasonable way to support that at the moment.  For the moment I think we
should just document that as a gotcha, and make a TODO item to find a
way to do it.
        regards, tom lane


Re: odd problem !

From
Gavin Sherry
Date:
On Wed, 23 Mar 2005, Tom Lane wrote:

> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2005-03-22 at 21:42 -0500, Bruce Momjian wrote:
> >> Is this a TODO?
>
> > Yes, it is.
>
> > In my experience, most people create only a single Database, then define
> > their Tablespaces (or change them rarely, if ever). So I've always
> > regarded CREATE DATABASE and TABLESPACE as lower priority items. I'd
> > always recommend taking another full backup immediately following such
> > changes, rather than rely on everything working in the rollforward. That
> > was true with PITR and similarly true with log shipping based upon PITR.
>

[snip]

> The hard part is CREATE TABLESPACE, and the reason it's hard is that
> someone might possibly want the tablespace to be located at a different
> place on the recipient machine than it is on the master.  I do not see a
> reasonable way to support that at the moment.  For the moment I think we
> should just document that as a gotcha, and make a TODO item to find a
> way to do it.

Here is a docs patch which adds a note to the section on PITR.

Gavin

Re: odd problem !

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
>> The hard part is CREATE TABLESPACE, and the reason it's hard is that
>> someone might possibly want the tablespace to be located at a different
>> place on the recipient machine than it is on the master.  I do not see a
>> reasonable way to support that at the moment.  For the moment I think we
>> should just document that as a gotcha, and make a TODO item to find a
>> way to do it.

> Here is a docs patch which adds a note to the section on PITR.

I already applied something about this to CVS.  Do you want to modify
the wording of what's there now?
        regards, tom lane


Re: odd problem !

From
Gavin Sherry
Date:
On Thu, 24 Mar 2005, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> >> The hard part is CREATE TABLESPACE, and the reason it's hard is that
> >> someone might possibly want the tablespace to be located at a different
> >> place on the recipient machine than it is on the master.  I do not see a
> >> reasonable way to support that at the moment.  For the moment I think we
> >> should just document that as a gotcha, and make a TODO item to find a
> >> way to do it.
>
> > Here is a docs patch which adds a note to the section on PITR.
>
> I already applied something about this to CVS.  Do you want to modify
> the wording of what's there now?

No, it looks good. I didn't see your commit.

Gavin