Thread: pg_upgrade: delete_old_cluster.sh issues

pg_upgrade: delete_old_cluster.sh issues

From
Marc Mamin
Date:
Hello,
 
IMHO, there is a serious issue in the script to clean the old data directory
when running pg_upgrade in link mode.
 
in short: When working with symbolic links, the first step in delete_old_cluster.sh
is to delete the old $PGDATA folder that may contain tablespaces used by the new instance.
 
in long, our use case:
 
our postgres data directories are organized as follow:
 
1) they are all registered in a root location, i.e. /opt/data,
   but can be located somewhere else using symbolic links:
 
   ll /opt/app/
   ...
   postgresql-data-1 -> /pgdata/postgresql-data-1
 
2) we have fixed names for root locations of tablespaces within $PGDATA.
   these can be real folders or again symbolic links to some other places:
 
   ll /pgdata/postgresql-data-1
   ...
   tblspc_data
   tblspc_idx -> /datarep/pg1/tblspc_idx
 
   (additionally, each schema has its own tablespaces in these locations, but this is not relevant here)
 
3 ) we do have some custom content within $PGDATA. e.g. an extra log folder used by our deployment script  
 
After running pg_upgrade, checking the tablespace location within the NEW instance:
 
ll pg_tblspc
 
16428 -> /opt/app/postgresql-data-1/tblspc_data/foo
16429 -> /opt/app/postgresql-data-1/tblspc_idx/foo
 
which, resolving the symbolic links is equivalent to:
 
  /pgdata/postgresql-data-1/tblspc_data/foo (x)
  /datarep/pg1/tblspc_idx/foo               (y)
 
I called pg_upgrade using the true paths (no symbolic links):
 
./pg_upgrade \
  --link\
  --check\
  --old-datadir "/pgdata/postgresql-data-1"\
  --new-datadir "/pgdata/postgresql_93-data-1"
 
now, checking what the cleanup script would like to do:
 
cat delete_old_cluster.sh
#!/bin/sh
 
(a) rm -rf /pgdata/postgresql-data-1
(b) rm -rf /opt/app/postgresql-data-1/tblspc_data/foo/PG_9.1_201105231
(c) rm -rf /opt/app/postgresql-data-1/tblspc_err_data/foo/PG_9.1_201105231
 
a: will delete the folder (x) which contains data for the NEW Postgres instance !
b: already gone through (a)
c: still exists in /datarep/pg1/tblspc_idx/foo  but can't be found
   as the symbolic link in /pgdata/postgresql-data-1 is already deleted through (a)
 
moreover, our custom content in $OLD_PGATA would be gone too  
 
It seems that these issues could all be avoided
while first removing the expected content of $OLD_PGATA
and then only unlink $OLD_PGATA itself when empty
(or add a note in the output of pg_restore):
 
replace
 
rm -rf /pgdata/postgresql-data-1
 
with
 
cd /pgdata/postgresql-data-1
rm -rf base
rm -rf global
rm -rf pg_clog
rm -rf pg_hba.conf (*)
rm -rf pg_ident.conf (*)
rm -rf pg_log
rm -rf pg_multixact
rm -rf pg_notify
rm -rf pg_serial
rm -rf pg_stat_tmp
rm -rf pg_subtrans
rm -rf pg_tblspc
rm -rf pg_twophase
rm -rf PG_VERSION (*)
rm -rf pg_xlog
rm -rf postgresql.conf (*)
rm -rf postmaster.log 
rm -rf postmaster.opts (*)
 
(*):  could be nice to keep as a reference.
 
best regards,
 
Marc Mamin
 

Re: pg_upgrade: delete_old_cluster.sh issues

From
Bruce Momjian
Date:
On Tue, Nov 12, 2013 at 10:35:58AM +0000, Marc Mamin wrote:
> Hello,
>  
> IMHO, there is a serious issue in the script to clean the old data directory
> when running pg_upgrade in link mode.
>  
> in short: When working with symbolic links, the first step in
> delete_old_cluster.sh
> is to delete the old $PGDATA folder that may contain tablespaces used by the
> new instance.
>  
> in long, our use case:

Rather than removing files/directories individually, which would be
difficult to maintain, we decided in pg_upgrade 9.3 to detect
tablespaces in the old data directory and report that and not create a
delete script.  Here is the commit:
      http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4765dd79219b9697d84f5c2c70f3fe00455609a1

The problem with your setup is that while you didn't pass symbolic links
to pg_upgrade, you did use symbolic links when defining the tablespaces,
so pg_upgrade couldn't recognize that the symbolic links were inside the
old data directory.

We could use readlink() to go walk over all symbolic links, but that
seems quite complex.  We could use stat() and make sure there are no
matching inodes in the old data directory, or that they are in a
different file system.  We could look for a directory named after the PG
catversion in the old data directory.  We could update the docs.

I am not sure what to do.  We never expected people would put
tablespaces in the data directory.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade: delete_old_cluster.sh issues

From
Bruce Momjian
Date:
On Mon, Nov 18, 2013 at 10:13:19PM -0500, Bruce Momjian wrote:
> On Tue, Nov 12, 2013 at 10:35:58AM +0000, Marc Mamin wrote:
> > Hello,
> >
> > IMHO, there is a serious issue in the script to clean the old data directory
> > when running pg_upgrade in link mode.
> >
> > in short: When working with symbolic links, the first step in
> > delete_old_cluster.sh
> > is to delete the old $PGDATA folder that may contain tablespaces used by the
> > new instance.
> >
> > in long, our use case:
>
> Rather than removing files/directories individually, which would be
> difficult to maintain, we decided in pg_upgrade 9.3 to detect
> tablespaces in the old data directory and report that and not create a
> delete script.  Here is the commit:
>
>        http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4765dd79219b9697d84f5c2c70f3fe00455609a1
>
> The problem with your setup is that while you didn't pass symbolic links
> to pg_upgrade, you did use symbolic links when defining the tablespaces,
> so pg_upgrade couldn't recognize that the symbolic links were inside the
> old data directory.
>
> We could use readlink() to go walk over all symbolic links, but that
> seems quite complex.  We could use stat() and make sure there are no
> matching inodes in the old data directory, or that they are in a
> different file system.  We could look for a directory named after the PG
> catversion in the old data directory.  We could update the docs.
>
> I am not sure what to do.  We never expected people would put
> tablespaces in the data directory.

I went with a doc patch, attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment