Thread: BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

From
maxim.boguk@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      7573
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 9.2.0
Operating system:   Linux
Description:        =


Hi,

today while performing migration of test database (with no critical data...
and that was good thing).
I found very nasty corner case with using delete_old_cluster.sh after
pg_upgrade.

Test database have a bit unusual tablespace layout:
main tablespace partition was mounted inside data directory of the old
cluster...
E.g.:
data directory - /var/lib/postgresql/9.2/main
main tablespace (another partition mount point) -
/var/lib/postgresql/9.2/main/largedb

Now funny part: migration was successful but after few days I decided to
clear old cluster data...
I echecked content of delete_old_cluster.sh but found nothing suspicious...
just one string...
rm -rf /var/lib/postgresql/9.2/main

Well I know I should be more careful, but in result that command deleted
whole tablespace data on another partition including 9.2 version
tablespace.

It was surprise...

May be it is good idea to add:
       --one-file-system
              when removing a hierarchy recursively, skip any directory that
is on a file system different from that of the corresponding command line
argument

to rm call into that script.

However, it is Linux only feature.

PS: Yes I know that keeping any foreign data inside PostgreSQL data
directory is bad idea.

Re: BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

From
Bruce Momjian
Date:
On Fri, Sep 28, 2012 at 01:18:26AM +0000, maxim.boguk@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      7573
> Logged by:          Maxim Boguk
> Email address:      maxim.boguk@gmail.com
> PostgreSQL version: 9.2.0
> Operating system:   Linux
> Description:
>
> Hi,
>
> today while performing migration of test database (with no critical data...
> and that was good thing).
> I found very nasty corner case with using delete_old_cluster.sh after
> pg_upgrade.
>
> Test database have a bit unusual tablespace layout:
> main tablespace partition was mounted inside data directory of the old
> cluster...
> E.g.:
> data directory - /var/lib/postgresql/9.2/main
> main tablespace (another partition mount point) -
> /var/lib/postgresql/9.2/main/largedb

Can you show us the data directory path of the old and new clusters?

pg_upgrade really doesn't know what is inside that old cluster, so it
just deletes everything under the data directory.

I guess I could check if the path of the old cluster somehow matches the
leading path of the new cluster, but I doubt that would be fool-proof
either, e.g. symlinks.

> May be it is good idea to add:
>        --one-file-system
>               when removing a hierarchy recursively, skip any directory that
> is on a file system different from that of the corresponding command line
> argument
>
> to rm call into that script.
>
> However, it is Linux only feature.
>
> PS: Yes I know that keeping any foreign data inside PostgreSQL data
> directory is bad idea.

I don't see how adding --one-file-system would help us.  They could have
place it under the old cluster in the same file system.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
> > Test database have a bit unusual tablespace layout:
> > main tablespace partition was mounted inside data directory of the old
> > cluster...
> > E.g.:
> > data directory - /var/lib/postgresql/9.2/main
> > main tablespace (another partition mount point) -
> > /var/lib/postgresql/9.2/main/largedb
>
> Can you show us the data directory path of the old and new clusters?
>

--old-datadir=3D/var/lib/postgresql/9.0/main
--new-datadir=3D/var/lib/postgresql/9.2/main

second partition used as tablespace were mounted as:
/var/lib/postgresql/9.0/main/largedb


> pg_upgrade really doesn't know what is inside that old cluster, so it
> just deletes everything under the data directory.
>

Hmm... may be good idea to try opposite way:
default directories and files layout in PostgreSQL data directory well
documented and almost never changes.
May be instead of rm -rf whole data directory try rm -rf only files and
directories which sure belong to the PostgreSQL?

Something along with:
1)rm -rf base global pg_clog pg_multixact ... and so on
2)produce warning if any unusual files left in data directory after that
(but not delete them).
3)delete data directory itself only if that directory completely empty
after step 1 and 2

PS: I know that solution will be not completely error-prone but it will
prevent most probably data-loss scenarios. So it's better then nothing.

PS: I also think deleting postgresql.conf and pg_hba.conf from old data
directory is wrong move too... if admin forget copy pg_hba.conf to the new
cluster - these settings could be lost forever after delete_old_cluster.sh =
.

--
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + It's impossible for everything to be true. +
>



--=20
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/=
>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

Re: BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

From
Bruce Momjian
Date:
On Thu, Oct  4, 2012 at 10:40:19AM +1000, Maxim Boguk wrote:
>
>     > Test database have a bit unusual tablespace layout:
>     > main tablespace partition was mounted inside data directory of the old
>     > cluster...
>     > E.g.:
>     > data directory - /var/lib/postgresql/9.2/main
>     > main tablespace (another partition mount point) -
>     > /var/lib/postgresql/9.2/main/largedb
>
>     Can you show us the data directory path of the old and new clusters?
>
>
> --old-datadir=/var/lib/postgresql/9.0/main
> --new-datadir=/var/lib/postgresql/9.2/main
>
> second partition used as tablespace were mounted as:
> /var/lib/postgresql/9.0/main/largedb
>
>
>
>     pg_upgrade really doesn't know what is inside that old cluster, so it
>     just deletes everything under the data directory.
>
>
> Hmm... may be good idea to try opposite way:
> default directories and files layout in PostgreSQL data directory well
> documented and almost never changes.
> May be instead of rm -rf whole data directory try rm -rf only files and
> directories which sure belong to the PostgreSQL?
>
> Something along with:
> 1)rm -rf base global pg_clog pg_multixact ... and so on
> 2)produce warning if any unusual files left in data directory after that (but
> not delete them).
> 3)delete data directory itself only if that directory completely empty after
> step 1 and 2
>
> PS: I know that solution will be not completely error-prone but it will prevent
> most probably data-loss scenarios. So it's better then nothing.
>
> PS: I also think deleting postgresql.conf and pg_hba.conf from old data
> directory is wrong move too... if admin forget copy pg_hba.conf to the new
> cluster - these settings could be lost forever after delete_old_cluster.sh .

This all seems like a step backwards and adds complexity that will fail.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +