Re: pg_basebackup bug: base backup is double the size of the database - Mailing list pgsql-admin
From | David G Johnston |
---|---|
Subject | Re: pg_basebackup bug: base backup is double the size of the database |
Date | |
Msg-id | 1421906546792-5834987.post@n5.nabble.com Whole thread Raw |
In response to | pg_basebackup bug: base backup is double the size of the database (Craig James <cjames@emolecules.com>) |
Responses |
Re: Re: pg_basebackup bug: base backup is double the size of
the database
Re: Re: pg_basebackup bug: base backup is double the size of the database Re: Re: pg_basebackup bug: base backup is double the size of the database |
List | pgsql-admin |
Craig James-2 wrote > We've encountered a serious bug with pg_basebackup. It seems to be > following hard links and duplicating all files in the tablespaces rather > than preserving links. This entire sentence doesn't make sense to me. How does one "follow" a hard-link? A soft-link yes but a hard-link is an alias to actual data. I'm not sure directory hard-linking is even allowed or used so following in that sense don't compute... > # ls -l /data/postgres-9.3/main/pg_tblspc/16747 > lrwxrwxrwx 1 postgres postgres 27 2014-08-18 11:28 > /data/postgres-9.3/main/pg_tblspc/16747 -> /postgres/tablespaces/uorsy/ > > # du -sh /data/postgres-9.3/tablespaces/uorsy > *35G* /data/postgres-9.3/tablespaces/uorsy Your tablespace points to "/postgres/tablespaces/uorsy/" yet you proceed to show us the contents of "/data/postgres-9.3/tablesapces/uorsy"... > # du -sh /data/postgres-9.3/tablespaces/uorsy/* > *35G* /data/postgres-9.3/tablespaces/uorsy/8208624 > *8.1M* /data/postgres-9.3/tablespaces/uorsy/PG_9.3_201306121 > 4.0K /data/postgres-9.3/tablespaces/uorsy/pgsql_tmp > 4.0K /data/postgres-9.3/tablespaces/uorsy/PG_VERSION > > # find /data/postgres-9.3/tablespaces/uorsy \! -links 1 -type f | wc -l > *740* > > In other words, this tablespace has 35G of real data, plus 740 hard links > that effectively duplicate each data file. I can't quite figure out what to make of the above - as others have said it looks like user error at first glance and we do not have the benefit of exploring the system or a failing test case to reject that and start exploring how pg_upgrade (if indeed that is even the culprit) could be at fault. Even if you didn't manually create the hard-links some configuration allowed them to be created where they didn't belong. It very well could be something incorrectly allowed but unusual enough that it isn't accounted for in pg_upgrade et al. Guessing what exactly that might be is going to be seen as likely futile effort. Especially since it could be something as simple as an errant copy command gone wrong that caused the situation to exist. > When we look at the same data in the archive that pg_basebackup creates > (invoked via barman), we find this: > > # du -sh /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747 > *70G* /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747 > > # du -sh /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747/* > *35G* > /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747/8208624 > *35G* > /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747/PG_9.3_201306121 > 4.0K > /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747/pgsql_tmp > 4.0K > > /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747/PG_VERSION > > # find /pg_archive/staging/base/20150114T170002/pgdata/pg_tblspc/16747 \! > -links 1 -type f | wc -l > *0* > > That is, no hard links, and all of the data files are duplicated. Of course the backup is going to create it own copy of the files...if it were to store hard (or soft) links the restoration would fail if the data being pointed to were to become corrupt. > And of course, when we try to actually use this archive to recover, it's > twice the > size as the original database and doesn't fit on our disks. > > My guess is that pg_basebackup is using (or doing the equivalent of) > rsync(1) without the --hard-links option, and that these hard links were > created by pg_upgrade when we went from 8.4.17 to 9.3.5. And how, exactly, did you perform the pg_upgrade. As mentioned down-thread pg_upgrade does use hard links; specifically to avoid duplication of data (in exchange you lose the ability to easily fall back to the old database version). I'm doubtful that it, by itself, is contributing to this problem but again my experience in this area is limited. But what you have shown us to this point is far from conclusive. > What can we do to fix this? The whole cluster is about 350 databases and > 800GB. Unfortunately I've gotten as far as I can with the limited, and slightly conflicting, information provided and the documentation for pg_upgrade and tablespaces/physical-database-layout. At first glance there seems to be some gaps in the documentation but without actually exploring the capability its only a gut feeling from trying to answer some questions while reading your post. But some of that could be not knowing if what you show is "normal". Specifically, what is uorsy/8208624 in [...]9.3/tablespaces? There are two things that can be discovered here: Is there a bug in pg_upgrade or some other tool that you are using? How do I manually fix whatever went wrong with your installation? You likely care more about the former but that likely requires more interaction that is convenient to provide via e-mail. You might have better luck on IRC or with actual support people. If you truly think this was caused by a bug then reproducing it in a self-contained script would be most helpful to the community. The other, though obviously more costly (in terms of time) fix is to pg_dump and restore to a clean setup. That likely is not necessary since your database is currently operational so some of what you are seeing must be garbage somehow dumped there at some point in the past. Others have already hinted that the hard links are said garbage - now you get to decide whether to act on that assumption or obtain more information first. David J. -- View this message in context: http://postgresql.nabble.com/pg-basebackup-bug-base-backup-is-double-the-size-of-the-database-tp5834912p5834987.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
pgsql-admin by date: