Thread: Always include encoding of database in pg_dumpall
I've have a PostgreSQL database cluster that I've continually upgraded from 7.1 to 9.1 without problems using pg_dumpall and psql. When migrating to 9.2, I decided to change the default encoding for the database cluster from SQL_ASCII to UTF8. When I went to restore my database backup (created using 9.1's pg_dumpall), the tables containing data not in UTF8 format were empty. This appears to be caused by pg_dumpall omitting the ENCODING when dumping if it is the same as the current database cluster's default encoding. I believe this is a bad idea, as there is no guarantee the backup will be restored on a cluster with the same default encoding. This patch always sets the ENCODING for the databases, even if it is the same as the current database cluster default encoding. Thoughts? I'm not sure if similar changes should be made for LC_COLLATE or LC_CTYPE, but that's something that should be considered. Please CC me when responding as I don't currently subscribe to the list. Thanks, Jeremy
Attachment
Jeremy Evans <code@jeremyevans.net> writes: > I've have a PostgreSQL database cluster that I've continually upgraded > from 7.1 to 9.1 without problems using pg_dumpall and psql. When > migrating to 9.2, I decided to change the default encoding for the > database cluster from SQL_ASCII to UTF8. When I went to restore my > database backup (created using 9.1's pg_dumpall), the tables > containing data not in UTF8 format were empty. I assume there were some errors reported? > This appears to be caused by pg_dumpall omitting the ENCODING when > dumping if it is the same as the current database cluster's default > encoding. I believe this is a bad idea, as there is no guarantee the > backup will be restored on a cluster with the same default encoding. I'm not persuaded by this argument. The choice to omit default encoding and locale settings was intentional; if you choose to load into an installation with different defaults, it's probably because you *want* to change the encoding/locale you're using. What you propose would make it impossible to do that. Indeed, I'd have thought that was exactly the use-case you were trying to accomplish here. > I'm not sure if similar changes should be made for LC_COLLATE or > LC_CTYPE, but that's something that should be considered. It would be a particularly bad idea to change the behavior for the locale names, because those aren't consistent across platforms. If we didn't suppress them when possible, cross-platform restores would likely become impossible. regards, tom lane
On 10/19 06:55, Tom Lane wrote: > Jeremy Evans <code@jeremyevans.net> writes: > > I've have a PostgreSQL database cluster that I've continually upgraded > > from 7.1 to 9.1 without problems using pg_dumpall and psql. When > > migrating to 9.2, I decided to change the default encoding for the > > database cluster from SQL_ASCII to UTF8. When I went to restore my > > database backup (created using 9.1's pg_dumpall), the tables > > containing data not in UTF8 format were empty. > > I assume there were some errors reported? Yes. Of course, using psql to restore the dump, the error messages are interspersed with all of the notices output, making it harder to see the errors. > > This appears to be caused by pg_dumpall omitting the ENCODING when > > dumping if it is the same as the current database cluster's default > > encoding. I believe this is a bad idea, as there is no guarantee the > > backup will be restored on a cluster with the same default encoding. > > I'm not persuaded by this argument. The choice to omit default encoding > and locale settings was intentional; if you choose to load into an > installation with different defaults, it's probably because you *want* > to change the encoding/locale you're using. What you propose would make > it impossible to do that. Indeed, I'd have thought that was exactly > the use-case you were trying to accomplish here. My use case was just to change the default encoding to UTF8 so I wouldn't have to set the encoding manually when creating databases in the future, I didn't want to change the encoding of any of my current databases. I think the current behavior gives preference to usability at the expense of reliability, but I understand that that might be an appropriate decision to make. > > I'm not sure if similar changes should be made for LC_COLLATE or > > LC_CTYPE, but that's something that should be considered. > > It would be a particularly bad idea to change the behavior for the > locale names, because those aren't consistent across platforms. > If we didn't suppress them when possible, cross-platform restores > would likely become impossible. That makes sense. Thanks for considering my patch. :) Jeremy