Thread: Some 8.4 changes needed according to pg_migrator testing

Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
I was just talking to Bruce about his results from testing pg_migrator,
and we realized there are a couple of changes that we need to slip into
the core code before 8.4 goes final.

1. pg_dumpall dumps CREATE DATABASE commands that include the source
database's encoding, lc_collate, and lc_ctype settings ... but if
dumping from a pre-8.4 server it just omits the lc_ settings.  This
is flat-out wrong (independently of pg_migrator).  The correct behavior
when dumping from pre-8.4 is to get the server-wide locale settings
and include those in the CREATE DATABASE commands.  Otherwise you're
not restoring the full state of the database correctly.  This is
critical in view of the 8.4 changes to make CREATE DATABASE enforce
encoding-vs-locale match --- if you try to load the dump into a server
with a different default locale, it'll probably fail, and there's
absolutely no reason why it should.

2. There seem to be some corner cases where creating a table in the new
database will not create a toast table even though there was one in the
previous instance.  (I'm not 100% convinced that this can happen if we
create and then drop dropped columns, for instance ... but I'm not
convinced it can't happen, either.)  If there is a toast table in the
old database then pg_migrator must bring it over because it might
possibly contain live data.  However, as toasting.c is presently coded
there is no way to force it to create a toast table.  I think we should
change AlterTableCreateToastTable to add a "bool force" parameter.
Alternatively we could add a separate entry point, but the option seems
a bit cleaner.

Barring objections I'll commit changes for both of these before beta2.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Alvaro Herrera
Date:
Tom Lane wrote:

> 2. There seem to be some corner cases where creating a table in the new
> database will not create a toast table even though there was one in the
> previous instance.  (I'm not 100% convinced that this can happen if we
> create and then drop dropped columns, for instance ... but I'm not
> convinced it can't happen, either.)  If there is a toast table in the
> old database then pg_migrator must bring it over because it might
> possibly contain live data.  However, as toasting.c is presently coded
> there is no way to force it to create a toast table.  I think we should
> change AlterTableCreateToastTable to add a "bool force" parameter.
> Alternatively we could add a separate entry point, but the option seems
> a bit cleaner.

Hmm, what about toast reloptions?  They are not a problem now of course, but
could be in a 8.4->8.5 migration.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> 2. There seem to be some corner cases where creating a table in the new
>> database will not create a toast table even though there was one in the
>> previous instance.

> Hmm, what about toast reloptions?  They are not a problem now of course, but
> could be in a 8.4->8.5 migration.

I don't think it's an issue.  The type of scenario we are looking at is
where there is no need for a toast table *now*, but there might be some
old rows hanging around that got toasted anyway.  (Say, you originally
had two wide varchar columns and then dropped one.)  It seems unlikely
that preserving the reloptions for the toast table is going to be all
that critical in this type of scenario.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
I wrote:
> 1. pg_dumpall dumps CREATE DATABASE commands that include the source
> database's encoding, lc_collate, and lc_ctype settings ... but if
> dumping from a pre-8.4 server it just omits the lc_ settings.  This
> is flat-out wrong (independently of pg_migrator).  The correct behavior
> when dumping from pre-8.4 is to get the server-wide locale settings
> and include those in the CREATE DATABASE commands.

Actually, there's another issue that comes to mind here: since we are
relying on platform-dependent locale names, including those in the dump
is going to pose a severe problem for porting dumps across platforms
(where "different platform" could even mean "different libc release").
We're already at risk with respect to dumps from 8.4, even without the
above-proposed change.

I am not sure what we can do about this.  Ideas?
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
"David E. Wheeler"
Date:
On May 7, 2009, at 10:18 AM, Tom Lane wrote:

> Actually, there's another issue that comes to mind here: since we are
> relying on platform-dependent locale names, including those in the  
> dump
> is going to pose a severe problem for porting dumps across platforms
> (where "different platform" could even mean "different libc release").
> We're already at risk with respect to dumps from 8.4, even without the
> above-proposed change.
>
> I am not sure what we can do about this.  Ideas?

Abandon platform-dependent locales?

Kidding! (Sort of.)

Best,

David


Re: Some 8.4 changes needed according to pg_migrator testing

From
Alvaro Herrera
Date:
Tom Lane wrote:

> Actually, there's another issue that comes to mind here: since we are
> relying on platform-dependent locale names, including those in the dump
> is going to pose a severe problem for porting dumps across platforms
> (where "different platform" could even mean "different libc release").
> We're already at risk with respect to dumps from 8.4, even without the
> above-proposed change.
> 
> I am not sure what we can do about this.  Ideas?

I don't think there's much we can do apart from telling the user not to
move stuff across platforms that do not have equally named locales.
Maybe what we can do is have a mechanism for pg_restore to map one
locale from the dump file into another.  So the user can specify a file
with lines like
"en_US := English_UnitedStates"
etc

(For text dumps, the only solution would be for the user to edit the
dump manually; perhaps provide a pg_dump switch to avoid dumping
locale config?).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Some 8.4 changes needed according to pg_migrator testing

From
Magnus Hagander
Date:
Alvaro Herrera wrote:
> Tom Lane wrote:
> 
>> Actually, there's another issue that comes to mind here: since we are
>> relying on platform-dependent locale names, including those in the dump
>> is going to pose a severe problem for porting dumps across platforms
>> (where "different platform" could even mean "different libc release").
>> We're already at risk with respect to dumps from 8.4, even without the
>> above-proposed change.
>>
>> I am not sure what we can do about this.  Ideas?
> 
> I don't think there's much we can do apart from telling the user not to
> move stuff across platforms that do not have equally named locales.
> Maybe what we can do is have a mechanism for pg_restore to map one
> locale from the dump file into another.  So the user can specify a file
> with lines like
> "en_US := English_UnitedStates"
> etc
> 
> (For text dumps, the only solution would be for the user to edit the
> dump manually; perhaps provide a pg_dump switch to avoid dumping
> locale config?).

We have a pg_dump switch that sets the encoding. Perhaps we could have a
pg_dump switch that "fakes" the output locale? Seems awfully kludgy
though - I'd much rather see us supporting it on pg_restore and just say
that if you are dumping in plaintext, well, use a plaintext editor to
edit it.

//Magnus



Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Alvaro Herrera wrote:
>> (For text dumps, the only solution would be for the user to edit the
>> dump manually; perhaps provide a pg_dump switch to avoid dumping
>> locale config?).

> We have a pg_dump switch that sets the encoding. Perhaps we could have a
> pg_dump switch that "fakes" the output locale? Seems awfully kludgy
> though - I'd much rather see us supporting it on pg_restore and just say
> that if you are dumping in plaintext, well, use a plaintext editor to
> edit it.

I don't think a solution that requires you to know about this in advance
(ie when you make the dump) is going to be very satisfactory.

I'm inclined to think that the most usable answer is to have some way of
getting CREATE DATABASE itself to apply a locale-name mapping.

Or we could try to make the user-visible locale names
platform-independent in the first place, a la David's not-silly-at-all
suggestion.  I think the part that goes "en_US" or whatever is actually
quite well standardized (except for good ol' Windows, but we could
provide a mapping from the Unix-style names to Windows names).  It's the
encoding-name part that's not very stable.  If we could hide that from
the user and tack it on internally, things would be much better.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Bernd Helmle
Date:

--On 7. Mai 2009 15:32:01 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think the part that goes "en_US" or whatever is actually
> quite well standardized (except for good ol' Windows, but we could
> provide a mapping from the Unix-style names to Windows names).

I like this idea, but i could imagine that this is pretty hard to maintain, 
once someone decides to change things suddenly?

-- 
Thanks
Bernd


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Bernd Helmle <mailings@oopsware.de> writes:
> --On 7. Mai 2009 15:32:01 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the part that goes "en_US" or whatever is actually
>> quite well standardized (except for good ol' Windows, but we could
>> provide a mapping from the Unix-style names to Windows names).

> I like this idea, but i could imagine that this is pretty hard to maintain, 
> once someone decides to change things suddenly?

Well, we'd probably want to make sure there was an escape-hatch whereby
you could specify an exact platform-dependent locale name, in case
whatever we were doing didn't work on a particular platform.  I just don't
want that to be the norm.

Possibly it would work to first try the locale name as given by the
user, and if that doesn't work (either isn't recognized, or doesn't seem
to use the right encoding) then try to map/modify it.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Bruce Momjian
Date:
Tom Lane wrote:
> 2. There seem to be some corner cases where creating a table in the new
> database will not create a toast table even though there was one in the
> previous instance.  (I'm not 100% convinced that this can happen if we
> create and then drop dropped columns, for instance ... but I'm not
> convinced it can't happen, either.)  If there is a toast table in the
> old database then pg_migrator must bring it over because it might
> possibly contain live data.  However, as toasting.c is presently coded
> there is no way to force it to create a toast table.  I think we should
> change AlterTableCreateToastTable to add a "bool force" parameter.
> Alternatively we could add a separate entry point, but the option seems
> a bit cleaner.

The bottom line is that the TOAST logic was so fluid on when it thinks a
TOAST table is needed that even if it it consistent from 8.3 -> 8.4, it
would likely break in some later release and it was just safer to add a
boolean.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Some 8.4 changes needed according to pg_migrator testing

From
"David E. Wheeler"
Date:
On May 7, 2009, at 12:32 PM, Tom Lane wrote:

> Or we could try to make the user-visible locale names
> platform-independent in the first place, a la David's not-silly-at-all
> suggestion.

Actually, what I was thinking of was using a platform-independent  
locale infrastructure: the inconsistency in behavior between platforms  
is astonishing and annoying. But this works as a stopgap.

Best,

David


Re: Some 8.4 changes needed according to pg_migrator testing

From
Peter Eisentraut
Date:
On Thursday 07 May 2009 20:54:37 Alvaro Herrera wrote:
> I don't think there's much we can do apart from telling the user not to
> move stuff across platforms that do not have equally named locales.

The other part of the problem is that there is no guarantee that equally or 
similarly named locales behave the same.  There will necessarily be a user-
beware factor here, and perhaps blowing up when the locale name is not 
recognized is safer and alerts about this fact better than trying to make it 
match by force somehow.


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Thursday 07 May 2009 20:54:37 Alvaro Herrera wrote:
>> I don't think there's much we can do apart from telling the user not to
>> move stuff across platforms that do not have equally named locales.

> The other part of the problem is that there is no guarantee that equally or 
> similarly named locales behave the same.  There will necessarily be a user-
> beware factor here, and perhaps blowing up when the locale name is not 
> recognized is safer and alerts about this fact better than trying to make it 
> match by force somehow.

We have never before operated on the assumption that it's okay for
pg_dump output to be locked to particular platforms, and I do not think
we should start now.  So we've got to do *something* about this.
(Do you really want PG Windows users to be unable to port to a better
platform?)

I note also that the problem is in pg_dumpall not pg_dump, so pg_restore
is not going to help us --- if someone has a problem his only recourse
would be manual editing of a possibly-monstrous SQL script file.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On Thursday 07 May 2009 20:54:37 Alvaro Herrera wrote:
>>> I don't think there's much we can do apart from telling the user not to
>>> move stuff across platforms that do not have equally named locales.
> 
>> The other part of the problem is that there is no guarantee that equally or 
>> similarly named locales behave the same.  There will necessarily be a user-
>> beware factor here, and perhaps blowing up when the locale name is not 
>> recognized is safer and alerts about this fact better than trying to make it 
>> match by force somehow.
> 
> We have never before operated on the assumption that it's okay for
> pg_dump output to be locked to particular platforms, and I do not think
> we should start now.  So we've got to do *something* about this.
> (Do you really want PG Windows users to be unable to port to a better
> platform?)
> 
> I note also that the problem is in pg_dumpall not pg_dump, so pg_restore
> is not going to help us --- if someone has a problem his only recourse
> would be manual editing of a possibly-monstrous SQL script file.

How about only outputting the LC_COLLATE/CTYPE options for databases 
that use a non-default setting? In the common scenarios where you have 
the same collation for the whole cluster it would work just like in 
previous releases. If you dump and restore a database with default 
locale to a cluster initialized with a different locale, the database is 
restored with the default locale of the target cluster. But if you 
explicitly set a database to use a different locale, that would be 
preserved in dumps with the caveat that you'd have to change it manually 
if you restore to a cluster on a different platform.

If we go with that, we should probably make the notion of a default 
collation explicit. We could set pg_database.datcollate/datctype column 
to NULL to mean "use the cluster default".

I don't find the idea of creating mapping tables of locale names very 
appetizing. Looking at our encoding name mapping table, there's quite a 
few different spellings of different encoding names alone, let alone all 
locale names.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Some 8.4 changes needed according to pg_migrator testing

From
Peter Eisentraut
Date:
On Friday 08 May 2009 19:09:51 Heikki Linnakangas wrote:
> How about only outputting the LC_COLLATE/CTYPE options for databases
> that use a non-default setting? In the common scenarios where you have
> the same collation for the whole cluster it would work just like in
> previous releases. If you dump and restore a database with default
> locale to a cluster initialized with a different locale, the database is
> restored with the default locale of the target cluster. But if you
> explicitly set a database to use a different locale, that would be
> preserved in dumps with the caveat that you'd have to change it manually
> if you restore to a cluster on a different platform.

That was my latest thinking as well.  And it preserves the not-uncommon use 
case that you pg_dumpall and restore your database after having initdb'ed with 
a different locale/encoding in order to, say, switch to Unicode.



Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Friday 08 May 2009 19:09:51 Heikki Linnakangas wrote:
>> How about only outputting the LC_COLLATE/CTYPE options for databases
>> that use a non-default setting?

> That was my latest thinking as well.

Logically we should handle database encoding the same way, no?

I'm not really satisfied with this as a long-term solution, but it may
be the most we can hope to squeeze into 8.4.  I'm willing to go make the
pg_dumpall changes if there are not objections/better ideas.

>> If we go with that, we should probably make the notion of a default 
>> collation explicit. We could set pg_database.datcollate/datctype column 
>> to NULL to mean "use the cluster default".

I don't find that to be a good idea, mainly because there *is* no
"cluster default" anymore.  I think it's okay for pg_dumpall to treat
the template0 settings as being the "default" for its purposes, but keep
in mind that there is nothing much stopping a DBA from replacing
template0 with another DB that has different settings.  We can't have
that break existing DBs.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:

> If we go with that, we should probably make the notion of a default  
> collation explicit. We could set pg_database.datcollate/datctype column  
> to NULL to mean "use the cluster default".

I'm not sure how this would work.  If I initdb with a certain
locale/encoding and then create a database with default locale/encoding,
how would a restore work on a cluster that has been initdb'd with a
different locale/encoding?  If you don't dump the locale specification,
it could very well not match what the user intended.


> I don't find the idea of creating mapping tables of locale names very  
> appetizing. Looking at our encoding name mapping table, there's quite a  
> few different spellings of different encoding names alone, let alone all  
> locale names.

Yeah, it doesn't seem pleasant that way.  When I proposed it I was
thinking that the mapping would be specified by the user.  OTOH since
it's only pg_dumpall output that's the problem, this idea is not be very
useful because there will be no way for a tool to do the replacement.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Heikki Linnakangas wrote:
>> If we go with that, we should probably make the notion of a default  
>> collation explicit. We could set pg_database.datcollate/datctype column  
>> to NULL to mean "use the cluster default".

> I'm not sure how this would work.  If I initdb with a certain
> locale/encoding and then create a database with default locale/encoding,
> how would a restore work on a cluster that has been initdb'd with a
> different locale/encoding?  If you don't dump the locale specification,
> it could very well not match what the user intended.

As Peter suggested, that's more or less the point.  As long as we still
have pg_dump output include the client_encoding setting, it is sensible
to try to load a dump into a different default database encoding/locale;
and in fact you can not guarantee that the new platform's locales behave
exactly the same anyway.

One problem that just occurred to me is that this solution may not be
adequate for pg_migrator.  It will need to check that the new database
has the same "default" settings (where "default" means "those of
template0") as the old installation did.  I think that's probably doable
but we'll have to check.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Bruce Momjian
Date:
Tom Lane wrote:
> One problem that just occurred to me is that this solution may not be
> adequate for pg_migrator.  It will need to check that the new database
> has the same "default" settings (where "default" means "those of
> template0") as the old installation did.  I think that's probably doable
> but we'll have to check.

I have already coded pg_migrator to get the 8.4 template0 locale  and
check for a match of that against pg_controldata of 8.3.  Are you saying
I need to spin through all the databases after they are created and
check again?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> One problem that just occurred to me is that this solution may not be
>> adequate for pg_migrator.  It will need to check that the new database
>> has the same "default" settings (where "default" means "those of
>> template0") as the old installation did.  I think that's probably doable
>> but we'll have to check.

> I have already coded pg_migrator to get the 8.4 template0 locale  and
> check for a match of that against pg_controldata of 8.3.  Are you saying
> I need to spin through all the databases after they are created and
> check again?

No, that sounds like it should be sufficient ... but you need to check
the template0 encodings match too, assuming that we make encoding and
locale work the same here.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> One problem that just occurred to me is that this solution may not be
> >> adequate for pg_migrator.  It will need to check that the new database
> >> has the same "default" settings (where "default" means "those of
> >> template0") as the old installation did.  I think that's probably doable
> >> but we'll have to check.
> 
> > I have already coded pg_migrator to get the 8.4 template0 locale  and
> > check for a match of that against pg_controldata of 8.3.  Are you saying
> > I need to spin through all the databases after they are created and
> > check again?
> 
> No, that sounds like it should be sufficient ... but you need to check
> the template0 encodings match too, assuming that we make encoding and
> locale work the same here.

OK, are you suggesting checking the pg_database.encoding for
template0 for both servers because that information isn't in either
pg_controldatas?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Some 8.4 changes needed according to pg_migrator testing

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> No, that sounds like it should be sufficient ... but you need to check
>> the template0 encodings match too, assuming that we make encoding and
>> locale work the same here.

> OK, are you suggesting checking the pg_database.encoding for
> template0 for both servers because that information isn't in either
> pg_controldatas?

Right.  Compare old template0 encoding against new template0 encoding,
and old locale information from pg_controldata against new template0
locale information.  If these match then it will be safe to let
pg_dumpall omit the corresponding CREATE DATABASE parameters.
        regards, tom lane


Re: Some 8.4 changes needed according to pg_migrator testing

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> No, that sounds like it should be sufficient ... but you need to check
> >> the template0 encodings match too, assuming that we make encoding and
> >> locale work the same here.
> 
> > OK, are you suggesting checking the pg_database.encoding for
> > template0 for both servers because that information isn't in either
> > pg_controldatas?
> 
> Right.  Compare old template0 encoding against new template0 encoding,
> and old locale information from pg_controldata against new template0
> locale information.  If these match then it will be safe to let
> pg_dumpall omit the corresponding CREATE DATABASE parameters.

OK, encoding check added to pg_migrator.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +