Thread: Extensions, patch v18 (merge against master, bitrot-only-fixes)
Hi, Well $subject says about it all really. The bitrot of course comes from the fact that the last in-commitfest-dependency has been commited in, and I kept a version of pg_execute_sql_file() in the extension's patch. Following the discussions, and as exposing this function is not necessary for CREATE EXTENSION to work as needed, it's not exposed at the SQL level anymore. Please note that the SQL scripts seem to be encoded in latin9. I don't think there's a policy about that already, and I'd like to get confirmation about that before to go ahead and add encoding = latin9 to the control files. Whatever the master sources encoding policy is, I think explicitely marking it in the control files will be a good idea. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Thu, Dec 16, 2010 at 7:49 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Please note that the SQL scripts seem to be encoded in latin9. Seems like an odd choice. Why not UTF-8? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Dec 16, 2010 at 7:49 AM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >> Please note that the SQL scripts seem to be encoded in latin9. > > Seems like an odd choice. Why not UTF-8? Not a choice, just what's already in… -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Dec 16, 2010 at 9:04 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Dec 16, 2010 at 7:49 AM, Dimitri Fontaine >> <dimitri@2ndquadrant.fr> wrote: >>> Please note that the SQL scripts seem to be encoded in latin9. >> >> Seems like an odd choice. Why not UTF-8? > > Not a choice, just what's already in… Sure, I get it. I'm guessing that many of the scripts will work in a wide variety of encodings because they're a subset of ASCII. Should we think about converting the others to UTF-8, or is that a bad idea? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2010/12/16 Robert Haas <robertmhaas@gmail.com>: > On Thu, Dec 16, 2010 at 7:49 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > >> Please note that the SQL scripts seem to be encoded in latin9. > > Seems like an odd choice. Why not UTF-8? Latin 9 = ISO 8859-15 = a more modern version of Latin 1 (ISO 8859-1), which includes the € symbol for example. I.e., it's not that weird. As long as there are no non-ASCII characters, it seems even likely that some tools might detect a simple ASCII text file as Latin 9 by default. Nicolas
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Dec 16, 2010 at 9:04 AM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >>>> Please note that the SQL scripts seem to be encoded in latin9. >>> Seems like an odd choice. �Why not UTF-8? >> Not a choice, just what's already in� > Sure, I get it. I'm guessing that many of the scripts will work in a > wide variety of encodings because they're a subset of ASCII. Should > we think about converting the others to UTF-8, or is that a bad idea? I would think that we want to establish the same policy as we have for dictionary files: they're assumed to be UTF-8. I don't believe there should be an encoding option at all. If we didn't need one for dictionary files, there is *surely* no reason why we have to have one for extension SQL files. regards, tom lane
Extensions, patch v19 (encoding brainfart fix) (was: Extensions, patch v18 (merge against master, bitrot-only-fixes))
From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > I would think that we want to establish the same policy as we have for > dictionary files: they're assumed to be UTF-8. I don't believe there > should be an encoding option at all. If we didn't need one for > dictionary files, there is *surely* no reason why we have to have one > for extension SQL files. Brain-fart from me in the v18, that I've produced while being pressed by other distractions. Fixed now in the attached, v19. Sorry about that, I was too eager to produce the no-moving-parts "final" patch. Time to find this quote about parallelism and time lost I guess. So, attached patch fixes the v18 regression wrt to script file encoding and establish UTF-8 as the default encoding to consider to read a script file. Thanks for your comments. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Dec 16, 2010, at 8:19 AM, Tom Lane wrote: > I would think that we want to establish the same policy as we have for > dictionary files: they're assumed to be UTF-8. I don't believe there > should be an encoding option at all. If we didn't need one for > dictionary files, there is *surely* no reason why we have to have one > for extension SQL files. +1 KISS. David
Excerpts from Dimitri Fontaine's message of jue dic 16 09:49:31 -0300 2010: > Hi, > > Well $subject says about it all really. The bitrot of course comes from > the fact that the last in-commitfest-dependency has been commited in, > and I kept a version of pg_execute_sql_file() in the extension's patch. I thought the suggestion of having "version = 9.1devel" line in each contrib's module extension file was a joke. It is certainly not going to be sustainable in the long run -- I don't think we want to be modifying all control files each release. We need to find a better way to fix this. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I thought the suggestion of having "version = 9.1devel" line in each > contrib's module extension file was a joke. It is certainly not going > to be sustainable in the long run -- I don't think we want to be > modifying all control files each release. We need to find a better way > to fix this. +1. Naively enough, getting this from the Makefile looked obvious to me. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I thought the suggestion of having "version = 9.1devel" line in each > contrib's module extension file was a joke. It is certainly not going > to be sustainable in the long run -- I don't think we want to be > modifying all control files each release. We need to find a better way > to fix this. Consider also the use case where we only update contrib module minor version when they do receive an update in the tree. If we get to do that, maybe it's good enough with plain .control files. Of course having some Makefile or other support doesn't prevent anything here, it just allows some more flexibility. That was considered some more complexity not solving any real world use-case on-list, though. FWIW, it seems very likely that should we ship without the Make support I'll have to add it to every extension I maintain, separately. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> I thought the suggestion of having "version = 9.1devel" line in each >> contrib's module extension file was a joke. It is certainly not going >> to be sustainable in the long run -- I don't think we want to be >> modifying all control files each release. We need to find a better way >> to fix this. > Naively enough, getting this from the Makefile looked obvious to me. Putting those numbers in the Makefile instead of the control file surely does nothing to alleviate Alvaro's maintenance concern. However, the only way I can see to fix this "automatically" is to have the makefiles propagate PG_VERSION_NUM (or one of the other values set by configure) into generated control files. I don't think that's what we want either. If we do that, then people are going to be forced to go through an ALTER EXTENSION UPGRADE cycle whether or not anything actually changed in the extension's SQL definitions. We really only want the extension's SQL version to change when there was a meaningful change in the SQL commands. I'm not sure that I see a better answer than hand-maintained version numbers in each extension SQL file. But if that's where we're going, they should be in the SQL files, not in either the Makefiles or control files. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > However, the only way I can see to fix this "automatically" is to have > the makefiles propagate PG_VERSION_NUM (or one of the other values set > by configure) into generated control files. Ah, somewhat like what I was asked to remove from the patch, right? -EXTVERSION = $(VERSION) > I don't think that's what > we want either. If we do that, then people are going to be forced to > go through an ALTER EXTENSION UPGRADE cycle whether or not anything > actually changed in the extension's SQL definitions. We really only > want the extension's SQL version to change when there was a meaningful > change in the SQL commands. Well that's just a facility, and very useful for people that have a Make variable where the extension's version is already held. This problem of having a VERSION that moves when the extension possibly didn't change seems pretty specific to PostgreSQL, not typical for extensions authors. > I'm not sure that I see a better answer than hand-maintained version > numbers in each extension SQL file. But if that's where we're going, > they should be in the SQL files, not in either the Makefiles or control > files. You lost me. Are you wanting extension authors to maintain meta data in the SQL script, with some new syntax support, or maybe from calling a function? How do you think this will play with upgrading? That doesn't sound any simpler to me. The whole goal of the control file is to be able to register the extension in the catalog and get an OID there to manage the dependencies (so that we get a single command per extension in pg_dump output). The first design proposed an SQL command to fill in the catalog, but that command must be separate from what the DBA will have to type in to have the server execute the script, so this idea has been beaten by the control file idea (thanks goes to Heikki). Now, we could go with the current patch for starters and add some facilities around 9.2 when we have experienced the maintenance burden. And extension authors will probably be asking for some more facilities too, by then. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Excerpts from Tom Lane's message of jue dic 16 17:10:10 -0300 2010: > However, the only way I can see to fix this "automatically" is to have > the makefiles propagate PG_VERSION_NUM (or one of the other values set > by configure) into generated control files. I don't think that's what > we want either. If we do that, then people are going to be forced to > go through an ALTER EXTENSION UPGRADE cycle whether or not anything > actually changed in the extension's SQL definitions. We really only > want the extension's SQL version to change when there was a meaningful > change in the SQL commands. I've been wondering if we should stop thinking that each contrib's module version is the same as the backend version. What if we declare them all 1.0.0 for the 9.1 release, and increment the version manually each time there's a change? So a module that stays unchanged for 9.2 would still be 1.0.0. (The .so file would still need to be recompiled for the new server version, because of PG_MODULE_MAGIC.) In that case, having hand-maintained version numbers in control or wherever is not so much of a problem; the committer that touches the module needs to ensure that the version number is incremented too. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of jue dic 16 17:10:10 -0300 2010: >> However, the only way I can see to fix this "automatically" is to have >> the makefiles propagate PG_VERSION_NUM (or one of the other values set >> by configure) into generated control files. I don't think that's what >> we want either. If we do that, then people are going to be forced to >> go through an ALTER EXTENSION UPGRADE cycle whether or not anything >> actually changed in the extension's SQL definitions. We really only >> want the extension's SQL version to change when there was a meaningful >> change in the SQL commands. > I've been wondering if we should stop thinking that each contrib's > module version is the same as the backend version. Right, they would have to be decoupled if you believe what I said above. > In that case, having hand-maintained version numbers in control or > wherever is not so much of a problem; the committer that touches the > module needs to ensure that the version number is incremented too. It would be tolerable, if perhaps error-prone. But we've seldom blown it on catversion, and this would be a comparable requirement. regards, tom lane
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> However, the only way I can see to fix this "automatically" is to have >> the makefiles propagate PG_VERSION_NUM (or one of the other values set >> by configure) into generated control files. > > Ah, somewhat like what I was asked to remove from the patch, right? I've been pointed off-list that this message ain't conveying the meaning I'm attaching it, sorry about that. What I mean is that should we change our opinion again the code to do that has already been written. Allow me to insist on "we": it's not like I feel forced into changing the design again and again, I realise I'm acting eagerly upon group decision making steps before to ensure it's the final step. Well, we all have been reading over-stressed exchanges on this list before, right? :) Will now step back on the topic, or try to... Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: Extensions, patch v19 (encoding brainfart fix) (was: Extensions, patch v18 (merge against master, bitrot-only-fixes))
From
Itagaki Takahiro
Date:
On Fri, Dec 17, 2010 at 02:00, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > So, attached patch fixes the v18 regression wrt to script file encoding > and establish UTF-8 as the default encoding to consider to read a script > file. Thanks for your comments. You probably compared wrong versions of source trees. The patch contains many diffs not related to EXTENSION. It cannot be applied cleanly. BTW, only earthdistance.sql.in has @extschema@. Didn't you forget to remove it? --- b/contrib/earthdistance/earthdistance.sql.in ! SET search_path = @extschema@; I've not read the patch well yet, but I'd like to make sure of two specs in the patch: #1. I found the patch specifies "version" in each control file. We will need to maintain them manually, but I assume it was a conclusion in the discussion for v18 patch. So, no more changes are required here. --- b/contrib/XXX/XXX.control + version = '9.1devel' #2. The patch replaces "\i XXX.sql" to "CREATE EXTENSION XXX". They are a bit different because CREATE EXTENSION uses the installed sql files instead of the source directory. But I think this is the correct behavior. We should have used only installed files because they are used in "make *installcheck*". *** a/contrib/XXX/sql/XXX.sql *************** SET client_min_messages = warning; ! \set ECHO none ! \i XXX.sql ! \set ECHO all RESET client_min_messages; --- 7,13 ---- SET client_min_messages = warning; ! CREATE EXTENSION XXX; RESET client_min_messages; -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > You probably compared wrong versions of source trees. The patch contains > many diffs not related to EXTENSION. It cannot be applied cleanly. Ouch, sorry about that, will revise soon'ish. Meanwhile the git repo is still available here: http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension > BTW, only earthdistance.sql.in has @extschema@. > Didn't you forget to remove it? No. It so happens that earthdistance depends on cube and that we agreed on not implement dependencies between extensions for the first patch. The problem here is that ALTER EXTENSION earthdistance SET SCHEMA foo; would relocate cube objects too, because there's nothing to stop the recursive walking at the right spot. So earthdistance.control states that earthdistance is not relocatable for now. And a non relocatable extension will use @extschema@ so that users still get a say in where to install it… > I've not read the patch well yet, but I'd like to make sure of > two specs in the patch: > > #1. I found the patch specifies "version" in each control file. We will > need to maintain them manually, but I assume it was a conclusion in the > discussion for v18 patch. So, no more changes are required here. Exactly, the consensus seems to be that we *want* to maintain separate versions number for each contrib. > #2. The patch replaces "\i XXX.sql" to "CREATE EXTENSION XXX". They are a > bit different because CREATE EXTENSION uses the installed sql files instead > of the source directory. But I think this is the correct behavior. We should > have used only installed files because they are used in "make *installcheck*". Yeah. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Extensions, patch v20 (bitrot fixes) (was: Extensions, patch v19 (encoding brainfart fix))
From
Dimitri Fontaine
Date:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >> You probably compared wrong versions of source trees. The patch contains >> many diffs not related to EXTENSION. It cannot be applied cleanly. > > Ouch, sorry about that, will revise soon'ish. Meanwhile the git repo is > still available here: > > http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension Here it is, attached. The problem was of course due to git branches and missing local pulling and merging. Going gradually from 7 to 2 branches causes that, sometimes. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Re: Extensions, patch v20 (bitrot fixes) (was: Extensions, patch v19 (encoding brainfart fix))
From
Robert Haas
Date:
On Sat, Dec 18, 2010 at 2:39 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Here it is, attached. The problem was of course due to git branches and > missing local pulling and merging. Going gradually from 7 to 2 branches > causes that, sometimes. I spent a little time looking at this tonight. I'm going to give you the same general advice that I've given other people who have submitted very large patches of this type: it'll be a lot easier to get this committed if we can break it into smaller pieces. A pretty easy thing to do would be to separate the changes that turn the existing contrib modules into extensions into its own patch. A possibly more controversial idea is to actually remove the notion of a relocatable extension from this patch, and all the supporting machinery, and make that a follow-on patch. I think it was arranged like that before and somehow got collapsed, but maybe the notion is worth revisiting, because this is a lot of code: 224 files changed, 4713 insertions(+), 293 deletions(-) That issue aside, I think there is still a fair amount of cleanup and code review that needs to happen here before this can be considered for commit. Here are a few things I noticed on a first read-through: - pg_execute_sql_string() and pg_execute_sql_file() are documented, but are not actually part of the patch. - The description of the NO USER DATA option is almost content-free. It basically says "this option is here because it wouldn't work if we didn't have this option". - listDependentObjects() appears to be largely cut-and-pasted from some other function. It calls AcquireDeletionLock() and the comments mention constructing a list of objects to delete. It sure doesn't seem right to be acquiring AccessExclusiveLocks on lots of things in a function that's there to support the pg_extension_objects SRF. - This bit certainly violates our message style guidelines: + elog(NOTICE, + "Installing extension '%s' from '%s', in schema %s, %s user data", + stmt->extname, + get_extension_absolute_path(control->script), + schema ? schema : "public", + create_extension_with_user_data ? "with" : "without"); User-facing messages need to be ereport, not elog, so they can be translated, and mustn't be assembled from pieces, as you've done with "%s user data". - Has the issue of changing custom_variable_classes from PGC_SIGHUP to PGC_SUSET been discussed? I am not sure whether that's an OK thing to do. If it is OK, then the documentation also needs updating. - This comment looks like leftovers: + /* FIXME: add PGC_EXTENSION so that we don't abuse PGC_SIGHUP here? */ + SetConfigOption("custom_variable_classes", + newval, PGC_SIGHUP, PGC_S_EXTENSION); Apologies if I missed the previous discussion of this, but why are we adding a new GUC context? - Did we decide to ditch the encoding parameter for extension scripts and mandate UTF-8? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Extensions, patch v20 (bitrot fixes) (was: Extensions, patch v19 (encoding brainfart fix))
From
"David E. Wheeler"
Date:
On Dec 18, 2010, at 7:04 PM, Robert Haas wrote: > - Did we decide to ditch the encoding parameter for extension scripts > and mandate UTF-8? +1 It was certainly suggested. I think it's a good idea, at least with a first cut. Best, David
Re: Extensions, patch v20 (bitrot fixes) (was: Extensions, patch v19 (encoding brainfart fix))
From
Robert Haas
Date:
On Sat, Dec 18, 2010 at 10:04 PM, Robert Haas <robertmhaas@gmail.com> wrote: > - Has the issue of changing custom_variable_classes from PGC_SIGHUP to > PGC_SUSET been discussed? I am not sure whether that's an OK thing to > do. If it is OK, then the documentation also needs updating. > > - This comment looks like leftovers: > > + /* FIXME: add PGC_EXTENSION so that we don't abuse PGC_SIGHUP here? */ > + SetConfigOption("custom_variable_classes", > + newval, PGC_SIGHUP, PGC_S_EXTENSION); > > Apologies if I missed the previous discussion of this, but why are we > adding a new GUC context? Looking at this a little more, I am inclined to think that ExtensionSetCVC() is entirely unacceptable. Our backend startup is high enough already. Sequential scanning the pg_extension catalog on every startup to spare the DBA the trouble of setting up postgresql.conf strikes me as a bad trade-off. If you were to come back and say that custom_variable_classes is a vile hack from the darkest reaches of wherever vile hacks originate from, I would agree with you. Having a GUC that controls what names can be used as GUCs is probably a bad design, and I'd like to come up with something better. But I don't think this is it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Thanks for your review and your time. Trying to answer some of your points there: Robert Haas <robertmhaas@gmail.com> writes: > I spent a little time looking at this tonight. I'm going to give you > the same general advice that I've given other people who have > submitted very large patches of this type: it'll be a lot easier to > get this committed if we can break it into smaller pieces. A pretty > easy thing to do would be to separate the changes that turn the > existing contrib modules into extensions into its own patch. Those are pretty mechanical, so applying them after would be quite easy. That said I don't know exactly how much it would ease the review, but if you say it does, I'll separate the patches. > A > possibly more controversial idea is to actually remove the notion of a > relocatable extension from this patch, and all the supporting > machinery, and make that a follow-on patch. I think it was arranged > like that before and somehow got collapsed, but maybe the notion is > worth revisiting, because this is a lot of code: > > 224 files changed, 4713 insertions(+), 293 deletions(-) It was like that before indeed. The problem is how to reliably implement the CREATE EXTENSION WITH SCHEMA, which is like the second best thing about all this infrastructure work just behind the pg_dump support. I'm not clear that this option should come on its own in a later patch or that doing so simplifies anything, but not opinionated about that: let's organise ourselves as best as we are able to. Again, will split the code in a third patch if that's what's necessary. > That issue aside, I think there is still a fair amount of cleanup and > code review that needs to happen here before this can be considered > for commit. Here are a few things I noticed on a first read-through: > > - pg_execute_sql_string() and pg_execute_sql_file() are documented, > but are not actually part of the patch. Ouch, git ain't that magic after all. Will clean-up next version(s). > - The description of the NO USER DATA option is almost content-free. > It basically says "this option is here because it wouldn't work if we > didn't have this option". + (see <xref linkend="functions-extension">), this option allow + extension authors to prepare installation scripts that will avoid + creating user data when restoring. I'd need some specifics here to be able to do much better. My guess is that an example is what needed, and it would fit in the main section of the manual about it, 35.16. Putting it all together: create extension. > - listDependentObjects() appears to be largely cut-and-pasted from > some other function. It calls AcquireDeletionLock() and the comments > mention constructing a list of objects to delete. It sure doesn't > seem right to be acquiring AccessExclusiveLocks on lots of things in a > function that's there to support the pg_extension_objects SRF. Ah well, true. AccessShareLock seems the minimum we can take, as we surely want to prevent the extension and its objects disappearing under us, right? > - This bit certainly violates our message style guidelines: > > + elog(NOTICE, > + "Installing extension '%s' from '%s', in schema %s, > %s user data", > + stmt->extname, > + get_extension_absolute_path(control->script), > + schema ? schema : "public", > + create_extension_with_user_data ? "with" : "without"); > > User-facing messages need to be ereport, not elog, so they can be > translated, and mustn't be assembled from pieces, as you've done with > "%s user data". This message is pretty useful for debug and review of the patch, but I'm not sure we want to keep after that... > - Has the issue of changing custom_variable_classes from PGC_SIGHUP to > PGC_SUSET been discussed? I am not sure whether that's an OK thing to > do. If it is OK, then the documentation also needs updating. I've been talking about it since the very firsts versions of the patch on this list but didn't receive much answer. You have a detailed mail about that later in the thread, will start a new thread about that from there. > - This comment looks like leftovers: > > + /* FIXME: add PGC_EXTENSION so that we don't abuse PGC_SIGHUP here? */ > + SetConfigOption("custom_variable_classes", > + newval, PGC_SIGHUP, PGC_S_EXTENSION); > > Apologies if I missed the previous discussion of this, but why are we > adding a new GUC context? Well we're using PGC_SIGHUP and PGC_S_EXTENSION, should we use PGC_EXTENSION too is what the comment asks, so it's not very clear but not a leftover. We're adding a new GUC source here, not a new context. Let's include that in the new thread about cvc, though. > - Did we decide to ditch the encoding parameter for extension scripts > and mandate UTF-8? No we didn't, we decided that the default encoding is UTF-8 and that any contrib script defaults to UTF-8, so that it's not necessary to care about the 'encoding' parameter in the control file there. Itagaki required that the extension code is encoding aware, and it wasn't clear that the control file 'encoding' parameter would be enough in his side of the world, so the encoding support is currently offered to authors in the control file and users still can override it in the CREATE EXTENSION command. I'm about sure that we don't want to remove that support, and I think that the command part of it ain't required, but that's for people with complex encoding problems to tell us IMO. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Extensions and custom_variable_classes (was: Extensions, patch v20 (bitrot fixes))
From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes: > Looking at this a little more, I am inclined to think that > ExtensionSetCVC() is entirely unacceptable. Our backend startup is > high enough already. Sequential scanning the pg_extension catalog on > every startup to spare the DBA the trouble of setting up > postgresql.conf strikes me as a bad trade-off. That seems like a reasonable viewpoint, but... > If you were to come > back and say that custom_variable_classes is a vile hack from the > darkest reaches of wherever vile hacks originate from, I would agree > with you. Having a GUC that controls what names can be used as GUCs > is probably a bad design, and I'd like to come up with something > better. But I don't think this is it. ... users IMO should not be concerned with custom_variable_classes at all. The only users that know about it have already written an extension in C. If some are using it in another context then even with my edits they still can do so. Now, for people following but not reading the patch, what's in is that in order for extensions using custom_variable_classes to work without the user having to care about it, I've added an step at backend startup time to seqscan pg_extension and update custom_variable_classes from this catalog. When adding new entries to custom_variable_classes, known invalid placeholders used to raise an error, that's no longer the case, we keep them aside, as this comment in guc-file.l tells: * 1. The class name is not valid according to the (new) setting * of custom_variable_classes. If so, we would reject, butto * support custom_variable_classes being PGC_SUSET, we maintain * a list of not-yet-valid items. * * This list willbe processed at assign_custom_variable_classes * time: each time cvc changes, we have candidates to * consider. Sucha time is for example ExtensionSetCVC() from * backend post init, where we read additional cvc in * pg_extension catalog. Now, I can see this mechanism evolving in several directions. Either we remove it entirely, or we add a list ok known custom_variable_classes and SINVAL support for it so that as soon as the setting changes, all concurrent backends know about it, and so that you have a catalog cache to walk through rather than a real catalog to seqscan and backend init. Other ideas? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Dec 19, 2010 at 5:30 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I spent a little time looking at this tonight. I'm going to give you >> the same general advice that I've given other people who have >> submitted very large patches of this type: it'll be a lot easier to >> get this committed if we can break it into smaller pieces. A pretty >> easy thing to do would be to separate the changes that turn the >> existing contrib modules into extensions into its own patch. > > Those are pretty mechanical, so applying them after would be quite > easy. That said I don't know exactly how much it would ease the review, > but if you say it does, I'll separate the patches. Well, I don't know who is ultimately going to end up applying this. I can only say what helps me. If somebody else wants to jump in here and say they're good to review and commit the whole thing in one go, so be it... >> A >> possibly more controversial idea is to actually remove the notion of a >> relocatable extension from this patch, and all the supporting >> machinery, and make that a follow-on patch. I think it was arranged >> like that before and somehow got collapsed, but maybe the notion is >> worth revisiting, because this is a lot of code: >> >> 224 files changed, 4713 insertions(+), 293 deletions(-) > > It was like that before indeed. The problem is how to reliably implement > the CREATE EXTENSION WITH SCHEMA, which is like the second best thing > about all this infrastructure work just behind the pg_dump support. I'm > not clear that this option should come on its own in a later patch or > that doing so simplifies anything, but not opinionated about that: let's > organise ourselves as best as we are able to. I'm not totally certain of it either, but I think it might. >> - The description of the NO USER DATA option is almost content-free. >> It basically says "this option is here because it wouldn't work if we >> didn't have this option". > > + (see <xref linkend="functions-extension">), this option allow > + extension authors to prepare installation scripts that will avoid > + creating user data when restoring. > > I'd need some specifics here to be able to do much better. My guess is > that an example is what needed, and it would fit in the main section of > the manual about it, 35.16. Putting it all together: create extension. No, I think you need to describe what the option actually does, as opposed to what problem it solves. >> - listDependentObjects() appears to be largely cut-and-pasted from >> some other function. It calls AcquireDeletionLock() and the comments >> mention constructing a list of objects to delete. It sure doesn't >> seem right to be acquiring AccessExclusiveLocks on lots of things in a >> function that's there to support the pg_extension_objects SRF. > > Ah well, true. AccessShareLock seems the minimum we can take, as we > surely want to prevent the extension and its objects disappearing under > us, right? On two minutes thought, I'm not entirely sure that we need to take locks at all here, but if we do, AccessShareLock is certainly the right one. One idea I have is that we might want to move AcquireDeletionLock to objectaddress.c or maybe even somewhere in storage/lmgr, rename it to something like LockObjectByAddress, and give it a second argument for the lock strength. >> - This bit certainly violates our message style guidelines: >> >> + elog(NOTICE, >> + "Installing extension '%s' from '%s', in schema %s, >> %s user data", >> + stmt->extname, >> + get_extension_absolute_path(control->script), >> + schema ? schema : "public", >> + create_extension_with_user_data ? "with" : "without"); >> >> User-facing messages need to be ereport, not elog, so they can be >> translated, and mustn't be assembled from pieces, as you've done with >> "%s user data". > > This message is pretty useful for debug and review of the patch, but I'm > not sure we want to keep after that... Either fix it or take it out. If you're proposing this for commit, it shouldn't contain anything you don't want committed. >> - Has the issue of changing custom_variable_classes from PGC_SIGHUP to >> PGC_SUSET been discussed? I am not sure whether that's an OK thing to >> do. If it is OK, then the documentation also needs updating. > > I've been talking about it since the very firsts versions of the patch > on this list but didn't receive much answer. You have a detailed mail > about that later in the thread, will start a new thread about that from > there. Sorry, I missed that discussion. As you know I try to follow -hackers pretty closely, but apparently not closely enough in this case. Can you provide links to the archives? >> - This comment looks like leftovers: >> >> + /* FIXME: add PGC_EXTENSION so that we don't abuse PGC_SIGHUP here? */ >> + SetConfigOption("custom_variable_classes", >> + newval, PGC_SIGHUP, PGC_S_EXTENSION); >> >> Apologies if I missed the previous discussion of this, but why are we >> adding a new GUC context? > > Well we're using PGC_SIGHUP and PGC_S_EXTENSION, should we use > PGC_EXTENSION too is what the comment asks, so it's not very clear but > not a leftover. We're adding a new GUC source here, not a new context. > > Let's include that in the new thread about cvc, though. OK. >> - Did we decide to ditch the encoding parameter for extension scripts >> and mandate UTF-8? > > No we didn't, we decided that the default encoding is UTF-8 and that any > contrib script defaults to UTF-8, so that it's not necessary to care > about the 'encoding' parameter in the control file there. > > Itagaki required that the extension code is encoding aware, and it > wasn't clear that the control file 'encoding' parameter would be enough > in his side of the world, so the encoding support is currently offered > to authors in the control file and users still can override it in the > CREATE EXTENSION command. > > I'm about sure that we don't want to remove that support, and I think > that the command part of it ain't required, but that's for people with > complex encoding problems to tell us IMO. Oh, I wasn't aware that Itagaki-san had objected to Tom's proposal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>> - Did we decide to ditch the encoding parameter for extension scripts >>> and mandate UTF-8? >> >> No we didn't, we decided that the default encoding is UTF-8 and that any >> contrib script defaults to UTF-8, so that it's not necessary to care >> about the 'encoding' parameter in the control file there. >> >> Itagaki required that the extension code is encoding aware, and it >> wasn't clear that the control file 'encoding' parameter would be enough >> in his side of the world, so the encoding support is currently offered >> to authors in the control file and users still can override it in the >> CREATE EXTENSION command. I think 'encoding' parameter is enough. Of course embedding encoding specifiers in SQL files are better, but there would be technical problems. (Just for reference: http://www.python.org/dev/peps/pep-0263/ ) >> I'm about sure that we don't want to remove that support, and I think >> that the command part of it ain't required, but that's for people with >> complex encoding problems to tell us IMO. > > Oh, I wasn't aware that Itagaki-san had objected to Tom's proposal. I agree that "the default encoding is UTF-8", but it should be configurable by the 'encoding' parameter in control files. If we use UTF-8 as the the default encoding, we need to treat 3 encodings at once (server, client, and UTF-8) anyway. So, I think no additional complexity will be added even if we support a configurable encoding as the third encoding. -- Itagaki Takahiro
Re: Extensions and custom_variable_classes (was: Extensions, patch v20 (bitrot fixes))
From
Robert Haas
Date:
On Sun, Dec 19, 2010 at 5:41 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > ... users IMO should not be concerned with custom_variable_classes at > all. The only users that know about it have already written an extension > in C. I agree with the goal. > Now, I can see this mechanism evolving in several directions. Either we > remove it entirely, or we add a list ok known custom_variable_classes > and SINVAL support for it so that as soon as the setting changes, all > concurrent backends know about it, and so that you have a catalog cache > to walk through rather than a real catalog to seqscan and backend init. It'd certainly be better to jigger this so that we do syscache lookups for each CVC that is actually needed rather than seq-scanning the whole catalog. Even that is going to be slower than the present method, but at least the overhead is proportional to the number of CVCs you're actually using - and in particular, it's zero if no CVCs are in use, which must be regarded as one of the common cases. But I'm not sure that by itself is enough to save this proposal. Right now, custom_variable_classes is a server-wide setting. But what happens if you install an extension into database A and then set a related configuration parameter in postgresql.conf? When connecting to database A, things are OK. But as soon as someone tries to connect to database B, the wheels come off. Either the connection fails (which seems pretty undesirable) or it works but we don't process those GUCs (which is outright broken if the extension requires that every backend see the same value - think for example of a PGC_POSTMASTER option controlling shared memory allocation). My gut feeling at the moment is that, given enough time and thought, there may well be a way to work through all of this and come up with a design that works. But we're pretty tight on time right now, and even if we had a perfect design today, I think I'd still argue for putting it in a separate patch. I think that the value of extensions is first and foremost that they can simplify installing, removing, dumping, and restoring extensions. I'd rather see us nail that, and then worry about custom_variable_classes as a separate issue, likely for 9.2 or beyond. Otherwise, I am afraid (as I am for all the big patches we have in the works at this point) that we may end up with nothing. That would be a real shame. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Extensions and custom_variable_classes (was: Extensions, patch v20 (bitrot fixes))
From
Itagaki Takahiro
Date:
On Sun, Dec 19, 2010 at 23:45, Robert Haas <robertmhaas@gmail.com> wrote: > I think I'd still argue for putting > it in a separate patch. I think that the value of extensions is first > and foremost that they can simplify installing, removing, dumping, and > restoring extensions. I'd rather see us nail that, and then worry > about custom_variable_classes as a separate issue, likely for 9.2 or > beyond. +1 to split the custom_variable_classes issue. It's a longstanding TODO item, but EXTENSION is still very useful without the fix. It's my guess that ExtensionSetCVC() can initialize modules one-by-one and be called on demand, for example, at SHOW, searching pg_settings, or assignment to variables with unknown classes. -- Itagaki Takahiro
Re: Extensions and custom_variable_classes (was: Extensions, patch v20 (bitrot fixes))
From
Tom Lane
Date:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Now, for people following but not reading the patch, what's in is that > in order for extensions using custom_variable_classes to work without > the user having to care about it, I've added an step at backend startup > time to seqscan pg_extension and update custom_variable_classes from > this catalog. I agree with Robert that that is an utterly horrid, broken concept. Just to point out one concrete problem: the postmaster reads postgresql.conf too, so it would have to do this as well in order to parse postgresql.conf correctly. Please remove it. If you think of a non-broken way to do this later, we can revisit the problem then. regards, tom lane
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >> Oh, I wasn't aware that Itagaki-san had objected to Tom's proposal. > I agree that "the default encoding is UTF-8", but it should be > configurable by the 'encoding' parameter in control files. Why is it necessary to have such a parameter at all? AFAICS it just adds complexity for little if any gain. Most extension files will probably be pure ASCII anyway. Dictionary files are *far* more likely to contain non-ASCII characters. If we've gotten along fine with requiring dictionary files to be UTF8, I can't see any reason why we can't or shouldn't take the same approach to extension files. > So, I think no additional complexity will be added even if we > support a configurable encoding as the third encoding. This is nonsense. The mere existence of the parameter requires code to support it and user documentation to explain it. regards, tom lane
Re: Extensions and custom_variable_classes (was: Extensions, patch v20 (bitrot fixes))
From
Robert Haas
Date:
On Sun, Dec 19, 2010 at 11:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I agree with Robert that that is an utterly horrid, broken concept. That's a little stronger than what I said, though the same general idea. > Just to point out one concrete problem: the postmaster reads > postgresql.conf too, so it would have to do this as well in order to > parse postgresql.conf correctly. This is an even more serious problem than what I was pointing out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> writes: > Just to point out one concrete problem: the postmaster reads > postgresql.conf too, so it would have to do this as well in order to > parse postgresql.conf correctly. Well, if I parse you correctly, in my patch, it does. There's another GUC array where to store invalid placeholders so that we can revisit later. That even allows custom_variable_classes not to be parsed first in the list, even if that's not put into use in the patch. > Please remove it. If you think of a non-broken way to do this later, > we can revisit the problem then. Sure. I'm curious about understanding how it's broken and if there's any will at all to have something better (and what better would be), but it's now obvious that it's not going to help the extension patch. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > +1 to split the custom_variable_classes issue. It's a longstanding > TODO item, but EXTENSION is still very useful without the fix. > > It's my guess that ExtensionSetCVC() can initialize modules one-by-one > and be called on demand, for example, at SHOW, searching pg_settings, > or assignment to variables with unknown classes. Just so that we're on the same line here, if we are to remove custom_variable_classes then we don't keep any GUC related code into the extension patch, right? So that ExtensionSetCVC() and friends disappear entirely. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Tom Lane <tgl@sss.pgh.pa.us> writes: > Why is it necessary to have such a parameter at all? AFAICS it just > adds complexity for little if any gain. Most extension files will > probably be pure ASCII anyway. Dictionary files are *far* more likely > to contain non-ASCII characters. If we've gotten along fine with > requiring dictionary files to be UTF8, I can't see any reason why we > can't or shouldn't take the same approach to extension files. Don't forget that extensions are not just contrib or third party Open Source software, but a lot of in-house code, mostly functions written in SQL and PLpgSQL. In non-English speaking countries, the parameter names and comments are typically not written in English. When we're talking Japan they have some quite specifics needs and I came to understand that the database encoding and the script encoding are not to be the same, usually. So I still vote for handling this parameter. >> So, I think no additional complexity will be added even if we >> support a configurable encoding as the third encoding. > > This is nonsense. The mere existence of the parameter requires code > to support it and user documentation to explain it. The whole documentation needs to be: <varlistentry> <term><replaceable class="parameter">encoding</replaceable></term> <listitem> <para> The <literal>encoding</literal> in which the script file is read. </para> </listitem> </varlistentry> The code to support that is on the order of 25 lines of code, once we get rid of the SQL command level support for it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Dec 20, 2010 at 01:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I agree that "the default encoding is UTF-8", but it should be >> configurable by the 'encoding' parameter in control files. > > Why is it necessary to have such a parameter at all? UTF-8 is not a superset of all encodings. -- Itagaki Takahiro
On Sun, Dec 19, 2010 at 1:35 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Just to point out one concrete problem: the postmaster reads >> postgresql.conf too, so it would have to do this as well in order to >> parse postgresql.conf correctly. > > Well, if I parse you correctly, in my patch, it does. There's another > GUC array where to store invalid placeholders so that we can revisit > later. That even allows custom_variable_classes not to be parsed first > in the list, even if that's not put into use in the patch. I bet it doesn't. The *postmaster* never connects to a database, so which copy of pg_extension does it ever read? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 20, 2010 at 03:39, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Just so that we're on the same line here, if we are to remove > custom_variable_classes then we don't keep any GUC related code into the > extension patch, right? So that ExtensionSetCVC() and friends disappear > entirely. I think so. It would be better to remove the CVC support and related code. Preloading modules that defines CVC is a good direction to fix the issue, but we need more consideration about where to do it. -- Itagaki Takahiro
Robert Haas <robertmhaas@gmail.com> writes: > I bet it doesn't. The *postmaster* never connects to a database, so > which copy of pg_extension does it ever read? None, which does it need to read? My answer is none, you're saying it's wrong, I don't get why. postmaster surely has no business with what's in a specific database and no use at all of placeholder GUCs, right? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > I think so. It would be better to remove the CVC support and related code. Will isolate that into another branch just in case and prepare a patch with that removed. > Preloading modules that defines CVC is a good direction to fix the issue, > but we need more consideration about where to do it. I checked for doing that too, but both shared_preload_libraries and local_preload_libraries are managed way earlier than the point where we can connect to a database a check what extensions it contains. So that appeared to me as a dead-end. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Dec 20, 2010 at 3:56 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I bet it doesn't. The *postmaster* never connects to a database, so >> which copy of pg_extension does it ever read? > > None, which does it need to read? My answer is none, you're saying it's > wrong, I don't get why. postmaster surely has no business with what's in > a specific database and no use at all of placeholder GUCs, right? Let's take a concrete example. Suppose the user installs the extension 'pg_stat_statements' and puts the following into their postgresql.conf: pg_stat_statements.max = 31415; The effect of that has to be that the postmaster adds a certain amount of space to PostgreSQL's initial shared memory allocation. That means the postmaster has to know that pg_stat_statements is a valid custom variable class. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > The effect of that has to be that the postmaster adds a certain amount > of space to PostgreSQL's initial shared memory allocation. That means > the postmaster has to know that pg_stat_statements is a valid custom > variable class. Ah. Yes. Indeed. So you still needed to edit postgresql.conf in such cases. Well there's only 1 contrib module that touches SHM, but is also happen to be the only one that needs custom_variable_classes support… Meanwhile, the custom_variable_classes related code has been removed from the extension's git branch, and the WITH ENCODING option is no more (the control file encoding is still there and defaults to UTF-8). Will continue collecting improvements ideas and cleanup needs as we go, unless you prefer to see new patches (they are easy enough to produce). http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Dec 20, 2010 at 11:36 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The effect of that has to be that the postmaster adds a certain amount >> of space to PostgreSQL's initial shared memory allocation. That means >> the postmaster has to know that pg_stat_statements is a valid custom >> variable class. > > Ah. Yes. Indeed. So you still needed to edit postgresql.conf in such > cases. Well there's only 1 contrib module that touches SHM, but is also > happen to be the only one that needs custom_variable_classes support… > > Meanwhile, the custom_variable_classes related code has been removed > from the extension's git branch, and the WITH ENCODING option is no more > (the control file encoding is still there and defaults to UTF-8). > > Will continue collecting improvements ideas and cleanup needs as we go, > unless you prefer to see new patches (they are easy enough to produce). > > http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension Patches are better for me, anyway... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Patches are better for me, anyway... Here it is then, version 21. Changes: - docs cleanup - downgrade a debug message from NOTICE to DEBUG1 - get rid of custom_variable_classes - get rid of CREATE EXTENSION WITH ENCODING option - some context for WITH NO DATA option docs The docs are updated in their HTML form too, if that's easier to read: http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Excerpts from Dimitri Fontaine's message of lun dic 20 14:25:14 -0300 2010: > Robert Haas <robertmhaas@gmail.com> writes: > > Patches are better for me, anyway... > > Here it is then, version 21. Changes: Just noticed a small problem: you're removing the "SET search_path" lines in contrib Makefiles but you're leaving the "Adjust this setting to control where the objects get created" line behind, which should be removed as well. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: > On Mon, Dec 20, 2010 at 01:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I agree that "the default encoding is UTF-8", but it should be > >> configurable by the 'encoding' parameter in control files. > > > > Why is it necessary to have such a parameter at all? > > UTF-8 is not a superset of all encodings. I think you mean Unicode is not a superset of all character sets. I've heard this before but never found what's missing. [citation needed]? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: > On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: > > On Mon, Dec 20, 2010 at 01:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> I agree that "the default encoding is UTF-8", but it should be > > >> configurable by the 'encoding' parameter in control files. > > > > > > Why is it necessary to have such a parameter at all? > > > > UTF-8 is not a superset of all encodings. > > I think you mean Unicode is not a superset of all character sets. I've > heard this before but never found what's missing. [citation needed]? Windows-1252, ISO-2022-JP-2 and EUC-TW are such encodings. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: >> I think you mean Unicode is not a superset of all character sets. I've >> heard this before but never found what's missing. [citation needed]? > Windows-1252, ISO-2022-JP-2 and EUC-TW are such encodings. [citation needed]? Exactly what characters are missing, and why would the Unicode people have chosen to leave them out? It's not like they've not heard of those encodings, I'm sure. regards, tom lane
On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: > >> I think you mean Unicode is not a superset of all character sets. I've > >> heard this before but never found what's missing. [citation needed]? > > > Windows-1252, ISO-2022-JP-2 and EUC-TW are such encodings. > > [citation needed]? Exactly what characters are missing, and why would > the Unicode people have chosen to leave them out? It's not like they've > not heard of those encodings, I'm sure. > > regards, tom lane > Here is an interesting description of some of the gotchas: http://en.wikipedia.org/wiki/Windows-1252 Regards, Ken
On Dec 20, 2010, at 11:53 AM, Kenneth Marshall wrote: > Here is an interesting description of some of the gotchas: > > http://en.wikipedia.org/wiki/Windows-1252 FWIW, those are gotchas translating between Windows 1252 and Latin-1. Windows 1252's nerbles translate to UTF-8 just fine. David
Kenneth Marshall <ktm@rice.edu> writes: > On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: >> [citation needed]? Exactly what characters are missing, and why would >> the Unicode people have chosen to leave them out? It's not like they've >> not heard of those encodings, I'm sure. > Here is an interesting description of some of the gotchas: > http://en.wikipedia.org/wiki/Windows-1252 Well, it's interesting, but I see no glyphs on that page that lack Unicode assignments. regards, tom lane
On Mon, Dec 20, 2010 at 03:08:48PM -0500, Tom Lane wrote: > Kenneth Marshall <ktm@rice.edu> writes: > > On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: > >> [citation needed]? Exactly what characters are missing, and why would > >> the Unicode people have chosen to leave them out? It's not like they've > >> not heard of those encodings, I'm sure. > > > Here is an interesting description of some of the gotchas: > > http://en.wikipedia.org/wiki/Windows-1252 > > Well, it's interesting, but I see no glyphs on that page that lack > Unicode assignments. > > regards, tom lane > You are correct. I mis-read the text. Regards, Ken
2010/12/20 Martijn van Oosterhout <kleptog@svana.org>: > On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: > >> UTF-8 is not a superset of all encodings. > > I think you mean Unicode is not a superset of all character sets. I've > heard this before but never found what's missing. [citation needed]? From <URL:http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings>: "Unicode is supposed to solve all encoding problems in all languages of the world. [..] There are still controversies. For Japanese, the kanji characters have been unified with Chinese, that is a character considered to be the same in both Japanese and Chinese have been given one and the same code number in Unicode, even if they look a little different. This process, called Han unification, has caused controversy." For examples (my browser doesn't show any differences though, probably because I don't have the corresponding fonts): <URL:http://en.wikipedia.org/wiki/Han_unification#Examples_of_language_dependent_characters> Nicolas
On Mon, Dec 20, 2010 at 10:15:56PM +0100, Nicolas Barbier wrote: > >From <URL:http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings>: > > "Unicode is supposed to solve all encoding problems in all languages > of the world. [..] There are still controversies. For Japanese, the > kanji characters have been unified with Chinese, that is a character > considered to be the same in both Japanese and Chinese have been given > one and the same code number in Unicode, even if they look a little > different. This process, called Han unification, has caused > controversy." From http://en.wikipedia.org/wiki/CJK_Unified_Ideographs: "However, the source separation rule states that characters encoded separately in an earlier character set would remain separate in the new Unicode encoding." From all the references I've seen this has been applied everywhere and any failures to roundtrip conversions are considered bugs and I can't believe that at this point they havn't all been fixed. This is kind of underscored by the fact that references always point to theoretical problems rather than actual lists of characters that can't be converted. ISTM that since all the mapping tables are public it should be a SMOP to *prove* roundtrip conversions are safe, or identify the problems. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Tue, Dec 21, 2010 at 08:04, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Mon, Dec 20, 2010 at 10:15:56PM +0100, Nicolas Barbier wrote: >> >From <URL:http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings>: > ISTM that since all the mapping tables are public it should be a SMOP > to *prove* roundtrip conversions are safe, or identify the problems. Another issue in Japanese users is EUDC (End User Defined Character). Unfortunately for both postgres developers and application developers in Japan, many machine dependence characters are still used in popular mobile phones in Japan. Their native encoding is SHIFT_JIS, and we have an EUDC mapping for SHIFT_JIS to/from EUC_JP. But we don't have for UTF-8 to/from other encodings. That is one of the reasons why we cannot move to the UTF-8 world completely. Imagine that a module that manipulate EUDC text. It will be written in EUC_JP because SHIFT_JIS is not supported in postgres. Also, it cannot be rewritten in UTF-8 because there are no mapping for the characters used in it. -- Itagaki Takahiro