Thread: A different approach to extension NO USER DATA feature
As I work through the extensions patch, the aspect of it that I like the least is the NO USER DATA clause and related functions. I think it's badly designed, badly implemented, and doesn't solve the problem. If we accept it as-is, we'll soon have to rework it, and at that point we'll be left with a large backwards-compatibility issue. To recap: the problem we need to solve is the possibility that an extension has configuration tables whose contents might get altered within a particular database; if so, we need to have pg_dump dump out those alterations so that the configuration will be preserved. This isn't just an academic issue; contrib/tsearch2 for example had tables exactly like that, before it got absorbed into core, and I believe PostGIS has such today. The solution offered by the submitted patch looks like this: 1. The CREATE EXTENSION command has an optional clause NO USER DATA, which users are not supposed to include when initially loading an extension, but which pg_dump will always include when emitting CREATE EXTENSION in dump scripts. 2. There is a function pg_extension_with_user_data() that an extension's SQL script can call to detect whether NO USER DATA was specified. 3. There is a function pg_extension_flag_dump(oid) that an extension's SQL script can call to mark a created object as (sort of) not part of the extension after all, so that pg_dump will dump that object anyway. The way that you'd use these features to solve the problem is to make the extension's SQL script flag the configuration table for dumping, and also avoid creating the table or loading any predefined entries into it if NO USER DATA is active. (Which seems to mean that that option is named backwards, but I digress.) IMO this approach has got numerous serious problems: 1. Because the flag operates at the whole-SQL-object level, the only way to have a configuration table is to exclude the whole table definition from being treated as part of the extension. This makes it impossible to upgrade the table definition (eg, add columns or indexes) when installing a new version of the extension. This also makes it really easy to break the extension during selective restores ... there's no assurance the table will get created at all. 2. Also, because we're working at the whole-SQL-object level, there's no good way to deal with the possibility that the configuration table contains some rows created by the extension itself and others added by the user. All such rows will get dumped and reloaded separately from the extension, meaning there's no way to upgrade the extension-provided initial data either. 3. Use of pg_extension_with_user_data() requires the extension SQL script to contain conditional logic, which is not easy unless you want to assume plpgsql is available. Which extensions really should not do. (Once again: plpgsql may be installed by default, but that doesn't mean it's guaranteed to be present.) 4. If the SQL script uses pg_extension_with_user_data() to decide not to create the table, then it has no way to apply pg_extension_flag_dump, with the result that after a dump and reload cycle, the configuration table doesn't appear to be related to the extension at all. Thus the user has another way to break the extension, which is to accidentally drop the configuration table. So basically I think this approach doesn't work, can't be made to work, and imposes unreasonable burdens on extension authors anyway. If we're going to support user-modifiable configuration tables, we need to do it a different way. After a bit of thought I believe that we can fix this if we are willing to teach pg_dump explicitly about extension configuration tables. The behavior we want for those is for the table schema definition to never be dumped (the table should always be created by CREATE EXTENSION), but for some subset of the table data to get dumped, excluding any system-provided rows. An extension that wants to make use of that ability would probably need to add a boolean column "system_data" or something similar to its configuration tables. Having done so, the extension's SQL script could call a function that identifies the configuration table and tells how to decide which rows to dump, along the lines of pg_extension_partial_dump (table_name regclass, where_condition text) for example, SELECT pg_extension_partial_dump('my_table', 'WHERE NOT system_data'); I'm envisioning that this function would add that information to the pg_extension entry, and then pg_dump would use it to select out a subset of rows that it would dump as user data, even though it's not dumping the table definition. One interesting point is that for this to work during pg_upgrade, we'd need to include those user-data rows in the upgrade dump, otherwise they'd get lost. So at least for pg_upgrade, those rows need to be considered schema not data. I'm not sure if we should handle them that way all the time or whether pg_dump should special-case this for binary upgrades. I've not attempted to implement this idea, but offhand it looks like it would take a day or two's work, which seems well worthwhile to expend now in the hope of getting this feature right the first time. Comments? regards, tom lane
On Feb 6, 2011, at 10:23 AM, Tom Lane wrote: > I've not attempted to implement this idea, but offhand it looks like > it would take a day or two's work, which seems well worthwhile to > expend now in the hope of getting this feature right the first tim Seems worthwhile, and a much better approach. But do we really want to limit it only to tables? Best, David
On Sun, Feb 6, 2011 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've not attempted to implement this idea, but offhand it looks like > it would take a day or two's work, which seems well worthwhile to > expend now in the hope of getting this feature right the first time. > > Comments? The design you propose seems pretty good, and I don't think we can commit this without this change. The scheduling implications are not great, but since this seems like an important patch I guess we should live with it, so +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Tom Lane <tgl@sss.pgh.pa.us> writes: > As I work through the extensions patch, the aspect of it that I like the > least is the NO USER DATA clause and related functions. I think it's > badly designed, badly implemented, and doesn't solve the problem. I'm not loving it either, but wanting to stay general enough, then that's all I came up with. > After a bit of thought I believe that we can fix this if we are willing > to teach pg_dump explicitly about extension configuration tables. Ok, if we're now talking about tables only (which makes a lot of sense, but somehow I didn't want to restrict it that way).… > SELECT pg_extension_partial_dump('my_table', 'WHERE NOT system_data'); > > I'm envisioning that this function would add that information to the > pg_extension entry, and then pg_dump would use it to select out a subset > of rows that it would dump as user data, even though it's not dumping > the table definition. …then what about taking it the extra mile and have something even more simple and easier to check and control? From your idea, where I get is that we could force the system_data column name and type. We're talking about only supporting tables, and a boolean expression is all we will be able to use. Then, if the column name and type are fixed, we only need the non-qualified relation name. I'd expect that in the control file, so that we don't have to provide new SQL level tools. People that need a complex boolean expression will be able to maintain the column from a trigger. If we get there, then all we get to store is an oidvector column in the catalog, the list of the table OIDs we will have to only dump the content of those rows WHERE NOT system_data. Or do you want to keep some generality here? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 06/02/11 18:23, Tom Lane wrote: > After a bit of thought I believe that we can fix this if we are willing > to teach pg_dump explicitly about extension configuration tables. > The behavior we want for those is for the table schema definition to > never be dumped (the table should always be created by CREATE EXTENSION), > but for some subset of the table data to get dumped, excluding any > system-provided rows. [snip] > pg_extension_partial_dump (table_name regclass, where_condition text) Possible alternative approach? 1. Extension provides list of config tables/views/set-returning functions to be dumped via e.g. my_config_tables() 2. They get dumped, but each as a TEMP TABLE (need unique names for multiple extensions though). 3. On restore, tables are created and populated, then read_your_config(<ARRAY-OF-TABLE-NAMES>) is called in the extension. This separates the configuration-for-user from configuration-for-extension. It allows the extension to decide whether to load the new config or reject it. It lets you test/demonstrate multiple configurations fairly simply. The "system_data" column scenario can then be a default implementation of read_your_config(). -- Richard Huxton Archonet Ltd
On Mon, Feb 7, 2011 at 4:18 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Or do you want to keep some generality here? I think it might be slightly advantageous to keep some generality, because some people might already have catalog columns that do this (but with a different name) or might have other reasons for needing an integer "entry type" column from which the system property can be inferred. I can't think how many times I've written a web interface that lets users edit a foo, but denies them the ability to edit any foo where bar_id = 1 (because those are the magic ones that get created some other way). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 7, 2011 at 4:18 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> Or do you want to keep some generality here? > I think it might be slightly advantageous to keep some generality, Yeah. I had also thought about hard-wiring the WHERE clause, but there's at least one big objection to that: it fails to cover cases where there's no need for a flag column because all the entries are user-provided. The catalog representation I'd been thinking of was a regclass[] array for the table names and a text[] array of the same length for the WHERE clauses. It's slightly ugly but no big deal. There are likely to be some other array columns in pg_extension before we're done, anyway --- one I'd been thinking about a bit was OIDs of modules this one depends on. The current design doesn't cope very well with modules that depend on other ones. regards, tom lane
Richard Huxton <dev@archonet.com> writes: > Possible alternative approach? > 1. Extension provides list of config tables/views/set-returning > functions to be dumped via e.g. my_config_tables() > 2. They get dumped, but each as a TEMP TABLE (need unique names for > multiple extensions though). > 3. On restore, tables are created and populated, then > read_your_config(<ARRAY-OF-TABLE-NAMES>) is called in the extension. This is kind of problematic because it requires those tables to be created before the extension is created. One difficulty with that is it would break parallel pg_restore. Another is that it's not unusual for the config tables to not be populatable at all before the extension's been loaded. For example, IIRC how the old contrib/tsearch2 config tables worked, there were regproc columns that contained references to functions created by tsearch2. It is true that the extension author may sometimes need to run some code when the user-provided data is loaded into the config table, but in the design I suggested, that could be handled by attaching triggers to the config table. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > one I'd been thinking about a bit was OIDs of modules this one depends > on. The current design doesn't cope very well with modules that depend > on other ones. Or even at all. I guess here "modules" is referring to shared object libraries, right? Or are you already thinking about extension that depend on other extensions, like earthdistance depends on cube? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> one I'd been thinking about a bit was OIDs of modules this one depends >> on. The current design doesn't cope very well with modules that depend >> on other ones. > Or even at all. I guess here "modules" is referring to shared object > libraries, right? Or are you already thinking about extension that > depend on other extensions, like earthdistance depends on cube? Sorry, I meant module == extension. If it's not intended that we try to support dependent extensions yet, I'd be fine with leaving that for 9.2. However, if earthdistance already has such a dependency, maybe we can't put that issue off. regards, tom lane
On Feb6, 2011, at 19:23 , Tom Lane wrote: > After a bit of thought I believe that we can fix this if we are willing > to teach pg_dump explicitly about extension configuration tables. > The behavior we want for those is for the table schema definition to > never be dumped (the table should always be created by CREATE EXTENSION), > but for some subset of the table data to get dumped, excluding any > system-provided rows. An extension that wants to make use of that > ability would probably need to add a boolean column "system_data" or > something similar to its configuration tables. Having done so, > the extension's SQL script could call a function that identifies the > configuration table and tells how to decide which rows to dump, > along the lines of We could avoid the need for a per-row "system_data" flag if we required extensions to split user-editable and system-provided configuration data into different tables. For convenient access to the configuration data, the extension could let the user-editable table inherit from the system-provided one, or use a view to combine the two. We'd then only need a per-table flag which tells pg_dump to dump the data (but not the structure), so instead of a pg_extension_partial_dump (table_name regclass, where_condition text) we'd have pg_extension_userdata(table_name regclass). Apart from getting rid of the slightly ugly userdata flag-column, this would also make it easier for users to distinguish between user-editable and system-provided configuration options. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > We could avoid the need for a per-row "system_data" flag if we required > extensions to split user-editable and system-provided configuration data > into different tables. For convenient access to the configuration data, > the extension could let the user-editable table inherit from the > system-provided one, or use a view to combine the two. Yeah, this is another approach that one could take instead of having per-row flags. I'm not sure that it's better, much less so much better that we should force extensions to do it that way and not the other. But it's definitely another argument against making a hard-wired assumption that there will be a flag column. regards, tom lane
On Feb 7, 2011, at 8:57 AM, Tom Lane wrote: > Yeah, this is another approach that one could take instead of having > per-row flags. I'm not sure that it's better, much less so much better > that we should force extensions to do it that way and not the other. > But it's definitely another argument against making a hard-wired > assumption that there will be a flag column. Would the flag column usually be invisible, like the system oid column? Best, David
"David E. Wheeler" <david@kineticode.com> writes: > On Feb 7, 2011, at 8:57 AM, Tom Lane wrote: >> Yeah, this is another approach that one could take instead of having >> per-row flags. I'm not sure that it's better, much less so much better >> that we should force extensions to do it that way and not the other. >> But it's definitely another argument against making a hard-wired >> assumption that there will be a flag column. > Would the flag column usually be invisible, like the system oid column? No, it's just a column. regards, tom lane