Thread: A different approach to extension NO USER DATA feature

A different approach to extension NO USER DATA feature

From
Tom Lane
Date:
As I work through the extensions patch, the aspect of it that I like the
least is the NO USER DATA clause and related functions.  I think it's
badly designed, badly implemented, and doesn't solve the problem.
If we accept it as-is, we'll soon have to rework it, and at that point
we'll be left with a large backwards-compatibility issue.

To recap: the problem we need to solve is the possibility that an
extension has configuration tables whose contents might get altered
within a particular database; if so, we need to have pg_dump dump out
those alterations so that the configuration will be preserved.  This
isn't just an academic issue; contrib/tsearch2 for example had tables
exactly like that, before it got absorbed into core, and I believe
PostGIS has such today.

The solution offered by the submitted patch looks like this:

1. The CREATE EXTENSION command has an optional clause NO USER DATA,
which users are not supposed to include when initially loading an
extension, but which pg_dump will always include when emitting CREATE
EXTENSION in dump scripts.

2. There is a function pg_extension_with_user_data() that an extension's
SQL script can call to detect whether NO USER DATA was specified.

3. There is a function pg_extension_flag_dump(oid) that an extension's
SQL script can call to mark a created object as (sort of) not part of
the extension after all, so that pg_dump will dump that object anyway.

The way that you'd use these features to solve the problem is to make
the extension's SQL script flag the configuration table for dumping, and
also avoid creating the table or loading any predefined entries into it
if NO USER DATA is active.  (Which seems to mean that that option is
named backwards, but I digress.)

IMO this approach has got numerous serious problems:

1. Because the flag operates at the whole-SQL-object level, the only
way to have a configuration table is to exclude the whole table
definition from being treated as part of the extension.  This makes
it impossible to upgrade the table definition (eg, add columns or
indexes) when installing a new version of the extension.  This also
makes it really easy to break the extension during selective restores
... there's no assurance the table will get created at all.

2. Also, because we're working at the whole-SQL-object level, there's
no good way to deal with the possibility that the configuration table
contains some rows created by the extension itself and others added by
the user.  All such rows will get dumped and reloaded separately from
the extension, meaning there's no way to upgrade the extension-provided
initial data either.

3. Use of pg_extension_with_user_data() requires the extension SQL
script to contain conditional logic, which is not easy unless you want
to assume plpgsql is available.  Which extensions really should not do.
(Once again: plpgsql may be installed by default, but that doesn't mean
it's guaranteed to be present.)

4. If the SQL script uses pg_extension_with_user_data() to decide not to
create the table, then it has no way to apply pg_extension_flag_dump,
with the result that after a dump and reload cycle, the configuration
table doesn't appear to be related to the extension at all.  Thus the
user has another way to break the extension, which is to accidentally
drop the configuration table.

So basically I think this approach doesn't work, can't be made to
work, and imposes unreasonable burdens on extension authors anyway.
If we're going to support user-modifiable configuration tables,
we need to do it a different way.

After a bit of thought I believe that we can fix this if we are willing
to teach pg_dump explicitly about extension configuration tables.
The behavior we want for those is for the table schema definition to
never be dumped (the table should always be created by CREATE EXTENSION),
but for some subset of the table data to get dumped, excluding any
system-provided rows.  An extension that wants to make use of that
ability would probably need to add a boolean column "system_data" or
something similar to its configuration tables.  Having done so,
the extension's SQL script could call a function that identifies the
configuration table and tells how to decide which rows to dump,
along the lines of

pg_extension_partial_dump (table_name regclass, where_condition text)

for example,

SELECT pg_extension_partial_dump('my_table', 'WHERE NOT system_data');

I'm envisioning that this function would add that information to the
pg_extension entry, and then pg_dump would use it to select out a subset
of rows that it would dump as user data, even though it's not dumping
the table definition.

One interesting point is that for this to work during pg_upgrade,
we'd need to include those user-data rows in the upgrade dump, otherwise
they'd get lost.  So at least for pg_upgrade, those rows need to be
considered schema not data.  I'm not sure if we should handle them that
way all the time or whether pg_dump should special-case this for binary
upgrades.

I've not attempted to implement this idea, but offhand it looks like
it would take a day or two's work, which seems well worthwhile to
expend now in the hope of getting this feature right the first time.

Comments?
        regards, tom lane


Re: A different approach to extension NO USER DATA feature

From
"David E. Wheeler"
Date:
On Feb 6, 2011, at 10:23 AM, Tom Lane wrote:

> I've not attempted to implement this idea, but offhand it looks like
> it would take a day or two's work, which seems well worthwhile to
> expend now in the hope of getting this feature right the first tim

Seems worthwhile, and a much better approach. But do we really want to limit it only to tables?

Best,

David



Re: A different approach to extension NO USER DATA feature

From
Robert Haas
Date:
On Sun, Feb 6, 2011 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've not attempted to implement this idea, but offhand it looks like
> it would take a day or two's work, which seems well worthwhile to
> expend now in the hope of getting this feature right the first time.
>
> Comments?

The design you propose seems pretty good, and I don't think we can
commit this without this change.  The scheduling implications are not
great, but since this seems like an important patch I guess we should
live with it, so +1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: A different approach to extension NO USER DATA feature

From
Dimitri Fontaine
Date:
Hi,

Tom Lane <tgl@sss.pgh.pa.us> writes:
> As I work through the extensions patch, the aspect of it that I like the
> least is the NO USER DATA clause and related functions.  I think it's
> badly designed, badly implemented, and doesn't solve the problem.

I'm not loving it either, but wanting to stay general enough, then
that's all I came up with.

> After a bit of thought I believe that we can fix this if we are willing
> to teach pg_dump explicitly about extension configuration tables.

Ok, if we're now talking about tables only (which makes a lot of sense,
but somehow I didn't want to restrict it that way).…

> SELECT pg_extension_partial_dump('my_table', 'WHERE NOT system_data');
>
> I'm envisioning that this function would add that information to the
> pg_extension entry, and then pg_dump would use it to select out a subset
> of rows that it would dump as user data, even though it's not dumping
> the table definition.

…then what about taking it the extra mile and have something even more
simple and easier to check and control?

From your idea, where I get is that we could force the system_data
column name and type.  We're talking about only supporting tables, and a
boolean expression is all we will be able to use.  Then, if the column
name and type are fixed, we only need the non-qualified relation name.
I'd expect that in the control file, so that we don't have to provide
new SQL level tools.

People that need a complex boolean expression will be able to maintain
the column from a trigger.

If we get there, then all we get to store is an oidvector column in the
catalog, the list of the table OIDs we will have to only dump the
content of those rows WHERE NOT system_data.

Or do you want to keep some generality here?
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: A different approach to extension NO USER DATA feature

From
Richard Huxton
Date:
On 06/02/11 18:23, Tom Lane wrote:
> After a bit of thought I believe that we can fix this if we are willing
> to teach pg_dump explicitly about extension configuration tables.
> The behavior we want for those is for the table schema definition to
> never be dumped (the table should always be created by CREATE EXTENSION),
> but for some subset of the table data to get dumped, excluding any
> system-provided rows.
[snip]
> pg_extension_partial_dump (table_name regclass, where_condition text)

Possible alternative approach?

1. Extension provides list of config tables/views/set-returning 
functions to be dumped via e.g. my_config_tables()
2. They get dumped, but each as a TEMP TABLE (need unique names for 
multiple extensions though).
3. On restore, tables are created and populated, then 
read_your_config(<ARRAY-OF-TABLE-NAMES>) is called in the extension.

This separates the configuration-for-user from 
configuration-for-extension. It allows the extension to decide whether 
to load the new config or reject it. It lets you test/demonstrate 
multiple configurations fairly simply.

The "system_data" column scenario can then be a default implementation 
of read_your_config().

--   Richard Huxton  Archonet Ltd


Re: A different approach to extension NO USER DATA feature

From
Robert Haas
Date:
On Mon, Feb 7, 2011 at 4:18 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Or do you want to keep some generality here?

I think it might be slightly advantageous to keep some generality,
because some people might already have catalog columns that do this
(but with a different name) or might have other reasons for needing an
integer "entry type" column from which the system property can be
inferred.  I can't think how many times I've written a web interface
that lets users edit a foo, but denies them the ability to edit any
foo where bar_id = 1 (because those are the magic ones that get
created some other way).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: A different approach to extension NO USER DATA feature

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 7, 2011 at 4:18 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>> Or do you want to keep some generality here?

> I think it might be slightly advantageous to keep some generality,

Yeah.  I had also thought about hard-wiring the WHERE clause, but
there's at least one big objection to that: it fails to cover cases
where there's no need for a flag column because all the entries are
user-provided.

The catalog representation I'd been thinking of was a regclass[] array
for the table names and a text[] array of the same length for the WHERE
clauses.  It's slightly ugly but no big deal.  There are likely to be
some other array columns in pg_extension before we're done, anyway ---
one I'd been thinking about a bit was OIDs of modules this one depends
on.  The current design doesn't cope very well with modules that depend
on other ones.
        regards, tom lane


Re: A different approach to extension NO USER DATA feature

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> Possible alternative approach?

> 1. Extension provides list of config tables/views/set-returning 
> functions to be dumped via e.g. my_config_tables()
> 2. They get dumped, but each as a TEMP TABLE (need unique names for 
> multiple extensions though).
> 3. On restore, tables are created and populated, then 
> read_your_config(<ARRAY-OF-TABLE-NAMES>) is called in the extension.

This is kind of problematic because it requires those tables to be
created before the extension is created.  One difficulty with that is it
would break parallel pg_restore.  Another is that it's not unusual for
the config tables to not be populatable at all before the extension's
been loaded.  For example, IIRC how the old contrib/tsearch2 config
tables worked, there were regproc columns that contained references to
functions created by tsearch2.

It is true that the extension author may sometimes need to run some code
when the user-provided data is loaded into the config table, but in the
design I suggested, that could be handled by attaching triggers to the
config table.
        regards, tom lane


Re: A different approach to extension NO USER DATA feature

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> one I'd been thinking about a bit was OIDs of modules this one depends
> on.  The current design doesn't cope very well with modules that depend
> on other ones.

Or even at all.  I guess here "modules" is referring to shared object
libraries, right?  Or are you already thinking about extension that
depend on other extensions, like earthdistance depends on cube?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: A different approach to extension NO USER DATA feature

From
Tom Lane
Date:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> one I'd been thinking about a bit was OIDs of modules this one depends
>> on.  The current design doesn't cope very well with modules that depend
>> on other ones.

> Or even at all.  I guess here "modules" is referring to shared object
> libraries, right?  Or are you already thinking about extension that
> depend on other extensions, like earthdistance depends on cube?

Sorry, I meant module == extension.  If it's not intended that we try to
support dependent extensions yet, I'd be fine with leaving that for 9.2.
However, if earthdistance already has such a dependency, maybe we can't
put that issue off.
        regards, tom lane


Re: A different approach to extension NO USER DATA feature

From
Florian Pflug
Date:
On Feb6, 2011, at 19:23 , Tom Lane wrote:
> After a bit of thought I believe that we can fix this if we are willing
> to teach pg_dump explicitly about extension configuration tables.
> The behavior we want for those is for the table schema definition to
> never be dumped (the table should always be created by CREATE EXTENSION),
> but for some subset of the table data to get dumped, excluding any
> system-provided rows.  An extension that wants to make use of that
> ability would probably need to add a boolean column "system_data" or
> something similar to its configuration tables.  Having done so,
> the extension's SQL script could call a function that identifies the
> configuration table and tells how to decide which rows to dump,
> along the lines of

We could avoid the need for a per-row "system_data" flag if we required
extensions to split user-editable and system-provided configuration data
into different tables. For convenient access to the configuration data,
the extension could let the user-editable table inherit from the 
system-provided one, or use a view to combine the two.

We'd then only need a per-table flag which tells pg_dump to dump the
data (but not the structure), so instead of a
 pg_extension_partial_dump (table_name regclass, where_condition text)

we'd have
 pg_extension_userdata(table_name regclass).

Apart from getting rid of the slightly ugly userdata flag-column, this
would also make it easier for users to distinguish between user-editable
and system-provided configuration options.

best regards,
Florian Pflug



Re: A different approach to extension NO USER DATA feature

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> We could avoid the need for a per-row "system_data" flag if we required
> extensions to split user-editable and system-provided configuration data
> into different tables. For convenient access to the configuration data,
> the extension could let the user-editable table inherit from the 
> system-provided one, or use a view to combine the two.

Yeah, this is another approach that one could take instead of having
per-row flags.  I'm not sure that it's better, much less so much better
that we should force extensions to do it that way and not the other.
But it's definitely another argument against making a hard-wired
assumption that there will be a flag column.
        regards, tom lane


Re: A different approach to extension NO USER DATA feature

From
"David E. Wheeler"
Date:
On Feb 7, 2011, at 8:57 AM, Tom Lane wrote:

> Yeah, this is another approach that one could take instead of having
> per-row flags.  I'm not sure that it's better, much less so much better
> that we should force extensions to do it that way and not the other.
> But it's definitely another argument against making a hard-wired
> assumption that there will be a flag column.

Would the flag column usually be invisible, like the system oid column?

Best,

David



Re: A different approach to extension NO USER DATA feature

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Feb 7, 2011, at 8:57 AM, Tom Lane wrote:
>> Yeah, this is another approach that one could take instead of having
>> per-row flags.  I'm not sure that it's better, much less so much better
>> that we should force extensions to do it that way and not the other.
>> But it's definitely another argument against making a hard-wired
>> assumption that there will be a flag column.

> Would the flag column usually be invisible, like the system oid column?

No, it's just a column.
        regards, tom lane