[PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade - Mailing list pgsql-hackers
| From | Jimmy Angelakos |
|---|---|
| Subject | [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade |
| Date | |
| Msg-id | CAA0-ca0sPhggS1EK25FmpFh_efbT=ZSZWgEOyKzHiUEC27LPnA@mail.gmail.com Whole thread Raw |
| Responses |
Re: [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade
|
| List | pgsql-hackers |
Hi All,
I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is my first code patch, so any feedback on the approach will be appreciated!
The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the schema from $oldcluster to $newcluster. Because this excludes all table data, it leaves out data in extension config tables registered with pg_extension_config_dump().
In $newcluster, binary_upgrade_create_empty_extension() creates the extensions without populating any table data. The extensions' CREATE EXTENSION scripts never get executed so any INSERTs are skipped. As a consequence, if any CREATE TABLE statement in $newcluster requires validation against these empty config tables, the upgrade fails. As an example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial reference system definitions (SRIDs). When a table has, e.g. a geometry column that specifies an SRID, this gets validated during the CREATE TABLE:
CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR: Cannot find SRID (27700) in spatial_ref_sys
This will happen for any SRID-constrained column, which will prevent many real-world PostGIS deployments from being able to pg_upgrade. To summarise the problem, our ordering is wrong here because extension configuration data must be present before user tables that depend on it get created, but --schema-only strips this data.
The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps extension config table data in SECTION_PRE_DATA during --binary-upgrade ONLY. This restores the needed data between extension creation and user object creation, allowing the DDL to succeed.
Four files are modified in bin/pg_dump:
pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between DO_EXTENSION and DO_TYPE
pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE
pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType = DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during --binary-upgrade ONLY. As it depends on the table def, the COPY will be emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with description "EXTENSION DATA" using dumpTableData_copy(). This allows the config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true, call makeExtensionDataInfo() instead of makeTableDataInfo(). Additionally, skip extcondition filter because we need to dump all rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in addBoundaryDependencies()
pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar to BLOB, BLOB METADATA, etc. to include extension config table data in --schema-only dumps during --binary-upgrade ONLY.
What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and do not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA" entries populate tables like spatial_ref_sys with COPY. Subsequent CREATE TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to $newcluster, making spatial_ref_sys byte-for-byte identical to its previous state.
This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade). DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for any extension that registers config tables that will be needed for DDL validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster PostGIS 3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on PG19-devel.
Thanks in advance for your review! Please find attached the patch for HEAD. I believe this should be easily backpatchable to (at least) PG15, and will be happy to work on backports.
Best regards,
Jimmy
I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is my first code patch, so any feedback on the approach will be appreciated!
The problem:
============
pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the schema from $oldcluster to $newcluster. Because this excludes all table data, it leaves out data in extension config tables registered with pg_extension_config_dump().
In $newcluster, binary_upgrade_create_empty_extension() creates the extensions without populating any table data. The extensions' CREATE EXTENSION scripts never get executed so any INSERTs are skipped. As a consequence, if any CREATE TABLE statement in $newcluster requires validation against these empty config tables, the upgrade fails. As an example,
PostGIS registers config table spatial_ref_sys to hold ~8500 spatial reference system definitions (SRIDs). When a table has, e.g. a geometry column that specifies an SRID, this gets validated during the CREATE TABLE:
CREATE TABLE points (id int, location geometry(Point, 27700));
ERROR: Cannot find SRID (27700) in spatial_ref_sys
This will happen for any SRID-constrained column, which will prevent many real-world PostGIS deployments from being able to pg_upgrade. To summarise the problem, our ordering is wrong here because extension configuration data must be present before user tables that depend on it get created, but --schema-only strips this data.
The patch:
==========
We are adding a new dump object type DO_EXTENSION_DATA that dumps extension config table data in SECTION_PRE_DATA during --binary-upgrade ONLY. This restores the needed data between extension creation and user object creation, allowing the DDL to succeed.
Four files are modified in bin/pg_dump:
pg_dump.h:
Add DO_EXTENSION_DATA to the DumpableObjectType enum, between DO_EXTENSION and DO_TYPE
pg_dump_sort.c:
Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE
pg_dump.c:
1. Add makeExtensionDataInfo() to create a TableDataInfo with objType = DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during --binary-upgrade ONLY. As it depends on the table def, the COPY will be emitted after the CREATE TABLE.
2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with description "EXTENSION DATA" using dumpTableData_copy(). This allows the config table data to go into the schema-only dump.
3. In processExtensionTables(), when dopt->binary_upgrade is true, call makeExtensionDataInfo() instead of makeTableDataInfo(). Additionally, skip extcondition filter because we need to dump all rows here.
4. Include DO_EXTENSION_DATA in pre-data boundary in addBoundaryDependencies()
pg_backup_archiver.c:
Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar to BLOB, BLOB METADATA, etc. to include extension config table data in --schema-only dumps during --binary-upgrade ONLY.
What ends up happening:
=======================
The inserted rows are basically scaffolding to allow the upgrade, and do not persist. The pg_upgrade sequence goes like:
1. pg_dump includes $oldcluster extension config data in schema-only dump
2. pg_restore replays the dump into $newcluster and "EXTENSION DATA" entries populate tables like spatial_ref_sys with COPY. Subsequent CREATE TABLEs with e.g. SRID-constrained columns pass validation.
3. pg_upgrade transfers all data files from $oldcluster to $newcluster, making spatial_ref_sys byte-for-byte identical to its previous state.
This patch:
1. Does NOT affect normal pg_dumps (without --binary-upgrade). DO_EXTENSION_DATA objects are not created in this case.
2. Leaves binary_upgrade_create_empty_extension() unchanged.
3. Is not PostGIS-specific, and should solve this class of problem for any extension that registers config tables that will be needed for DDL validation.
4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster PostGIS 3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on PG19-devel.
Thanks in advance for your review! Please find attached the patch for HEAD. I believe this should be easily backpatchable to (at least) PG15, and will be happy to work on backports.
Best regards,
Jimmy
Attachment
pgsql-hackers by date: