Further cleanup of pg_dump/pg_restore item selection code - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Further cleanup of pg_dump/pg_restore item selection code |
Date | |
Msg-id | 32668.1516848577@sss.pgh.pa.us Whole thread Raw |
Responses |
Further cleanup of pg_dump/pg_restore item selection code
Re: Further cleanup of pg_dump/pg_restore item selection code |
List | pgsql-hackers |
As I've been hacking on the pg_dump code recently, I got annoyed at the ugliness of its code for deciding whether or not to emit database-related TOC entries. That logic is implemented in a completely different place from other TOC entry selection decisions, and it has a bunch of strange behaviors that can really only be characterized as bugs. An example is that pg_dump --create -t sometable somedb will not emit a CREATE DATABASE command as you might expect, because the use of -t prevents the DATABASE TOC entry from being made in the first place. Also, pg_dump --clean --create -t sometable somedb will not emit any DROP commands: the code expects that with the combination of --clean and --create, it should emit only DROP DATABASE, but nothing happens because there's no DATABASE entry. The same behaviors occur if you do e.g. pg_dump -Fc -t sometable somedb | pg_restore --create which is another undocumented misbehavior: the docs for pg_restore do not suggest that --create might fail if the source archive was selective. Another set of problems is that if you use pg_restore's item selection switches (-t etc), those do not restore ACLs, comments, or security labels associated with the selected object(s). This is strange, and unlike the behavior of the comparable switches in pg_dump, and certainly undocumented. Attached is a patch that proposes to clean this up. It arranges things so that CREATE DATABASE (and, if --clean, DROP DATABASE) is emitted if and only if you said --create, regardless of other selectivity switches. And it fixes selective pg_restore to include subsidiary ACLs, comments, and security labels. This does not go all the way towards making pg_restore's item selection switches dump subsidiary objects the same as pg_dump would, because pg_restore doesn't really have enough info to deal with indexes and table constraints the way pg_dump does. Perhaps we could intuit the parent table by examining object dependencies, but that would be a bunch of new logic that seems like fit material for a different patch. In the meantime I just added a disclaimer to pg_restore.sgml explaining this. I also made an effort to make _tocEntryRequired() a little better organized and better documented. It's grown a lot of different behaviors over the years without much thought about layout or keeping related checks together. (There's a chunk in the middle of the function that now needs to be indented one stop to the right, but in the interests of keeping the patch reviewable, I've not done so yet.) Comments? regards, tom lane diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml index a2ebf75..b1b507f 100644 *** a/doc/src/sgml/ref/pg_restore.sgml --- b/doc/src/sgml/ref/pg_restore.sgml *************** *** 446,451 **** --- 446,456 ---- flag of <application>pg_dump</application>. There is not currently any provision for wild-card matching in <application>pg_restore</application>, nor can you include a schema name within its <option>-t</option>. + And, while <application>pg_dump</application>'s <option>-t</option> + flag will also dump subsidiary objects (such as indexes) of the + selected table(s), + <application>pg_restore</application>'s <option>-t</option> + flag does not include such subsidiary objects. </para> </note> diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index fb03793..051df9e 100644 *** a/src/bin/pg_dump/pg_backup_archiver.c --- b/src/bin/pg_dump/pg_backup_archiver.c *************** static void _selectOutputSchema(ArchiveH *** 70,76 **** static void _selectTablespace(ArchiveHandle *AH, const char *tablespace); static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te); static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te); ! static teReqs _tocEntryRequired(TocEntry *te, teSection curSection, RestoreOptions *ropt); static RestorePass _tocEntryRestorePass(TocEntry *te); static bool _tocEntryIsACL(TocEntry *te); static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te); --- 70,76 ---- static void _selectTablespace(ArchiveHandle *AH, const char *tablespace); static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te); static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te); ! static teReqs _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH); static RestorePass _tocEntryRestorePass(TocEntry *te); static bool _tocEntryIsACL(TocEntry *te); static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te); *************** ProcessArchiveRestoreOptions(Archive *AH *** 312,318 **** if (te->section != SECTION_NONE) curSection = te->section; ! te->reqs = _tocEntryRequired(te, curSection, ropt); } /* Enforce strict names checking */ --- 312,318 ---- if (te->section != SECTION_NONE) curSection = te->section; ! te->reqs = _tocEntryRequired(te, curSection, AH); } /* Enforce strict names checking */ *************** RestoreArchive(Archive *AHX) *** 488,496 **** * In createDB mode, issue a DROP *only* for the database as a * whole. Issuing drops against anything else would be wrong, * because at this point we're connected to the wrong database. ! * Conversely, if we're not in createDB mode, we'd better not ! * issue a DROP against the database at all. (The DATABASE ! * PROPERTIES entry, if any, works like the DATABASE entry.) */ if (ropt->createDB) { --- 488,495 ---- * In createDB mode, issue a DROP *only* for the database as a * whole. Issuing drops against anything else would be wrong, * because at this point we're connected to the wrong database. ! * (The DATABASE PROPERTIES entry, if any, should be treated like ! * the DATABASE entry.) */ if (ropt->createDB) { *************** RestoreArchive(Archive *AHX) *** 498,509 **** strcmp(te->desc, "DATABASE PROPERTIES") != 0) continue; } - else - { - if (strcmp(te->desc, "DATABASE") == 0 || - strcmp(te->desc, "DATABASE PROPERTIES") == 0) - continue; - } /* Otherwise, drop anything that's selected and has a dropStmt */ if (((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0) && te->dropStmt) --- 497,502 ---- *************** restore_toc_entry(ArchiveHandle *AH, Toc *** 752,776 **** AH->currentTE = te; - /* Work out what, if anything, we want from this entry */ - reqs = te->reqs; - - /* - * Ignore DATABASE and related entries unless createDB is specified. We - * must check this here, not in _tocEntryRequired, because !createDB - * should not prevent emitting these entries to an archive file. - */ - if (!ropt->createDB && - (strcmp(te->desc, "DATABASE") == 0 || - strcmp(te->desc, "DATABASE PROPERTIES") == 0 || - (strcmp(te->desc, "ACL") == 0 && - strncmp(te->tag, "DATABASE ", 9) == 0) || - (strcmp(te->desc, "COMMENT") == 0 && - strncmp(te->tag, "DATABASE ", 9) == 0) || - (strcmp(te->desc, "SECURITY LABEL") == 0 && - strncmp(te->tag, "DATABASE ", 9) == 0))) - reqs = 0; - /* Dump any relevant dump warnings to stderr */ if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0) { --- 745,750 ---- *************** restore_toc_entry(ArchiveHandle *AH, Toc *** 780,785 **** --- 754,762 ---- write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt); } + /* Work out what, if anything, we want from this entry */ + reqs = te->reqs; + defnDumped = false; /* *************** PrintTOCSummary(Archive *AHX) *** 1191,1197 **** if (te->section != SECTION_NONE) curSection = te->section; if (ropt->verbose || ! (_tocEntryRequired(te, curSection, ropt) & (REQ_SCHEMA | REQ_DATA)) != 0) { char *sanitized_name; char *sanitized_schema; --- 1168,1174 ---- if (te->section != SECTION_NONE) curSection = te->section; if (ropt->verbose || ! (_tocEntryRequired(te, curSection, AH) & (REQ_SCHEMA | REQ_DATA)) != 0) { char *sanitized_name; char *sanitized_schema; *************** StrictNamesCheck(RestoreOptions *ropt) *** 2824,2839 **** } } static teReqs ! _tocEntryRequired(TocEntry *te, teSection curSection, RestoreOptions *ropt) { teReqs res = REQ_SCHEMA | REQ_DATA; /* ENCODING and STDSTRINGS items are treated specially */ if (strcmp(te->desc, "ENCODING") == 0 || strcmp(te->desc, "STDSTRINGS") == 0) return REQ_SPECIAL; /* If it's an ACL, maybe ignore it */ if (ropt->aclsSkip && _tocEntryIsACL(te)) return 0; --- 2801,2842 ---- } } + /* + * Determine whether we want to restore this TOC entry. + * + * Returns 0 if entry should be skipped, or some combination of the + * REQ_SCHEMA and REQ_DATA bits if we want to restore schema and/or data + * portions of this TOC entry, or REQ_SPECIAL if it's a special entry. + */ static teReqs ! _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH) { teReqs res = REQ_SCHEMA | REQ_DATA; + RestoreOptions *ropt = AH->public.ropt; /* ENCODING and STDSTRINGS items are treated specially */ if (strcmp(te->desc, "ENCODING") == 0 || strcmp(te->desc, "STDSTRINGS") == 0) return REQ_SPECIAL; + /* + * DATABASE and DATABASE PROPERTIES also have a special rule: they are + * restored in createDB mode, and not restored otherwise, independently of + * all else. + */ + if (strcmp(te->desc, "DATABASE") == 0 || + strcmp(te->desc, "DATABASE PROPERTIES") == 0) + { + if (ropt->createDB) + return REQ_SCHEMA; + else + return 0; + } + + /* + * Process exclusions that affect certain classes of TOC entries. + */ + /* If it's an ACL, maybe ignore it */ if (ropt->aclsSkip && _tocEntryIsACL(te)) return 0; *************** _tocEntryRequired(TocEntry *te, teSectio *** 2842,2852 **** if (ropt->no_publications && strcmp(te->desc, "PUBLICATION") == 0) return 0; ! /* If it's security labels, maybe ignore it */ if (ropt->no_security_labels && strcmp(te->desc, "SECURITY LABEL") == 0) return 0; ! /* If it's a subcription, maybe ignore it */ if (ropt->no_subscriptions && strcmp(te->desc, "SUBSCRIPTION") == 0) return 0; --- 2845,2855 ---- if (ropt->no_publications && strcmp(te->desc, "PUBLICATION") == 0) return 0; ! /* If it's a security label, maybe ignore it */ if (ropt->no_security_labels && strcmp(te->desc, "SECURITY LABEL") == 0) return 0; ! /* If it's a subscription, maybe ignore it */ if (ropt->no_subscriptions && strcmp(te->desc, "SUBSCRIPTION") == 0) return 0; *************** _tocEntryRequired(TocEntry *te, teSectio *** 2870,2876 **** return 0; } ! /* Check options for selective dump/restore */ if (ropt->schemaNames.head != NULL) { /* If no namespace is specified, it means all. */ --- 2873,2924 ---- return 0; } ! /* Ignore it if rejected by idWanted[] (cf. SortTocFromFile) */ ! if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1]) ! return 0; ! ! /* ! * Check options for selective dump/restore. ! */ ! if (strcmp(te->desc, "ACL") == 0 || ! strcmp(te->desc, "COMMENT") == 0 || ! strcmp(te->desc, "SECURITY LABEL") == 0) ! { ! /* Database properties react to createDB, not selectivity options. */ ! if (strncmp(te->tag, "DATABASE ", 9) == 0) ! { ! if (!ropt->createDB) ! return 0; ! } ! else if (ropt->schemaNames.head != NULL || ! ropt->schemaExcludeNames.head != NULL || ! ropt->selTypes) ! { ! /* ! * In a selective dump/restore, we want to restore these dependent ! * TOC entry types only if their parent object is being restored. ! * Without selectivity options, we let through everything in the ! * archive. Note there may be such entries with no parent, eg ! * non-default ACLs for built-in objects. ! * ! * This code depends on the parent having been marked already, ! * which should be the case; if it isn't, perhaps due to ! * SortTocFromFile rearrangement, skipping the dependent entry ! * seems prudent anyway. ! * ! * Ideally we'd handle, eg, table CHECK constraints this way too. ! * But it's hard to tell which of their dependencies is the one to ! * consult. ! */ ! if (te->nDeps != 1 || ! TocIDRequired(AH, te->dependencies[0]) == 0) ! return 0; ! } ! } ! else ! { ! /* Apply selective-restore rules for standalone TOC entries. */ ! /* XXX reindent before committing */ if (ropt->schemaNames.head != NULL) { /* If no namespace is specified, it means all. */ *************** _tocEntryRequired(TocEntry *te, teSectio *** 2926,2934 **** else return 0; } /* ! * Check if we had a dataDumper. Indicates if the entry is schema or data */ if (!te->hadDumper) { --- 2974,2986 ---- else return 0; } + } /* ! * Determine whether the TOC entry contains schema and/or data components, ! * and mask off inapplicable REQ bits. If it had a dataDumper, assume ! * it's both schema and data. Otherwise it's probably schema-only, but ! * there are exceptions. */ if (!te->hadDumper) { *************** _tocEntryRequired(TocEntry *te, teSectio *** 2952,2957 **** --- 3004,3013 ---- res = res & ~REQ_DATA; } + /* If there's no definition command, there's no schema component */ + if (!te->defn || !te->defn[0]) + res = res & ~REQ_SCHEMA; + /* * Special case: <Init> type with <Max OID> tag; this is obsolete and we * always ignore it. *************** _tocEntryRequired(TocEntry *te, teSectio *** 2963,2974 **** if (ropt->schemaOnly) { /* ! * The sequence_data option overrides schema-only for SEQUENCE SET. * ! * In binary-upgrade mode, even with schema-only set, we do not mask ! * out large objects. Only large object definitions, comments and ! * other information should be generated in binary-upgrade mode (not ! * the actual data). */ if (!(ropt->sequence_data && strcmp(te->desc, "SEQUENCE SET") == 0) && !(ropt->binary_upgrade && --- 3019,3030 ---- if (ropt->schemaOnly) { /* ! * The sequence_data option overrides schemaOnly for SEQUENCE SET. * ! * In binary-upgrade mode, even with schemaOnly set, we do not mask ! * out large objects. (Only large object definitions, comments and ! * other metadata should be generated in binary-upgrade mode, not the ! * actual data, but that need not concern us here.) */ if (!(ropt->sequence_data && strcmp(te->desc, "SEQUENCE SET") == 0) && !(ropt->binary_upgrade && *************** _tocEntryRequired(TocEntry *te, teSectio *** 2986,2999 **** if (ropt->dataOnly) res = res & REQ_DATA; - /* Mask it if we don't have a schema contribution */ - if (!te->defn || strlen(te->defn) == 0) - res = res & ~REQ_SCHEMA; - - /* Finally, if there's a per-ID filter, limit based on that as well */ - if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1]) - return 0; - return res; } --- 3042,3047 ---- diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 50399c9..749182e 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *************** main(int argc, char **argv) *** 632,637 **** --- 632,644 ---- #endif /* + * If emitting an archive format, we always want to emit a DATABASE item, + * in case --create is specified at pg_restore time. + */ + if (!plainText) + dopt.outputCreateDB = 1; + + /* * On Windows we can only have at most MAXIMUM_WAIT_OBJECTS (= 64 usually) * parallel jobs because that's the maximum limit for the * WaitForMultipleObjects() call. *************** main(int argc, char **argv) *** 841,847 **** dumpStdStrings(fout); /* The database items are always next, unless we don't want them at all */ ! if (dopt.include_everything && !dopt.dataOnly) dumpDatabase(fout); /* Now the rearrangeable objects. */ --- 848,854 ---- dumpStdStrings(fout); /* The database items are always next, unless we don't want them at all */ ! if (dopt.outputCreateDB) dumpDatabase(fout); /* Now the rearrangeable objects. */
pgsql-hackers by date: