Re: Non-text mode for pg_dumpall - Mailing list pgsql-hackers
From | Mahendra Singh Thalor |
---|---|
Subject | Re: Non-text mode for pg_dumpall |
Date | |
Msg-id | CAKYtNAqXTvfAw-y4FHvzprg72cFrC63cge9xMVDO_R4=NHc5rA@mail.gmail.com Whole thread Raw |
In response to | Re: Non-text mode for pg_dumpall (Noah Misch <noah@leadboat.com>) |
Responses |
Re: Non-text mode for pg_dumpall
Re: Non-text mode for pg_dumpall |
List | pgsql-hackers |
Thanks Noah for the feedback. On Wed, 16 Jul 2025 at 05:50, Noah Misch <noah@leadboat.com> wrote: > > On Thu, Jul 10, 2025 at 12:21:03AM +0530, Mahendra Singh Thalor wrote: > > On Wed, 9 Jul 2025 at 02:58, Noah Misch <noah@leadboat.com> wrote: > > > On Fri, Apr 04, 2025 at 04:11:05PM -0400, Andrew Dunstan wrote: > > > > Thanks. I have pushed these now with a few further small tweaks. > > > > > > This drops all databases: > > > > > > pg_dumpall --clean -Fd -f /tmp/dump > > > pg_restore -d template1 --globals-only /tmp/dump > > > > > > That didn't match my expectations given this help text: > > > > > > $ pg_restore --help|grep global > > > -g, --globals-only restore only global objects, no databases > > > > Databases are global objects so due to --clean command, we are putting > > drop commands in global.dat for all the databases. While restoring, we > > used the "--globals-only" option so we are dropping all these > > databases by global.dat file. > > > > Please let us know your expectations for this specific case. > > Be consistent with "pg_dump". A quick check suggests "pg_dump --clean" > affects plain format only. For non-plain formats, only the pg_restore > argument governs the final commands: > > $ rm -r /tmp/dump; pg_dump --clean -Fd -f /tmp/dump && pg_restore -f- /tmp/dump | grep DROP > $ rm -r /tmp/dump; pg_dump --clean -Fd -f /tmp/dump && pg_restore --clean -f- /tmp/dump | grep DROP > DROP TABLE public.example; > > That said, you should audit code referencing the --clean flag and see if > there's more to it than that quick test suggests. Note that aligning with > pg_dump will require changes for object types beyond databases. "pg_restore > --clean" of a global dump should emit DROP TABLESPACE and DROP ROLE as > appropriate, regardless of whether the original pg_dumpall had --clean. > > For my earlier example (pg_dumpall --clean; pg_restore --globals-only) I > expect the same outcome as plain-format "pg_dumpall --globals-only", which is > no databases dropped or created. The help line says "no databases". Plain > "pg_dumpall --globals-only" and even "pg_dumpall --globals-only --clean" do > not drop or create databases. To pg_restore, we are giving a dump of pg_dumpall which has a global.dat file and we have drop commands in the global.dat file so when we are using 'globals-only', we are dropping databases as we have DROP commands. As of now, we don't have any filter for global.dat file in restore. If a user wants to restore only globals(without droping db), then they should use 'globals-only' in pg_dumpall. Or if we don't want to DROP databases by global.dat file, then we should add a filter in pg_restore (hard to implement as we have SQL commands in global.dat file). I think, for this case, we can do some more doc changes. Example: pg_restore --globals-only : this will restore the global.dat file(including all drop commands). It might drop databases if any drop commands. @Andrew Dunstan Please add your opinion. > > > > commit 1495eff wrote: > > > > --- a/src/bin/pg_dump/pg_dumpall.c > > > > +++ b/src/bin/pg_dump/pg_dumpall.c > > > > > > > @@ -1612,9 +1683,27 @@ dumpDatabases(PGconn *conn) > > > > continue; > > > > } > > > > > > > > + /* > > > > + * If this is not a plain format dump, then append dboid and dbname to > > > > + * the map.dat file. > > > > + */ > > > > + if (archDumpFormat != archNull) > > > > + { > > > > + if (archDumpFormat == archCustom) > > > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".dmp", db_subdir, oid); > > > > + else if (archDumpFormat == archTar) > > > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".tar", db_subdir, oid); > > > > + else > > > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\"", db_subdir, oid); > > > > > > Use appendShellString() instead. Plain mode already does that for the > > > "pg_dumpall -f" argument, which is part of db_subdir here. We don't want > > > weird filename characters to work out differently for plain vs. non-plain > > > mode. Also, it's easier to search for appendShellString() than to search for > > > open-coded shell quoting. > > > > Yes, we can use appendShellString also. We are using snprintf in the > > pg_dump.c file also. > > Ex: snprintf(tagbuf, sizeof(tagbuf), "LARGE OBJECTS %u..%u", > > loinfo->looids[0], loinfo->looids[loinfo->numlos - 1]); > > It's true snprintf() is not banned in these programs, but don't use it to do > the quoting for OS shell command lines or fragments thereof. dbfilepath is a > fragment of an OS shell command line. The LARGE OBJECTS string is not one of > those. Hence, the LARGE OBJECTS scenario should keep using snprintf(). > > > If we want to use appendShellString, I can write a patch for these. > > Please let me know your opinion. > > Use appendShellString() for shell quoting. Don't attempt to use it for other > purposes. Okay. Fixed in attached patch. > > > > > --- a/src/bin/pg_dump/pg_restore.c > > > > +++ b/src/bin/pg_dump/pg_restore.c > > > > > > > +/* > > > > + * read_one_statement > > > > + * > > > > + * This will start reading from passed file pointer using fgetc and read till > > > > + * semicolon(sql statement terminator for global.dat file) > > > > + * > > > > + * EOF is returned if end-of-file input is seen; time to shut down. > > > > > > What makes it okay to use this particular subset of SQL lexing? > > > > To support complex syntax, we used this code from another file. > > I'm hearing that you copied this code from somewhere. Running > "git grep 'time to shut down'" suggests you copied it from > InteractiveBackend(). Is that right? I do see other similarities between > read_one_statement() and InteractiveBackend(). > > Copying InteractiveBackend() provides negligible assurance that this is the > right subset of SQL lexing. Only single-user mode uses InteractiveBackend(). > Single-user mode survives mostly as a last resort for recovering from having > reached xidStopLimit, is rarely used, and only superusers write queries to it. Yes, we copied this from InteractiveBackend to read statements from global.dat file. > > > > > +/* > > > > + * get_dbnames_list_to_restore > > > > + * > > > > + * This will mark for skipping any entries from dbname_oid_list that pattern match an > > > > + * entry in the db_exclude_patterns list. > > > > + * > > > > + * Returns the number of database to be restored. > > > > + * > > > > + */ > > > > +static int > > > > +get_dbnames_list_to_restore(PGconn *conn, > > > > + SimpleOidStringList *dbname_oid_list, > > > > + SimpleStringList db_exclude_patterns) > > > > +{ > > > > + int count_db = 0; > > > > + PQExpBuffer query; > > > > + PGresult *res; > > > > + > > > > + query = createPQExpBuffer(); > > > > + > > > > + if (!conn) > > > > + pg_log_info("considering PATTERN as NAME for --exclude-database option as no db connection while doingpg_restore."); > > > > > > When do we not have a connection here? We'd need to document this behavior > > > variation if it stays, but I'd prefer if we can just rely on having a > > > connection. > > > > Yes, we can document this behavior. > > My review asked a question there. I don't see an answer to that question. > Would you answer that question? Example: if there is no active database, even postgres/template1, then we will consider PATTEREN as NAME. This is the rare case. In attached patch, I added one doc line also for this case. > > @@ -1612,9 +1683,27 @@ dumpDatabases(PGconn *conn) > > continue; > > } > > > > + /* > > + * If this is not a plain format dump, then append dboid and dbname to > > + * the map.dat file. > > + */ > > + if (archDumpFormat != archNull) > > + { > > + if (archDumpFormat == archCustom) > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".dmp", db_subdir, oid); > > + else if (archDumpFormat == archTar) > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".tar", db_subdir, oid); > > + else > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\"", db_subdir, oid); > > Use appendShellString() instead. Plain mode already does that for the > "pg_dumpall -f" argument, which is part of db_subdir here. We don't want > weird filename characters to work out differently for plain vs. non-plain > mode. Also, it's easier to search for appendShellString() than to search for > open-coded shell quoting. Fixed. > > > @@ -1641,19 +1727,30 @@ dumpDatabases(PGconn *conn) > > if (filename) > > fclose(OPF); > > > > - ret = runPgDump(dbname, create_opts); > > + ret = runPgDump(dbname, create_opts, dbfilepath, archDumpFormat); > > if (ret != 0) > > pg_fatal("pg_dump failed on database \"%s\", exiting", dbname); > > > > if (filename) > > { > > - OPF = fopen(filename, PG_BINARY_A); > > + char global_path[MAXPGPATH]; > > + > > + if (archDumpFormat != archNull) > > + snprintf(global_path, MAXPGPATH, "%s/global.dat", filename); > > + else > > + snprintf(global_path, MAXPGPATH, "%s", filename); > > + > > + OPF = fopen(global_path, PG_BINARY_A); > > if (!OPF) > > pg_fatal("could not re-open the output file \"%s\": %m", > > - filename); > > + global_path); > > Minor item: plain mode benefits from reopening, because pg_dump appended to > the plain output file. There's no analogous need to reopen global.dat, since > just this one process writes to global.dat. Fixed. > > > @@ -1672,17 +1770,36 @@ runPgDump(const char *dbname, const char *create_opts) > > initPQExpBuffer(&connstrbuf); > > initPQExpBuffer(&cmd); > > > > - printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, > > - pgdumpopts->data, create_opts); > > - > > /* > > - * If we have a filename, use the undocumented plain-append pg_dump > > - * format. > > + * If this is not a plain format dump, then append file name and dump > > + * format to the pg_dump command to get archive dump. > > */ > > - if (filename) > > - appendPQExpBufferStr(&cmd, " -Fa "); > > + if (archDumpFormat != archNull) > > + { > > + printfPQExpBuffer(&cmd, "\"%s\" -f %s %s", pg_dump_bin, > > + dbfile, create_opts); > > + > > + if (archDumpFormat == archDirectory) > > + appendPQExpBufferStr(&cmd, " --format=directory "); > > + else if (archDumpFormat == archCustom) > > + appendPQExpBufferStr(&cmd, " --format=custom "); > > + else if (archDumpFormat == archTar) > > + appendPQExpBufferStr(&cmd, " --format=tar "); > > + } > > else > > - appendPQExpBufferStr(&cmd, " -Fp "); > > + { > > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, > > + pgdumpopts->data, create_opts); > > This uses pgdumpopts for plain mode only, so many pg_dumpall options silently > have no effect in non-plain mode. Example: > > strace -f pg_dumpall --lock-wait-timeout=10 2>&1 >/dev/null | grep exec > strace -f pg_dumpall --lock-wait-timeout=10 -Fd -f /tmp/dump3 2>&1 >/dev/null | grep exec Fixed. > > + /* If database is already created, then don't set createDB flag. */ > > + if (opts->cparams.dbname) > > + { > > + PGconn *test_conn; > > + > > + test_conn = ConnectDatabase(db_cell->str, NULL, opts->cparams.pghost, > > + opts->cparams.pgport, opts->cparams.username,TRI_DEFAULT, > > + false, progname, NULL, NULL, NULL, NULL); > > + if (test_conn) > > + { > > + PQfinish(test_conn); > > + > > + /* Use already created database for connection. */ > > + opts->createDB = 0; > > + opts->cparams.dbname = db_cell->str; > > + } > > + else > > + { > > + /* we'll have to create it */ > > + opts->createDB = 1; > > + opts->cparams.dbname = connected_db; > > + } > > In released versions, "pg_restore --create" fails if the database exists, and > pg_restore w/o --create fails unless the database exists. I think we should > continue that pattern in this new feature. If not, pg_restore should document > how it treats pg_dumpall-sourced dumps with the "create if not exists" > semantics appearing here. Added one more doc line for this case. Here, I am attaching a patch. Please let me know feedback. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: