Thread: [PATCH] minor bug fix for pg_dump --clean
Hello, When using pg_dump (or pg_restore) with option "--clean", there is some SQL code to drop every objects at the beginning. The DROP statement for a view involving circular dependencies is : CREATE OR REPLACE VIEW [...] (see commit message of d8c05aff for a much better explanation) If the view is not in the "public" schema, and the target database is empty, this statement fails, because the schema hasn't been created yet. The attached patches are a TAP test which can be used to reproduce the bug, and a proposed fix. They apply to the master branch. Best regards, Frédéric
Attachment
Hi,
Good catch! Here are several points for improvement:
1. pg_dump.c:17380
Maybe better to write simpler
appendPQExpBuffer(delcmd, "CREATE SCHEMA IF NOT EXISTS %s;\n", tbinfo->dobj.namespace->dobj.name);
because there is a schema name inside the `tbinfo->dobj.namespace->dobj.name`
2. pg_backup_archiver.c:588
Here are no necessary spaces before and after braces, and no spaces around the '+' sign.
( strncmp(dropStmt, "CREATE SCHEMA IF NOT EXISTS", 27) == 0 &&
strstr(dropStmt+29, "CREATE OR REPLACE VIEW") ))
Best regards,
Viktoria Shepard
чт, 1 сент. 2022 г. в 12:13, Frédéric Yhuel <frederic.yhuel@dalibo.com>:
Hello,
When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.
The DROP statement for a view involving circular dependencies is :
CREATE OR REPLACE VIEW [...]
(see commit message of d8c05aff for a much better explanation)
If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.
The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.
Best regards,
Frédéric
=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes: > When using pg_dump (or pg_restore) with option "--clean", there is some > SQL code to drop every objects at the beginning. Yup ... > The DROP statement for a view involving circular dependencies is : > CREATE OR REPLACE VIEW [...] > (see commit message of d8c05aff for a much better explanation) > If the view is not in the "public" schema, and the target database is > empty, this statement fails, because the schema hasn't been created yet. > The attached patches are a TAP test which can be used to reproduce the > bug, and a proposed fix. They apply to the master branch. I am disinclined to accept this as a valid bug, because there's never been any guarantee that a --clean script would execute error-free in a database that doesn't match what the source database contains. (The pg_dump documentation used to say that in so many words. I see that whoever added the --if-exists option was under the fond illusion that that fixes all cases, which it surely does not. We need to back off the promises a bit there.) An example of a case that won't execute error-free is if the view having a circular dependency includes a column of a non-built-in data type. If you try to run that in an empty database, you'll get an error from the CREATE OR REPLACE VIEW's reference to that data type. For instance, if I adjust your test case to make the "payload" column be of type hstore, I get something like psql:dumpresult.sql:22: ERROR: type "public.hstore" does not exist LINE 4: NULL::public.hstore AS payload; ^ The same type of failure occurs for user-defined functions and operators that use a non-built-in type, and I'm sure there are more cases in the same vein. But it gets *really* messy if the target database isn't completely empty, but contains objects with different properties than the dump script expects; for example, if the view being discussed here exists with a different column set than the script thinks, or if the dependency chains aren't all the same. If this fix were cleaner I might be inclined to accept it anyway, but it's not very clean at all --- for example, it's far from obvious to me what are the side-effects of changing the filter in RestoreArchive like that. Nor am I sure that the schema you want to create is guaranteed to get dropped again later in every use-case. So I think mainly what we ought to do here is to adjust the documentation to make it clearer that --clean is not guaranteed to work without errors unless the target database has the same set of objects as the source. --if-exists can reduce the set of error cases, but not eliminate it. Possibly we should be more enthusiastic about recommending --create --clean (ie, drop and recreate the whole database) instead. regards, tom lane
On 10/24/22 03:01, Tom Lane wrote: > =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes: >> When using pg_dump (or pg_restore) with option "--clean", there is some >> SQL code to drop every objects at the beginning. > > Yup ... > >> The DROP statement for a view involving circular dependencies is : >> CREATE OR REPLACE VIEW [...] >> (see commit message of d8c05aff for a much better explanation) >> If the view is not in the "public" schema, and the target database is >> empty, this statement fails, because the schema hasn't been created yet. >> The attached patches are a TAP test which can be used to reproduce the >> bug, and a proposed fix. They apply to the master branch. > > I am disinclined to accept this as a valid bug, because there's never > been any guarantee that a --clean script would execute error-free in > a database that doesn't match what the source database contains. > > (The pg_dump documentation used to say that in so many words. > I see that whoever added the --if-exists option was under the > fond illusion that that fixes all cases, which it surely does not. > We need to back off the promises a bit there.) > > An example of a case that won't execute error-free is if the view > having a circular dependency includes a column of a non-built-in > data type. If you try to run that in an empty database, you'll > get an error from the CREATE OR REPLACE VIEW's reference to that > data type. For instance, if I adjust your test case to make > the "payload" column be of type hstore, I get something like > > psql:dumpresult.sql:22: ERROR: type "public.hstore" does not exist > LINE 4: NULL::public.hstore AS payload; > ^ > > The same type of failure occurs for user-defined functions and > operators that use a non-built-in type, and I'm sure there are > more cases in the same vein. But it gets *really* messy if > the target database isn't completely empty, but contains objects > with different properties than the dump script expects; for example, > if the view being discussed here exists with a different column set > than the script thinks, or if the dependency chains aren't all the > same. > > If this fix were cleaner I might be inclined to accept it anyway, > but it's not very clean at all --- for example, it's far from > obvious to me what are the side-effects of changing the filter > in RestoreArchive like that. Nor am I sure that the schema > you want to create is guaranteed to get dropped again later in > every use-case. > Hi Tom, Viktoria, Thank you for your review Viktoria! Thank you for this detailed explanation, Tom! I didn't have great hope for this patch. I thought that the TAP test could be accepted, but now I can see that it is clearly useless. > So I think mainly what we ought to do here is to adjust the > documentation to make it clearer that --clean is not guaranteed > to work without errors unless the target database has the same > set of objects as the source. --if-exists can reduce the set > of error cases, but not eliminate it. Possibly we should be > more enthusiastic about recommending --create --clean (ie, > drop and recreate the whole database) instead. > I beleive a documentation patch would be useful, indeed. Best regards, Frédéric
FYI, this was improved in a recent commit: commit 75af0f401f Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Sep 29 13:13:54 2023 -0400 Doc: improve description of dump/restore's --clean and --if-exists. Try to make these option descriptions a little clearer for novices. Per gripe from Attila Gulyás. Discussion: https://postgr.es/m/169590536647.3727336.11070254203649648453@wrigleys.postgresql.org --------------------------------------------------------------------------- On Mon, Oct 24, 2022 at 09:02:46AM +0200, Frédéric Yhuel wrote: > On 10/24/22 03:01, Tom Lane wrote: > > =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes: > > > When using pg_dump (or pg_restore) with option "--clean", there is some > > > SQL code to drop every objects at the beginning. > > > > Yup ... > > > > > The DROP statement for a view involving circular dependencies is : > > > CREATE OR REPLACE VIEW [...] > > > (see commit message of d8c05aff for a much better explanation) > > > If the view is not in the "public" schema, and the target database is > > > empty, this statement fails, because the schema hasn't been created yet. > > > The attached patches are a TAP test which can be used to reproduce the > > > bug, and a proposed fix. They apply to the master branch. > > > > I am disinclined to accept this as a valid bug, because there's never > > been any guarantee that a --clean script would execute error-free in > > a database that doesn't match what the source database contains. > > > > (The pg_dump documentation used to say that in so many words. > > I see that whoever added the --if-exists option was under the > > fond illusion that that fixes all cases, which it surely does not. > > We need to back off the promises a bit there.) > > > > An example of a case that won't execute error-free is if the view > > having a circular dependency includes a column of a non-built-in > > data type. If you try to run that in an empty database, you'll > > get an error from the CREATE OR REPLACE VIEW's reference to that > > data type. For instance, if I adjust your test case to make > > the "payload" column be of type hstore, I get something like > > > > psql:dumpresult.sql:22: ERROR: type "public.hstore" does not exist > > LINE 4: NULL::public.hstore AS payload; > > ^ > > > > The same type of failure occurs for user-defined functions and > > operators that use a non-built-in type, and I'm sure there are > > more cases in the same vein. But it gets *really* messy if > > the target database isn't completely empty, but contains objects > > with different properties than the dump script expects; for example, > > if the view being discussed here exists with a different column set > > than the script thinks, or if the dependency chains aren't all the > > same. > > > > If this fix were cleaner I might be inclined to accept it anyway, > > but it's not very clean at all --- for example, it's far from > > obvious to me what are the side-effects of changing the filter > > in RestoreArchive like that. Nor am I sure that the schema > > you want to create is guaranteed to get dropped again later in > > every use-case. > > > > Hi Tom, Viktoria, > > Thank you for your review Viktoria! > > Thank you for this detailed explanation, Tom! I didn't have great hope for > this patch. I thought that the TAP test could be accepted, but now I can see > that it is clearly useless. > > > > So I think mainly what we ought to do here is to adjust the > > documentation to make it clearer that --clean is not guaranteed > > to work without errors unless the target database has the same > > set of objects as the source. --if-exists can reduce the set > > of error cases, but not eliminate it. Possibly we should be > > more enthusiastic about recommending --create --clean (ie, > > drop and recreate the whole database) instead. > > > > I beleive a documentation patch would be useful, indeed. > > Best regards, > Frédéric > > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.