Thread: [PATCH] minor bug fix for pg_dump --clean

[PATCH] minor bug fix for pg_dump --clean

From
Frédéric Yhuel
Date:
Hello,

When using pg_dump (or pg_restore) with option "--clean", there is some 
SQL code to drop every objects at the beginning.

The DROP statement for a view involving circular dependencies is :

CREATE OR REPLACE VIEW [...]

(see commit message of d8c05aff for a much better explanation)

If the view is not in the "public" schema, and the target database is 
empty, this statement fails, because the schema hasn't been created yet.

The attached patches are a TAP test which can be used to reproduce the 
bug, and a proposed fix. They apply to the master branch.

Best regards,
Frédéric
Attachment

Re: [PATCH] minor bug fix for pg_dump --clean

From
Виктория Шепард
Date:
Hi,

Good catch! Here are several points for improvement:
1. pg_dump.c:17380
Maybe better to write simpler

appendPQExpBuffer(delcmd, "CREATE SCHEMA IF NOT EXISTS %s;\n", tbinfo->dobj.namespace->dobj.name);

because there is a schema name inside the `tbinfo->dobj.namespace->dobj.name`

2. pg_backup_archiver.c:588
Here are no necessary spaces before and after braces, and no spaces around the '+' sign.
( strncmp(dropStmt, "CREATE SCHEMA IF NOT EXISTS", 27) == 0 &&
strstr(dropStmt+29, "CREATE OR REPLACE VIEW") ))

Best regards,
Viktoria Shepard

чт, 1 сент. 2022 г. в 12:13, Frédéric Yhuel <frederic.yhuel@dalibo.com>:
Hello,

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.

The DROP statement for a view involving circular dependencies is :

CREATE OR REPLACE VIEW [...]

(see commit message of d8c05aff for a much better explanation)

If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.

The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.

Best regards,
Frédéric

Re: [PATCH] minor bug fix for pg_dump --clean

From
Tom Lane
Date:
=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:
> When using pg_dump (or pg_restore) with option "--clean", there is some 
> SQL code to drop every objects at the beginning.

Yup ...

> The DROP statement for a view involving circular dependencies is :
> CREATE OR REPLACE VIEW [...]
> (see commit message of d8c05aff for a much better explanation)
> If the view is not in the "public" schema, and the target database is 
> empty, this statement fails, because the schema hasn't been created yet.
> The attached patches are a TAP test which can be used to reproduce the 
> bug, and a proposed fix. They apply to the master branch.

I am disinclined to accept this as a valid bug, because there's never
been any guarantee that a --clean script would execute error-free in
a database that doesn't match what the source database contains.

(The pg_dump documentation used to say that in so many words.
I see that whoever added the --if-exists option was under the
fond illusion that that fixes all cases, which it surely does not.
We need to back off the promises a bit there.)

An example of a case that won't execute error-free is if the view
having a circular dependency includes a column of a non-built-in
data type.  If you try to run that in an empty database, you'll
get an error from the CREATE OR REPLACE VIEW's reference to that
data type.  For instance, if I adjust your test case to make
the "payload" column be of type hstore, I get something like

psql:dumpresult.sql:22: ERROR:  type "public.hstore" does not exist
LINE 4:     NULL::public.hstore AS payload;
                  ^

The same type of failure occurs for user-defined functions and
operators that use a non-built-in type, and I'm sure there are
more cases in the same vein.  But it gets *really* messy if
the target database isn't completely empty, but contains objects
with different properties than the dump script expects; for example,
if the view being discussed here exists with a different column set
than the script thinks, or if the dependency chains aren't all the
same.

If this fix were cleaner I might be inclined to accept it anyway,
but it's not very clean at all --- for example, it's far from
obvious to me what are the side-effects of changing the filter
in RestoreArchive like that.  Nor am I sure that the schema
you want to create is guaranteed to get dropped again later in
every use-case.

So I think mainly what we ought to do here is to adjust the
documentation to make it clearer that --clean is not guaranteed
to work without errors unless the target database has the same
set of objects as the source.  --if-exists can reduce the set
of error cases, but not eliminate it.  Possibly we should be
more enthusiastic about recommending --create --clean (ie,
drop and recreate the whole database) instead.

            regards, tom lane



Re: [PATCH] minor bug fix for pg_dump --clean

From
Frédéric Yhuel
Date:
On 10/24/22 03:01, Tom Lane wrote:
> =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:
>> When using pg_dump (or pg_restore) with option "--clean", there is some
>> SQL code to drop every objects at the beginning.
> 
> Yup ...
> 
>> The DROP statement for a view involving circular dependencies is :
>> CREATE OR REPLACE VIEW [...]
>> (see commit message of d8c05aff for a much better explanation)
>> If the view is not in the "public" schema, and the target database is
>> empty, this statement fails, because the schema hasn't been created yet.
>> The attached patches are a TAP test which can be used to reproduce the
>> bug, and a proposed fix. They apply to the master branch.
> 
> I am disinclined to accept this as a valid bug, because there's never
> been any guarantee that a --clean script would execute error-free in
> a database that doesn't match what the source database contains.
> 
> (The pg_dump documentation used to say that in so many words.
> I see that whoever added the --if-exists option was under the
> fond illusion that that fixes all cases, which it surely does not.
> We need to back off the promises a bit there.)
> 
> An example of a case that won't execute error-free is if the view
> having a circular dependency includes a column of a non-built-in
> data type.  If you try to run that in an empty database, you'll
> get an error from the CREATE OR REPLACE VIEW's reference to that
> data type.  For instance, if I adjust your test case to make
> the "payload" column be of type hstore, I get something like
> 
> psql:dumpresult.sql:22: ERROR:  type "public.hstore" does not exist
> LINE 4:     NULL::public.hstore AS payload;
>                    ^
> 
> The same type of failure occurs for user-defined functions and
> operators that use a non-built-in type, and I'm sure there are
> more cases in the same vein.  But it gets *really* messy if
> the target database isn't completely empty, but contains objects
> with different properties than the dump script expects; for example,
> if the view being discussed here exists with a different column set
> than the script thinks, or if the dependency chains aren't all the
> same.
> 
> If this fix were cleaner I might be inclined to accept it anyway,
> but it's not very clean at all --- for example, it's far from
> obvious to me what are the side-effects of changing the filter
> in RestoreArchive like that.  Nor am I sure that the schema
> you want to create is guaranteed to get dropped again later in
> every use-case.
> 

Hi Tom, Viktoria,

Thank you for your review Viktoria!

Thank you for this detailed explanation, Tom! I didn't have great hope 
for this patch. I thought that the TAP test could be accepted, but now I 
can see that it is clearly useless.


> So I think mainly what we ought to do here is to adjust the
> documentation to make it clearer that --clean is not guaranteed
> to work without errors unless the target database has the same
> set of objects as the source.  --if-exists can reduce the set
> of error cases, but not eliminate it.  Possibly we should be
> more enthusiastic about recommending --create --clean (ie,
> drop and recreate the whole database) instead.
> 

I beleive a documentation patch would be useful, indeed.

Best regards,
Frédéric



Re: [PATCH] minor bug fix for pg_dump --clean

From
Bruce Momjian
Date:
FYI, this was improved in a recent commit:

    commit 75af0f401f
    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Date:   Fri Sep 29 13:13:54 2023 -0400
    
        Doc: improve description of dump/restore's --clean and --if-exists.
    
        Try to make these option descriptions a little clearer for novices.
        Per gripe from Attila Gulyás.
    
        Discussion: https://postgr.es/m/169590536647.3727336.11070254203649648453@wrigleys.postgresql.org


---------------------------------------------------------------------------

On Mon, Oct 24, 2022 at 09:02:46AM +0200, Frédéric Yhuel wrote:
> On 10/24/22 03:01, Tom Lane wrote:
> > =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:
> > > When using pg_dump (or pg_restore) with option "--clean", there is some
> > > SQL code to drop every objects at the beginning.
> > 
> > Yup ...
> > 
> > > The DROP statement for a view involving circular dependencies is :
> > > CREATE OR REPLACE VIEW [...]
> > > (see commit message of d8c05aff for a much better explanation)
> > > If the view is not in the "public" schema, and the target database is
> > > empty, this statement fails, because the schema hasn't been created yet.
> > > The attached patches are a TAP test which can be used to reproduce the
> > > bug, and a proposed fix. They apply to the master branch.
> > 
> > I am disinclined to accept this as a valid bug, because there's never
> > been any guarantee that a --clean script would execute error-free in
> > a database that doesn't match what the source database contains.
> > 
> > (The pg_dump documentation used to say that in so many words.
> > I see that whoever added the --if-exists option was under the
> > fond illusion that that fixes all cases, which it surely does not.
> > We need to back off the promises a bit there.)
> > 
> > An example of a case that won't execute error-free is if the view
> > having a circular dependency includes a column of a non-built-in
> > data type.  If you try to run that in an empty database, you'll
> > get an error from the CREATE OR REPLACE VIEW's reference to that
> > data type.  For instance, if I adjust your test case to make
> > the "payload" column be of type hstore, I get something like
> > 
> > psql:dumpresult.sql:22: ERROR:  type "public.hstore" does not exist
> > LINE 4:     NULL::public.hstore AS payload;
> >                    ^
> > 
> > The same type of failure occurs for user-defined functions and
> > operators that use a non-built-in type, and I'm sure there are
> > more cases in the same vein.  But it gets *really* messy if
> > the target database isn't completely empty, but contains objects
> > with different properties than the dump script expects; for example,
> > if the view being discussed here exists with a different column set
> > than the script thinks, or if the dependency chains aren't all the
> > same.
> > 
> > If this fix were cleaner I might be inclined to accept it anyway,
> > but it's not very clean at all --- for example, it's far from
> > obvious to me what are the side-effects of changing the filter
> > in RestoreArchive like that.  Nor am I sure that the schema
> > you want to create is guaranteed to get dropped again later in
> > every use-case.
> > 
> 
> Hi Tom, Viktoria,
> 
> Thank you for your review Viktoria!
> 
> Thank you for this detailed explanation, Tom! I didn't have great hope for
> this patch. I thought that the TAP test could be accepted, but now I can see
> that it is clearly useless.
> 
> 
> > So I think mainly what we ought to do here is to adjust the
> > documentation to make it clearer that --clean is not guaranteed
> > to work without errors unless the target database has the same
> > set of objects as the source.  --if-exists can reduce the set
> > of error cases, but not eliminate it.  Possibly we should be
> > more enthusiastic about recommending --create --clean (ie,
> > drop and recreate the whole database) instead.
> > 
> 
> I beleive a documentation patch would be useful, indeed.
> 
> Best regards,
> Frédéric
> 
> 

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.