Thread: Detect supported SET parameters when pg_restore is run

Detect supported SET parameters when pg_restore is run

From
Vitaly Burovoy
Date:
Hackers,

At work we use several major versions of PostgreSQL, and developers
use non-local clusters for developing and debugging.
We do dump/restore schemas/data via custom/dir formats and we have to
keep several client versions for 9.2, 9.4 and 9.5 versions on local
workstations because after pg_restore95 connects to 9.2, it fails when
it sets run-time parameters via SET:

vitaly@work 01:21:26 ~ $ pg_restore95 --host DEV_HOST_9_2 -d DBNAME
--data-only -e -1 -Fc arhcivefile
Password:
pg_restore95: [archiver (db)] Error while INITIALIZING:
pg_restore95: [archiver (db)] could not execute query: ERROR:
unrecognized configuration parameter "lock_timeout"
    Command was: SET lock_timeout = 0;

Of course, it can be fixed avoiding "--single-transaction", but if
there is inconsistent schema (or stricter constraints) part of
schema/data is already changed/inserted and a lot of errors are
generated for the next pg_restore run.

The pd_dump has checks in "setup_connection" function to detect what
to send after connection is done for dumping, but there is no checks
in _doSetFixedOutputState for restoring. If there are checks it is
possible to use a single version pg_dump96/pg_restore96 to
dump/restore, for example 9.2->9.2 as well as 9.4->9.4 and so on.

The only trouble we have is in "SET" block and after some research I
discovered it is possible not to send unsupported SET options to the
database.

Please, find attached simple patch.

For restoring to stdout (or dumping to a plain SQL file) I left
current behavior: all options in the SET block are written.
Also I left "SET row_security = on;" if "enable_row_security" is set
to break restoring to a DB non-supported version.

--
Best regards,
Vitaly Burovoy

Attachment

Re: Detect supported SET parameters when pg_restore is run

From
Robert Haas
Date:
On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy
<vitaly.burovoy@gmail.com> wrote:
> At work we use several major versions of PostgreSQL, and developers
> use non-local clusters for developing and debugging.
> We do dump/restore schemas/data via custom/dir formats and we have to
> keep several client versions for 9.2, 9.4 and 9.5 versions on local
> workstations because after pg_restore95 connects to 9.2, it fails when
> it sets run-time parameters via SET:
>
> vitaly@work 01:21:26 ~ $ pg_restore95 --host DEV_HOST_9_2 -d DBNAME
> --data-only -e -1 -Fc arhcivefile
> Password:
> pg_restore95: [archiver (db)] Error while INITIALIZING:
> pg_restore95: [archiver (db)] could not execute query: ERROR:
> unrecognized configuration parameter "lock_timeout"
>     Command was: SET lock_timeout = 0;

I think our policy is that a newer pg_dump needs to work with an older
version of the database, but I don't think we make any similar
guarantee for other tools, such as pg_restore.  It's an interesting
question whether we should try a little harder to do that, but I
suspect it might require more changes than what you've got in this
patch....

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Detect supported SET parameters when pg_restore is run

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy
> <vitaly.burovoy@gmail.com> wrote:
>> We do dump/restore schemas/data via custom/dir formats and we have to
>> keep several client versions for 9.2, 9.4 and 9.5 versions on local
>> workstations because after pg_restore95 connects to 9.2, it fails when
>> it sets run-time parameters via SET:

> I think our policy is that a newer pg_dump needs to work with an older
> version of the database, but I don't think we make any similar
> guarantee for other tools, such as pg_restore.  It's an interesting
> question whether we should try a little harder to do that, but I
> suspect it might require more changes than what you've got in this
> patch....

The general policy has always been that pg_dump output is only expected to
restore without errors into a server that's the same or newer version as
pg_dump (regardless of the server version being dumped from).  If you try
to restore into an older server version, you may get some errors, which we
try to minimize the scope of when possible.  But it will never be possible
to promise none at all.  I think the correct advice here is simply "don't
use pg_restore -e -1 when trying to restore into an older server version".

Taking a step back, there are any number of ways that you might get
errors during a pg_restore into a DB that's not set up exactly as pg_dump
expected.  Missing roles or tablespaces, for example.  Again, the dump
output is constructed so that you can survive those problems and bull
ahead ... but not with "-e -1".  I don't see a very good reason why
older-server-version shouldn't be considered the same type of problem.

The patch as given seems rather broken anyway --- won't it change text
output from pg_dump as well as on-line output from pg_restore?  (That is,
it looks to me like the SETs emitted by pg_dump to text format would
depend on the source server version, which they absolutely should not.
Either that, or the patch is overwriting pg_dump's idea of what the
source server version is at the start of the output phase, which is
likely to break all kinds of stuff when dumping from an older server.)

It's possible that we could alleviate this specific symptom by arranging
for the BEGIN caused by "-1" to come out after the initial SETs, but
I'm not sure how complicated that would be or whether it's worth the
trouble.
        regards, tom lane



Re: Detect supported SET parameters when pg_restore is run

From
Vitaly Burovoy
Date:
On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy
>> <vitaly.burovoy@gmail.com> wrote:
>>> We do dump/restore schemas/data via custom/dir formats and we have to
>>> keep several client versions for 9.2, 9.4 and 9.5 versions on local
>>> workstations because after pg_restore95 connects to 9.2, it fails when
>>> it sets run-time parameters via SET:
>
>> I think our policy is that a newer pg_dump needs to work with an older
>> version of the database, but I don't think we make any similar
>> guarantee for other tools, such as pg_restore.  It's an interesting
>> question whether we should try a little harder to do that, but I
>> suspect it might require more changes than what you've got in this
>> patch....

Well... I'm not inclined to insert support of restoring from a higher
major version to a lower one, because it can lead to security issues
(e.g. with RLS). But my observation is that all new features supported
by the "pg_dump" are "incremental", e.g. the last feature "parallel"
for pg_dump/pg_restore --- lack of "PARALLEL UNSAFE" (which is by
default) from 9.6 and lack of it from pre-9.6.

That behavior allows newer versions of pg_restore to use dumps from DB
of older versions because of lack of new features grammar. With the
patch I'm able to use pg_dump96/pg_restore96 for our database of 9.2
(dump from 9.2 and restore to 9.2).

> The general policy has always been that pg_dump output is only expected to
> restore without errors into a server that's the same or newer version as
> pg_dump (regardless of the server version being dumped from).  If you try
> to restore into an older server version, you may get some errors, which we
> try to minimize the scope of when possible.  But it will never be possible
> to promise none at all.  I think the correct advice here is simply "don't
> use pg_restore -e -1 when trying to restore into an older server version".

Why can't I use it if pg_dump92/pg_restore92 have the same behavior as
pg_dump96/pg_restore96 except the SET block?
The patch does not give guarantee of a restoration, it just avoids
setting unsupported parameters for pg_restore the same way as pg_dump
does.
The other issues are for solving by the user who wants to restore to a
DB of older version.

> Taking a step back, there are any number of ways that you might get
> errors during a pg_restore into a DB that's not set up exactly as pg_dump
> expected.  Missing roles or tablespaces, for example.

It does not depends on a pg_dump/pg_restore version and can be soved
by using command line options:
--no-tablespaces --no-owner --no-privileges

> Again, the dump output is constructed so that you can survive those problems
> and bull ahead ... but not with "-e -1".

I think "-e -1" was invented specially for it --- stop restoring if
something is going wrong. Wasn't it?

> I don't see a very good reason why
> older-server-version shouldn't be considered the same type of problem.
>
> The patch as given seems rather broken anyway --- won't it change text
> output from pg_dump as well as on-line output from pg_restore?

My opinion --- no (and I wrote it in the initial letter): because it
is impossible to know what version of a database is used for that
plain text output. Users who use output to a plain text are able to
use sed (or something else) to delete unwanted rows.

> (That is,
> it looks to me like the SETs emitted by pg_dump to text format would
> depend on the source server version, which they absolutely should not.
> Either that, or the patch is overwriting pg_dump's idea of what the
> source server version is at the start of the output phase, which is
> likely to break all kinds of stuff when dumping from an older server.)

I agree, that's why I left current behavior as is for the plain text output.

> It's possible that we could alleviate this specific symptom by arranging
> for the BEGIN caused by "-1" to come out after the initial SETs, but
> I'm not sure how complicated that would be or whether it's worth the
> trouble.

It leads to change ERRORs to WARNINGs but current behavior discussed
above is left the same.
Why don't just avoid SET parameters when we know for sure they are not
supported by the server to which pg_restore is connected?

-- 
Best regards,
Vitaly Burovoy



Re: Detect supported SET parameters when pg_restore is run

From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The general policy has always been that pg_dump output is only expected to
>> restore without errors into a server that's the same or newer version as
>> pg_dump (regardless of the server version being dumped from).

> Why can't I use it if pg_dump92/pg_restore92 have the same behavior as
> pg_dump96/pg_restore96 except the SET block?

That's a pretty large "if", and not one I'm prepared to commit to.
Before you assert that it's true, you might want to reflect on the
size of the diff between 9.2 and 9.6 pg_dump (hint: it's over 20K
lines in -u format).

>> Either that, or the patch is overwriting pg_dump's idea of what the
>> source server version is at the start of the output phase, which is
>> likely to break all kinds of stuff when dumping from an older server.)

> I agree, that's why I left current behavior as is for the plain text output.

I'm not exactly convinced that you did.  There's only one copy of
Archive->remoteVersion, and you're overwriting it long before the
dump process is over.  That'd break anything that consulted it to
find the source server's version after RestoreArchive starts.

It might not be a bad thing to clearly distinguish source server version
from (expected?) target server version, but pg_dump doesn't do that
currently, and making it do so is probably not entirely trivial.
I think your patch confuses that distinction further, which
is not going to be helpful in the long run.

I could get behind a patch that split remoteVersion into sourceVersion
and targetVersion and made sure that all the existing references to
remoteVersion were updated to the correct one of those.  After that
we could do something like what you have here in a less shaky fashion.
As Robert noted, there'd probably still be a long way to go before it'd
really work to use a newer pg_dump with an older target version, but at
least we'd not be building on quicksand.
        regards, tom lane



Re: Detect supported SET parameters when pg_restore is run

From
Vitaly Burovoy
Date:
On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
>> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The general policy has always been that pg_dump output is only expected
>>> to
>>> restore without errors into a server that's the same or newer version as
>>> pg_dump (regardless of the server version being dumped from).
>
>> Why can't I use it if pg_dump92/pg_restore92 have the same behavior as
>> pg_dump96/pg_restore96 except the SET block?
>
> That's a pretty large "if", and not one I'm prepared to commit to.
> Before you assert that it's true, you might want to reflect on the
> size of the diff between 9.2 and 9.6 pg_dump (hint: it's over 20K
> lines in -u format).
>
>>> Either that, or the patch is overwriting pg_dump's idea of what the
>>> source server version is at the start of the output phase, which is
>>> likely to break all kinds of stuff when dumping from an older server.)
>
>> I agree, that's why I left current behavior as is for the plain text
>> output.
>
> I'm not exactly convinced that you did.  There's only one copy of
> Archive->remoteVersion, and you're overwriting it long before the
> dump process is over.

I'm sorry, I'm not so good at knowing the code base but I see that my
patch changes Archive->remoteVersion in the "RestoreArchive" which is
called after all schema is fetched to pg_dump's structs and just
before output is written, moreover, there is a comment that it is a
final stage (9.2 has the same block of code):   ...   /*    * And finally we can do the actual output.    *...    */
if(plainText)       RestoreArchive(fout);
 
   CloseArchive(fout);
   exit_nicely(0);
}

It does not seem that I'm "overwriting it long before the dump process
is over"... Also pg_dump -v proves that changing remoteVersion happens
after all "pg_dump: finding *", "pg_dump: reading *" and "pg_dump:
saving *".

> That'd break anything that consulted it to
> find the source server's version after RestoreArchive starts.

I'm sorry again, could you or anyone else point me what part of the
code use Archive->remoteVersion after schema is read?
I set up break point at the line   AHX->remoteVersion = 999999;
and ran pg_dump with plain text output to a file (via "-f" option).
When the line was reached I set up break points at all lines I could
find by a script:

egrep -n -r --include '*.c' 'remoteVersion [><]' src/bin/pg_dump/ |
awk -F: '{print "b "$1":"$2}'

(there were 217 of them) and continued debuging. All three fired break
points were expectedly in _doSetFixedOutputState (at the lines where
my patch introduced them) after which the program exited normally
without stopping.

Also I wonder that the process of "restoring" consults
Archive->remoteVersion because currently in the pg_restore to plain
text output remoteVersion is zero. It means restoring process would
output schema to the oldest supported version, but is not so.

> I could get behind a patch that split remoteVersion into sourceVersion
> and targetVersion and made sure that all the existing references to
> remoteVersion were updated to the correct one of those.  After that
> we could do something like what you have here in a less shaky fashion.
> As Robert noted, there'd probably still be a long way to go before it'd
> really work to use a newer pg_dump with an older target version, but at
> least we'd not be building on quicksand.

It means support of restoring from a newer version to an older one.
What's for? If new features are used it is impossible to restore
(port) them to an older version, if they are not, restoring process is
done (i guess) in 99% of cases.

-- 
Best regards,
Vitaly Burovoy



Re: Detect supported SET parameters when pg_restore is run

From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not exactly convinced that you did.  There's only one copy of
>> Archive->remoteVersion, and you're overwriting it long before the
>> dump process is over.

> It does not seem that I'm "overwriting it long before the dump process
> is over"...

There's a lot that happens during RestoreArchive.  Even if none of it
inspects remoteVersion today, I do not think that's a safe assumption to
make going forward.  The easiest counterexample is that this very bit of
code you want to add does so.  I really do not want to get into a design
that says "remoteVersion means the source server version until we reach
RestoreArchive, and the target version afterwards".  That way madness
lies.  If we're going to try altering the emitted SQL based on target
version, let's first create a separation between those concepts; otherwise
I will bet that we add more bugs than we remove.

(The other thing I'd want here is a --target-version option so that
you could get the same output alterations in pg_dump or pg_restore to
text.  Otherwise it's nigh undebuggable, and certainly much harder
to test than it needs to be.)
        regards, tom lane



Re: Detect supported SET parameters when pg_restore is run

From
Vitaly Burovoy
Date:
On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
>> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm not exactly convinced that you did.  There's only one copy of
>>> Archive->remoteVersion, and you're overwriting it long before the
>>> dump process is over.
>
>> It does not seem that I'm "overwriting it long before the dump process
>> is over"...
>
> There's a lot that happens during RestoreArchive.  Even if none of it
> inspects remoteVersion today, I do not think that's a safe assumption to
> make going forward.

And... Khm... Note that even _now_ AHX->remoteVersion is set to a
database version pg_restore connects to... So all the code has it
during restoring process...

> The easiest counterexample is that this very bit of
> code you want to add does so.

The only change I've done is set remoteVersion to the maximum allowed
when output is the plain text format.

> I really do not want to get into a design
> that says "remoteVersion means the source server version until we reach
> RestoreArchive, and the target version afterwards".  That way madness lies.

It is only if you think about "remoteVersion" as
"sourceServerVersion", but even now it is not so.

Moreover RestoreArchive is a delimter only for pg_dump and only when
output is a plain text.
For other modes of the pg_dump RestoreArchive is not called at all.

> If we're going to try altering the emitted SQL based on target version,
> let's first create a separation between those concepts;

I've just found there is _archiveHandle.archiveRemoteVersion. Is it a
parameter you were searched for?
The pg_restore code does not use it (just as remoteVersion), but it
can do so if it is necessary.

> otherwise I will bet that we add more bugs than we remove.
>
> (The other thing I'd want here is a --target-version option so that
> you could get the same output alterations in pg_dump or pg_restore to
> text.  Otherwise it's nigh undebuggable, and certainly much harder
> to test than it needs to be.)

I thought that way. I'm ready to introduce that parameter, but again,
I see now it will influence only SET parameters. Does it worth it?

-- 
Best regards,
Vitaly Burovoy



Re: Detect supported SET parameters when pg_restore is run

From
Vitaly Burovoy
Date:
On 9/27/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (The other thing I'd want here is a --target-version option so that
>> you could get the same output alterations in pg_dump or pg_restore to
>> text.  Otherwise it's nigh undebuggable, and certainly much harder
>> to test than it needs to be.)
>
> I thought that way. I'm ready to introduce that parameter, but again,
> I see now it will influence only SET parameters. Does it worth it?

The only reason I have not implemented it was attempt to avoid users
being confused who could think that result of pg_dump (we need it
there for the plain text output) or pg_restore can be converted for
target version to be restored without new features (but now it is
wrong).

-- 
Best regards,
Vitaly Burovoy



Re: Detect supported SET parameters when pg_restore is run

From
Pavel Stehule
Date:


2016-09-27 23:12 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not exactly convinced that you did.  There's only one copy of
>> Archive->remoteVersion, and you're overwriting it long before the
>> dump process is over.

> It does not seem that I'm "overwriting it long before the dump process
> is over"...

There's a lot that happens during RestoreArchive.  Even if none of it
inspects remoteVersion today, I do not think that's a safe assumption to
make going forward.  The easiest counterexample is that this very bit of
code you want to add does so.  I really do not want to get into a design
that says "remoteVersion means the source server version until we reach
RestoreArchive, and the target version afterwards".  That way madness
lies.  If we're going to try altering the emitted SQL based on target
version, let's first create a separation between those concepts; otherwise
I will bet that we add more bugs than we remove.

(The other thing I'd want here is a --target-version option so that
you could get the same output alterations in pg_dump or pg_restore to
text.  Otherwise it's nigh undebuggable, and certainly much harder
to test than it needs to be.)

This options likes like very good idea.

Regards

Pavel
 

                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Detect supported SET parameters when pg_restore is run

From
Peter Eisentraut
Date:
On 9/27/16 6:57 PM, Vitaly Burovoy wrote:
> On 9/27/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>> On 9/27/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> (The other thing I'd want here is a --target-version option so that
>>> you could get the same output alterations in pg_dump or pg_restore to
>>> text.  Otherwise it's nigh undebuggable, and certainly much harder
>>> to test than it needs to be.)
>>
>> I thought that way. I'm ready to introduce that parameter, but again,
>> I see now it will influence only SET parameters. Does it worth it?
> 
> The only reason I have not implemented it was attempt to avoid users
> being confused who could think that result of pg_dump (we need it
> there for the plain text output) or pg_restore can be converted for
> target version to be restored without new features (but now it is
> wrong).

I'm in favor of making this work.  But I also agree with the earlier
commenters that we shouldn't overload remoteVersion to mean sometimes
source and sometimes target version.  It would probably be enough for
now to leave remoteVersion to mean source version, introduce a new field
targetVersion, default that to remoteVersion in plain text format, and
set it to the actual remote version when restoring directly to a
database.  It will need a bit of work to tie it all together, but it
shouldn't be too difficult.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services