Thread: Re: [GENERAL] pg_dump behaves differently for different archive formats
Tom Lane wrote on Dec 16, 2013: > Albe Laurenz <laurenz.albe@wien.gv.at> writes: >> Restoring a "plain format" dump and a "custom format" dump of >> the same database can lead to different results: >> pg_dump organizes the SQL statements it creates in "TOC entries". >> If a custom format dump is restored with pg_restore, all >> SQL statements in a TOC entry will be executed as a single command >> and thus in a single transaction. > Yeah, this is a bug I think. pg_dump was designed around the idea > that the output would be executed as a simple script, and in a > number of places there's an expectation that one SQL statement > can fail without affecting following ones. So if pg_restore can't > provide that behavior it's not good. > > On the other hand, I'm not sure how much enthusiasm there'd be for > complex or fragile changes to fix this. A lot of people invariably > run restores in single-transaction mode and don't really care about > fault-tolerant restores. Also, it's easy enough to dodge the problem > if you must: just pipe the output into psql rather than > direct-to-database. > > So to me the question is can we fix this without doing something like > duplicating psql's lexer? If we have to parse out the statements > contained in each text blob, it's probably going to be too painful. > Some cautionary history about this sort of thing can be read at > http://www.postgresql.org/message-id/flat/18006.1325700782@sss.pgh.pa.us I thought that changing the dump format for this would be too much trouble, so I came up with the attached. It assumes that custom- or tar-format archives are written by pg_dump and cannot contain arbitrary SQL statements, which allows me to get away with very simple parsing. If this is not shot down immediately on account of fragility, I'd add it to the next commitfest page. The problem has been a pain point for my co-workers in the past; using single-transaction mode doesn't work for us, since we have custom objects in our template database that cause expected errors when a dump is restored. Yours, Laurenz Albe
Attachment
Albe Laurenz <laurenz.albe@wien.gv.at> writes: > I thought that changing the dump format for this would be too > much trouble, so I came up with the attached. > It assumes that custom- or tar-format archives are written by pg_dump > and cannot contain arbitrary SQL statements, which allows me to get away > with very simple parsing. I don't think this can be trusted in the least. To begin with, where'd you get the idea dumps cannot contain "arbitrary SQL statements"? CREATE RULE at least could contain some pretty weird stuff. This thing doesn't look like it's even bothering to count nested parentheses, so it will certainly fail on a multi-statement rule. I believe you're also at risk of SQL injection attacks from failing to account for multibyte characters in non-ASCII-safe client encodings. While those specific problems could no doubt be fixed, I object to the entire concept of assuming that what pg_dump emits is always going to be trivially parsable. If we are to go down this path, I think we have to replicate what psql is doing to identify statement boundaries ... and as I mentioned upthread, that's rather a lot of code :-( regards, tom lane
Re: Re: [GENERAL] pg_dump behaves differently for different archive formats
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Albe Laurenz <laurenz.albe@wien.gv.at> writes: > > I thought that changing the dump format for this would be too > > much trouble, so I came up with the attached. If we're going to change this, it seems to me that the only option would be to change the dump format... Just off-the-cuff, I'm wondering if we could actually not change the real 'format' but simply promote each ACL entry (and similar cases..) to top-level objects and declare that TOC entries should be single statements. > While those specific problems could no doubt be fixed, I object to the > entire concept of assuming that what pg_dump emits is always going to be > trivially parsable. If we are to go down this path, I think we have to > replicate what psql is doing to identify statement boundaries ... and > as I mentioned upthread, that's rather a lot of code :-( Agreed. If we want this, we should handle it on the pg_dump side, not try and work it out on the pg_restore side. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > If we're going to change this, it seems to me that the only option would > be to change the dump format... Just off-the-cuff, I'm wondering if we > could actually not change the real 'format' but simply promote each ACL > entry (and similar cases..) to top-level objects and declare that TOC > entries should be single statements. I don't think we want even more TOC entries, but it would not be unreasonable to insist that the statement(s) within a TOC entry be subdivided somehow. Essentially the payload of a TOC entry becomes a list of strings rather than just one string. That would mean that the problem could not be fixed for existing archive files; but that seems OK, given the rather small number of complaints so far. If we had something like that, I'd be strongly inclined to get rid of the existing convention whereby comments and ACL commands are separate TOC entries, and make them part of the parent object's TOC entry (which'd mean we'd want to label the sub-strings so we can tell whether they are main object, comment, or ACL). The fewer TOC entries we can have, the better; there is no reason why comments/ACLs should be independently sortable. regards, tom lane
On Mon, Jul 28, 2014 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> If we're going to change this, it seems to me that the only option would >> be to change the dump format... Just off-the-cuff, I'm wondering if we >> could actually not change the real 'format' but simply promote each ACL >> entry (and similar cases..) to top-level objects and declare that TOC >> entries should be single statements. > > I don't think we want even more TOC entries, but it would not be > unreasonable to insist that the statement(s) within a TOC entry be > subdivided somehow. Essentially the payload of a TOC entry becomes > a list of strings rather than just one string. > > That would mean that the problem could not be fixed for existing archive > files; but that seems OK, given the rather small number of complaints > so far. > > If we had something like that, I'd be strongly inclined to get rid of > the existing convention whereby comments and ACL commands are separate > TOC entries, and make them part of the parent object's TOC entry (which'd > mean we'd want to label the sub-strings so we can tell whether they are > main object, comment, or ACL). The fewer TOC entries we can have, the > better; there is no reason why comments/ACLs should be independently > sortable. Maybe, but I think people will still want an option to skip restoring them altogether (at least for ACLs). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jul 28, 2014 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If we had something like that, I'd be strongly inclined to get rid of >> the existing convention whereby comments and ACL commands are separate >> TOC entries, and make them part of the parent object's TOC entry (which'd >> mean we'd want to label the sub-strings so we can tell whether they are >> main object, comment, or ACL). The fewer TOC entries we can have, the >> better; there is no reason why comments/ACLs should be independently >> sortable. > Maybe, but I think people will still want an option to skip restoring > them altogether (at least for ACLs). Sure; we already have such an option, and I'm not suggesting removing it. The implementation would change though: it would have to look at the individual command labels to see which part(s) of a TOC entry to print out. regards, tom lane