Thread: pg_restore COPY error handling
* Stephen Frost (sfrost@snowman.net) wrote: > Needs to be changed to handle whitespace in front of the actual 'COPY', > unless someone else has a better idea. This should be reasonably > trivial to do though... If you'd like me to make that change and send > in a new patch, just let me know. Fixed, using isspace(). Also added an output message to make it a bit clearer when a failed COPY has been detected, etc. Updated patch attached. Thanks, Stephen
Attachment
This needs a little style cleanup but I can do that. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Stephen Frost wrote: > * Stephen Frost (sfrost@snowman.net) wrote: > > Needs to be changed to handle whitespace in front of the actual 'COPY', > > unless someone else has a better idea. This should be reasonably > > trivial to do though... If you'd like me to make that change and send > > in a new patch, just let me know. > > Fixed, using isspace(). Also added an output message to make it a bit > clearer when a failed COPY has been detected, etc. > > Updated patch attached. > > Thanks, > > Stephen [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce, et al, * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > This needs a little style cleanup but I can do that. Thanks! > Your patch has been added to the PostgreSQL unapplied patches list at: > > http://momjian.postgresql.org/cgi-bin/pgpatches > > It will be applied as soon as one of the PostgreSQL committers reviews > and approves it. Great! It'd be really nice to have this fix in 8.1.3... :) Thanks again, Stephen > --------------------------------------------------------------------------- > Stephen Frost wrote: > > * Stephen Frost (sfrost@snowman.net) wrote: > > > Needs to be changed to handle whitespace in front of the actual 'COPY', > > > unless someone else has a better idea. This should be reasonably > > > trivial to do though... If you'd like me to make that change and send > > > in a new patch, just let me know. > > > > Fixed, using isspace(). Also added an output message to make it a bit > > clearer when a failed COPY has been detected, etc. > > > > Updated patch attached. > > > > Thanks, > > > > Stephen > > [ Attachment, skipping... ] > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachment
Stephen Frost wrote: -- Start of PGP signed section. > Bruce, et al, > > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > This needs a little style cleanup but I can do that. > > Thanks! > > > Your patch has been added to the PostgreSQL unapplied patches list at: > > > > http://momjian.postgresql.org/cgi-bin/pgpatches > > > > It will be applied as soon as one of the PostgreSQL committers reviews > > and approves it. > > Great! It'd be really nice to have this fix in 8.1.3... :) No, it will not be in 8.1.3. It is a new feature. --------------------------------------------------------------------------- > > Thanks again, > > Stephen > > > --------------------------------------------------------------------------- > > Stephen Frost wrote: > > > * Stephen Frost (sfrost@snowman.net) wrote: > > > > Needs to be changed to handle whitespace in front of the actual 'COPY', > > > > unless someone else has a better idea. This should be reasonably > > > > trivial to do though... If you'd like me to make that change and send > > > > in a new patch, just let me know. > > > > > > Fixed, using isspace(). Also added an output message to make it a bit > > > clearer when a failed COPY has been detected, etc. > > > > > > Updated patch attached. > > > > > > Thanks, > > > > > > Stephen > > > > [ Attachment, skipping... ] > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Have you searched our list archives? > > > > > > http://archives.postgresql.org > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 359-1001 > > + If your life is a hard drive, | 13 Roberts Road > > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 -- End of PGP section, PGP failed! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > Stephen Frost wrote: > > Great! It'd be really nice to have this fix in 8.1.3... :) > > No, it will not be in 8.1.3. It is a new feature. I'd really appriciate it if you could review my post to pgsql-bugs, Message-ID: <20060120144957.GG6026@ns.snowman.net>. If this patch is considered a new feature then I'd like to either revisit that decision or be given some direction on what a bug-fix patch for the grows-to-3G bug would look like. Thanks, Stephen
Attachment
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > Stephen Frost wrote: > > > Great! It'd be really nice to have this fix in 8.1.3... :) > > > > No, it will not be in 8.1.3. It is a new feature. > > I'd really appriciate it if you could review my post to pgsql-bugs, > Message-ID: <20060120144957.GG6026@ns.snowman.net>. If this patch is > considered a new feature then I'd like to either revisit that decision > or be given some direction on what a bug-fix patch for the grows-to-3G > bug would look like. Oh, so when it skips the \., it reads the rest of the file and bombs out? I thought it just continued fine. How is that failure happening? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Wed, 2006-02-01 at 17:20 -0500, Stephen Frost wrote: > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > Stephen Frost wrote: > > > Great! It'd be really nice to have this fix in 8.1.3... :) > > > > No, it will not be in 8.1.3. It is a new feature. > > I'd really appriciate it if you could review my post to pgsql-bugs, > Message-ID: <20060120144957.GG6026@ns.snowman.net>. If this patch is > considered a new feature then I'd like to either revisit that decision > or be given some direction on what a bug-fix patch for the grows-to-3G > bug would look like. > Stephen, this is a hopeless way of giving a reference. Many users don't keep list emails. If you want to refer to a previous post you should give a reference to the web archives. I assume you are referring to this post: http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php cheers andrew
Andrew Dunstan wrote: > On Wed, 2006-02-01 at 17:20 -0500, Stephen Frost wrote: > > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > > Stephen Frost wrote: > > > > Great! It'd be really nice to have this fix in 8.1.3... :) > > > > > > No, it will not be in 8.1.3. It is a new feature. > > > > I'd really appriciate it if you could review my post to pgsql-bugs, > > Message-ID: <20060120144957.GG6026@ns.snowman.net>. If this patch is > > considered a new feature then I'd like to either revisit that decision > > or be given some direction on what a bug-fix patch for the grows-to-3G > > bug would look like. > > > > > Stephen, > > this is a hopeless way of giving a reference. Many users don't keep list > emails. If you want to refer to a previous post you should give a > reference to the web archives. > > I assume you are referring to this post: > http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php OK, that helps. The solution is to "not do that", meaning install postgis before the restore or something. Anyway, adding the patch seems like too large a risk for a minor release, especially since you are the first person to complain about it that I remember. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Andrew Dunstan wrote: >> I assume you are referring to this post: >> http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php > OK, that helps. The solution is to "not do that", meaning install > postgis before the restore or something. Anyway, adding the patch seems > like too large a risk for a minor release, especially since you are the > first person to complain about it that I remember. I don't think you should see this as something specific to PostGIS. If I interpret Stephen's report properly, any sort of failure during COPY IN would lead to undesirable behavior in pg_restore. This is not super surprising because the original design approach for pg_restore was "bomb out on any sort of difficulty whatsoever". That was justly complained about, and now it will try to keep going after SQL-command errors ... but it sounds like the COPY-data processing part didn't get fixed properly. I take no position on whether Stephen's patch is any good --- I haven't looked at it --- but if he's correct about the problem then I agree it's a bug fix. Before deciding whether it deserves to be back-patched, we at least ought to look closely enough to characterize the situations in which pg_restore will fail. regards, tom lane
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > Stephen Frost wrote: > -- Start of PGP signed section. > > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > > Stephen Frost wrote: > > > > Great! It'd be really nice to have this fix in 8.1.3... :) > > > > > > No, it will not be in 8.1.3. It is a new feature. > > > > I'd really appriciate it if you could review my post to pgsql-bugs, > > Message-ID: <20060120144957.GG6026@ns.snowman.net>. If this patch is > > considered a new feature then I'd like to either revisit that decision > > or be given some direction on what a bug-fix patch for the grows-to-3G > > bug would look like. > > Oh, so when it skips the \., it reads the rest of the file and bombs > out? I thought it just continued fine. How is that failure happening? It doesn't actually bomb out either. On the system I was working with what happened was that it hung after reading in the COPY data. It looked like it got stuck somehow while trying to parse the data as an SQL command. Thanks, Stephen
Attachment
* Andrew Dunstan (andrew@dunslane.net) wrote: > this is a hopeless way of giving a reference. Many users don't keep list > emails. If you want to refer to a previous post you should give a > reference to the web archives. Sorry, I'm actually pretty used to using Message IDs for references (we do it alot on the Debian lists). I'll try to be better in the future though, honestly, I'm not really a big fan of the Postgres mailing list archive setup. :/ > I assume you are referring to this post: > http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php Yup, that's the one, thanks. Stephen
Attachment
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > Andrew Dunstan wrote: > > this is a hopeless way of giving a reference. Many users don't keep list > > emails. If you want to refer to a previous post you should give a > > reference to the web archives. > > > > I assume you are referring to this post: > > http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php > > OK, that helps. The solution is to "not do that", meaning install > postgis before the restore or something. Anyway, adding the patch seems > like too large a risk for a minor release, especially since you are the > first person to complain about it that I remember. It's not PostGIS specific, it's any case where a COPY command fails for any reason. I'll be following up with the Debian maintainer regarding being able to install things before doing the restore/pg_upgradecluster (at the moment there's no hook from pg_upgradecluster between creating the database and trying to load the data). I understand that's not really a PostgreSQL issue but I do think pg_restore should be able to handle a COPY command failing sanely... :/ I didn't think the patch was terribly large or complicated... The one thing I don't like about it is that it doesn't seem to be very easy to differentiate between a COPY command failing and some other SQL command failing. If there's a better way to detect this than to just check for '<whitespace>COPY', I'd love to hear it. :) I'd also be happy to make any changes necessary to have the patch accepted, of course... Thanks, Stephen
Attachment
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Andrew Dunstan wrote: > >> I assume you are referring to this post: > >> http://archives.postgresql.org/pgsql-bugs/2006-01/msg00188.php > > > OK, that helps. The solution is to "not do that", meaning install > > postgis before the restore or something. Anyway, adding the patch seems > > like too large a risk for a minor release, especially since you are the > > first person to complain about it that I remember. > > I don't think you should see this as something specific to PostGIS. > If I interpret Stephen's report properly, any sort of failure during > COPY IN would lead to undesirable behavior in pg_restore. I'm reasonably confident this is exactly the case. > This is not super surprising because the original design approach for > pg_restore was "bomb out on any sort of difficulty whatsoever". That > was justly complained about, and now it will try to keep going after > SQL-command errors ... but it sounds like the COPY-data processing > part didn't get fixed properly. Exactly so. Possibly because it appears to be non-trivial to detect when it was a *COPY* command which failed (to differentiate it from some other command). What I did was check for '<whitespace>COPY'. I'd be happy to change that if anyone has a better suggestion though. There might be a way to pass that information down into the pg_archive_db but I don't think it's available at that level currently and I didn't see any way to get the answer from libpq about what *kind* of command failed. > I take no position on whether Stephen's patch is any good --- I haven't > looked at it --- but if he's correct about the problem then I agree it's > a bug fix. Before deciding whether it deserves to be back-patched, we > at least ought to look closely enough to characterize the situations in > which pg_restore will fail. I have some level of confidence that this is the only case along these lines because it's the only case where pg_restore isn't dealing with SQL commands but with a dataset which has to be handled in a special way. pg_restore checks if the command sent was a successful COPY command and then treats everything after it in a special way; it doesn't do that for any other commands... Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> This is not super surprising because the original design approach for >> pg_restore was "bomb out on any sort of difficulty whatsoever". That >> was justly complained about, and now it will try to keep going after >> SQL-command errors ... but it sounds like the COPY-data processing >> part didn't get fixed properly. > Exactly so. Possibly because it appears to be non-trivial to detect > when it was a *COPY* command which failed (to differentiate it from some > other command). What I did was check for '<whitespace>COPY'. I'd be > happy to change that if anyone has a better suggestion though. ISTM you should be depending on the archive structure: at some level, at least, pg_restore knows darn well whether it is dealing with table data or SQL commands. Also, it might be possible to depend on whether libpq has entered the "copy in" state. I'm not sure this works very well, because if there's an error in the COPY command itself (not in the following data) then we probably don't ever get into "copy in" state. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> This is not super surprising because the original design approach for > >> pg_restore was "bomb out on any sort of difficulty whatsoever". That > >> was justly complained about, and now it will try to keep going after > >> SQL-command errors ... but it sounds like the COPY-data processing > >> part didn't get fixed properly. > > > Exactly so. Possibly because it appears to be non-trivial to detect > > when it was a *COPY* command which failed (to differentiate it from some > > other command). What I did was check for '<whitespace>COPY'. I'd be > > happy to change that if anyone has a better suggestion though. > > ISTM you should be depending on the archive structure: at some level, > at least, pg_restore knows darn well whether it is dealing with table > data or SQL commands. Right, it does higher up in the code but I don't believe that knowledge is passed down to the level it needs to be because it wasn't necessary before and the module API didn't include a way to pass that information into a given module. I expect this is why the code currently depends on libpq to tell it when it has entered a 'copy in' state. I'll look into this though and see just how invasive it would be to get the information to that level. > Also, it might be possible to depend on whether libpq has entered the > "copy in" state. I'm not sure this works very well, because if there's > an error in the COPY command itself (not in the following data) then we > probably don't ever get into "copy in" state. This is what the code currently depends on, and libpq (properly, imv) doesn't enter the 'copy in' state when the COPY command itself fails. That's exactly how we end up in this situation where pg_restore thinks it's looking at a SQL command (because libpq said it wasn't in a COPY state) when it's really getting COPY data from the higher level in the code. Thanks, Stephen
Attachment
Tom Lane said: > > Also, it might be possible to depend on whether libpq has entered the > "copy in" state. I'm not sure this works very well, because if there's > an error in the COPY command itself (not in the following data) then we > probably don't ever get into "copy in" state. > Could we not refrain from sending data if we detect that we are not in copy in state? cheers andrew
* Andrew Dunstan (andrew@dunslane.net) wrote: > Tom Lane said: > > Also, it might be possible to depend on whether libpq has entered the > > "copy in" state. I'm not sure this works very well, because if there's > > an error in the COPY command itself (not in the following data) then we > > probably don't ever get into "copy in" state. > > Could we not refrain from sending data if we detect that we are not in copy > in state? You have to know at that level if it's data you're getting or an SQL statement though. SQL statements can fail and things move along happily. Essentially what my patch does is detect when a COPY command fails and then circular-file the data that comes in after it from the upper-level. When I started to look at the way pg_restore worked I had initially thought I could make the higher-level code realize that the COPY command failed through a return-code and have it not pass the data down but the return codes didn't appear to be very well defined to handle that kind of communication... (As in, it looked like trying to make that happen would break other things), I can look at it again though. Thanks, Stephen
Attachment
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > ISTM you should be depending on the archive structure: at some level, > at least, pg_restore knows darn well whether it is dealing with table > data or SQL commands. Alright, to do this, ExecuteSqlCommandBuf would need to be modified to return an error-code other than 1 for when a command fails. Thankfully, only pg_backup_archiver.c uses ExecuteSqlCommandBuf, in ahwrite. ahwrite would then need to return a value when there's an error (at the moment it's defined to return int but everything appears to ignore its return value). We'd then need ahprintf to return a negative value when it fails (at the moment it returns 'cnt' which appears to be the size of what was written, except that nothing looks at the return value of ahprintf either). Once we have ahprintf returning a negative value when it fails, we can modify pg_backup_archiver.c around line 332 to check if the ahprintf failed for a COPY statement and if so to skip the: (*AH->PrintTocDataPtr) (AH, te, ropt); // on line 335. I'd be happy to work this up, and I think it would work, but it seems kind of ugly since then we'd have ahwrite and ahprintf returning error codes which in 99% of the cases wouldn't be checked which doesn't seem terribly clean. :/ Thanks, Stephen
Attachment
* Stephen Frost (sfrost@snowman.net) wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > ISTM you should be depending on the archive structure: at some level, > > at least, pg_restore knows darn well whether it is dealing with table > > data or SQL commands. > [...] > I'd be happy to work this up, and I think it would work, but it seems > kind of ugly since then we'd have ahwrite and ahprintf returning error > codes which in 99% of the cases wouldn't be checked which doesn't seem > terribly clean. :/ Looking at this again, I'm not 100% sure it'd work quite that cleanly. I'm not sure you can just skip that PrintTocDataPtr call, depending on the how the input is coming in... There might be a way to modify _PrintData (in pg_backup_custom.c, and the others, which is what is the function behind PrintTocDataPtr) to somehow check an AH variable which essentially says "the data command failed, just ignore the input", and we'd need to set the AH variable somewhere else, perhaps using the return value setup I described before... All of this is sounding rather invasive though. I have to admit that I'm not exactly a big fan of the pg_dump/pg_restore modular system. :/ Thanks, Stephen
Attachment
Stephen Frost wrote: -- Start of PGP signed section. > * Stephen Frost (sfrost@snowman.net) wrote: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > > ISTM you should be depending on the archive structure: at some level, > > > at least, pg_restore knows darn well whether it is dealing with table > > > data or SQL commands. > > > [...] > > I'd be happy to work this up, and I think it would work, but it seems > > kind of ugly since then we'd have ahwrite and ahprintf returning error > > codes which in 99% of the cases wouldn't be checked which doesn't seem > > terribly clean. :/ > > Looking at this again, I'm not 100% sure it'd work quite that cleanly. > I'm not sure you can just skip that PrintTocDataPtr call, depending on > the how the input is coming in... There might be a way to modify > _PrintData (in pg_backup_custom.c, and the others, which is what is > the function behind PrintTocDataPtr) to somehow check an AH variable > which essentially says "the data command failed, just ignore the input", > and we'd need to set the AH variable somewhere else, perhaps using the > return value setup I described before... Whatever we do is going to be cleaner than looking for "<space>*COPY". > All of this is sounding rather invasive though. I have to admit that > I'm not exactly a big fan of the pg_dump/pg_restore modular system. :/ Hence TODO has: o Remove unnecessary function pointer abstractions in pg_dump source code -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Stephen Frost <sfrost@snowman.net> writes: > I'd be happy to work this up, and I think it would work, but it seems > kind of ugly since then we'd have ahwrite and ahprintf returning error > codes which in 99% of the cases wouldn't be checked which doesn't seem > terribly clean. :/ I agree. I wonder if it wouldn't be cleaner to pass the information in the other direction, ie, send a boolean down to PrintTocData saying "you are sending SQL commands" or "you are sending COPY data". Then, instead of depending only on the libpq state to decide what to do in ExecuteSqlCommandBuf, we could cross-check: if we're sending SQL data and the libpq state is wrong, just discard the line. The immediate problem you saw is fairly clear at this point: ExecuteSqlCommandBuf and its subroutines attempt to parse the data well enough to determine command boundaries (if SQL commands) or line boundaries (if COPY data). If they are misinformed about what they are processing then the parsing gets totally confused --- it's not hard to imagine the code thinking the entire COPY dump is an incomplete SQL command. So driving this from an upper-level indication of what we are currently sending, rather than libpq status, ought to be more robust. BTW, we'd not need to mess with a ton of routine APIs to make this happen --- just add a flag in ArchiveHandle. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I agree. I wonder if it wouldn't be cleaner to pass the information in > the other direction, ie, send a boolean down to PrintTocData saying "you > are sending SQL commands" or "you are sending COPY data". Then, instead > of depending only on the libpq state to decide what to do in > ExecuteSqlCommandBuf, we could cross-check: if we're sending SQL data > and the libpq state is wrong, just discard the line. I believe the attached patch does this now. Under my test case it correctly handled things. I'm certainly happier with it this way and apologize for not realizing this better approach sooner. Please comment. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > I believe the attached patch does this now. Under my test case it > correctly handled things. I'm certainly happier with it this way and > apologize for not realizing this better approach sooner. Please > comment. Applied (with trivial stylistic changes) as far back as 8.0, which was the first release that would try to continue after an error. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I believe the attached patch does this now. Under my test case it > > correctly handled things. I'm certainly happier with it this way and > > apologize for not realizing this better approach sooner. Please > > comment. > > Applied (with trivial stylistic changes) as far back as 8.0, which > was the first release that would try to continue after an error. Great, thanks! Now if I can convince you to look at the Kerberos patch I posted on -hackers... ;) Thanks again, Stephen