Thread: Error in pg_restore (could not close data file: Success)

Error in pg_restore (could not close data file: Success)

From
Andrii Tkach
Date:
Hello,
After restoring 'directory' backup with pg_restore (PostgreSQL 10.6) I've got a message:
pg_restore: [directory archiver] could not close data file: Success
pg_restore: [parallel archiver] a worker process died unexpectedly
In this thread: https://www.postgresql.org/message-id/CAFcNs%2Bos5ExGvXMBrBBzzuJJamoHt5-zdJdxX39nkVG0KUxwsw%40mail.gmail.com
there is only one answer. I'm interesting, is it normal behaivor of pg_restore and backup restored normaly or not ?

Regards, Andrii

Re: Error in pg_restore (could not close data file: Success)

From
Kyotaro Horiguchi
Date:
At Tue, 20 Oct 2020 13:48:25 +0300, Andrii Tkach <and7ua@gmail.com> wrote in 
> Hello,
> After restoring 'directory' backup with pg_restore (PostgreSQL 10.6) I've
> got a message:
> pg_restore: [directory archiver] could not close data file: Success
> pg_restore: [parallel archiver] a worker process died unexpectedly
> In this thread:
> https://www.postgresql.org/message-id/CAFcNs%2Bos5ExGvXMBrBBzzuJJamoHt5-zdJdxX39nkVG0KUxwsw%40mail.gmail.com
> there is only one answer. I'm interesting, is it normal behaivor of
> pg_restore and backup restored normaly or not ?

That would be a broken compressed file, maybe caused by disk full.

This reminded me of a thread. The issue above seems to be the same
with this:


https://www.postgresql.org/message-id/flat/20200416.181945.759179589924840062.horikyota.ntt%40gmail.com#ed85c5fda64873c45811be4e3027a2ea

Me> Hmm. Sounds reasonable.  I'm going to do that.  Thanks!

But somehow that haven't happened, I'll come up with a new version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error in pg_restore (could not close data file: Success)

From
Kyotaro Horiguchi
Date:
At Wed, 21 Oct 2020 13:45:15 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
>
https://www.postgresql.org/message-id/flat/20200416.181945.759179589924840062.horikyota.ntt%40gmail.com#ed85c5fda64873c45811be4e3027a2ea
> 
> Me> Hmm. Sounds reasonable.  I'm going to do that.  Thanks!
> 
> But somehow that haven't happened, I'll come up with a new version.

pg_restore shows the following error instead of "Success" for broken
compressed file.

pg_restore: error: could not close data file "d/3149.dat": zlib error: error reading or writing compressed file


0001:

cfclose() calls fatal() instead of returning the result to the callers
on error, which isobviously okay for all existing callers that are
handling errors from the function. Other callers ignored the returned
value but we should fatal() on error of the function.

At least for me, gzerror doesn't return a message (specifically,
returns NULL) after gzclose failure so currently cfclose shows its own
messages for erros of gzclose().  Am I missing something?

0002:

cfread has the same code with get_cfp_error() and cfgetc uses
sterror() after gzgetc(). It would be suitable for a separate patch,
but 0002 fixes those bugs.  I changed _EndBlob() to show the cause of
an error.

Did not do in this patch:

We could do further small refactoring to remove temporary variables in
pg_backup_directory.c for _StartData(), InitArchiveFmt_Directory,
_LoadBlobs(), _StartBlobs() and _CloseArchive(), but I left them as is
for the ease of back-patching.

Now that we have the file name in the context variable so we could
show the file name in all error messages, but that change was large
and there's a part where that change is a bit more complex so I didn't
do that.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 15190e795dbc4a6245d96d37ee561624769ab7ae Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Wed, 21 Oct 2020 11:50:02 +0900
Subject: [PATCH 1/2] Fix error handling for file close of pg_dump

Errors from cfclose is handled as if it were fclose, but the function
returned errors from gzclose and the callers show incorrect error
messages like "could not close: Success". Fix that behavior by moving
the error handling code into cfclose.
---
 src/bin/pg_dump/compress_io.c         | 38 +++++++++++++++++++++++----
 src/bin/pg_dump/compress_io.h         |  2 +-
 src/bin/pg_dump/pg_backup_directory.c | 37 +++++++++++++++++---------
 3 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 1417401086..28d45fd299 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -645,31 +645,59 @@ cfgets(cfp *fp, char *buf, int len)
         return fgets(buf, len, fp->uncompressedfp);
 }
 
-int
-cfclose(cfp *fp)
+/*
+ * cfclose close the stream
+ *
+ * Returns if successfully closed the cfp.  End the process with fatal() on
+ * error. filedesc is a label that identifies the file type and filename is the
+ * name of the file, both of which are used for error reporting.
+ */
+void
+cfclose(cfp *fp, const char *filedesc, const char *filename)
 {
     int            result;
 
     if (fp == NULL)
     {
         errno = EBADF;
-        return EOF;
+
+        fatal("could not close %s \"%s\": %m", filedesc, filename);
     }
 #ifdef HAVE_LIBZ
     if (fp->compressedfp)
     {
         result = gzclose(fp->compressedfp);
         fp->compressedfp = NULL;
+
+        switch (result)
+        {
+            case Z_OK:
+                break;
+                
+            case Z_STREAM_ERROR:
+                fatal("could not close %s \"%s\": zlib error: invalid parameter",
+                      filedesc, filename);
+            case Z_ERRNO:
+                fatal("could not close %s \"%s\": %m", filedesc, filename);
+            case Z_BUF_ERROR:
+                fatal("could not close %s \"%s\": zlib error: error reading or writing compressed file",
+                      filedesc, filename);
+            default:
+                fatal("could not close %s \"%s\": zlib error: unknown gzclose error: %d",
+                      filedesc, filename, result);
+        }
     }
     else
 #endif
     {
         result = fclose(fp->uncompressedfp);
         fp->uncompressedfp = NULL;
+
+        if (result != 0)
+            fatal("could not close %s \"%s\": %m", filedesc, filename);
     }
+
     free_keep_errno(fp);
-
-    return result;
 }
 
 int
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index d2e6e1b854..97986636c7 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -63,7 +63,7 @@ extern int    cfread(void *ptr, int size, cfp *fp);
 extern int    cfwrite(const void *ptr, int size, cfp *fp);
 extern int    cfgetc(cfp *fp);
 extern char *cfgets(cfp *fp, char *buf, int len);
-extern int    cfclose(cfp *fp);
+extern void    cfclose(cfp *fp, const char *filedesc, const char *filename);
 extern int    cfeof(cfp *fp);
 extern const char *get_cfp_error(cfp *fp);
 
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 48fa7cb1a3..4b1cf4a5a3 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -51,8 +51,13 @@ typedef struct
     char       *directory;
 
     cfp           *dataFH;            /* currently open data file */
+    char        dataFName[MAXPGPATH]; /* file name for dataFH, for error
+                                       * message use only */
 
     cfp           *blobsTocFH;        /* file handle for blobs.toc */
+    char        blobsTocFName[MAXPGPATH];    /* file name for blobs.toc, for
+                                             * error message use only */
+
     ParallelState *pstate;        /* for parallel backup / restore */
 } lclContext;
 
@@ -207,6 +212,7 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
             fatal("could not open input file \"%s\": %m", fname);
 
         ctx->dataFH = tocFH;
+        strlcpy(ctx->dataFName, fname, MAXPGPATH);
 
         /*
          * The TOC of a directory format dump shares the format code of the
@@ -218,9 +224,9 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
         ReadToc(AH);
 
         /* Nothing else in the file, so close it again... */
-        if (cfclose(tocFH) != 0)
-            fatal("could not close TOC file: %m");
+        cfclose(tocFH, "TOC file", fname);
         ctx->dataFH = NULL;
+        ctx->dataFName[0] = 0;
     }
 }
 
@@ -369,9 +375,11 @@ _EndData(ArchiveHandle *AH, TocEntry *te)
     lclContext *ctx = (lclContext *) AH->formatData;
 
     /* Close the file */
-    cfclose(ctx->dataFH);
+    Assert(AH->mode == archModeWrite);
+    cfclose(ctx->dataFH, "output file",    ctx->dataFName);
 
     ctx->dataFH = NULL;
+    ctx->dataFName[0] = 0;
 }
 
 /*
@@ -402,8 +410,8 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
     }
 
     free(buf);
-    if (cfclose(cfp) != 0)
-        fatal("could not close data file \"%s\": %m", filename);
+
+    cfclose(cfp, "data file", filename);
 }
 
 /*
@@ -466,11 +474,10 @@ _LoadBlobs(ArchiveHandle *AH)
         fatal("error reading large object TOC file \"%s\"",
               fname);
 
-    if (cfclose(ctx->blobsTocFH) != 0)
-        fatal("could not close large object TOC file \"%s\": %m",
-              fname);
+    cfclose(ctx->blobsTocFH, "large object TOC file", fname);
 
     ctx->blobsTocFH = NULL;
+    ctx->blobsTocFName[0] = 0;
 
     EndRestoreBlobs(AH);
 }
@@ -584,6 +591,7 @@ _CloseArchive(ArchiveHandle *AH)
         if (tocFH == NULL)
             fatal("could not open output file \"%s\": %m", fname);
         ctx->dataFH = tocFH;
+        strlcpy(ctx->dataFName, fname, MAXPGPATH);
 
         /*
          * Write 'tar' in the format field of the toc.dat file. The directory
@@ -594,8 +602,9 @@ _CloseArchive(ArchiveHandle *AH)
         WriteHead(AH);
         AH->format = archDirectory;
         WriteToc(AH);
-        if (cfclose(tocFH) != 0)
-            fatal("could not close TOC file: %m");
+        cfclose(tocFH, "TOC file", fname);
+        ctx->dataFH = NULL;
+        ctx->dataFName[0] = 0;
         WriteDataChunks(AH, ctx->pstate);
 
         ParallelBackupEnd(AH, ctx->pstate);
@@ -646,6 +655,8 @@ _StartBlobs(ArchiveHandle *AH, TocEntry *te)
     ctx->blobsTocFH = cfopen_write(fname, "ab", 0);
     if (ctx->blobsTocFH == NULL)
         fatal("could not open output file \"%s\": %m", fname);
+
+    strlcpy(ctx->blobsTocFName, fname, MAXPGPATH);
 }
 
 /*
@@ -680,8 +691,9 @@ _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
     int            len;
 
     /* Close the BLOB data file itself */
-    cfclose(ctx->dataFH);
+    cfclose(ctx->dataFH, "BLOB data file", ctx->dataFName);
     ctx->dataFH = NULL;
+    ctx->dataFName[0] = 0;
 
     /* register the blob in blobs.toc */
     len = snprintf(buf, sizeof(buf), "%u blob_%u.dat\n", oid, oid);
@@ -699,8 +711,9 @@ _EndBlobs(ArchiveHandle *AH, TocEntry *te)
 {
     lclContext *ctx = (lclContext *) AH->formatData;
 
-    cfclose(ctx->blobsTocFH);
+    cfclose(ctx->blobsTocFH, "large object TOC file", ctx->blobsTocFName);
     ctx->blobsTocFH = NULL;
+    ctx->blobsTocFName[0] = 0;
 }
 
 /*
-- 
2.18.4

From 3de87e733ca7041bad06f8e42b8917d040d61355 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Wed, 21 Oct 2020 14:10:04 +0900
Subject: [PATCH 2/2] Use correct function to get error message in gzread and
 gzgetc.

cfread() had the equivalent code with get_crp_error(), and cfgetc()
used strerror() to get error message of gzgetc(). Use get_cfp_error()
at the two places.
---
 src/bin/pg_dump/compress_io.c         | 10 ++--------
 src/bin/pg_dump/pg_backup_directory.c |  3 ++-
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 28d45fd299..686472e045 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -577,13 +577,7 @@ cfread(void *ptr, int size, cfp *fp)
     {
         ret = gzread(fp->compressedfp, ptr, size);
         if (ret != size && !gzeof(fp->compressedfp))
-        {
-            int            errnum;
-            const char *errmsg = gzerror(fp->compressedfp, &errnum);
-
-            fatal("could not read from input file: %s",
-                  errnum == Z_ERRNO ? strerror(errno) : errmsg);
-        }
+            fatal("could not read from input file: %s", get_cfp_error(fp));
     }
     else
 #endif
@@ -618,7 +612,7 @@ cfgetc(cfp *fp)
         if (ret == EOF)
         {
             if (!gzeof(fp->compressedfp))
-                fatal("could not read from input file: %s", strerror(errno));
+                fatal("could not read from input file: %s", get_cfp_error(fp));
             else
                 fatal("could not read from input file: end of file");
         }
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 4b1cf4a5a3..56e0c27e29 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -698,7 +698,8 @@ _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
     /* register the blob in blobs.toc */
     len = snprintf(buf, sizeof(buf), "%u blob_%u.dat\n", oid, oid);
     if (cfwrite(buf, len, ctx->blobsTocFH) != len)
-        fatal("could not write to blobs TOC file");
+        fatal("could not write to blobs TOC file: %s",
+              get_cfp_error(ctx->blobsTocFH));
 }
 
 /*
-- 
2.18.4


Re: Error in pg_restore (could not close data file: Success)

From
Andrii Tkach
Date:
Maybe it would be better to commit this patches to mainstream, but I don't realy know.

ср, 21 окт. 2020 г. в 09:20, Kyotaro Horiguchi <horikyota.ntt@gmail.com>:
At Wed, 21 Oct 2020 13:45:15 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> https://www.postgresql.org/message-id/flat/20200416.181945.759179589924840062.horikyota.ntt%40gmail.com#ed85c5fda64873c45811be4e3027a2ea
>
> Me> Hmm. Sounds reasonable.  I'm going to do that.  Thanks!
>
> But somehow that haven't happened, I'll come up with a new version.

pg_restore shows the following error instead of "Success" for broken
compressed file.

pg_restore: error: could not close data file "d/3149.dat": zlib error: error reading or writing compressed file


0001:

cfclose() calls fatal() instead of returning the result to the callers
on error, which isobviously okay for all existing callers that are
handling errors from the function. Other callers ignored the returned
value but we should fatal() on error of the function.

At least for me, gzerror doesn't return a message (specifically,
returns NULL) after gzclose failure so currently cfclose shows its own
messages for erros of gzclose().  Am I missing something?

0002:

cfread has the same code with get_cfp_error() and cfgetc uses
sterror() after gzgetc(). It would be suitable for a separate patch,
but 0002 fixes those bugs.  I changed _EndBlob() to show the cause of
an error.

Did not do in this patch:

We could do further small refactoring to remove temporary variables in
pg_backup_directory.c for _StartData(), InitArchiveFmt_Directory,
_LoadBlobs(), _StartBlobs() and _CloseArchive(), but I left them as is
for the ease of back-patching.

Now that we have the file name in the context variable so we could
show the file name in all error messages, but that change was large
and there's a part where that change is a bit more complex so I didn't
do that.


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center