Thread: Allow pg_archivecleanup to ignore extensions

Allow pg_archivecleanup to ignore extensions

From
Greg Smith
Date:
One bit of feedback I keep getting from people who archive their WAL
files is that the fairly new pg_archivecleanup utility doesn't handle
the case where those archives are compressed.  As the sort of users who
are concerned about compression are also often ones with giant archives
they struggle to cleanup, they would certainly appreciate having a
bundled utility to take care of that.

The attached patch provides an additional option to the utility to
provide this capability.  It just strips a provided extension off any
matching file it considers before running the test for whether it should
be deleted or not.  It includes updates to the usage message and some
docs about how this might be used.  Code by Jaime Casanova and myself.

Here's an example of it working:

$ psql -c "show archive_command"
                  archive_command
----------------------------------------------------
 cp -i %p archive/%f < /dev/null && gzip archive/%f
[Yes, I know that can be written more cleanly.  I call external scripts
with more serious error handling than you can put into a single command
line for this sort of thing in production.]

$ psql -c "select pg_start_backup('test',true)"
$ psql -c "select pg_stop_backup()"
$ psql -c "checkpoint"
$ psql -c "select pg_switch_xlog()"

$ cd $PGDATA/archive
$ ls
000000010000000000000025.gz
000000010000000000000026.gz
000000010000000000000027.gz
000000010000000000000028.00000020.backup.gz
000000010000000000000028.gz
000000010000000000000029.gz

$ pg_archivecleanup -d -x .gz `pwd`
000000010000000000000028.00000020.backup
pg_archivecleanup: keep WAL file
"/home/gsmith/pgwork/data/archivecleanup/archive/000000010000000000000028"
and later
pg_archivecleanup: removing file
"/home/gsmith/pgwork/data/archivecleanup/archive/000000010000000000000025.gz"
pg_archivecleanup: removing file
"/home/gsmith/pgwork/data/archivecleanup/archive/000000010000000000000027.gz"
pg_archivecleanup: removing file
"/home/gsmith/pgwork/data/archivecleanup/archive/000000010000000000000026.gz"
$ ls
000000010000000000000028.00000020.backup.gz
000000010000000000000028.gz
000000010000000000000029.gz

We recenty got some on-list griping that pg_standby doesn't handle
archive files that are compressed, either.  Given how the job I'm
working on this week is going, I'll probably have to add that feature
next.  That's actually an easier source code hack than this one, because
of how the pg_standby code modularizes the concept of a restore command.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/contrib/pg_archivecleanup/pg_archivecleanup.c b/contrib/pg_archivecleanup/pg_archivecleanup.c
index 7989207..a95b659 100644
*** a/contrib/pg_archivecleanup/pg_archivecleanup.c
--- b/contrib/pg_archivecleanup/pg_archivecleanup.c
*************** const char *progname;
*** 36,41 ****
--- 36,42 ----

  /* Options and defaults */
  bool        debug = false;        /* are we debugging? */
+ char       *additional_ext = NULL;    /* Extension to remove from filenames */

  char       *archiveLocation;    /* where to find the archive? */
  char       *restartWALFileName; /* the file from which we can restart restore */
*************** static void
*** 93,105 ****
--- 94,135 ----
  CleanupPriorWALFiles(void)
  {
      int            rc;
+     int            chop_at;
      DIR           *xldir;
      struct dirent *xlde;
+     char        walfile[MAXPGPATH];

      if ((xldir = opendir(archiveLocation)) != NULL)
      {
          while ((xlde = readdir(xldir)) != NULL)
          {
+             strncpy(walfile, xlde->d_name, MAXPGPATH);
+             /*
+              * Remove any specified additional extension from the filename
+              * before testing it against the conditions below.
+              */
+             if (additional_ext)
+             {
+                 chop_at = strlen(walfile) - strlen(additional_ext);
+                 /*
+                  * Only chop if this is long enough to be a file name and the
+                  * extension matches.
+                  */
+                 if ((chop_at >= (XLOG_DATA_FNAME_LEN - 1)) &&
+                     (strcmp(walfile + chop_at,additional_ext)==0))
+                 {
+                     walfile[chop_at] = '\0';
+                     /*
+                      * This is too chatty even for regular debug output, but
+                      * leaving it in for program testing.
+                      */
+                     if (false)
+                         fprintf(stderr,
+                             "removed extension='%s' from file=%s result=%s\n",
+                             additional_ext,xlde->d_name,walfile);
+                 }
+             }
+
              /*
               * We ignore the timeline part of the XLOG segment identifiers in
               * deciding whether a segment is still needed.    This ensures that
*************** CleanupPriorWALFiles(void)
*** 113,122 ****
               * file. Note that this means files are not removed in the order
               * they were originally written, in case this worries you.
               */
!             if (strlen(xlde->d_name) == XLOG_DATA_FNAME_LEN &&
!             strspn(xlde->d_name, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
!                 strcmp(xlde->d_name + 8, exclusiveCleanupFileName + 8) < 0)
              {
                  snprintf(WALFilePath, MAXPGPATH, "%s/%s",
                           archiveLocation, xlde->d_name);
                  if (debug)
--- 143,156 ----
               * file. Note that this means files are not removed in the order
               * they were originally written, in case this worries you.
               */
!             if (strlen(walfile) == XLOG_DATA_FNAME_LEN &&
!             strspn(walfile, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
!                 strcmp(walfile + 8, exclusiveCleanupFileName + 8) < 0)
              {
+                 /*
+                  * Use the original file name again now, including any extension
+                  * that might have been chopped off before testing the sequence.
+                  */
                  snprintf(WALFilePath, MAXPGPATH, "%s/%s",
                           archiveLocation, xlde->d_name);
                  if (debug)
*************** usage(void)
*** 214,219 ****
--- 248,254 ----
             "  pg_archivecleanup /mnt/server/archiverdir 000000010000000000000010.00000020.backup\n");
      printf("\nOptions:\n");
      printf("  -d                 generates debug output (verbose mode)\n");
+     printf("  -x EXT             cleanup files if they have this same extension\n");
      printf("  --help             show this help, then exit\n");
      printf("  --version          output version information, then exit\n");
      printf("\nReport bugs to <pgsql-bugs@postgresql.org>.\n");
*************** main(int argc, char **argv)
*** 241,253 ****
          }
      }

!     while ((c = getopt(argc, argv, "d")) != -1)
      {
          switch (c)
          {
              case 'd':            /* Debug mode */
                  debug = true;
                  break;
              default:
                  fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
                  exit(2);
--- 276,291 ----
          }
      }

!     while ((c = getopt(argc, argv, "x:d")) != -1)
      {
          switch (c)
          {
              case 'd':            /* Debug mode */
                  debug = true;
                  break;
+             case 'x':
+                 additional_ext = optarg;
+                 break;
              default:
                  fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
                  exit(2);
diff --git a/doc/src/sgml/pgarchivecleanup.sgml b/doc/src/sgml/pgarchivecleanup.sgml
index 725f3ed..0c215fb 100644
*** a/doc/src/sgml/pgarchivecleanup.sgml
--- b/doc/src/sgml/pgarchivecleanup.sgml
*************** pg_archivecleanup:  removing file "archi
*** 98,103 ****
--- 98,118 ----
        </listitem>
       </varlistentry>

+      <varlistentry>
+       <term><option>-x</option> <replaceable>extension</></term>
+       <listitem>
+        <para>
+         When using the program as a standalone utility, provide an extension
+         that will be stripped from all file names before deciding if they
+         should be deleted.  This is typically useful for cleaning up archives
+         that have been compressed during storage, and therefore have had an
+         extension added by the compression program.  Note that the
+         <filename>.backup</> file name passed to the program should not
+         include the extension.
+        </para>
+       </listitem>
+      </varlistentry>
+
      </variablelist>
     </para>


Re: Allow pg_archivecleanup to ignore extensions

From
Euler Taveira de Oliveira
Date:
Em 08-02-2011 04:57, Greg Smith escreveu:
> We recenty got some on-list griping that pg_standby doesn't handle
> archive files that are compressed, either. Given how the job I'm working
> on this week is going, I'll probably have to add that feature next.
> That's actually an easier source code hack than this one, because of how
> the pg_standby code modularizes the concept of a restore command.
>
This was already proposed a few years ago [1]. I have used a modified 
pg_standby with this feature for a year or so.


[1] 
http://archives.postgresql.org/message-id/e4ccc24e0810222010p12bae2f4xa3a11cb2bc51bd89%40mail.gmail.com


--   Euler Taveira de Oliveira  http://www.timbira.com/


Re: Allow pg_archivecleanup to ignore extensions

From
Robert Haas
Date:
On Tue, Feb 8, 2011 at 2:57 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> One bit of feedback I keep getting from people who archive their WAL files
> is that the fairly new pg_archivecleanup utility doesn't handle the case
> where those archives are compressed.  As the sort of users who are concerned
> about compression are also often ones with giant archives they struggle to
> cleanup, they would certainly appreciate having a bundled utility to take
> care of that.

Please add this patch to the currently open CommitFest at

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Allow pg_archivecleanup to ignore extensions

From
Josh Berkus
Date:
Simon, Greg,

This patch[1] is for some reason marked "waiting on Author".  But I
can't find that there's been any review of it searching the list.
What's going on with it?  Has it been reviewed?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Allow pg_archivecleanup to ignore extensions

From
Simon Riggs
Date:
On Sun, Jul 10, 2011 at 7:13 PM, Josh Berkus <josh@agliodbs.com> wrote:

> This patch[1] is for some reason marked "waiting on Author".  But I
> can't find that there's been any review of it searching the list.
> What's going on with it?  Has it been reviewed?

Yes, I reviewed it on list. Some minor changes were discussed. I'm
with Greg now, so we'll discuss and handle it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Allow pg_archivecleanup to ignore extensions

From
Josh Berkus
Date:
On 7/12/11 7:38 AM, Simon Riggs wrote:
> On Sun, Jul 10, 2011 at 7:13 PM, Josh Berkus <josh@agliodbs.com> wrote:
> 
>> This patch[1] is for some reason marked "waiting on Author".  But I
>> can't find that there's been any review of it searching the list.
>> What's going on with it?  Has it been reviewed?
> 
> Yes, I reviewed it on list. Some minor changes were discussed. I'm
> with Greg now, so we'll discuss and handle it.

I couldn't find the review searching the archives.  Can you please link
it in the Commitfest application?  Thanks.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Allow pg_archivecleanup to ignore extensions

From
Josh Berkus
Date:
On 7/12/11 11:17 AM, Josh Berkus wrote:
> On 7/12/11 7:38 AM, Simon Riggs wrote:
>> On Sun, Jul 10, 2011 at 7:13 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>
>>> This patch[1] is for some reason marked "waiting on Author".  But I
>>> can't find that there's been any review of it searching the list.
>>> What's going on with it?  Has it been reviewed?
>>
>> Yes, I reviewed it on list. Some minor changes were discussed. I'm
>> with Greg now, so we'll discuss and handle it.
> 
> I couldn't find the review searching the archives.  Can you please link
> it in the Commitfest application?  Thanks.

Given the total lack of activity on this patch, I'm bumping it.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com