Thread: [PATCH] Add --ordered option to pg_dump

[PATCH] Add --ordered option to pg_dump

From
Bob Lunney
Date:
I needed a way to run diffs on two
database dumps to see what data developers put in their
local databases versus the initial database load.  The
pg_dump utility with --inserts works well for this, but
since the order in which the data is returned of the server
is not guaranteed I hacked this patch to make life
simpler. 

Using --ordered will order the data by primary key or
unique index, if one exists, and use the "smallest" ordering
(i.e. least number of columns required for a unique
order). 

Note that --ordered could crush your database server if you
try to order very large tables, so use judiciously.

This is my first patch submission, so I hope I've followed
protocol.  If not, please be gentle!

Regards,

Bob Lunney
bob_lunney dot yahoo dot com


diff -cNr src/bin/pg_dump/pg_dump.c.orig
src/bin/pg_dump/pg_dump.c
*** src/bin/pg_dump/pg_dump.c.orig      2010-04-06
11:21:48.000000000 -0400                       
           
--- src/bin/pg_dump/pg_dump.c   2010-04-15
10:28:49.000000000 -0400                       
                   
***************                             
                                       
                         
*** 111,116 ****                           
                                       
                           
--- 111,117 ----                           
                                       
                           
  static int    disable_dollar_quoting = 0;       
                                       
                   
  static int    dump_inserts = 0;               
                                       
                     
  static int    column_inserts = 0;             
                                       
                     
+ static int      ordered = 0;                 
                                       
                       
                                       
                                       
                               
                                       
                                       
                               
  static void help(const char *progname);           
                                       
                 
***************                             
                                       
                         
*** 275,280 ****                           
                                       
                           
--- 276,282 ----                           
                                       
                           
                {"inserts", no_argument,
&dump_inserts, 1},                         
                         
                {"lock-wait-timeout",
required_argument, NULL, 2},                     
                       
                {"no-tablespaces", no_argument,
&outputNoTablespaces, 1},                   
                 
+               {"ordered", no_argument,
&ordered, 1},                           
                             
                {"role", required_argument, NULL,
3},                                     
                   
                {"use-set-session-authorization",
no_argument, &use_setsessauth, 1},             
             
                                       
                                       
                               
***************                             
                                       
                         
*** 493,498 ****                           
                                       
                           
--- 495,506 ----                           
                                       
                           
                exit(1);                 
                                       
                             
        }                             
                                       
                               
                                       
                                       
                               
+       if (!dump_inserts && ordered)     
                                       
                           
+       {                             
                                       
                               
+               write_msg(NULL, "option --ordered
cannot be used without --inserts or --column_inserts\n"); 
 
+               exit(1);                 
                                       
                             
+       }                             
                                       
                               
+                                     
                                       
                               
        /* open the output file */             
                                       
                       
        if (pg_strcasecmp(format, "a") == 0 ||
pg_strcasecmp(format, "append") == 0)               
           
        {
***************
*** 822,827 ****
--- 830,836 ----
        printf(_("  --disable-dollar-quoting   
disable dollar quoting, use SQL standard quoting\n"));
        printf(_("  --disable-triggers         
disable triggers during data-only restore\n"));
        printf(_("  --no-tablespaces           
do not dump tablespace assignments\n"));
+       printf(_("  --ordered               
   order insert statements by primary key or unique
index\n"));
        printf(_("  --role=ROLENAME         
   do SET ROLE before dump\n"));
        printf(_(" 
--use-set-session-authorization\n"
                         "           
                  use SET SESSION AUTHORIZATION
commands instead of\n"
***************
*** 1210,1215 ****
--- 1219,1225 ----
        TableDataInfo *tdinfo = (TableDataInfo *)
dcontext;
        TableInfo  *tbinfo = tdinfo->tdtable;
        const char *classname = tbinfo->dobj.name;
+       PQExpBuffer p = createPQExpBuffer();
        PQExpBuffer q = createPQExpBuffer();
        PGresult   *res;
        int                     tuple;
***************
*** 1239,1244 ****
--- 1249,1292 ----
                                       
                                     
   classname));
        }

+       /*
+        * If --ordered is specified on the command
line get the primary key or
+        * unique index column positions and order the
select using those.  If
+        * neither exist say so and just do an
unordered select.  The ORDER BY and
+        * LIMIT 1 insure we're using the least number
of rows in the
+        * sort.
+        */
+
+       if (ordered)
+       {
+               appendPQExpBuffer(p, "SELECT
array_to_string(indkey, ','), array_length(i.indkey, 1) "
+                                     
                  "  FROM pg_catalog.pg_index i "
+                                     
                  " WHERE (i.indisprimary = true or
i.indisunique = true) "
+                                     
                  "   AND i.indisvalid = true "
+                                     
                  "   AND i.indrelid =
'%s'::regclass "
+                                     
                  " ORDER BY 2, 1 LIMIT 1",
+                                     
             
   fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
classname));
+
+               res = PQexec(g_conn, p->data);
+               check_sql_result(res, g_conn,
p->data, PGRES_TUPLES_OK);
+
+               if (PQntuples(res) == 0)
+               {
+                       write_msg(NULL, "Table
%s does not have a primary key or unique index.  The dump
for this table is unordered.\n",
+                                     
     
   fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
classname));
+               }
+               else
+               {
+                       const char *s =
PQgetvalue(res, 0, 0);
+                       if (s != NULL)
+                       {
+                           
   appendPQExpBuffer(q, " ORDER BY %s", s);
+                       }
+               }
+               PQclear(res);
+               destroyPQExpBuffer(p);
+       }
+
        res = PQexec(g_conn, q->data);
        check_sql_result(res, g_conn, q->data,
PGRES_COMMAND_OK);



diff -cNr doc/man1/pg_dump.1.orig doc/man1/pg_dump.1
*** doc/man1/pg_dump.1.orig     2010-04-15
12:05:54.000000000 -0400
--- doc/man1/pg_dump.1  2010-04-15 12:30:55.000000000
-0400
***************
*** 356,361 ****
--- 356,368 ----
  an error in reloading a row causes only that row to be
lost rather
  than the entire table contents.
  .TP
+ \fB--ordered\fR
+ Order the data being dumped.  This option is mainly
useful
+ for running a \fBdiff(1)\fR on two plain dump files to
detect
+ data changes, and will be very slow
+ when dumping large tables.  Must be used with
+ \fB--inserts\fR or \fB--column-inserts\fR.
+ .TP
  \fB--disable-dollar-quoting\fR
  This option disables the use of dollar quoting for
function bodies,
  and forces them to be quoted using SQL standard string
syntax.

diff -cNr doc/src/sgml/ref/pg_dump.sgml.orig
doc/src/sgml/ref/pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml.orig  2010-04-15
12:18:49.000000000 -0400
--- doc/src/sgml/ref/pg_dump.sgml       2010-04-15
12:30:01.000000000 -0400
***************
*** 561,566 ****
--- 561,582 ----
       </varlistentry>

       <varlistentry>
+   
   <term><option>--ordered</option></term>
+       <listitem>
+        <para>
+         Order the data being dumped.  This option
is mainly useful
+         for running a
<application>diff</application>
+         on two plain dump files to detect data
changes,
+         and will be very slow
+         when dumping large tables.  Must be used
with
+         <option>--inserts</optionor
+     
   <option>--column-inserts</option>.
+        </para>
+       </listitem>
+      </varlistentry>
+
+
+      <varlistentry>
       
<term><option>--disable-dollar-quoting</></term>
        <listitem>
         <para>






Re: [PATCH] Add --ordered option to pg_dump

From
Peter Eisentraut
Date:
On tor, 2010-04-15 at 10:48 -0700, Bob Lunney wrote:
> I needed a way to run diffs on two
> database dumps to see what data developers put in their
> local databases versus the initial database load.

Maybe pg_comparator would help you?



Re: [PATCH] Add --ordered option to pg_dump

From
Mark Kirkwood
Date:
Peter Eisentraut wrote:
> On tor, 2010-04-15 at 10:48 -0700, Bob Lunney wrote:
>   
>> I needed a way to run diffs on two
>> database dumps to see what data developers put in their
>> local databases versus the initial database load.
>>     
>
> Maybe pg_comparator would help you?
>
>
>   
Or DBIx::Compare if you like perl :-)

Mark


Re: [PATCH] Add --ordered option to pg_dump

From
Bob Lunney
Date:
Thanks for the suggestion, Peter.  It looks like pg_comparator is for comparing the contents of two different servers.
Ineed to compare the contents of two dump files from the same server separated by time and busy developers.
 

Regards,

Bob Lunney

--- On Thu, 4/15/10, Peter Eisentraut <peter_e@gmx.net> wrote:

> From: Peter Eisentraut <peter_e@gmx.net>
> Subject: Re: [HACKERS] [PATCH] Add --ordered option to pg_dump
> To: "Bob Lunney" <bob_lunney@yahoo.com>
> Cc: pgsql-hackers@postgresql.org
> Date: Thursday, April 15, 2010, 4:36 PM
> On tor, 2010-04-15 at 10:48 -0700,
> Bob Lunney wrote:
> > I needed a way to run diffs on two
> > database dumps to see what data developers put in
> their
> > local databases versus the initial database load.
> 
> Maybe pg_comparator would help you?
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

     


Re: [PATCH] Add --ordered option to pg_dump

From
Thomas Sibley
Date:
I came across this thread from a web search when looking for prior art.
I'm reviving it to alert other interested folks to a flaw in the
provided patch should they try to use it.  The feature request seems
somewhat common on the web, and the patch here would cause some fun
debugging sessions otherwise.

On Thu, Apr 15, 2010 at 10:48:25AM -0700, Bob Lunney wrote:
> Using --ordered will order the data by primary key or
> unique index, if one exists, and use the "smallest" ordering
> (i.e. least number of columns required for a unique
> order). 

The provided patch for pg_dump.c fails on a corner case with deleted
columns.  It's likely rare, but tripping over it will either cause an
unexpected order (at best) or a failed dump (at worst).

> +       if (ordered)
> +       {
> +               appendPQExpBuffer(p, "SELECT array_to_string(indkey, ','), array_length(i.indkey, 1) "
> +                                    "  FROM pg_catalog.pg_index i "
> +                                    " WHERE (i.indisprimary = true or i.indisunique = true) "
> +                                    "   AND i.indisvalid = true "
> +                                    "   AND i.indrelid = '%s'::regclass "
> +                                    " ORDER BY 2, 1 LIMIT 1",

In this stanza the internal column numbers of the selected index keys
are joined with commas to produce a string.

> +                       const char *s = PQgetvalue(res, 0, 0);
> +                       if (s != NULL)
> +                       {
> +                           appendPQExpBuffer(q, " ORDER BY %s", s);
> +                       }

Then that string is used when constructing the ORDER BY clause for the
data retrieval query.  The query is a SELECT * FROM ... which means the
positional ORDER BY takes internal column numbers and uses them for the
implicit select-list columns.  That's fine if * is guaranteed to match
the internal column order, but that's not true in cases where the table
has a deleted column between two columns in the index.  The query will
either order by an unexpected column not part of the index (with
potentially very slow results), or it will fail if it overruns the
length of the select-list.  A SQL script demonstrating the failure mode
is attached.

To work around this corner case, the patch should map the internal
column numbers to column names (via pg_attribute or somesuch) and then
use those in the ORDER BY.

Hopefully this'll save someone else a bit of debugging in the future.

Thomas

Attachment