Thread: pg_dump doesn't dump binary compatible casts

pg_dump doesn't dump binary compatible casts

From
Jan Wieck
Date:
The offending source code is in pg_dump.c line 3953, where at the lack 
of an underlying conversion function and thus no clear namespace 
relationship pg_dump simply ignores the cast.

IMHO a binary compatible cast should be dumped if one or both namespaces 
of the underlying data types is included in the dump.

I classify this problem as a bug. Objections?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: pg_dump doesn't dump binary compatible casts

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> I classify this problem as a bug. Objections?

The question is not whether it is a bug, the question is what is correct
behavior instead.

> IMHO a binary compatible cast should be dumped if one or both namespaces 
> of the underlying data types is included in the dump.

That would just replace one bug with another (viz, failure to restore
if one of the datatypes wasn't included in the dump).  Also, what of
user-defined casts between two system datatypes?  pg_catalog is never
considered part of the dump AFAIR.

The first idea that came to mind is to dump the cast if both underlying
types are either system types or included in the dump.  But I think that
would end up dumping built-in binary casts, which is no good either.
I hate to think of looking at the cast's OID vs. lastsysoid to decide
if it was a built-in cast, but maybe there's no other way.

Ideas anyone?
        regards, tom lane


Re: pg_dump doesn't dump binary compatible casts

From
Jan Wieck
Date:

Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>> I classify this problem as a bug. Objections?
> 
> The question is not whether it is a bug, the question is what is correct
> behavior instead.

Not yet, but when we have a decision and a fix it'll be the criterium 
for applying it to 7.4 and maybe backpatch into 7.3.

> 
>> IMHO a binary compatible cast should be dumped if one or both namespaces 
>> of the underlying data types is included in the dump.
> 
> That would just replace one bug with another (viz, failure to restore
> if one of the datatypes wasn't included in the dump).  Also, what of
> user-defined casts between two system datatypes?  pg_catalog is never
> considered part of the dump AFAIR.
> 
> The first idea that came to mind is to dump the cast if both underlying
> types are either system types or included in the dump.  But I think that
> would end up dumping built-in binary casts, which is no good either.
> I hate to think of looking at the cast's OID vs. lastsysoid to decide
> if it was a built-in cast, but maybe there's no other way.

User defined casts between builtin types are a bad thing anyway. They 
are only good for special backward compatibility cases or to support 
applications that are broken by design anyway. And a user defined binary 
cast between builtin types is just the worst case. Thus, braking the 
application with every dump+restore to remind the programmer that 
there's still something he has to fix is the _right thing_.

CREATE CAST for that beast should spit out a popup asking
    "Do you really ..."

with "No" being the default! Well, a WARNING might do for now.

That eliminated, a cast should be dumped if one or more of the three 
objects (source, target and function) are not builtin AND all the 
non-builtin objects belong to namespaces included in the dump.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: pg_dump doesn't dump binary compatible casts

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> That eliminated, a cast should be dumped if one or more of the three 
> objects (source, target and function) are not builtin AND all the 
> non-builtin objects belong to namespaces included in the dump.

Where "builtin" is defined as "belongs to pg_catalog schema", you mean?
I think that works for me ...
        regards, tom lane


Re: pg_dump doesn't dump binary compatible casts

From
Greg Stark
Date:
Jan Wieck <JanWieck@yahoo.com> writes:

> braking the application with every dump+restore to remind the programmer
> that there's still something he has to fix is the _right thing_.

I would hate to be the sysadmin restoring a database after a hardware failure
scratching my head to try to figure out why the restore isn't working.

[But then I'm not a fan of treating pg_dump files as if they were backups.]

-- 
greg



Re: pg_dump doesn't dump binary compatible casts

From
"Matthew T. O'Connor"
Date:
On Tue, 2003-09-23 at 22:40, Greg Stark wrote:
> [But then I'm not a fan of treating pg_dump files as if they were backups.]

If you don't use pg_dump for backups what do you use?  Stop the database
and copy the data directory? That is not a valid choice for most people.



Re: pg_dump doesn't dump binary compatible casts

From
Greg Stark
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:

> On Tue, 2003-09-23 at 22:40, Greg Stark wrote:
> > [But then I'm not a fan of treating pg_dump files as if they were backups.]
> 
> If you don't use pg_dump for backups what do you use?  Stop the database
> and copy the data directory? That is not a valid choice for most people.

It's currently the only way to do a real backup though. I haven't gone to
production yet with my first postgres project but that's certainly what I'm
planning to recommend to my client.

pg_dump files are not backups. When you've restored a pg_dump you don't
necessarily get back the exact same data you had when you dumped it. Every
piece of data is serialized and then unserialized, space has to be allocated
anew, the DDL has to be reinterpreted. pg_dump has to spend the time to
rebuild tables and indexes too. A straight physical backup doesn't have to do
any of that.

pg_dumps are a logical export, not a physical backup. A lot of work goes into
guaranteeing pg_dump/pg_restore reliably generates equivalent databases, but
even so there are always corner cases as the original post describes. What if
there's a bug and some piece of data doesn't unserialize correctly (arrays
didn't in 7.3 if their lower bound wasn't 1 for example). Or if there's a bug
and the database core dumps trying to rebuild indexes? Or for that matter if
the machine available just wasn't powerful enough to generate the indexes at
all.

I would tend to prefer pg_dump/pg_restore files if they were an option because
the resulting database would be more compact, the indexes more balanced, and
any data corruption would be cleaned up. but I wouldn't feel safe unless I had
a real physical backup ready to use in case the pg_dump didn't work, or took
too long.

Every time I've tried to restore from a pg_dump I've run into small problems
like this. That's fine when I'm doing development, but in a crisis you have to
have an image you can restore and be sure it'll be *exactly* the same as it
was before the failure.

Fwiw, there is a hardware solution to doing 24x7 operation without online
backups. You store the database on a raid mirror (preferably a 3-way mirror so
you're never without mirroring). When the time comes to do the backup you
break the mirror, use the inactive side to do the backup. The backup you get
is exactly what you would have gotten had the machine crashed at the point in
time that you broke the mirror. We used to do something similar even with
Oracle because it was easier and faster than its hot backup system. This
solution requires money for good raid controllers and lots of disks though.


-- 
greg



Re: pg_dump doesn't dump binary compatible casts

From
Jan Wieck
Date:

Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
>> That eliminated, a cast should be dumped if one or more of the three
>> objects (source, target and function) are not builtin AND all the
>> non-builtin objects belong to namespaces included in the dump.
>
> Where "builtin" is defined as "belongs to pg_catalog schema", you mean?
> I think that works for me ...

Okay,

I have more precisely defined "builtin" as "belongs to a schema who's
name starts with pg_", as that is how we distinguish what namespaces get
dumped in the first place.

The patch is applied to 7.4 according to being a bug. The same patch
would cleanly apply to 7.3.4 as well, so my question is do we want to
backpatch?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
diff -crN ../src.orig/bin/pg_dump/pg_dump.c ./bin/pg_dump/pg_dump.c
*** ../src.orig/bin/pg_dump/pg_dump.c    Tue Sep 23 10:10:31 2003
--- ./bin/pg_dump/pg_dump.c    Sat Sep 27 10:18:55 2003
***************
*** 3912,3917 ****
--- 3912,3918 ----
      PQExpBuffer query = createPQExpBuffer();
      PQExpBuffer defqry = createPQExpBuffer();
      PQExpBuffer delqry = createPQExpBuffer();
+     PQExpBuffer castsig = createPQExpBuffer();
      int            ntups;
      int            i;

***************
*** 3941,3956 ****
          char       *castcontext = PQgetvalue(res, i, 4);
          int            fidx = -1;
          const char *((*deps)[]);

          if (strcmp(castfunc, "0") != 0)
              fidx = findFuncByOid(finfo, numFuncs, castfunc);

          /*
!          * We treat the cast as being in the namespace of the underlying
!          * function.  This doesn't handle binary compatible casts.  Where
!          * should those go?
           */
!         if (fidx < 0 || !finfo[fidx].pronamespace->dump)
              continue;

          /* Make a dependency to ensure function is dumped first */
--- 3942,3992 ----
          char       *castcontext = PQgetvalue(res, i, 4);
          int            fidx = -1;
          const char *((*deps)[]);
+         int            source_idx;
+         int            target_idx;

          if (strcmp(castfunc, "0") != 0)
              fidx = findFuncByOid(finfo, numFuncs, castfunc);

          /*
!          * As per discussion we dump casts if one or more of the underlying
!          * objects (the conversion function and the two data types) are not
!          * builtin AND if all of the non-builtin objects namespaces are
!          * included in the dump. Builtin meaning, the namespace name does
!          * not start with "pg_".
           */
!         source_idx = findTypeByOid(tinfo, numTypes, castsource);
!         target_idx = findTypeByOid(tinfo, numTypes, casttarget);
!
!         /*
!          * Skip this cast if all objects are from pg_
!          */
!         if ((fidx < 0 || strncmp(finfo[fidx].pronamespace->nspname, "pg_", 3) == 0) &&
!                 strncmp(tinfo[source_idx].typnamespace->nspname, "pg_", 3) == 0 &&
!                 strncmp(tinfo[target_idx].typnamespace->nspname, "pg_", 3) == 0)
!             continue;
!
!         /*
!          * Skip cast if function isn't from pg_ and that namespace is
!          * not dumped.
!          */
!         if (fidx >= 0 &&
!                 strncmp(finfo[fidx].pronamespace->nspname, "pg_", 3) != 0 &&
!                 !finfo[fidx].pronamespace->dump)
!             continue;
!
!         /*
!          * Same for the Source type
!          */
!         if (strncmp(tinfo[source_idx].typnamespace->nspname, "pg_", 3) != 0 &&
!                 !tinfo[source_idx].typnamespace->dump)
!             continue;
!
!         /*
!          * and the target type.
!          */
!         if (strncmp(tinfo[target_idx].typnamespace->nspname, "pg_", 3) != 0 &&
!                 !tinfo[target_idx].typnamespace->dump)
              continue;

          /* Make a dependency to ensure function is dumped first */
***************
*** 3966,3971 ****
--- 4002,4008 ----

          resetPQExpBuffer(defqry);
          resetPQExpBuffer(delqry);
+         resetPQExpBuffer(castsig);

          appendPQExpBuffer(delqry, "DROP CAST (%s AS %s);\n",
                            getFormattedTypeName(castsource, zeroAsNone),
***************
*** 3987,3995 ****
              appendPQExpBuffer(defqry, " AS IMPLICIT");
          appendPQExpBuffer(defqry, ";\n");

          ArchiveEntry(fout, castoid,
!                      format_function_signature(&finfo[fidx], false),
!                      finfo[fidx].pronamespace->nspname, "",
                       "CAST", deps,
                       defqry->data, delqry->data,
                       NULL, NULL, NULL);
--- 4024,4036 ----
              appendPQExpBuffer(defqry, " AS IMPLICIT");
          appendPQExpBuffer(defqry, ";\n");

+         appendPQExpBuffer(castsig, "CAST (%s AS %s)",
+                           getFormattedTypeName(castsource, zeroAsNone),
+                           getFormattedTypeName(casttarget, zeroAsNone));
+
          ArchiveEntry(fout, castoid,
!                      castsig->data,
!                      tinfo[source_idx].typnamespace->nspname, "",
                       "CAST", deps,
                       defqry->data, delqry->data,
                       NULL, NULL, NULL);
***************
*** 4000,4005 ****
--- 4041,4047 ----
      destroyPQExpBuffer(query);
      destroyPQExpBuffer(defqry);
      destroyPQExpBuffer(delqry);
+     destroyPQExpBuffer(castsig);
  }



Re: pg_dump doesn't dump binary compatible casts

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> I have more precisely defined "builtin" as "belongs to a schema who's 
> name starts with pg_", as that is how we distinguish what namespaces get 
> dumped in the first place.

Check.

> The patch is applied to 7.4 according to being a bug. The same patch 
> would cleanly apply to 7.3.4 as well, so my question is do we want to 
> backpatch?

No objection here.
        regards, tom lane


Re: pg_dump doesn't dump binary compatible casts

From
"Joshua D. Drake"
Date:
Hello,
 Don't know if my vote counts here, but ANYTHING that makes pg_dump 
more correct should be
backpatched. It is one thing to have index bloat, it is entirely another 
to not be able to correctly backup
and restore.
 Also did you guys patch the create schema bug? I have a patch I can 
submit if not.

Sincerely,

Joshua Drake


Tom Lane wrote:

>Jan Wieck <JanWieck@Yahoo.com> writes:
>  
>
>>I have more precisely defined "builtin" as "belongs to a schema who's 
>>name starts with pg_", as that is how we distinguish what namespaces get 
>>dumped in the first place.
>>    
>>
>
>Check.
>
>  
>
>>The patch is applied to 7.4 according to being a bug. The same patch 
>>would cleanly apply to 7.3.4 as well, so my question is do we want to 
>>backpatch?
>>    
>>
>
>No objection here.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>  
>




Re: pg_dump doesn't dump binary compatible casts

From
Jan Wieck
Date:
Joshua D. Drake wrote:
> Hello,
> 
>   Don't know if my vote counts here, but ANYTHING that makes pg_dump 
> more correct should be
> backpatched. It is one thing to have index bloat, it is entirely another 
> to not be able to correctly backup
> and restore.

Patch applied to REL7_3_STABLE.


Jan

> Tom Lane wrote:
>>
>>No objection here.

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #