Thread: pg_dump vs. TRANSFORMs

pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
Greetings all,

While testing pg_dump, I discovered that there seems to be an issue when
it comes to TRANSFORMs.  I'll be the first to admit that I'm not
terribly familiar with transforms, but I do know that if you create one
using functions from pg_catalog (as our regression tests do), the CREATE
TRANSFORM statement isn't dumped out and therefore we don't test the
dump/restore of transforms, even with our current pg_upgrade test
process.

In the regression tests, we do:

CREATE TRANSFORM FOR int LANGUAGE SQL (       FROM SQL WITH FUNCTION varchar_transform(internal),       TO SQL WITH
FUNCTIONint4recv(internal)); 

but if you do that and then run pg_dump, you don't see any mention of
any transforms in the dump file.

This is because there's a couple of checks towards the top of
dumpTransform():
   /* Cannot dump if we don't have the transform functions' info */   if (OidIsValid(transform->trffromsql))   {
 fromsqlFuncInfo = findFuncByOid(transform->trffromsql);       if (fromsqlFuncInfo == NULL)           return;   }   if
(OidIsValid(transform->trftosql))  {          tosqlFuncInfo = findFuncByOid(transform->trftosql);       if
(tosqlFuncInfo== NULL)           return;   } 

These checks are looking for the functions used by the transform in the
list of functions that pg_dump has loaded, but in 9.5, we don't load any
of the function in pg_catalog, and even with my patches, we only dump
the functions in pg_catalog that have an ACL which has been changed from
the default.

Given that there are lots of functions in pg_catalog, we probably don't
want to just load them all, all the time, but what we should probably do
is grab any functions which are depended on by any transforms.  Or we
could change dumpTransform() to actually issue a query to get the
function information which it needs when we've decided to dump a given
transform.

For my 2c, this is a bug that needs to be fixed and back-patched to 9.5.

I'm going to continue working on my pg_dump test suite for now, but I
can look at fixing this after PGCon if no one else fixes it first.

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Peter Eisentraut
Date:
On 5/4/16 2:39 PM, Stephen Frost wrote:
> These checks are looking for the functions used by the transform in the
> list of functions that pg_dump has loaded, but in 9.5, we don't load any
> of the function in pg_catalog, and even with my patches, we only dump
> the functions in pg_catalog that have an ACL which has been changed from
> the default.

This issue is not specific to transforms.  For example, if you create a 
user-defined cast using a built-in function, the cast won't be dumped. 
Obviously, this is a problem, but it needs a more general solution.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 5/4/16 2:39 PM, Stephen Frost wrote:
> >These checks are looking for the functions used by the transform in the
> >list of functions that pg_dump has loaded, but in 9.5, we don't load any
> >of the function in pg_catalog, and even with my patches, we only dump
> >the functions in pg_catalog that have an ACL which has been changed from
> >the default.
>
> This issue is not specific to transforms.  For example, if you
> create a user-defined cast using a built-in function, the cast won't
> be dumped. Obviously, this is a problem, but it needs a more general
> solution.

Ah, I hadn't gotten to casts yet with my test framework.

I'm certainly open to suggestions on how to address this, but I will
point out that we already have a number of cases where we issue
additional queries against the database from the dump*() functions when
we need to collect additional information.  That's not wonderful because
it ends up being slow, but it does work.

Trying to figure out what functions we need at getFuncs() time isn't
terribly easy.  On the other hand, it might not be *that* bad to just
pull in all functions, if we use the same technique that we use for
tables, and mark the ones we aren't dumping as not "interesting."

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 5/4/16 2:39 PM, Stephen Frost wrote:
>> These checks are looking for the functions used by the transform in the
>> list of functions that pg_dump has loaded, but in 9.5, we don't load any
>> of the function in pg_catalog, and even with my patches, we only dump
>> the functions in pg_catalog that have an ACL which has been changed from
>> the default.

> This issue is not specific to transforms.  For example, if you create a 
> user-defined cast using a built-in function, the cast won't be dumped. 
> Obviously, this is a problem, but it needs a more general solution.

Actually, I believe it will be dumped.  selectDumpableCast believes it
should dump casts with OID >= FirstNormalObjectId.  That's a kluge no
doubt, but reasonably effective; looks like we've been doing that since
9.0.

pg_dump appears not to have a special-case selectDumpableTransform
routine.  Instead it falls back on the generic selectDumpableObject
for transforms.  That's a bad idea because the only useful bit of
knowledge selectDumpableObject has is to look at the containing
namespace ... and transforms don't have one, just as casts don't.

My recommendation is to clone that cast logic for transforms.
        regards, tom lane



Re: pg_dump vs. TRANSFORMs

From
Peter Eisentraut
Date:
On 5/4/16 11:23 PM, Tom Lane wrote:
> Actually, I believe it will be dumped.  selectDumpableCast believes it
> should dump casts with OID >= FirstNormalObjectId.  That's a kluge no
> doubt, but reasonably effective; looks like we've been doing that since
> 9.0.
>
> pg_dump appears not to have a special-case selectDumpableTransform
> routine.  Instead it falls back on the generic selectDumpableObject
> for transforms.  That's a bad idea because the only useful bit of
> knowledge selectDumpableObject has is to look at the containing
> namespace ... and transforms don't have one, just as casts don't.
>
> My recommendation is to clone that cast logic for transforms.

Hmm.  The way I understand it is that Stephen wants to make dumping that 
test case work.  But note that that test case is bogus; it wouldn't 
actually do anything useful in practice.  There aren't any functions in 
the system catalog that could be used for actual transforms.  So making 
these changes in pg_dump isn't really of much practical value.  Perhaps 
it would be easier to change the test case or adjust the testing procedure?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > On 5/4/16 2:39 PM, Stephen Frost wrote:
> >> These checks are looking for the functions used by the transform in the
> >> list of functions that pg_dump has loaded, but in 9.5, we don't load any
> >> of the function in pg_catalog, and even with my patches, we only dump
> >> the functions in pg_catalog that have an ACL which has been changed from
> >> the default.
>
> > This issue is not specific to transforms.  For example, if you create a
> > user-defined cast using a built-in function, the cast won't be dumped.
> > Obviously, this is a problem, but it needs a more general solution.
>
> Actually, I believe it will be dumped.  selectDumpableCast believes it
> should dump casts with OID >= FirstNormalObjectId.  That's a kluge no
> doubt, but reasonably effective; looks like we've been doing that since
> 9.0.

No, it isn't dumped.  An interesting example test case is this:

-----
create function pg_catalog.toint(varchar) returns integer strict immutable language sql as 'select cast($1 as
integer);';

create cast (varchar as integer) with function toint(varchar);
-----

Note that neither the function nor the cast is dumped, and this was with
9.5.  This is because:

getFuncs() intentionally skips all functions in pg_catalog (unless they
are members of extensions...).  dumpCast() attempts to find the function
referenced by the cast in the set of functions which getFuncs()
collected, and simply gives up and doesn't dump the cast if it can't
find the function.

For my 2c, this coding pattern:

--------------
funcInfo = findFuncByOid(cast->castfunc);
if (funcInfo == NULL)return;
--------------

in pg_dump is really bad.  Thankfully, it looks like that's only
happening in dumpCast() and dumpTransform().  We used to have a similar
issue, it looks like, with extensions, which was fixed in 89c29c0.  It
looks like we need to either stop excluding the functions in pg_catalog
during getFuncs(), or add in more exceptions to the "don't collect info
about functions in pg_catalog" rule.

I'm also inclined to think that we should fix dumping of non-initdb
functions which have been created in pg_catalog.  I'm not sure how far
to take that though, as we have a similar issue for most object types
when it comes to pg_catalog.  Perhaps selectDumpableObject() should be
considering FirstNormalObjectId and constraining what component we dump
for from-initdb objects to only ACL, and pg_catalog, as a namespace,
should be set to dump all components (in HEAD, and just set to not dump
anything for from-initdb objects in back-branches, but everything for
user-defined objects).

> pg_dump appears not to have a special-case selectDumpableTransform
> routine.  Instead it falls back on the generic selectDumpableObject
> for transforms.  That's a bad idea because the only useful bit of
> knowledge selectDumpableObject has is to look at the containing
> namespace ... and transforms don't have one, just as casts don't.
>
> My recommendation is to clone that cast logic for transforms.

We should do this also, if we don't change selectDumpableObject, but
we should do it because we might have from-initdb transforms one day
and the current logic would end up selecting those transforms to dump
out (though dumpTransform would end up skipping them currently because
they'd be referring to functions that we didn't get information about,
but hopefully we're going to fix that).

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 5/4/16 11:23 PM, Tom Lane wrote:
> >Actually, I believe it will be dumped.  selectDumpableCast believes it
> >should dump casts with OID >= FirstNormalObjectId.  That's a kluge no
> >doubt, but reasonably effective; looks like we've been doing that since
> >9.0.
> >
> >pg_dump appears not to have a special-case selectDumpableTransform
> >routine.  Instead it falls back on the generic selectDumpableObject
> >for transforms.  That's a bad idea because the only useful bit of
> >knowledge selectDumpableObject has is to look at the containing
> >namespace ... and transforms don't have one, just as casts don't.
> >
> >My recommendation is to clone that cast logic for transforms.
>
> Hmm.  The way I understand it is that Stephen wants to make dumping
> that test case work.  But note that that test case is bogus; it
> wouldn't actually do anything useful in practice.  There aren't any
> functions in the system catalog that could be used for actual
> transforms.  So making these changes in pg_dump isn't really of much
> practical value.  Perhaps it would be easier to change the test case
> or adjust the testing procedure?

I strongly disagree with the idea that this is only an issue with the
testing system.  What if we add functions in the future that are
created by initdb and *are* useful for transforms?  What about casts?
There are a lot of functions in pg_catalog that a user might wish to use
to define their own cast.  This also doesn't do anything about users
creating functions in pg_catalog.

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Jim Nasby
Date:
On 5/5/16 8:33 AM, Stephen Frost wrote:
> I strongly disagree with the idea that this is only an issue with the
> testing system.  What if we add functions in the future that are
> created by initdb and *are* useful for transforms?  What about casts?
> There are a lot of functions in pg_catalog that a user might wish to use
> to define their own cast.  This also doesn't do anything about users
> creating functions in pg_catalog.

+1 to all of that. I know that I've at least created casts using 
built-in functions during testing, and I think I might be doing it in 
some of my extensions.

As for transforms, I suspect we're going to end up with some of those in 
initdb in the future, because it's currently the only way you can 
improve existing type transformations without breaking existing PL code.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
All,

* Stephen Frost (sfrost@snowman.net) wrote:
> While testing pg_dump, I discovered that there seems to be an issue when
> it comes to TRANSFORMs.

[...]

As pointed out by Peter E, this also impacts CASTs.  Attached is a patch
which addresses both by simply also pulling any functions which are
referenced from pg_cast or pg_transform when they have OIDs at or after
FirstNormalObjectId.  I also modified dumpCast() and dumpTransform() to
complain loudly if they're unable to dump out the cast or transform due
to not finding the function definition(s) necessary.

This'll need to be back-patched to 9.5 for the pg_transform look up and
all the way for pg_cast, though I don't expect that to be too difficult.

We don't do anything else with FirstNormalObjectId in SQL code in
pg_dump, though we obviously use it all over the place in the actual
code based on the OIDs returned from the database.  Still, does anyone
see an issue with using it in a query?  Without it, we end grabbing the
info for 100+ or so functions in a default install that we don't need,
which isn't horrible, but there were concerns raised before about
pg_dump performance for very small databases.

This also adds in regression tests to pg_dump for casts and transforms
and the pg_upgrade testing with the regression database will now
actually test the dump/restore of transforms (which it didn't previously
because the transforms weren't dumped).

Thoughts?

Thanks!

Stephen

Attachment

Re: pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> As pointed out by Peter E, this also impacts CASTs.  Attached is a patch
> which addresses both by simply also pulling any functions which are
> referenced from pg_cast or pg_transform when they have OIDs at or after
> FirstNormalObjectId.  I also modified dumpCast() and dumpTransform() to
> complain loudly if they're unable to dump out the cast or transform due
> to not finding the function definition(s) necessary.

Please do not hack the already-overcomplicated query in getFuncs without
bothering to adjust the long comment that describes what it's doing.

I have a vague feeling that the code for dumping casts and/or transforms
may have some assumptions that the underlying function is also being
dumped.  Although maybe the assumption was really only what's fixed here,
ie that there be a DumpableObject for the function.  Anyway, take a close
look for that.

Looks reasonable otherwise.
        regards, tom lane



Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > As pointed out by Peter E, this also impacts CASTs.  Attached is a patch
> > which addresses both by simply also pulling any functions which are
> > referenced from pg_cast or pg_transform when they have OIDs at or after
> > FirstNormalObjectId.  I also modified dumpCast() and dumpTransform() to
> > complain loudly if they're unable to dump out the cast or transform due
> > to not finding the function definition(s) necessary.
>
> Please do not hack the already-overcomplicated query in getFuncs without
> bothering to adjust the long comment that describes what it's doing.

Right, just wanted to make sure no one had issue with this approach.

> I have a vague feeling that the code for dumping casts and/or transforms
> may have some assumptions that the underlying function is also being
> dumped.  Although maybe the assumption was really only what's fixed here,
> ie that there be a DumpableObject for the function.  Anyway, take a close
> look for that.

I'll look around and see, though my hunch is that, at some point, we
were just pulling all functions and then an optimization was added to
exclude pg_catalog and no one noticed that it broke casts using built-in
functions.

> Looks reasonable otherwise.

Ok, great, I'll get to work on building and testing all supported
versions of pg_dump vs. all server versions that I can reasonably get to
build on my current laptop, which I expect to be a matrix of 9.2-master
pg_dump against server versions ~7.4-master...

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I have a vague feeling that the code for dumping casts and/or transforms
>> may have some assumptions that the underlying function is also being
>> dumped.  Although maybe the assumption was really only what's fixed here,
>> ie that there be a DumpableObject for the function.  Anyway, take a close
>> look for that.

> I'll look around and see, though my hunch is that, at some point, we
> were just pulling all functions and then an optimization was added to
> exclude pg_catalog and no one noticed that it broke casts using built-in
> functions.

Nah, that's historical revisionism --- the exclusion for system functions
is very ancient.  It certainly predates transforms altogether, and
probably predates the cast-dumping code in anything like its current form.
I think the expectation was that casts using built-in functions were
also built-in and so needn't be dumped.  There may be a comment about it
somewhere, which would need to be revised now.

(Actually, the most likely way in which this would break things is if
it started causing built-in casts to get dumped ... have you checked?)
        regards, tom lane



Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> I have a vague feeling that the code for dumping casts and/or transforms
> >> may have some assumptions that the underlying function is also being
> >> dumped.  Although maybe the assumption was really only what's fixed here,
> >> ie that there be a DumpableObject for the function.  Anyway, take a close
> >> look for that.
>
> > I'll look around and see, though my hunch is that, at some point, we
> > were just pulling all functions and then an optimization was added to
> > exclude pg_catalog and no one noticed that it broke casts using built-in
> > functions.
>
> Nah, that's historical revisionism --- the exclusion for system functions
> is very ancient.  It certainly predates transforms altogether, and
> probably predates the cast-dumping code in anything like its current form.

Oh, certainly it predates transforms entirely, but I completely expect
that code to have been copied from dumpCast().

> I think the expectation was that casts using built-in functions were
> also built-in and so needn't be dumped.  There may be a comment about it
> somewhere, which would need to be revised now.

I'll look and see.

> (Actually, the most likely way in which this would break things is if
> it started causing built-in casts to get dumped ... have you checked?)

I did and it doesn't cause built-in casts to be dumped because we have
the FirstNormalObjectId check in selectDumpableCast().  We don't have
that for transforms (it just uses the generic selectDumpableObject
routine), but we also don't have any built-in transforms.

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> I have a vague feeling that the code for dumping casts and/or transforms
> >> may have some assumptions that the underlying function is also being
> >> dumped.  Although maybe the assumption was really only what's fixed here,
> >> ie that there be a DumpableObject for the function.  Anyway, take a close
> >> look for that.
>
> > I'll look around and see, though my hunch is that, at some point, we
> > were just pulling all functions and then an optimization was added to
> > exclude pg_catalog and no one noticed that it broke casts using built-in
> > functions.
>
> Nah, that's historical revisionism --- the exclusion for system functions
> is very ancient.  It certainly predates transforms altogether, and
> probably predates the cast-dumping code in anything like its current form.

Apparently, that exclusion goes back to 7.3.

> I think the expectation was that casts using built-in functions were
> also built-in and so needn't be dumped.  There may be a comment about it
> somewhere, which would need to be revised now.

I don't see anything in current code or back to 9.2.  Going back
farther, I do find comments about skipping casts which only refer to
things in pg_* namespaces, which, of course, isn't correct either.

As such, creating a cast like so:

create cast (timestamptz as interval) with function age(timestamptz) as
assignment;

results in a cast which no version of pg_dump will actually dump out of
any PG server version since CREATE CAST was added.  I spent the effort
to get all the way back to 7.1 up and running, tho CREATE CAST wasn't
actually added until 7.3.

> (Actually, the most likely way in which this would break things is if
> it started causing built-in casts to get dumped ... have you checked?)

So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
have functions for server versions 7.3-8.0.  Casts which do have a
function aren't included though, be they user-defined or not, because
they're excluded by getFuncs() and dumpCast() just punts.

With my change, pg_dump'ing against 8.0 and earlier will dump out all
casts, including those with functions, since the function definitions
will now be pulled in for them by getFuncs().

What isn't clear to me is what to do about this.  Given the lack of
anyone complaining, and that this would at least ensure that the
user-defined casts are dumped, we could just go with this change and
tell people who are dumping against 8.0 and earlier databases to ignore
the errors from the extra CREATE CAST commands (they shouldn't hurt
anything, after all) during the restore.

Older versions of pg_dump don't have much to offer us regarding this
case- they didn't dump out user-defined casts which only used components
from pg_catalog either.

Ok, thinking a bit more on this and testing, it looks like we record the
initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
Therefore, we could use that as the gating factor for getFuncs() instead
of FirstNormalObjectId and then I can also modify getCasts() to exclude
pinned casts, which should fix pg_dump against 8.0 and earlier.

I'll start working on a patch for that, please let me know if you see
any huge glaring holes in my thinking.

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Ok, thinking a bit more on this and testing, it looks like we record the
> initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
> Therefore, we could use that as the gating factor for getFuncs() instead
> of FirstNormalObjectId and then I can also modify getCasts() to exclude
> pinned casts, which should fix pg_dump against 8.0 and earlier.

Don't really like that; won't it make the getFuncs query substantially
more expensive?
        regards, tom lane



Re: pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> (Actually, the most likely way in which this would break things is if
>> it started causing built-in casts to get dumped ... have you checked?)

> So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
> in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
> have functions for server versions 7.3-8.0.  Casts which do have a
> function aren't included though, be they user-defined or not, because
> they're excluded by getFuncs() and dumpCast() just punts.

> With my change, pg_dump'ing against 8.0 and earlier will dump out all
> casts, including those with functions, since the function definitions
> will now be pulled in for them by getFuncs().

I poked into that, and you're right --- it wasn't until 8.1 (commit
2193a856a) that we had a hard-and-fast rule that initdb-assigned OIDs
would be less than FirstNormalObjectId.  Before that, the cutoff was
variable and was recorded in pg_database.datlastsysoid.

> What isn't clear to me is what to do about this.  Given the lack of
> anyone complaining, and that this would at least ensure that the
> user-defined casts are dumped, we could just go with this change and
> tell people who are dumping against 8.0 and earlier databases to ignore
> the errors from the extra CREATE CAST commands (they shouldn't hurt
> anything, after all) during the restore.

There's a lot to be said for that.  It dumped too much before, it'll
dump a bit more now, but neither case is fatal.  And it's unlikely
that anybody really cares anymore.

If you do want to do something about this, the way would be to retrieve
datlastsysoid and use that as the cutoff with a pre-8.1 server.  I think
there used to be code to do things that way in pg_dump; we must have
removed it (rather prematurely).
        regards, tom lane



Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Ok, thinking a bit more on this and testing, it looks like we record the
> > initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
> > Therefore, we could use that as the gating factor for getFuncs() instead
> > of FirstNormalObjectId and then I can also modify getCasts() to exclude
> > pinned casts, which should fix pg_dump against 8.0 and earlier.
>
> Don't really like that; won't it make the getFuncs query substantially
> more expensive?

Adding the check against pg_depend doesn't appear to make it much slower
than having the check against FirstNormalObjectId.  Adding either seems
to cause the getFuncs() query to go from ~1.7ms to ~3.4ms for HEAD, at
least on my system, which was also built with debugging and asserts and
all that, so the difference in the field is probably less.

Going back to older versions, the difference drops because we drop the
pg_init_privs logic for pre-9.6, and then drop the check for transforms
in pre-9.5.

All versions are just a few ms from HEAD down to 8.4.

In 8.3 and older, we do start to get slower because we don't use a
Merge Anti Join and instead use a Nested Loop Left Join.  I was getting
about:

8.3 - ~35ms
8.2 - ~44ms
8.1 - ~66ms
8.0 - ~82ms
7.4 - ~64ms
7.3 - ~61ms

Again, these were Assert and debug-enabled builds, though, of course,
that's not going to make up for the lack of anti-join.

I'm really not sure that's worth troubling over though as I am not aware
of anyone with hundreds of 8.3 or earlier databases that they're going
to need a really fast pg_dump to work on.

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> (Actually, the most likely way in which this would break things is if
> >> it started causing built-in casts to get dumped ... have you checked?)
>
> > So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
> > in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
> > have functions for server versions 7.3-8.0.  Casts which do have a
> > function aren't included though, be they user-defined or not, because
> > they're excluded by getFuncs() and dumpCast() just punts.
>
> > With my change, pg_dump'ing against 8.0 and earlier will dump out all
> > casts, including those with functions, since the function definitions
> > will now be pulled in for them by getFuncs().
>
> I poked into that, and you're right --- it wasn't until 8.1 (commit
> 2193a856a) that we had a hard-and-fast rule that initdb-assigned OIDs
> would be less than FirstNormalObjectId.  Before that, the cutoff was
> variable and was recorded in pg_database.datlastsysoid.

Oh, I see.

> > What isn't clear to me is what to do about this.  Given the lack of
> > anyone complaining, and that this would at least ensure that the
> > user-defined casts are dumped, we could just go with this change and
> > tell people who are dumping against 8.0 and earlier databases to ignore
> > the errors from the extra CREATE CAST commands (they shouldn't hurt
> > anything, after all) during the restore.
>
> There's a lot to be said for that.  It dumped too much before, it'll
> dump a bit more now, but neither case is fatal.  And it's unlikely
> that anybody really cares anymore.

Well, yes, but still, if it's not too hard to do...

> If you do want to do something about this, the way would be to retrieve
> datlastsysoid and use that as the cutoff with a pre-8.1 server.  I think
> there used to be code to do things that way in pg_dump; we must have
> removed it (rather prematurely).

That's a good point, we might be doing things wrong in other places in
the code by using FirstNormalObjectId on pre-8.1 servers.

What I suggest then is an independent patch which uses a different
variable than FirstNormalObjectId and sets it correctly based on the
version of database that we're connecting to, and after that's working
correctly for HEAD vs. HEAD-8.0, and 9.6-9.2 vs. 9.6-7.1 (all I was able
to get running within a few hours..  if someone wants to test against
7.0 or earlier that's fine, or if someone can provide a clean patch to
make 7.0 work for me, that's fine too) and after that looks good and is
committed, I'll rebase this work on that.

That said, at least initial testing makes it look like it's still going
to be in the 10s-of-ms on 8.3 and earlier.  Looking at it a bit more, it
looks like part of the problem there is that, for some reason, we're
using a sequential scan inside a nested loop instead of using the
pg_cast_oid_index..  Setting enable_seqscan = false turns that into a
Bitmap Heap Scan which gets it down to only a few ms again.  ANALYZE
doesn't seem to help either, though I'm still not terribly concerned
about this.

Thanks!

Stephen

Re: pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> That's a good point, we might be doing things wrong in other places in
> the code by using FirstNormalObjectId on pre-8.1 servers.

> What I suggest then is an independent patch which uses a different
> variable than FirstNormalObjectId and sets it correctly based on the
> version of database that we're connecting to,

+1

> and after that's working
> correctly for HEAD vs. HEAD-8.0, and 9.6-9.2 vs. 9.6-7.1 (all I was able
> to get running within a few hours..  if someone wants to test against
> 7.0 or earlier that's fine, or if someone can provide a clean patch to
> make 7.0 work for me, that's fine too) and after that looks good and is
> committed, I'll rebase this work on that.

pg_dump never intended to support pre-7.0 servers.  I do have 7.0-7.3
servers in captivity and can do testing if you like.
        regards, tom lane



Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> I have a vague feeling that the code for dumping casts and/or transforms
> >> may have some assumptions that the underlying function is also being
> >> dumped.  Although maybe the assumption was really only what's fixed here,
> >> ie that there be a DumpableObject for the function.  Anyway, take a close
> >> look for that.
>
> > I'll look around and see, though my hunch is that, at some point, we
> > were just pulling all functions and then an optimization was added to
> > exclude pg_catalog and no one noticed that it broke casts using built-in
> > functions.
>
> Nah, that's historical revisionism --- the exclusion for system functions
> is very ancient.  It certainly predates transforms altogether, and
> probably predates the cast-dumping code in anything like its current form.

Apparently, that exclusion goes back to 7.3.

> I think the expectation was that casts using built-in functions were
> also built-in and so needn't be dumped.  There may be a comment about it
> somewhere, which would need to be revised now.

I don't see anything in current code or back to 9.2.  Going back
farther, I do find comments about skipping casts which only refer to
things in pg_* namespaces, which, of course, isn't correct either.

As such, creating a cast like so:

create cast (timestamptz as interval) with function age(timestamptz) as
assignment;

results in a cast which no version of pg_dump will actually dump out of
any PG server version since CREATE CAST was added.  I spent the effort
to get all the way back to 7.1 up and running, tho CREATE CAST wasn't
actually added until 7.3.

> (Actually, the most likely way in which this would break things is if
> it started causing built-in casts to get dumped ... have you checked?)

So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
have functions for server versions 7.3-8.0.  Casts which do have a
function aren't included though, be they user-defined or not, because
they're excluded by getFuncs() and dumpCast() just punts.

With my change, pg_dump'ing against 8.0 and earlier will dump out all
casts, including those with functions, since the function definitions
will now be pulled in for them by getFuncs().

What isn't clear to me is what to do about this.  Given the lack of
anyone complaining, and that this would at least ensure that the
user-defined casts are dumped, we could just go with this change and
tell people who are dumping against 8.0 and earlier databases to ignore
the errors from the extra CREATE CAST commands (they shouldn't hurt
anything, after all) during the restore.

Older versions of pg_dump don't have much to offer us regarding this
case- they didn't dump out user-defined casts which only used components
from pg_catalog either.

Ok, thinking a bit more on this and testing, it looks like we record the
initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
Therefore, we could use that as the gating factor for getFuncs() instead
of FirstNormalObjectId and then I can also modify getCasts() to exclude
pinned casts, which should fix pg_dump against 8.0 and earlier.

I'll start working on a patch for that, please let me know if you see
any huge glaring holes in my thinking.

Thanks!

Stephen

Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Ok, thinking a bit more on this and testing, it looks like we record the
> initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
> Therefore, we could use that as the gating factor for getFuncs() instead
> of FirstNormalObjectId and then I can also modify getCasts() to exclude
> pinned casts, which should fix pg_dump against 8.0 and earlier.

Don't really like that; won't it make the getFuncs query substantially
more expensive?
        regards, tom lane



Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> (Actually, the most likely way in which this would break things is if
>> it started causing built-in casts to get dumped ... have you checked?)

> So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
> in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
> have functions for server versions 7.3-8.0.  Casts which do have a
> function aren't included though, be they user-defined or not, because
> they're excluded by getFuncs() and dumpCast() just punts.

> With my change, pg_dump'ing against 8.0 and earlier will dump out all
> casts, including those with functions, since the function definitions
> will now be pulled in for them by getFuncs().

I poked into that, and you're right --- it wasn't until 8.1 (commit
2193a856a) that we had a hard-and-fast rule that initdb-assigned OIDs
would be less than FirstNormalObjectId.  Before that, the cutoff was
variable and was recorded in pg_database.datlastsysoid.

> What isn't clear to me is what to do about this.  Given the lack of
> anyone complaining, and that this would at least ensure that the
> user-defined casts are dumped, we could just go with this change and
> tell people who are dumping against 8.0 and earlier databases to ignore
> the errors from the extra CREATE CAST commands (they shouldn't hurt
> anything, after all) during the restore.

There's a lot to be said for that.  It dumped too much before, it'll
dump a bit more now, but neither case is fatal.  And it's unlikely
that anybody really cares anymore.

If you do want to do something about this, the way would be to retrieve
datlastsysoid and use that as the cutoff with a pre-8.1 server.  I think
there used to be code to do things that way in pg_dump; we must have
removed it (rather prematurely).
        regards, tom lane



Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Ok, thinking a bit more on this and testing, it looks like we record the
> > initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
> > Therefore, we could use that as the gating factor for getFuncs() instead
> > of FirstNormalObjectId and then I can also modify getCasts() to exclude
> > pinned casts, which should fix pg_dump against 8.0 and earlier.
>
> Don't really like that; won't it make the getFuncs query substantially
> more expensive?

Adding the check against pg_depend doesn't appear to make it much slower
than having the check against FirstNormalObjectId.  Adding either seems
to cause the getFuncs() query to go from ~1.7ms to ~3.4ms for HEAD, at
least on my system, which was also built with debugging and asserts and
all that, so the difference in the field is probably less.

Going back to older versions, the difference drops because we drop the
pg_init_privs logic for pre-9.6, and then drop the check for transforms
in pre-9.5.

All versions are just a few ms from HEAD down to 8.4.

In 8.3 and older, we do start to get slower because we don't use a
Merge Anti Join and instead use a Nested Loop Left Join.  I was getting
about:

8.3 - ~35ms
8.2 - ~44ms
8.1 - ~66ms
8.0 - ~82ms
7.4 - ~64ms
7.3 - ~61ms

Again, these were Assert and debug-enabled builds, though, of course,
that's not going to make up for the lack of anti-join.

I'm really not sure that's worth troubling over though as I am not aware
of anyone with hundreds of 8.3 or earlier databases that they're going
to need a really fast pg_dump to work on.

Thanks!

Stephen

Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> (Actually, the most likely way in which this would break things is if
> >> it started causing built-in casts to get dumped ... have you checked?)
>
> > So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
> > in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
> > have functions for server versions 7.3-8.0.  Casts which do have a
> > function aren't included though, be they user-defined or not, because
> > they're excluded by getFuncs() and dumpCast() just punts.
>
> > With my change, pg_dump'ing against 8.0 and earlier will dump out all
> > casts, including those with functions, since the function definitions
> > will now be pulled in for them by getFuncs().
>
> I poked into that, and you're right --- it wasn't until 8.1 (commit
> 2193a856a) that we had a hard-and-fast rule that initdb-assigned OIDs
> would be less than FirstNormalObjectId.  Before that, the cutoff was
> variable and was recorded in pg_database.datlastsysoid.

Oh, I see.

> > What isn't clear to me is what to do about this.  Given the lack of
> > anyone complaining, and that this would at least ensure that the
> > user-defined casts are dumped, we could just go with this change and
> > tell people who are dumping against 8.0 and earlier databases to ignore
> > the errors from the extra CREATE CAST commands (they shouldn't hurt
> > anything, after all) during the restore.
>
> There's a lot to be said for that.  It dumped too much before, it'll
> dump a bit more now, but neither case is fatal.  And it's unlikely
> that anybody really cares anymore.

Well, yes, but still, if it's not too hard to do...

> If you do want to do something about this, the way would be to retrieve
> datlastsysoid and use that as the cutoff with a pre-8.1 server.  I think
> there used to be code to do things that way in pg_dump; we must have
> removed it (rather prematurely).

That's a good point, we might be doing things wrong in other places in
the code by using FirstNormalObjectId on pre-8.1 servers.

What I suggest then is an independent patch which uses a different
variable than FirstNormalObjectId and sets it correctly based on the
version of database that we're connecting to, and after that's working
correctly for HEAD vs. HEAD-8.0, and 9.6-9.2 vs. 9.6-7.1 (all I was able
to get running within a few hours..  if someone wants to test against
7.0 or earlier that's fine, or if someone can provide a clean patch to
make 7.0 work for me, that's fine too) and after that looks good and is
committed, I'll rebase this work on that.

That said, at least initial testing makes it look like it's still going
to be in the 10s-of-ms on 8.3 and earlier.  Looking at it a bit more, it
looks like part of the problem there is that, for some reason, we're
using a sequential scan inside a nested loop instead of using the
pg_cast_oid_index..  Setting enable_seqscan = false turns that into a
Bitmap Heap Scan which gets it down to only a few ms again.  ANALYZE
doesn't seem to help either, though I'm still not terribly concerned
about this.

Thanks!

Stephen

Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> That's a good point, we might be doing things wrong in other places in
> the code by using FirstNormalObjectId on pre-8.1 servers.

> What I suggest then is an independent patch which uses a different
> variable than FirstNormalObjectId and sets it correctly based on the
> version of database that we're connecting to,

+1

> and after that's working
> correctly for HEAD vs. HEAD-8.0, and 9.6-9.2 vs. 9.6-7.1 (all I was able
> to get running within a few hours..  if someone wants to test against
> 7.0 or earlier that's fine, or if someone can provide a clean patch to
> make 7.0 work for me, that's fine too) and after that looks good and is
> committed, I'll rebase this work on that.

pg_dump never intended to support pre-7.0 servers.  I do have 7.0-7.3
servers in captivity and can do testing if you like.
        regards, tom lane



Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > That's a good point, we might be doing things wrong in other places in
> > the code by using FirstNormalObjectId on pre-8.1 servers.
>
> > What I suggest then is an independent patch which uses a different
> > variable than FirstNormalObjectId and sets it correctly based on the
> > version of database that we're connecting to,
>
> +1

Alright, here we go- patches for every currently supported branch, each
tested from that version back to 7.1, back to 7.3 with a user-defined
CAST, and back to 9.5 with a user-defined TRANSFORM.  Also includes
regression tests for the TAP test structure in master and 9.6.

> pg_dump never intended to support pre-7.0 servers.  I do have 7.0-7.3
> servers in captivity and can do testing if you like.

You are certainly welcome to test and make sure I didn't break anything
for 7.0 servers, but I don't *think* I changed any code paths which have
differences between 7.0 and 7.1 (which I did test against).  That said,
I'm honestly not entirely sure what getCasts() is doing querying out the
"casts" from a 7.1 or 7.0 database.  The query does work, but none of
the "casts" returned have an OID beyond datlastsysoid and I'm not really
sure how to create one or if creating one is really a supported
operation.  If someone was able to create something that getCasts()
would try to dump out, and it used only built-in functions, they will
at least get an error now letting them know that it failed, instead of
having that "cast" silently ignored.

Trying to adjust the query in getFuncs() to account for that case makes
me concerned that we'd actually break more than fix things, and I really
don't like the idea of trying to blindly fix it for 7.0.

Thanks!

Stephen

Attachment

Re: [HACKERS] pg_dump vs. TRANSFORMs

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > That's a good point, we might be doing things wrong in other places in
> > the code by using FirstNormalObjectId on pre-8.1 servers.
>
> > What I suggest then is an independent patch which uses a different
> > variable than FirstNormalObjectId and sets it correctly based on the
> > version of database that we're connecting to,
>
> +1

Done.

Thanks!

Stephen