Thread: lo_create(oid, bytea) breaks every extant release of libpq

lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
While investigating a different issue, I was astonished to find that
pg_restore in HEAD is incapable of restoring dumps containing large
objects: it fails with messages like

pg_restore: [archiver] could not create large object 100000: ERROR:  function call message contains 1 arguments but
functionrequires 2
 

After some investigation, I find that:

1. Commit c50b7c09d added a function lo_create(oid, bytea).

2. libpq's lo_initialize function, in fe-lobj.c, assumes that
the pg_catalog schema will contain *only one* function named
lo_create (and likewise for lo_open and a dozen other names).
With more than one lo_create function, it's luck of the draw
which one's OID ends up in the PGlobjfuncs struct.  If it's
the wrong one, subsequent attempts to use libpq's lo_create()
function fail as above.

While there's certainly a good argument to be made for making
lo_initialize do that query differently, there's no way that we
can fix every copy of libpq that's in the field.  I think we have to
consider that "there can be only one lo_create" is effectively part of
the protocol spec for the foreseeable future.  (It'd be easy enough
to add a check in the opr_sanity regression test to catch violations
of this rule.)

It's also extremely unfortunate that the regression tests don't
create (or at least don't leave behind) any large objects, as we
might then have possibly caught this bug much earlier.

Meanwhile, we have to either revert the addition of lo_create(oid,
bytea) altogether, or choose a different name for it.  Suggestions?
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tatsuo Ishii
Date:
> Meanwhile, we have to either revert the addition of lo_create(oid,
> bytea) altogether, or choose a different name for it.  Suggestions?

I wonder if there's any use case where we need to store bytea into
large objects. Don't we already have bytea data type? If the use case
is for large data which does not fit in a tuple, I am afraid that the
query string could become extremely big one.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Noah Misch
Date:
On Wed, Jun 11, 2014 at 11:57:20PM -0400, Tom Lane wrote:
> While there's certainly a good argument to be made for making
> lo_initialize do that query differently, there's no way that we
> can fix every copy of libpq that's in the field.  I think we have to
> consider that "there can be only one lo_create" is effectively part of
> the protocol spec for the foreseeable future.  (It'd be easy enough
> to add a check in the opr_sanity regression test to catch violations
> of this rule.)
> 
> It's also extremely unfortunate that the regression tests don't
> create (or at least don't leave behind) any large objects, as we
> might then have possibly caught this bug much earlier.

Agreed.

> Meanwhile, we have to either revert the addition of lo_create(oid,
> bytea) altogether, or choose a different name for it.  Suggestions?

lo_new() or lo_make()?  An earlier draft of the patch that added
lo_create(oid, bytea) had a similar function named make_lo().

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Pavel Stehule
Date:



2014-06-12 6:22 GMT+02:00 Tatsuo Ishii <ishii@postgresql.org>:
> Meanwhile, we have to either revert the addition of lo_create(oid,
> bytea) altogether, or choose a different name for it.  Suggestions?

I wonder if there's any use case where we need to store bytea into
large objects. Don't we already have bytea data type? If the use case
is for large data which does not fit in a tuple, I am afraid that the
query string could become extremely big one.

I know a one use case - and I used it in my one application. For one my customer I wrote a application that ensures a data change between two independent subjects based on XML. It was relative simple architecture - applications solved a communication, and stored procedures solved a content - good success was a using of SQL/XML. After few years the communication protocol was enhanced about attached tiff scans - serialized in base64 in result XML doc. I had to quickly fix a this application with minimal impacts to others applications. And LO API is perfect for transporting binary data from/to database. But next I needed a functions for conversion between bytea and LO.

Regards

Pavel
 

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Wed, Jun 11, 2014 at 11:57:20PM -0400, Tom Lane wrote:
>> Meanwhile, we have to either revert the addition of lo_create(oid,
>> bytea) altogether, or choose a different name for it.  Suggestions?

> lo_new() or lo_make()?  An earlier draft of the patch that added
> lo_create(oid, bytea) had a similar function named make_lo().

I think we want to stick to the lo_xxx naming convention, whatever
xxx ends up being.

I was idly thinking that we might want to focus on the fact that this
function not only creates a LO but loads some data into it.  lo_make
isn't too bad, but we could also consider lo_load, lo_import, etc.
(lo_import is not one of the names we have to avoid overloading.
OTOH, there's already a 2-argument form of it, so maybe there'd be
issues with resolving calls with unknown-literal arguments.)
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Pavel Stehule
Date:



2014-06-12 6:54 GMT+02:00 Noah Misch <noah@leadboat.com>:
On Wed, Jun 11, 2014 at 11:57:20PM -0400, Tom Lane wrote:
> While there's certainly a good argument to be made for making
> lo_initialize do that query differently, there's no way that we
> can fix every copy of libpq that's in the field.  I think we have to
> consider that "there can be only one lo_create" is effectively part of
> the protocol spec for the foreseeable future.  (It'd be easy enough
> to add a check in the opr_sanity regression test to catch violations
> of this rule.)
>
> It's also extremely unfortunate that the regression tests don't
> create (or at least don't leave behind) any large objects, as we
> might then have possibly caught this bug much earlier.

Agreed.

> Meanwhile, we have to either revert the addition of lo_create(oid,
> bytea) altogether, or choose a different name for it.  Suggestions?

lo_new() or lo_make()?  An earlier draft of the patch that added
lo_create(oid, bytea) had a similar function named make_lo().

+1 for lo_new

Regards

Pavel

 

Thanks,
nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Pavel Stehule
Date:



2014-06-12 7:08 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Noah Misch <noah@leadboat.com> writes:
> On Wed, Jun 11, 2014 at 11:57:20PM -0400, Tom Lane wrote:
>> Meanwhile, we have to either revert the addition of lo_create(oid,
>> bytea) altogether, or choose a different name for it.  Suggestions?

> lo_new() or lo_make()?  An earlier draft of the patch that added
> lo_create(oid, bytea) had a similar function named make_lo().

I think we want to stick to the lo_xxx naming convention, whatever
xxx ends up being.

I was idly thinking that we might want to focus on the fact that this
function not only creates a LO but loads some data into it.  lo_make
isn't too bad, but we could also consider lo_load, lo_import, etc.
(lo_import is not one of the names we have to avoid overloading.
OTOH, there's already a 2-argument form of it, so maybe there'd be
issues with resolving calls with unknown-literal arguments.)


I have not any problem with lo_new, lo_make. lo_import is related to import from host system. I am not sure about lo_load, but I am not able to specify arguments why not.

Pavel
 
                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
>> lo_new() or lo_make()?  An earlier draft of the patch that added
>> lo_create(oid, bytea) had a similar function named make_lo().

It appears that lo_make() has a small plurality, but before we lock
that name in, there was one other idea that occurred to me: the
underlying C function is currently named lo_create_bytea(), and
that seems like not an awful choice for the SQL name either.

Any other votes out there?
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Andres Freund
Date:
On 2014-06-12 10:48:01 -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> >> lo_new() or lo_make()?  An earlier draft of the patch that added
> >> lo_create(oid, bytea) had a similar function named make_lo().
> 
> It appears that lo_make() has a small plurality, but before we lock
> that name in, there was one other idea that occurred to me: the
> underlying C function is currently named lo_create_bytea(), and
> that seems like not an awful choice for the SQL name either.
> 
> Any other votes out there?

I prefer lo_create_bytea() or even lo_create_from_bytea(),
lo_from_bytea().

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> >> lo_new() or lo_make()?  An earlier draft of the patch that added
> >> lo_create(oid, bytea) had a similar function named make_lo().
> 
> It appears that lo_make() has a small plurality, but before we lock
> that name in, there was one other idea that occurred to me: the
> underlying C function is currently named lo_create_bytea(), and
> that seems like not an awful choice for the SQL name either.
> 
> Any other votes out there?

I was also going to suggest lo_create_bytea().  Another similar
possibility would be lo_from_bytea() -- or, since we have overloading
(and we can actually use it in this case without breaking libpq), we
could just have lo_from(oid, bytea).

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Robert Haas
Date:
On Thu, Jun 12, 2014 at 10:56 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Tom Lane wrote:
>> Noah Misch <noah@leadboat.com> writes:
>> >> lo_new() or lo_make()?  An earlier draft of the patch that added
>> >> lo_create(oid, bytea) had a similar function named make_lo().
>>
>> It appears that lo_make() has a small plurality, but before we lock
>> that name in, there was one other idea that occurred to me: the
>> underlying C function is currently named lo_create_bytea(), and
>> that seems like not an awful choice for the SQL name either.
>>
>> Any other votes out there?
>
> I was also going to suggest lo_create_bytea().

That sounds good to me, too.

Presumably we should also fix libpq to not be so dumb.  I mean, it
doesn't help with the immediate problem, since as you say there could
be non-upgraded copies of libpq out there for the indefinite future,
but it still seems like something we oughta fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Any other votes out there?

> I was also going to suggest lo_create_bytea().  Another similar
> possibility would be lo_from_bytea() -- or, since we have overloading
> (and we can actually use it in this case without breaking libpq), we
> could just have lo_from(oid, bytea).

Andres also mentioned lo_from_bytea(), and I kinda like it too.
I don't like plain lo_from(), as it's not too apparent what it's
supposed to get data "from" --- someone might think the second
parameter was supposed to be a file name a la lo_import(),
for example.
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Pavel Stehule
Date:
<p dir="ltr">Lo_from_bytea sounds me better than lo_create_bytea

Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Presumably we should also fix libpq to not be so dumb.  I mean, it
> doesn't help with the immediate problem, since as you say there could
> be non-upgraded copies of libpq out there for the indefinite future,
> but it still seems like something we oughta fix.

It's been in the back of my mind for awhile that doing a dynamic query at
all here is pretty pointless.  It's not like the OIDs of those functions
ever have or ever will move.  It would probably be more robust if we
just let libpq be a consumer of fmgroids.h and refer directly to the
constants F_LO_CREATE etc.

I think there was some previous discussion about this, possibly tied
to also having a better-defined way to let clients depend on hard-wired
type OIDs, but I'm too lazy to search the archives right now.
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
I wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> I was also going to suggest lo_create_bytea().  Another similar
>> possibility would be lo_from_bytea() -- or, since we have overloading
>> (and we can actually use it in this case without breaking libpq), we
>> could just have lo_from(oid, bytea).

> Andres also mentioned lo_from_bytea(), and I kinda like it too.
> I don't like plain lo_from(), as it's not too apparent what it's
> supposed to get data "from" --- someone might think the second
> parameter was supposed to be a file name a la lo_import(),
> for example.

Since the discussion seems to have trailed off, I'm going to run with
lo_from_bytea().  The plan is:

1. Rename the function.
2. Add an opr_sanity regression test memorializing what we should get
from lo_initialize()'s query.
3. Make sure that the regression tests leave behind a few large objects,
so that testing of pg_dump/pg_restore using the regression database
will exercise the large-object code paths.

It'd be a good thing if the TAP tests for client programs included
testing of pg_dump/pg_restore, but that's a bit beyond my competence
with that tool ... anyone care to step up?

Redoing or getting rid of lo_initialize()'s query would be a good thing
too; but that does not seem like material for back-patching into 9.4,
while I think all the above are.
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Noah Misch
Date:
On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote:
> Since the discussion seems to have trailed off, I'm going to run with
> lo_from_bytea().  The plan is:
> 
> 1. Rename the function.
> 2. Add an opr_sanity regression test memorializing what we should get
> from lo_initialize()'s query.
> 3. Make sure that the regression tests leave behind a few large objects,
> so that testing of pg_dump/pg_restore using the regression database
> will exercise the large-object code paths.

Sounds good.

> It'd be a good thing if the TAP tests for client programs included
> testing of pg_dump/pg_restore, but that's a bit beyond my competence
> with that tool ... anyone care to step up?

The pg_upgrade test suite covers this well.

> Redoing or getting rid of lo_initialize()'s query would be a good thing
> too; but that does not seem like material for back-patching into 9.4,
> while I think all the above are.

I agree all the above make sense for 9.4.  I also wouldn't object to a
hardening of lo_initialize() sneaking in at this stage.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote:
>> It'd be a good thing if the TAP tests for client programs included
>> testing of pg_dump/pg_restore, but that's a bit beyond my competence
>> with that tool ... anyone care to step up?

> The pg_upgrade test suite covers this well.

Um, not really: what pg_upgrade exercises is "pg_dump -s" which entirely
fails to cover the data-transfer code paths.  It would not have found
this problem.

BTW, after further testing I realized that it was quite accidental that
I found it either.  pg_restore only uses libpq's lo_create() function
when restoring an "old_blob_style" archive, ie one generated by 8.4
or earlier.  That's what I happened to try to do last night, but it's
pure luck that I did.

Poking around with making the largeobject regression test leave
some stuff behind, I found out that:

1. That regression test includes the text of a Robert Frost poem that
AFAICT is still under copyright.  I think we'd better replace it with
something by someone a bit more safely dead.

2. I tried to add a COMMENT ON LARGE OBJECT to one of the not-removed
blobs.  While pg_upgrade didn't fail on transferring the blob, it
*did* fail to transfer the comment on it, which seems like a bug.
Bruce, have you got any idea how to fix that?
        regards, tom lane



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Noah Misch
Date:
On Thu, Jun 12, 2014 at 02:53:23PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote:
> >> It'd be a good thing if the TAP tests for client programs included
> >> testing of pg_dump/pg_restore, but that's a bit beyond my competence
> >> with that tool ... anyone care to step up?
> 
> > The pg_upgrade test suite covers this well.
> 
> Um, not really: what pg_upgrade exercises is "pg_dump -s" which entirely
> fails to cover the data-transfer code paths.  It would not have found
> this problem.

I see.  TAP suite coverage for a data-included dump/restore would be worth its
weight, agreed.

> BTW, after further testing I realized that it was quite accidental that
> I found it either.  pg_restore only uses libpq's lo_create() function
> when restoring an "old_blob_style" archive, ie one generated by 8.4
> or earlier.  That's what I happened to try to do last night, but it's
> pure luck that I did.

That could have easily remained undiscovered until release day.  Not good.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: lo_create(oid, bytea) breaks every extant release of libpq

From
Tom Lane
Date:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Presumably we should also fix libpq to not be so dumb.  I mean, it
>> doesn't help with the immediate problem, since as you say there could
>> be non-upgraded copies of libpq out there for the indefinite future,
>> but it still seems like something we oughta fix.

> It's been in the back of my mind for awhile that doing a dynamic query at
> all here is pretty pointless.  It's not like the OIDs of those functions
> ever have or ever will move.  It would probably be more robust if we
> just let libpq be a consumer of fmgroids.h and refer directly to the
> constants F_LO_CREATE etc.

I thought a bit more about this.  Ignore for the moment the larger
question of whether we want to consider fmgroids.h as something we'd
export to clients outside the immediate core infrastructure; that
will definitely take more thought than we can expend if we want to
slip this into 9.4.  It still seems reasonable for libpq to use it.
The actual code changes in fe-lobj.c are trivial enough (and would
consist mostly of code removal).  We would need to deal with the fact
that some of the support functions are not present in older backends,
but I think testing PQserverVersion is adequate for that purpose.

The hard part seems to be making sure that fmgroids.h is available to
reference, since it's a generated file and not guaranteed to be there
a-priori.  In a gmake-driven build I have the skillz to deal with that,
but I am not sure what to do in the various Windows build systems,
especially for the client-code-only build methods.  The path of least
resistance would be to just assume fmgroids.h is available, which would
work fine when building from a tarball, or probably when doing a full
build including backend (MSVC builds aren't parallel are they?).  But
what about a client-only build?
        regards, tom lane