Re: 64-bit API for large object - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: 64-bit API for large object
Date
Msg-id 20120930.113817.1476465722250402343.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: 64-bit API for large object  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Responses Re: 64-bit API for large object
List pgsql-hackers
Kaiai-san,

Thank you for review.

> I checked this patch. It looks good, but here are still some points to be
> discussed.
> 
> * I have a question. What is the meaning of INT64_IS_BUSTED?
>   It seems to me a marker to indicate a platform without 64bit support.
>   However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
>   says as follows:
>   | Remove all the special-case code for INT64_IS_BUSTED, per decision that
>   | we're not going to support that anymore.

Agreed.

> * At inv_seek(), it seems to me it checks offset correctness with wrong way,
>   as follows:
> |          case SEEK_SET:
> |              if (offset < 0)
> |                  elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
> |              obj_desc->offset = offset;
> |              break;
>   It is a right assumption, if large object size would be restricted to 2GB.
>   But the largest positive int64 is larger than expected limitation.
>   So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
>   instead.

Point taken. However, checking offset < 0 seems to be still valid
because it is possible to pass minus offset to inv_seek(), no?  Also I
think upper limit for seek position should be defined as (INT_MAX *
LOBLKSIZE), rather than (INT_MAX * PAGE_SIZE). Probably (INT_MAX *
LOBLKSIZE) should be defined in pg_largeobject.h as:

/** Maximum byte length for each large object
*/
#define MAX_LARGE_OBJECT_SIZE    INT64CONST(INT_MAX * LOBLKSIZE)

Then the checking offset in inv_seek() will be:
    case SEEK_SET:        if (offset < 0 || offset >= MAX_LARGE_OBJECT_SIZE)            elog(ERROR, "invalid seek
offset:" INT64_FORMAT, offset);        obj_desc->offset = offset;        break;    case SEEK_CUR:        if ((offset +
obj_desc->offset)< 0 ||           (offset + obj_desc->offset) >= MAX_LARGE_OBJECT_SIZE)            elog(ERROR, "invalid
seekoffset: " INT64_FORMAT, offset);        obj_desc->offset += offset;        break;    case SEEK_END:        {
   int64        pos = inv_getsize(obj_desc) + offset;
 
            if (pos < 0 || pos >= MAX_LARGE_OBJECT_SIZE)                elog(ERROR, "invalid seek offset: "
INT64_FORMAT,offset);            obj_desc->offset = pos;        }
 

What do you think?

> * At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
>   In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
>   will break head of the large object because of "pageno" overflow.

Ok. I will add checking:
if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)    elog(ERROR, "invalid write request size: %d", nbytes);

> * Please also add checks on inv_read() to prevent LargeObjectDesc->offset
>   unexpectedly overflows 4TB boundary.

Ok. I will add checking:
if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)    elog(ERROR, "invalid read request size: %d", nbytes);

> * At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
>   change? All its usage is to store the result of "len % LOBLKSIZE".

Your point is correct. Back to int32.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Thanks,
> 
> 2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:
>> Here is 64-bit API for large object version 2 patch.
>>
>>> I checked this patch. It can be applied onto the latest master branch
>>> without any problems. My comments are below.
>>>
>>> 2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:
>>> > Ok, here is the patch to implement 64-bit API for large object, to
>>> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
>>> > 32KB). The patch is based on Jeremy Drake's patch posted on September
>>> > 23, 2005
>>> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
>>> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
>>> > for the backend part and Yugo Nagata for the rest(including
>>> > documentation patch).
>>> >
>>> > Here are changes made in the patch:
>>> >
>>> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
>>> >
>>> > lo_initialize() gathers backend 64-bit large object handling
>>> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.
>>> >
>>> > If client calls lo_*64 functions and backend does not support them,
>>> > lo_*64 functions return error to caller. There might be an argument
>>> > since calls to lo_*64 functions can automatically be redirected to
>>> > 32-bit older API. I don't know this is worth the trouble though.
>>> >
>>> I think it should definitely return an error code when user tries to
>>> use lo_*64 functions towards the backend v9.2 or older, because
>>> fallback to 32bit API can raise unexpected errors if application
>>> intends to seek the area over than 2GB.
>>>
>>> > Currently lo_initialize() throws an error if one of oids are not
>>> > available. I doubt we do the same way for 64-bit functions since this
>>> > will make 9.3 libpq unable to access large objects stored in pre-9.2
>>> > PostgreSQL servers.
>>> >
>>> It seems to me the situation to split the case of pre-9.2 and post-9.3
>>> using a condition of "conn->sversion >= 90300".
>>
>> Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.
>>
>>
>>> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
>>> > is a pointer to 64-bit integer and actual data is placed somewhere
>>> > else. There might be other way: add new member to union u to store
>>> > 64-bit integer:
>>> >
>>> > typedef struct
>>> > {
>>> >         int                     len;
>>> >         int                     isint;
>>> >         union
>>> >         {
>>> >                 int                *ptr;                /* can't use void (dec compiler barfs)   */
>>> >                 int                     integer;
>>> >                 int64           bigint;         /* 64-bit integer */
>>> >         }                       u;
>>> > } PQArgBlock;
>>> >
>>> > I'm a little bit worried about this way because PQArgBlock is a public
>>> > interface.
>>> >
>>> I'm inclined to add a new field for the union; that seems to me straight
>>> forward approach.
>>> For example, the manner in lo_seek64() seems to me confusable.
>>> It set 1 on "isint" field even though pointer is delivered actually.
>>>
>>> +       argv[1].isint = 1;
>>> +       argv[1].len = 8;
>>> +       argv[1].u.ptr = (int *) &len;
>>
>> Your proposal was not adopted per discussion.
>>
>>
>>> > Also we add new type "pg_int64":
>>> >
>>> > #ifndef NO_PG_INT64
>>> > #define HAVE_PG_INT64 1
>>> > typedef long long int pg_int64;
>>> > #endif
>>> >
>>> > in postgres_ext.h per suggestion from Tom Lane:
>>> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
>>> >
>>> I'm uncertain about context of this discussion.
>>>
>>> Does it make matter if we include <stdint.h> and use int64_t instead
>>> of the self defined data type?
>>
>> Your proposal was not adopted per discussion.
>> Per discussion, endiannness translation was moved to fe-lobj.c.
>>
>>
>>> > 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)
>>> >
>>> > Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
>>> > 64-bit seek position and data length. loread64 and lowrite64 are not
>>> > added because if a program tries to read/write more than 2GB at once,
>>> > it would be a sign that the program need to be re-designed anyway.
>>> >
>>> I think it is a reasonable.
>>>
>>> > 3) Backend inv_api.c functions(Nozomi Anzai)
>>> >
>>> > No need to add new functions. Just extend them to handle 64-bit data.
>>> >
>>> > BTW , what will happen if older 32-bit libpq accesses large objects
>>> > over 2GB?
>>> >
>>> > lo_read and lo_write: they can read or write lobjs using 32-bit API as
>>> > long as requested read/write data length is smaller than 2GB. So I
>>> > think we can safely allow them to access over 2GB lobjs.
>>> >
>>> > lo_lseek: again as long as requested offset is smaller than 2GB, there
>>> > would be no problem.
>>> >
>>> > lo_tell:if current seek position is beyond 2GB, returns an error.
>>> >
>>> Even though iteration of lo_lseek() may move the offset to 4TB, it also
>>> makes unavailable to use lo_tell() to obtain the current offset, so I think
>>> it is reasonable behavior.
>>>
>>> However, error code is not an appropriate one.
>>>
>>> +       if (INT_MAX < offset)
>>> +       {
>>> +               ereport(ERROR,
>>> +                               (errcode(ERRCODE_UNDEFINED_OBJECT),
>>> +                                errmsg("invalid large-object
>>> descriptor: %d", fd)));
>>> +               PG_RETURN_INT32(-1);
>>> +       }
>>>
>>> According to the manpage of lseek(2)
>>>     EOVERFLOW
>>>         The resulting file offset cannot be represented in an off_t.
>>>
>>> Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.
>>
>> Changed the error code and error message. We added a new error code
>> "ERRCODE_UNDEFINED_OBJECT (22P07)".
>>
>>
>>> > 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
>>> >
>>> > Comments and suggestions are welcome.
>>> >
>>> miscellaneous comments are below.
>>>
>>> Regression test is helpful. Even though no need to try to create 4TB large
>>> object, it is helpful to write some chunks around the design boundary.
>>> Could you add some test cases that writes some chunks around 4TB offset.
>>
>> Added 64-bit lobj test items into regression test and confirmed it worked
>> rightly.
>>
>>
>>> Thanks,
>>> --
>>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>>>
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>> --
>> Nozomi Anzai
>> SRA OSS, Inc. Japan
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
> 
> 
> 
> -- 
> KaiGai Kohei <kaigai@kaigai.gr.jp>
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: 64-bit API for large object
Next
From: Darren Duncan
Date:
Subject: Re: is JSON really "a type" (Re: data to json enhancements)