Re: 64-bit API for large object - Mailing list pgsql-hackers

From Amit kapila
Subject Re: 64-bit API for large object
Date
Msg-id 6C0B27F7206C9E4CA54AE035729E9C38285378EC@szxeml509-mbs
Whole thread Raw
In response to Re: 64-bit API for large object  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: 64-bit API for large object
List pgsql-hackers
>On Sunday, October 07, 2012 5:42 AM Tatsuo Ishii wrote:

>Ok, committed with minor editings(fix header comments in testlo64.c).
>Thank you Kaigai-san for review!

Hello Tatsuo Ishii San,

Today when I tried to build the latest code on my windows m/c, I got few errors from the checkin of this patch.

lo_hton64 (due to -- unint32_t).\src\interfaces\libpq\fe-lobj.c(1049) : error C2065: 'uint32_t' : undeclared identifier
inv_seek (due to   -- MAX_LARGE_OBJECT_SIZE)
\src\backend\storage\large_object\inv_api.c(389) : error C2065: 'LOBLKSIZELL' : undeclared identifier
inv_read ((due to   -- MAX_LARGE_OBJECT_SIZE))
\src\backend\storage\large_object\inv_api.c(441) : error C2065: 'LOBLKSIZELL' : undeclared identifier


It may be some settings problem of my m/c if it is okay on some other windows m/c.


With Regards,
Amit Kapila.

> As a committer, I have looked into the patch and it seems it's good to
> commit. However I want to make a small enhancement in the
> documentation part:
>
> 1) lo_open section needs to mention about new 64bit APIs. Also it
>    should include description about lo_truncate, but this is not 64bit
>    APIs author's fault since it should had been there when lo_truncate
>    was added.
>
> 2) Add mention that 64bit APIs are only available in PostgreSQL 9.3 or
>    later and if the API is requested against older version of servers
>    it will fail.
>
> If there's no objection, I would like commit attached patches.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> Hi Anzai-san,
>>
>> The latest patch is fair enough for me, so let me hand over its reviewing
>> for comitters.
>>
>> Thanks,
>>
>> 2012/10/1 Nozomi Anzai <anzai@sraoss.co.jp>:
>>> Here is 64-bit API for large object version 3 patch.
>>>
>>>> I checked this patch. It looks good, but here are still some points to be
>>>> discussed.
>>>>
>>>> * I have a question. What is the meaning of INT64_IS_BUSTED?
>>>>   It seems to me a marker to indicate a platform without 64bit support.
>>>>   However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
>>>>   says as follows:
>>>>   | Remove all the special-case code for INT64_IS_BUSTED, per decision that
>>>>   | we're not going to support that anymore.
>>>
>>> Removed INT64_IS_BUSTED.
>>>
>>>
>>>> * At inv_seek(), it seems to me it checks offset correctness with wrong way,
>>>>   as follows:
>>>> |          case SEEK_SET:
>>>> |              if (offset < 0)
>>>> |                  elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
>>>> |              obj_desc->offset = offset;
>>>> |              break;
>>>>   It is a right assumption, if large object size would be restricted to 2GB.
>>>>   But the largest positive int64 is larger than expected limitation.
>>>>   So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
>>>>   instead.
>>>
>>> Fixed.
>>>
>>>
>>>> * At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
>>>>   In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
>>>>   will break head of the large object because of "pageno" overflow.
>>>
>>> Added a such check.
>>>
>>>
>>>> * Please also add checks on inv_read() to prevent LargeObjectDesc->offset
>>>>   unexpectedly overflows 4TB boundary.
>>>
>>> Added a such check.
>>>
>>>
>>>> * At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
>>>>   change? All its usage is to store the result of "len % LOBLKSIZE".
>>>
>>> Fixed and back to int32.
>>>
>>>
>>>> Thanks,
>>>>
>>>> 2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:
>>>> > Here is 64-bit API for large object version 2 patch.
>>>> >
>>>> >> I checked this patch. It can be applied onto the latest master branch
>>>> >> without any problems. My comments are below.
>>>> >>
>>>> >> 2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:
>>>> >> > Ok, here is the patch to implement 64-bit API for large object, to
>>>> >> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
>>>> >> > 32KB). The patch is based on Jeremy Drake's patch posted on September
>>>> >> > 23, 2005
>>>> >> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
>>>> >> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
>>>> >> > for the backend part and Yugo Nagata for the rest(including
>>>> >> > documentation patch).
>>>> >> >
>>>> >> > Here are changes made in the patch:
>>>> >> >
>>>> >> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
>>>> >> >
>>>> >> > lo_initialize() gathers backend 64-bit large object handling
>>>> >> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.
>>>> >> >
>>>> >> > If client calls lo_*64 functions and backend does not support them,
>>>> >> > lo_*64 functions return error to caller. There might be an argument
>>>> >> > since calls to lo_*64 functions can automatically be redirected to
>>>> >> > 32-bit older API. I don't know this is worth the trouble though.
>>>> >> >
>>>> >> I think it should definitely return an error code when user tries to
>>>> >> use lo_*64 functions towards the backend v9.2 or older, because
>>>> >> fallback to 32bit API can raise unexpected errors if application
>>>> >> intends to seek the area over than 2GB.
>>>> >>
>>>> >> > Currently lo_initialize() throws an error if one of oids are not
>>>> >> > available. I doubt we do the same way for 64-bit functions since this
>>>> >> > will make 9.3 libpq unable to access large objects stored in pre-9.2
>>>> >> > PostgreSQL servers.
>>>> >> >
>>>> >> It seems to me the situation to split the case of pre-9.2 and post-9.3
>>>> >> using a condition of "conn->sversion >= 90300".
>>>> >
>>>> > Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.
>>>> >
>>>> >
>>>> >> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
>>>> >> > is a pointer to 64-bit integer and actual data is placed somewhere
>>>> >> > else. There might be other way: add new member to union u to store
>>>> >> > 64-bit integer:
>>>> >> >
>>>> >> > typedef struct
>>>> >> > {
>>>> >> >         int                     len;
>>>> >> >         int                     isint;
>>>> >> >         union
>>>> >> >         {
>>>> >> >                 int                *ptr;                /* can't use void (dec compiler barfs)   */
>>>> >> >                 int                     integer;
>>>> >> >                 int64           bigint;         /* 64-bit integer */
>>>> >> >         }                       u;
>>>> >> > } PQArgBlock;
>>>> >> >
>>>> >> > I'm a little bit worried about this way because PQArgBlock is a public
>>>> >> > interface.
>>>> >> >
>>>> >> I'm inclined to add a new field for the union; that seems to me straight
>>>> >> forward approach.
>>>> >> For example, the manner in lo_seek64() seems to me confusable.
>>>> >> It set 1 on "isint" field even though pointer is delivered actually.
>>>> >>
>>>> >> +       argv[1].isint = 1;
>>>> >> +       argv[1].len = 8;
>>>> >> +       argv[1].u.ptr = (int *) &len;
>>>> >
>>>> > Your proposal was not adopted per discussion.
>>>> >
>>>> >
>>>> >> > Also we add new type "pg_int64":
>>>> >> >
>>>> >> > #ifndef NO_PG_INT64
>>>> >> > #define HAVE_PG_INT64 1
>>>> >> > typedef long long int pg_int64;
>>>> >> > #endif
>>>> >> >
>>>> >> > in postgres_ext.h per suggestion from Tom Lane:
>>>> >> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
>>>> >> >
>>>> >> I'm uncertain about context of this discussion.
>>>> >>
>>>> >> Does it make matter if we include <stdint.h> and use int64_t instead
>>>> >> of the self defined data type?
>>>> >
>>>> > Your proposal was not adopted per discussion.
>>>> > Per discussion, endiannness translation was moved to fe-lobj.c.
>>>> >
>>>> >
>>>> >> > 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)
>>>> >> >
>>>> >> > Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
>>>> >> > 64-bit seek position and data length. loread64 and lowrite64 are not
>>>> >> > added because if a program tries to read/write more than 2GB at once,
>>>> >> > it would be a sign that the program need to be re-designed anyway.
>>>> >> >
>>>> >> I think it is a reasonable.
>>>> >>
>>>> >> > 3) Backend inv_api.c functions(Nozomi Anzai)
>>>> >> >
>>>> >> > No need to add new functions. Just extend them to handle 64-bit data.
>>>> >> >
>>>> >> > BTW , what will happen if older 32-bit libpq accesses large objects
>>>> >> > over 2GB?
>>>> >> >
>>>> >> > lo_read and lo_write: they can read or write lobjs using 32-bit API as
>>>> >> > long as requested read/write data length is smaller than 2GB. So I
>>>> >> > think we can safely allow them to access over 2GB lobjs.
>>>> >> >
>>>> >> > lo_lseek: again as long as requested offset is smaller than 2GB, there
>>>> >> > would be no problem.
>>>> >> >
>>>> >> > lo_tell:if current seek position is beyond 2GB, returns an error.
>>>> >> >
>>>> >> Even though iteration of lo_lseek() may move the offset to 4TB, it also
>>>> >> makes unavailable to use lo_tell() to obtain the current offset, so I think
>>>> >> it is reasonable behavior.
>>>> >>
>>>> >> However, error code is not an appropriate one.
>>>> >>
>>>> >> +       if (INT_MAX < offset)
>>>> >> +       {
>>>> >> +               ereport(ERROR,
>>>> >> +                               (errcode(ERRCODE_UNDEFINED_OBJECT),
>>>> >> +                                errmsg("invalid large-object
>>>> >> descriptor: %d", fd)));
>>>> >> +               PG_RETURN_INT32(-1);
>>>> >> +       }
>>>> >>
>>>> >> According to the manpage of lseek(2)
>>>> >>     EOVERFLOW
>>>> >>         The resulting file offset cannot be represented in an off_t.
>>>> >>
>>>> >> Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.
>>>> >
>>>> > Changed the error code and error message. We added a new error code
>>>> > "ERRCODE_UNDEFINED_OBJECT (22P07)".
>>>> >
>>>> >
>>>> >> > 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
>>>> >> >
>>>> >> > Comments and suggestions are welcome.
>>>> >> >
>>>> >> miscellaneous comments are below.
>>>> >>
>>>> >> Regression test is helpful. Even though no need to try to create 4TB large
>>>> >> object, it is helpful to write some chunks around the design boundary.
>>>> >> Could you add some test cases that writes some chunks around 4TB offset.
>>>> >
>>>> > Added 64-bit lobj test items into regression test and confirmed it worked
>>>> > rightly.
>>>> >
>>>> >
>>>> >> Thanks,
>>>> >> --
>>>> >> KaiGai Kohei <kaigai@kaigai.gr.jp>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>>> >> To make changes to your subscription:
>>>> >> http://www.postgresql.org/mailpref/pgsql-hackers
>>>> >
>>>> >
>>>> > --
>>>> > Nozomi Anzai
>>>> > SRA OSS, Inc. Japan
>>>> >
>>>> >
>>>> > --
>>>> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>>> > To make changes to your subscription:
>>>> > http://www.postgresql.org/mailpref/pgsql-hackers
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>>
>>> --
>>> Nozomi Anzai
>>> SRA OSS, Inc. Japan
>>>
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>
>>
>>
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



pgsql-hackers by date:

Previous
From: Atri Sharma
Date:
Subject: Re: Regarding identifying a foreign scan
Next
From: Tatsuo Ishii
Date:
Subject: Re: 64-bit API for large object