Re: 64-bit API for large object - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: 64-bit API for large object |
Date | |
Msg-id | 20120930.113817.1476465722250402343.t-ishii@sraoss.co.jp Whole thread Raw |
In response to | Re: 64-bit API for large object (Kohei KaiGai <kaigai@kaigai.gr.jp>) |
Responses |
Re: 64-bit API for large object
|
List | pgsql-hackers |
Kaiai-san, Thank you for review. > I checked this patch. It looks good, but here are still some points to be > discussed. > > * I have a question. What is the meaning of INT64_IS_BUSTED? > It seems to me a marker to indicate a platform without 64bit support. > However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce > says as follows: > | Remove all the special-case code for INT64_IS_BUSTED, per decision that > | we're not going to support that anymore. Agreed. > * At inv_seek(), it seems to me it checks offset correctness with wrong way, > as follows: > | case SEEK_SET: > | if (offset < 0) > | elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset); > | obj_desc->offset = offset; > | break; > It is a right assumption, if large object size would be restricted to 2GB. > But the largest positive int64 is larger than expected limitation. > So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE) > instead. Point taken. However, checking offset < 0 seems to be still valid because it is possible to pass minus offset to inv_seek(), no? Also I think upper limit for seek position should be defined as (INT_MAX * LOBLKSIZE), rather than (INT_MAX * PAGE_SIZE). Probably (INT_MAX * LOBLKSIZE) should be defined in pg_largeobject.h as: /** Maximum byte length for each large object */ #define MAX_LARGE_OBJECT_SIZE INT64CONST(INT_MAX * LOBLKSIZE) Then the checking offset in inv_seek() will be: case SEEK_SET: if (offset < 0 || offset >= MAX_LARGE_OBJECT_SIZE) elog(ERROR, "invalid seek offset:" INT64_FORMAT, offset); obj_desc->offset = offset; break; case SEEK_CUR: if ((offset + obj_desc->offset)< 0 || (offset + obj_desc->offset) >= MAX_LARGE_OBJECT_SIZE) elog(ERROR, "invalid seekoffset: " INT64_FORMAT, offset); obj_desc->offset += offset; break; case SEEK_END: { int64 pos = inv_getsize(obj_desc) + offset; if (pos < 0 || pos >= MAX_LARGE_OBJECT_SIZE) elog(ERROR, "invalid seek offset: " INT64_FORMAT,offset); obj_desc->offset = pos; } What do you think? > * At inv_write(), it definitely needs a check to prevent data-write upper 4TB. > In case when obj_desc->offset is a bit below 4TB, an additional 1GB write > will break head of the large object because of "pageno" overflow. Ok. I will add checking: if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE) elog(ERROR, "invalid write request size: %d", nbytes); > * Please also add checks on inv_read() to prevent LargeObjectDesc->offset > unexpectedly overflows 4TB boundary. Ok. I will add checking: if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE) elog(ERROR, "invalid read request size: %d", nbytes); > * At inv_truncate(), variable "off" is re-defined to int64. Is it really needed > change? All its usage is to store the result of "len % LOBLKSIZE". Your point is correct. Back to int32. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > Thanks, > > 2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>: >> Here is 64-bit API for large object version 2 patch. >> >>> I checked this patch. It can be applied onto the latest master branch >>> without any problems. My comments are below. >>> >>> 2012/9/11 Tatsuo Ishii <ishii@postgresql.org>: >>> > Ok, here is the patch to implement 64-bit API for large object, to >>> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to >>> > 32KB). The patch is based on Jeremy Drake's patch posted on September >>> > 23, 2005 >>> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) >>> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai >>> > for the backend part and Yugo Nagata for the rest(including >>> > documentation patch). >>> > >>> > Here are changes made in the patch: >>> > >>> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) >>> > >>> > lo_initialize() gathers backend 64-bit large object handling >>> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. >>> > >>> > If client calls lo_*64 functions and backend does not support them, >>> > lo_*64 functions return error to caller. There might be an argument >>> > since calls to lo_*64 functions can automatically be redirected to >>> > 32-bit older API. I don't know this is worth the trouble though. >>> > >>> I think it should definitely return an error code when user tries to >>> use lo_*64 functions towards the backend v9.2 or older, because >>> fallback to 32bit API can raise unexpected errors if application >>> intends to seek the area over than 2GB. >>> >>> > Currently lo_initialize() throws an error if one of oids are not >>> > available. I doubt we do the same way for 64-bit functions since this >>> > will make 9.3 libpq unable to access large objects stored in pre-9.2 >>> > PostgreSQL servers. >>> > >>> It seems to me the situation to split the case of pre-9.2 and post-9.3 >>> using a condition of "conn->sversion >= 90300". >> >> Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc. >> >> >>> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr >>> > is a pointer to 64-bit integer and actual data is placed somewhere >>> > else. There might be other way: add new member to union u to store >>> > 64-bit integer: >>> > >>> > typedef struct >>> > { >>> > int len; >>> > int isint; >>> > union >>> > { >>> > int *ptr; /* can't use void (dec compiler barfs) */ >>> > int integer; >>> > int64 bigint; /* 64-bit integer */ >>> > } u; >>> > } PQArgBlock; >>> > >>> > I'm a little bit worried about this way because PQArgBlock is a public >>> > interface. >>> > >>> I'm inclined to add a new field for the union; that seems to me straight >>> forward approach. >>> For example, the manner in lo_seek64() seems to me confusable. >>> It set 1 on "isint" field even though pointer is delivered actually. >>> >>> + argv[1].isint = 1; >>> + argv[1].len = 8; >>> + argv[1].u.ptr = (int *) &len; >> >> Your proposal was not adopted per discussion. >> >> >>> > Also we add new type "pg_int64": >>> > >>> > #ifndef NO_PG_INT64 >>> > #define HAVE_PG_INT64 1 >>> > typedef long long int pg_int64; >>> > #endif >>> > >>> > in postgres_ext.h per suggestion from Tom Lane: >>> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php >>> > >>> I'm uncertain about context of this discussion. >>> >>> Does it make matter if we include <stdint.h> and use int64_t instead >>> of the self defined data type? >> >> Your proposal was not adopted per discussion. >> Per discussion, endiannness translation was moved to fe-lobj.c. >> >> >>> > 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai) >>> > >>> > Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle >>> > 64-bit seek position and data length. loread64 and lowrite64 are not >>> > added because if a program tries to read/write more than 2GB at once, >>> > it would be a sign that the program need to be re-designed anyway. >>> > >>> I think it is a reasonable. >>> >>> > 3) Backend inv_api.c functions(Nozomi Anzai) >>> > >>> > No need to add new functions. Just extend them to handle 64-bit data. >>> > >>> > BTW , what will happen if older 32-bit libpq accesses large objects >>> > over 2GB? >>> > >>> > lo_read and lo_write: they can read or write lobjs using 32-bit API as >>> > long as requested read/write data length is smaller than 2GB. So I >>> > think we can safely allow them to access over 2GB lobjs. >>> > >>> > lo_lseek: again as long as requested offset is smaller than 2GB, there >>> > would be no problem. >>> > >>> > lo_tell:if current seek position is beyond 2GB, returns an error. >>> > >>> Even though iteration of lo_lseek() may move the offset to 4TB, it also >>> makes unavailable to use lo_tell() to obtain the current offset, so I think >>> it is reasonable behavior. >>> >>> However, error code is not an appropriate one. >>> >>> + if (INT_MAX < offset) >>> + { >>> + ereport(ERROR, >>> + (errcode(ERRCODE_UNDEFINED_OBJECT), >>> + errmsg("invalid large-object >>> descriptor: %d", fd))); >>> + PG_RETURN_INT32(-1); >>> + } >>> >>> According to the manpage of lseek(2) >>> EOVERFLOW >>> The resulting file offset cannot be represented in an off_t. >>> >>> Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW. >> >> Changed the error code and error message. We added a new error code >> "ERRCODE_UNDEFINED_OBJECT (22P07)". >> >> >>> > 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata) >>> > >>> > Comments and suggestions are welcome. >>> > >>> miscellaneous comments are below. >>> >>> Regression test is helpful. Even though no need to try to create 4TB large >>> object, it is helpful to write some chunks around the design boundary. >>> Could you add some test cases that writes some chunks around 4TB offset. >> >> Added 64-bit lobj test items into regression test and confirmed it worked >> rightly. >> >> >>> Thanks, >>> -- >>> KaiGai Kohei <kaigai@kaigai.gr.jp> >>> >>> >>> -- >>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-hackers >> >> >> -- >> Nozomi Anzai >> SRA OSS, Inc. Japan >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: