Thread: Getting the length of varlength data using PG_DETOAST_DATUM_SLICE or similar?
Getting the length of varlength data using PG_DETOAST_DATUM_SLICE or similar?
From
Mark Dilger
Date:
Hello, could anyone tell me, for a user contributed variable length data type, how can you access the length of the data without pulling the entire thing from disk? Is there a function or macro for this? As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail. grep'ing through the release source forversion 8.1.2, I find very little usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence little clue how they are intended to be used.) The only files where I find them referenced are: doc/src/sgml/xfunc.sgmlsrc/backend/utils/adt/varlena.csrc/include/fmgr.h I am writing a variable length data type and trying to optimize the disk usage in certain functions. There are cases where the return value of the function can be determined from the length of the data and a prefix of the data without fetching the whole data from disk. (The prefix alone is insufficient -- I need to also know the length for the optimization to work.) The first field of the data type is the length, as follows: typedef struct datatype_foo { int32 length; char data[];} datatype_foo; But when I fetch the function arguments using datatype_foo * a = (datatype_foo *) PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ); the length field is set to the length of the fetched slice, not the length of the data as it exists on disk. Is there some other function that gets the length without pulling more than the first block? Thanks for any insight, --Mark
Have you looked at the 8.1.X buildin function pg_column_size()? --------------------------------------------------------------------------- Mark Dilger wrote: > Hello, could anyone tell me, for a user contributed variable length data type, > how can you access the length of the data without pulling the entire thing from > disk? Is there a function or macro for this? > > As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail. > grep'ing through the release source for version 8.1.2, I find very little > usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence > little clue how they are intended to be used.) The only files where I find them > referenced are: > > doc/src/sgml/xfunc.sgml > src/backend/utils/adt/varlena.c > src/include/fmgr.h > > > I am writing a variable length data type and trying to optimize the disk usage > in certain functions. There are cases where the return value of the function > can be determined from the length of the data and a prefix of the data without > fetching the whole data from disk. (The prefix alone is insufficient -- I need > to also know the length for the optimization to work.) > > The first field of the data type is the length, as follows: > > typedef struct datatype_foo { > int32 length; > char data[]; > } datatype_foo; > > But when I fetch the function arguments using > > datatype_foo * a = (datatype_foo *) > PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ); > > the length field is set to the length of the fetched slice, not the length of > the data as it exists on disk. Is there some other function that gets the length > without pulling more than the first block? > > Thanks for any insight, > > --Mark > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
It looks like pg_column_size gives you the actual size on disk, ie after compression. What looks interesting for you would be byteaoctetlen or the function it wraps, toast_raw_datum_size. See src/backend/access/heap/tuptoaster.c. pg_column_size calls toast_datum_size, while byteaoctetlen/textoctetlen calls toast_raw_datum_size. On Sat, 11 Feb 2006, Bruce Momjian wrote: > > Have you looked at the 8.1.X buildin function pg_column_size()? > > --------------------------------------------------------------------------- > > Mark Dilger wrote: > > Hello, could anyone tell me, for a user contributed variable length data type, > > how can you access the length of the data without pulling the entire thing from > > disk? Is there a function or macro for this? > > > > As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail. > > grep'ing through the release source for version 8.1.2, I find very little > > usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence > > little clue how they are intended to be used.) The only files where I find them > > referenced are: > > > > doc/src/sgml/xfunc.sgml > > src/backend/utils/adt/varlena.c > > src/include/fmgr.h > > > > > > I am writing a variable length data type and trying to optimize the disk usage > > in certain functions. There are cases where the return value of the function > > can be determined from the length of the data and a prefix of the data without > > fetching the whole data from disk. (The prefix alone is insufficient -- I need > > to also know the length for the optimization to work.) > > > > The first field of the data type is the length, as follows: > > > > typedef struct datatype_foo { > > int32 length; > > char data[]; > > } datatype_foo; > > > > But when I fetch the function arguments using > > > > datatype_foo * a = (datatype_foo *) > > PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ); > > > > the length field is set to the length of the fetched slice, not the length of > > the data as it exists on disk. Is there some other function that gets the length > > without pulling more than the first block? > > > > Thanks for any insight, > > > > --Mark > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > -- "Contrary to popular belief, penguins are not the salvation of modern technology. Neither do they throw parties for the urban proletariat."
Bruce Momjian wrote: > Have you looked at the 8.1.X buildin function pg_column_size()? Thanks Bruce for the lead. I didn't know what to grep for; this helps. The header comment for that function says "Return the size of a datum, possibly compressed" I take it the uncompressed length is not available -- that this is as close as I'm going to get. I haven't traced through the function yet; maybe it does what I need. I'll look at this some more now that I have a starting point. Thanks again! mark
Jeremy Drake wrote: > It looks like pg_column_size gives you the actual size on disk, ie after > compression. > > What looks interesting for you would be byteaoctetlen or the function it > wraps, toast_raw_datum_size. See src/backend/access/heap/tuptoaster.c. > pg_column_size calls toast_datum_size, while byteaoctetlen/textoctetlen > calls toast_raw_datum_size. > > > > On Sat, 11 Feb 2006, Bruce Momjian wrote: > > >>Have you looked at the 8.1.X buildin function pg_column_size()? >> >>--------------------------------------------------------------------------- >> >>Mark Dilger wrote: >> >>>Hello, could anyone tell me, for a user contributed variable length data type, >>>how can you access the length of the data without pulling the entire thing from >>>disk? Is there a function or macro for this? >>> >>>As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail. >>> grep'ing through the release source for version 8.1.2, I find very little >>>usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence >>>little clue how they are intended to be used.) The only files where I find them >>>referenced are: >>> >>> doc/src/sgml/xfunc.sgml >>> src/backend/utils/adt/varlena.c >>> src/include/fmgr.h >>> >>> >>>I am writing a variable length data type and trying to optimize the disk usage >>>in certain functions. There are cases where the return value of the function >>>can be determined from the length of the data and a prefix of the data without >>>fetching the whole data from disk. (The prefix alone is insufficient -- I need >>>to also know the length for the optimization to work.) >>> >>>The first field of the data type is the length, as follows: >>> >>> typedef struct datatype_foo { >>> int32 length; >>> char data[]; >>> } datatype_foo; >>> >>>But when I fetch the function arguments using >>> >>> datatype_foo * a = (datatype_foo *) >>> PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ); >>> >>>the length field is set to the length of the fetched slice, not the length of >>>the data as it exists on disk. Is there some other function that gets the length >>>without pulling more than the first block? >>> >>>Thanks for any insight, >>> >>>--Mark >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 1: if posting/reading through Usenet, please send an appropriate >>> subscribe-nomail command to majordomo@postgresql.org so that your >>> message can get through to the mailing list cleanly >>> Ok, for anyone following the thread, this code works for me: int true_size_arg_zero = toast_raw_datum_size(PG_GETARG_DATUM(0)); int true_size_arg_one = toast_raw_datum_size(PG_GETARG_DATUM(1)); Be sure to #include "access/tuptoaster.h" Thanks Jeremy!