Efficient slicing/substring of TOAST values (for comment) - Mailing list pgsql-patches

From John Gray
Subject Efficient slicing/substring of TOAST values (for comment)
Date
Msg-id 1002549159.23074.40.camel@adzuki
Whole thread Raw
Responses Re: Efficient slicing/substring of TOAST values (for comment)
Re: Efficient slicing/substring of TOAST values (for comment)
List pgsql-patches
Hi all,

I attach a patch which adds access routines for efficient extraction of
parts of TOAST values.

The principal additions are two routines in tuptoaster.c,
heap_tuple_untoast_attr_slice and toast_fetch_datum_slice. The latter
uses extra index scankeys to retrieve only the TOAST chunks which
contain the requested substring. This will provide a performance benefit
if you repeatedly extract small portions (e.g. file headers) from
TOASTed values, as only one or two chunks will need to be fetched. This
function is only invoked for external, uncompressed storage.

The public access routine (heap_tuple_untoast_attr_slice) does take care
of slicing values that are stored compressed or inline, but doesn't
provide any performance benefit in those cases.

The access macros are in the same vein to existing ones:
PG_GETARG_TEXT_P_SLICE(n,start,length)
for example.

What I haven't done:

1. Documentation. If this patch is appropriate or acceptable, I'll add
documentation.

2. Changed e.g. textsubstr and byteasubstr to use this method.
textsubstr is complicated by the multibyte support -the fast method is
only applicable in a non-multibyte environment. Also, the SQL negative
offset rule is not embodied in what I've added, and the subscripts are
zero-based. This was on the assumption that if the data was binary (e.g.
JPEG/JFIF data) and the user's intent was to extract the header, it
would be clearer to use zero-based offsets.

3. Added any facility to force a column to have attstorage 'e'. At
present it appears to be defaulted from typstorage, but I couldn't see
any problem with changing it after table creation. Would a keyword to
CREATE TABLE to override the default attstorage be useful? -especially
if the user knew that the data for a column would not be very
compressible (there would be a performance gain in not trying to
compress it, and just storing it externally uncompressed).

Of course, this may just all be useless feature bloat or not up to
scratch coding-wise (and please say so if it is) but please let me know
if it's worth me documenting this or adding any more to it.

(diffs against versions current in CVS as of twenty minutes ago or so)

Regards

John

Attachment

pgsql-patches by date:

Previous
From: "Serguei Mokhov"
Date:
Subject: Place of PO files for NLS (was Re: PG_DUMP NLS (Russian))
Next
From: Tom Lane
Date:
Subject: Re: Efficient slicing/substring of TOAST values (for comment)