Re: Efficient slicing/substring of TOAST values (for - Mailing list pgsql-patches

From John Gray
Subject Re: Efficient slicing/substring of TOAST values (for
Date
Msg-id 1002669495.23073.637.camel@adzuki
Whole thread Raw
In response to Re: Efficient slicing/substring of TOAST values (for comment)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Efficient slicing/substring of TOAST values (for comment)
List pgsql-patches
On Mon, 2001-10-08 at 16:06, Tom Lane wrote:
> John Gray <jgray@azuli.co.uk> writes:
> > 2. Changed e.g. textsubstr and byteasubstr to use this method.
> > textsubstr is complicated by the multibyte support -the fast method is
> > only applicable in a non-multibyte environment.
>
> More accurately, it's only applicable in single-byte character sets,
> which is a useful fraction of the MULTIBYTE world too.  Keep in mind
> that MULTIBYTE is likely to default to ON in the near future, so there's
> little point in such an exercise if you are going to ifdef it out.
> I believe there's a routine somewhere that will tell you whether the
> current database encoding is single-byte or multi-byte.
>
I've found pg_database_encoding_max_length() which seems to be the right
thing.

There may be some prospect of performance improvement even for multibyte
cases. For a substr(attr,start,len) call, I could fetch the segment
starting from the start of the attribute through to
pg_database_encoding_max_length()*(start+len). If this is less than the
length of attr then we can save time and memory. The existing multibyte
substring operation is then applied to this returned string.

> > 3. Added any facility to force a column to have attstorage 'e'. At
> > present it appears to be defaulted from typstorage, but I couldn't see
> > any problem with changing it after table creation. Would a keyword to
> > CREATE TABLE to override the default attstorage be useful?
>
> I've been envisioning adding an ALTER TABLE variant to do this, rather
> than mucking up CREATE TABLE --- compare the handling of attstattarget.
>
I've had a look, and it seems that AlterTableAlterColumnStatistics
contains a lot of template code (and only a small amount actually
dealing with stats). If we had different Node types for the different
Column flag-changing operations (especially those which don't touch
tuple data at all) then the current AlterTableAlterColumnStatistics
might become AlterTableAlterColumnFlags (?) and take several kinds of
nodes. Thus all the inheritance-supporting code wouldn't need to be
duplicated. Alternatively, cloning the routine wouldn't be too much
work.

> No, this has been a to-do item for awhile.  I think it's a tad too late
> to think in terms of squeezing it into 7.2, but keep working on it for
> 7.3.
>
Shall do. I'll develop something for substr operations in line with
above.

> Have you got any ideas about the update side of the problem?
>

Well, if a user is only updating a small part of a toasted value, then
only the corresponding chunks need be updated -assuming that the update
is Procrustean i.e. that the substitute string is coerced to the same
length as the segment it replaces, by truncation or blank padding. What
I'm not sure about is the syntax for specifying such an update. Would we
just overload the syntax for an array?

e.g. UPDATE objects SET obj_data[32:47]='0123456789abcdef';

I can see the other problem being that an updated toast value currently
gets a new valueid. Maybe I'll need to work through this more, but I
can't see why that need be the case -i.e. the toast update could just
perform a normal update on each chunk rather than creating new ones and
deleting old ones?

Regards

John




pgsql-patches by date:

Previous
From: Patrice Hédé
Date:
Subject: Re: [HACKERS] Unicode combining characters
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] Unicode combining characters