Chuck McDevitt wrote:
> What if the block of text is split in the middle of a multibyte character?
> I don't think it is safe to assume raw blocks always end on a character boundary.
Yeah, it's not. I realized myself after submitting. The generic approach
is to loop with pg_mblen() to find out the max. safe length. For UTF-8,
and probably many other multi-byte encodings as well, we can detect
whether a byte is the first byte of a multi-byte character, just by
looking at the few high-bits of the byte.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com