Home > mailing lists

Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
Date	August 27, 2014 05:13:44
Msg-id	1409116426.2335.455.camel@jeff-desktop Whole thread Raw
In response to	Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
List	pgsql-hackers

Tree view

On Mon, 2014-08-25 at 17:41 +0300, Heikki Linnakangas wrote:
> Actually, that gets optimized to a constant in the planner:

Oops, thank you (and Tom).

> your patch seems to be about 2x-3x as slow as unpatched master. So this
> needs some optimization. A couple of ideas:

I didn't see anywhere near that kind of regression. On unpatched master,
with your test case, I saw it stabilize to about 680ms. With
similar-escape-1, I saw about 775ms (15% regression). Are those at all
close to your numbers? Is there a chance you used an unoptimized build
for one of them, or left asserts enabled?

> 1. If the escape string is in fact a single-byte character, you can
> proceed with the loop just as it is today, without the pg_mblen calls.
>
> 2. Since pg_mblen() will always return an integer between 1-6, it would
> probably be faster to replace the memcpy() and memcmp() calls with
> simple for-loops iterating byte-by-byte.
>
> In very brief testing, with the 1. change above, the performance with
> this patch is back to what it's without the patch. See attached.

The particular patch has a mistake: the first branch is always taken
because pg_mblen() won't return 0. It's also fairly ugly to set mblen in
the test for the branch that doesn't use it.

Attached a patch implementing the same idea though: only use the
multibyte path if *both* the escape char and the current character from
the pattern are multibyte.

I also changed the comment to more clearly state the behavior upon which
we're relying. I hope what I said is accurate.

Regards,
    Jeff Davis

Attachment

similar-escape-3.patch

pgsql-hackers by date:

From: Michael Paquier
Date: 27 August 2014, 05:07:46
Subject: Re: Missing comment block at the top of streamutil.h and receivelog.h

From: Jeff Davis
Date: 27 August 2014, 05:23:03
Subject: Re: Proposal for CSN based snapshots

Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING - Mailing list pgsql-hackers

Attachment

Previous

Next