Re: speed up text_position() for utf-8 - Mailing list pgsql-hackers

From John Naylor
Subject Re: speed up text_position() for utf-8
Date
Msg-id CAFBsxsFUoxgYQ22xjFTVtq7UAoZbHFTigEeQn3bV=2PCyqgSpw@mail.gmail.com
Whole thread Raw
In response to Re: speed up text_position() for utf-8  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: speed up text_position() for utf-8
List pgsql-hackers
Attached is a short patch series to develop some ideas of inlining
pg_utf_mblen().

0001 puts the main implementation of pg_utf_mblen() into an inline
function and uses this in pg_mblen(). This is somewhat faster in the
strpos tests, so that gives some measure of the speedup expected for
other callers. Text search seems to call this a lot, so this might
have noticeable benefit.

0002 refactors text_position_get_match_pos() to use
pg_mbstrlen_with_len(). This itself is significantly faster when
combined with 0001, likely because the latter can inline the call to
pg_mblen(). The intention is to speed up more than just text_position.

0003 explicitly specializes for the inline version of pg_utf_mblen()
into pg_mbstrlen_with_len(), but turns out to be almost as slow as
master for ascii. It doesn't help if I undo the previous change in
pg_mblen(), and I haven't investigated why yet.

0002 looks good now, but the experience with 0003 makes me hesitant to
propose this seriously until I can figure out what's going on there.

The test is as earlier, a worst-case substring search, times in milliseconds.

 patch  | no match | ascii | multibyte
--------+----------+-------+-----------
 PG11   |     1220 |  1220 |      1150
 master |      385 |  2420 |      1980
 0001   |      390 |  2180 |      1670
 0002   |      389 |  1330 |      1100
 0003   |      391 |  2100 |      1360


-- 
John Naylor
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Adding CI to our tree
Next
From: Alvaro Herrera
Date:
Subject: Re: Column Filtering in Logical Replication