Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

From John Naylor
Subject Re: speed up verifying UTF-8
Date
Msg-id CAFBsxsEdUk96E1QLK1AEd8LudSd6Wo8k+w6_+KYYMgwJKAVy0g@mail.gmail.com
Whole thread Raw
In response to Re: speed up verifying UTF-8  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: speed up verifying UTF-8  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers

I wrote:

> Naively, the shift-based DFA requires 64-bit integers to encode the transitions, but I recently came across an idea from Dougall Johnson of using the Z3 SMT solver to pack the transitions into 32-bit integers [1]. That halves the size of the transition table for free. I adapted that effort to the existing conventions in v22 and arrived at the attached python script.
> [...]
> I'll include something like the attached text file diff in the next patch. Some comments are now outdated, but this is good enough for demonstration.

Attached is v23 incorporating the 32-bit transition table, with the necessary comment adjustments.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Next
From: Stephen Frost
Date:
Subject: Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)