speed up verifying UTF-8 - Mailing list pgsql-hackers

From John Naylor
Subject speed up verifying UTF-8
Date
Msg-id CAFBsxsHii1-wbwN7vEbpzK03VJJL=EXegJSz6RSXbXZeaUB2jA@mail.gmail.com
Whole thread Raw
In response to Re: [POC] verifying UTF-8 using SIMD instructions  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: speed up verifying UTF-8
List pgsql-hackers
For v10, I've split the patch up into two parts. 0001 uses pure C everywhere. This is much smaller and easier to review, and gets us the most bang for the buck. 

One concern Heikki raised upthread is that platforms with poor unaligned-memory access will see a regression. We could easily add an #ifdef to take care of that, but I haven't done so here.

To recap: On ascii-only input with storage taken out of the picture, profiles of COPY FROM show a reduction from nealy 10% down to just over 1%. In microbenchmarks found earlier in this thread, this works out to about 7 times faster. On multibyte/mixed input, 0001 is a bit faster, but not really enough to make a difference in copy performance.

0002 adds the SSE4 implementation on x86-64, and is equally fast on all input, at the cost of greater complexity.

To reflect the split, I've changed the thread subject and the commitfest title.
--
Attachment

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: pg_stat_progress_create_index vs. parallel index builds
Next
From: Marko Tiikkaja
Date:
Subject: Re: security_definer_search_path GUC