Home > mailing lists

speed up verifying UTF-8 - Mailing list pgsql-hackers

From	John Naylor
Subject	speed up verifying UTF-8
Date	June 2, 2021 19:26:41
Msg-id	CAFBsxsHii1-wbwN7vEbpzK03VJJL=EXegJSz6RSXbXZeaUB2jA@mail.gmail.com Whole thread Raw
In response to	Re: [POC] verifying UTF-8 using SIMD instructions (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: speed up verifying UTF-8
List	pgsql-hackers

Tree view

For v10, I've split the patch up into two parts. 0001 uses pure C everywhere. This is much smaller and easier to review, and gets us the most bang for the buck.

One concern Heikki raised upthread is that platforms with poor unaligned-memory access will see a regression. We could easily add an #ifdef to take care of that, but I haven't done so here.

To recap: On ascii-only input with storage taken out of the picture, profiles of COPY FROM show a reduction from nealy 10% down to just over 1%. In microbenchmarks found earlier in this thread, this works out to about 7 times faster. On multibyte/mixed input, 0001 is a bit faster, but not really enough to make a difference in copy performance.

0002 adds the SSE4 implementation on x86-64, and is equally fast on all input, at the cost of greater complexity.

To reflect the split, I've changed the thread subject and the commitfest title.

John Naylor

EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Matthias van de Meent
Date: 02 June 2021, 18:48:38
Subject: Re: pg_stat_progress_create_index vs. parallel index builds

From: Marko Tiikkaja
Date: 02 June 2021, 19:36:39
Subject: Re: security_definer_search_path GUC

speed up verifying UTF-8 - Mailing list pgsql-hackers

Attachment

Previous

Next