Home > mailing lists

Re: Perform COPY FROM encoding conversions in larger chunks - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: Perform COPY FROM encoding conversions in larger chunks
Date	December 22, 2020 20:01:48
Msg-id	CAFBsxsH4Zum8e+i1jGjQhGW+8fYWwJ7EqOKCx6P_cUzOJUK9qA@mail.gmail.com Whole thread
In response to	Perform COPY FROM encoding conversions in larger chunks (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses	Re: Perform COPY FROM encoding conversions in larger chunks
List	pgsql-hackers

Tree view

On Wed, Dec 16, 2020 at 8:18 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Currently, COPY FROM parses the input one line at a time. Each line is
> converted to the database encoding separately, or if the file encoding
> matches the database encoding, we just check that the input is valid for
> the encoding. It would be more efficient to do the encoding
> conversion/verification in larger chunks. At least potentially; the
> current conversion/verification implementations work one byte a time so
> it doesn't matter too much, but there are faster algorithms out there
> that use SIMD instructions or lookup tables that benefit from larger inputs.

Hi Heikki,

This is great news. I've seen examples of such algorithms and that'd be nice to have. I haven't studied the patch in detail, but it looks fine on the whole.

In 0004, it seems you have some doubts about upgrade compatibility. Is that because user-defined conversions would no longer have the right signature?

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 22 December 2020, 19:33:22
Subject: Re: libpq compression

From: Alastair Turner
Date: 22 December 2020, 20:15:27
Subject: Re: Proposed patch for key managment

Re: Perform COPY FROM encoding conversions in larger chunks - Mailing list pgsql-hackers

Previous

Next