Re: Parallel copy - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel copy
Date
Msg-id 20200222002802.yew5buvrd2yrjkm6@development
Whole thread Raw
In response to Re: Parallel copy  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On Fri, Feb 21, 2020 at 02:54:31PM +0200, Ants Aasma wrote:
>On Thu, 20 Feb 2020 at 18:43, David Fetter <david@fetter.org> wrote:>
>> On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
>> > I think the wc2 is showing that maybe instead of parallelizing the
>> > parsing, we might instead try using a different tokenizer/parser and
>> > make the implementation more efficient instead of just throwing more
>> > CPUs on it.
>>
>> That was what I had in mind.
>>
>> > I don't know if our code is similar to what wc does, maytbe parsing
>> > csv is more complicated than what wc does.
>>
>> CSV parsing differs from wc in that there are more states in the state
>> machine, but I don't see anything fundamentally different.
>
>The trouble with a state machine based approach is that the state
>transitions form a dependency chain, which means that at best the
>processing rate will be 4-5 cycles per byte (L1 latency to fetch the
>next state).
>
>I whipped together a quick prototype that uses SIMD and bitmap
>manipulations to do the equivalent of CopyReadLineText() in csv mode
>including quotes and escape handling, this runs at 0.25-0.5 cycles per
>byte.
>

Interesting. How does that compare to what we currently have?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Mercha
Date:
Subject: Re: SPI Concurrency Precautions? Problems with Parallel Execution ofMultiple CREATE TABLE statements
Next
From: Michael Leonhard
Date:
Subject: Make java client lib accept same connection strings as psql