Re: COPY FROM performance improvements - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: COPY FROM performance improvements
Date
Msg-id 200506240358.j5O3wga20563@candle.pha.pa.us
Whole thread Raw
In response to COPY FROM performance improvements  ("Alon Goldshuv" <agoldshuv@greenplum.com>)
List pgsql-hackers
Sounds great!

---------------------------------------------------------------------------

Alon Goldshuv wrote:
> This is a second iteration of a previous thread that didn't resolve few
> weeks ago. I made some more modifications to the code to make it compatible
> with the current COPY FROM code and it should be more agreeable this time.
> 
> The main premise of the new code is that it improves the text data parsing
> speed by about 4-5x, resulting in total improvements that lie between 15% to
> 95% for data importing (higher range gains will occur on large data rows
> without many columns - implying more parsing and less converting to internal
> format). This is done by replacing a char-at-a-time parsing with buffered
> parsing and also using fast scan routines and minimum amount of
> loading/appending into line and attribute buf.
> 
> The new code passes both COPY regression tests (copy, copy2) and doesn't
> break any of the others.
> 
> It also supports encoding conversions (thanks Peter and Tatsuo and your
> feedback) and the 3 line-end types. Having said that, using COPY with
> different encodings was only minimally tested. We are looking into creating
> new tests and hopefully add them to postgres regression suite one day if
> it's desired by the community.
> 
> This new code is improving the delimited data format parsing. BINARY and CSV
> will stay the same and will be executed separately for now (therefore there
> is some code duplication) In the future I plan to write improvements to the
> CSV path too, so that it will be executed without duplication of code.
> 
> I am still missing supporting data that uses COPY_OLD_FE (question: what are
> the use cases? When will it be used? Please advise)
> 
> I'll send out the patch soon. It's basically there to show that there is a
> way to load data faster. In future releases of the patch it will be more
> complete and elegant.
> 
> I'll appreciate any comments/advices.
> 
> Alon.
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: Re: regression failure
Next
From: ITAGAKI Takahiro
Date:
Subject: Re: [PATCHES] O_DIRECT for WAL writes