Re: CopyReadLineText optimization - Mailing list pgsql-patches

From Luke Lonergan
Subject Re: CopyReadLineText optimization
Date
Msg-id 014F2941B0A1EA47BD61D21526B806E901075565@MI8NYCMAIL08.Mi8.com
Whole thread Raw
In response to CopyReadLineText optimization  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-patches
Cool!  It's been a while since we've done the same kind of thing :-)

- Luke

> -----Original Message-----
> From: pgsql-patches-owner@postgresql.org
> [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of
> Heikki Linnakangas
> Sent: Saturday, February 23, 2008 5:30 PM
> To: pgsql-patches@postgresql.org
> Subject: [PATCHES] CopyReadLineText optimization
>
> The purpose of CopyReadLineText is to scan the input buffer,
> and find the next newline, taking into account any escape
> characters. It currently operates in a loop, one byte at a
> time, searching for LF, CR, or a backslash. That's a bit
> slow: I've been running oprofile on COPY, and I've seen
> CopyReadLine to take around ~10% of the CPU time, and Joshua
> Drake just posted a very similar profile to hackers.
>
> Attached is a patch that modifies CopyReadLineText so that it
> uses memchr to speed up the scan. The nice thing about memchr
> is that we can take advantage of any clever optimizations
> that might be in libc or compiler.
>
> In the tests I've been running, it roughly halves the time
> spent in CopyReadLine (including the new memchr calls), thus
> reducing the total CPU overhead by ~5%. I'm planning to run
> more tests with data that has backslashes and with different
> width tables to see what the worst-case and best-case
> performance is like. Also, it doesn't work for CSV format at
> the moment; that needs to be fixed.
>
> 5% isn't exactly breathtaking, but it's a start. I tried the
> same trick to CopyReadAttributesText, but unfortunately it
> doesn't seem to help there because you need to "stop" the
> efficient word-at-a-time scan that memchr does (at least with
> glibc, YMMV) whenever there's a column separator, while in
> CopyReadLineText you get to process the whole line in one
> call, assuming there's no backslashes.
>
> --
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
>

pgsql-patches by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: CopyReadLineText optimization
Next
From: Tatsuhito Kasahara
Date:
Subject: Re: Fix pgstatindex using for large indexes