Re: CopyReadLineText optimization - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: CopyReadLineText optimization
Date
Msg-id 200803031942.m23JgG129730@momjian.us
Whole thread Raw
In response to Re: CopyReadLineText optimization  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-patches
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > Attached is a patch that modifies CopyReadLineText so that it uses
> > memchr to speed up the scan. The nice thing about memchr is that we can
> > take advantage of any clever optimizations that might be in libc or
> > compiler.
>
> Here's an updated version of the patch. The principle is the same, but
> the same optimization is now used for CSV input as well, and there's
> more comments.
>
> I still need to do more benchmarking. I mentioned a ~5% speedup on the
> test I ran earlier, which was a load of the lineitem table from TPC-H.
> It looks like with cheaper data types the gain can be much bigger;
> here's an oprofile from loading the TPC-H partsupp table,
>
> Before:
>
> samples  %        image name               symbol name
> 5146     25.7635  postgres                 CopyReadLine
> 4089     20.4716  postgres                 DoCopy
> 1449      7.2544  reiserfs                 (no symbols)
> 1369      6.8539  postgres                 pg_verify_mbstr_len
> 1013      5.0716  libc-2.7.so              memcpy
> 749       3.7499  libc-2.7.so              ____strtod_l_internal
> 598       2.9939  postgres                 heap_formtuple
> 548       2.7436  libc-2.7.so              ____strtol_l_internal
> 403       2.0176  libc-2.7.so              memset
> 309       1.5470  libc-2.7.so              strlen
> 208       1.0414  postgres                 AllocSetAlloc
> ...
>
> After:
>
> samples  %        image name               symbol name
> 4165     25.7879  postgres                 DoCopy
> 1574      9.7455  postgres                 pg_verify_mbstr_len
> 1520      9.4112  reiserfs                 (no symbols)
> 1005      6.2225  libc-2.7.so              memchr
> 986       6.1049  libc-2.7.so              memcpy
> 632       3.9131  libc-2.7.so              ____strtod_l_internal
> 589       3.6468  postgres                 heap_formtuple
> 546       3.3806  libc-2.7.so              ____strtol_l_internal
> 386       2.3899  libc-2.7.so              memset
> 366       2.2661  postgres                 CopyReadLine
> 287       1.7770  libc-2.7.so              strlen
> 215       1.3312  postgres                 LWLockAcquire
> 208       1.2878  postgres                 hash_any
> 176       1.0897  postgres                 LWLockRelease
> 161       0.9968  postgres                 InputFunctionCall
> 157       0.9721  postgres                 AllocSetAlloc
> ...
>
> Profile shows that with the patch, ~8.5% of the CPU time is spent in
> CopyReadLine+memchr, vs. 25.5% before. That's a quite significant speedup.
>
> I still need to test the worst-case performance, with input that has a
> lot of escapes. It would be interesting to hear reports with this patch
> from people on different platforms. These results are from my laptop
> with 32-bit Intel CPU, running Linux. There could be big differences in
> the memchr implementations.
>
> --
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com


>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] new warning message
Next
From: Bruce Momjian
Date:
Subject: Re: Bulk Insert tuning