Re: COPY FROM performance improvements - Mailing list pgsql-patches

From Andrew Dunstan
Subject Re: COPY FROM performance improvements
Date
Msg-id 1688.24.211.165.134.1119696313.squirrel@www.dunslane.net
Whole thread Raw
In response to Re: COPY FROM performance improvements  ("Luke Lonergan" <llonergan@greenplum.com>)
Responses Re: COPY FROM performance improvements  ("Luke Lonergan" <llonergan@greenplum.com>)
Re: COPY FROM performance improvements  ("Alon Goldshuv" <agoldshuv@greenplum.com>)
List pgsql-patches
Luke Lonergan said:
> I've attached Alon's patch ported to the CVS trunk.  It applies cleanly
> and passes the regressions.  With fsync=false it is 40% faster loading
> a sample dataset with 15 columns of varied type.  It's 19% faster with
> fsync=true.
>
> This patch separates the CopyFrom code into two pieces, the new logic
> for delimited data and the existing logic for CSV and Binary.
>


A few of quick comments - I will probably have many more later when I have
time to review this in depth.

1. Postgres does context diffs for patches, not unidiffs.

2. This comment raises a flag in my mind:

+ * each attribute begins. If a specific attribute is not used for this
+ * COPY command (ommitted from the column list), a value of 0 will be
assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
+ * attr_offsets may look something like this after this routine
+ * returns: [0,20,0,0,55]. That means that column "a" value starts
+ * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.

Would it not be better to mark missing attributes with something that can't
be a valid offset, like -1?


3. This comment needs improving:

+/*
+ * Copy FROM file to relation with faster processing.
+ */

4. We should indeed do this for CSV, especially since a lot of the relevant
logic for detecting attribute starts is already there for CSV in
CopyReadLine. I'm prepared to help you do that if necessary, since I'm
guilty of perpetrating that code.

cheers

andrew



pgsql-patches by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Add PG version number to NLS files
Next
From: Andrew Dunstan
Date:
Subject: Re: plperl features