Thread: COPY FROM performance improvements

COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
Here is the patch I described in -hackers yesterday (copied again below).

Alon.


This is a second iteration of a previous thread that didn't resolve few
weeks ago. I made some more modifications to the code to make it compatible
with the current COPY FROM code and it should be more agreeable this time.

The main premise of the new code is that it improves the text data parsing
speed by about 4-5x, resulting in total improvements that lie between 15% to
95% for data importing (higher range gains will occur on large data rows
without many columns - implying more parsing and less converting to internal
format). This is done by replacing a char-at-a-time parsing with buffered
parsing and also using fast scan routines and minimum amount of
loading/appending into line and attribute buf.

The new code passes both COPY regression tests (copy, copy2) and doesn't
break any of the others.

It also supports encoding conversions (thanks Peter and Tatsuo and your
feedback) and the 3 line-end types. Having said that, using COPY with
different encodings was only minimally tested. We are looking into creating
new tests and hopefully add them to postgres regression suite one day if
it's desired by the community.

This new code is improving the delimited data format parsing. BINARY and CSV
will stay the same and will be executed separately for now (therefore there
is some code duplication) In the future I plan to write improvements to the
CSV path too, so that it will be executed without duplication of code.

I am still missing supporting data that uses COPY_OLD_FE (question: what are
the use cases? When will it be used? Please advise)

The patch is basically there to show that there is a
way to load data faster. In future releases of the patch it will be more
complete and elegant.

I'll appreciate any comments/advices.

Alon.


Attachment

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Note that this patch is against 8.0.2.

- Luke


Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
We're porting this to 8.1, we should be able to get it done within the next
couple of days.  The changes needed are to the way that the unmodified CSV
and binary code paths are preserved alongside the changed "delimited" text
COPY.

- Luke



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
I've attached Alon's patch ported to the CVS trunk.  It applies cleanly and
passes the regressions.  With fsync=false it is 40% faster loading a sample
dataset with 15 columns of varied type.  It's 19% faster with fsync=true.

This patch separates the CopyFrom code into two pieces, the new logic for
delimited data and the existing logic for CSV and Binary.

- Luke


Attachment

Re: COPY FROM performance improvements

From
"Andrew Dunstan"
Date:
Luke Lonergan said:
> I've attached Alon's patch ported to the CVS trunk.  It applies cleanly
> and passes the regressions.  With fsync=false it is 40% faster loading
> a sample dataset with 15 columns of varied type.  It's 19% faster with
> fsync=true.
>
> This patch separates the CopyFrom code into two pieces, the new logic
> for delimited data and the existing logic for CSV and Binary.
>


A few of quick comments - I will probably have many more later when I have
time to review this in depth.

1. Postgres does context diffs for patches, not unidiffs.

2. This comment raises a flag in my mind:

+ * each attribute begins. If a specific attribute is not used for this
+ * COPY command (ommitted from the column list), a value of 0 will be
assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
+ * attr_offsets may look something like this after this routine
+ * returns: [0,20,0,0,55]. That means that column "a" value starts
+ * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.

Would it not be better to mark missing attributes with something that can't
be a valid offset, like -1?


3. This comment needs improving:

+/*
+ * Copy FROM file to relation with faster processing.
+ */

4. We should indeed do this for CSV, especially since a lot of the relevant
logic for detecting attribute starts is already there for CSV in
CopyReadLine. I'm prepared to help you do that if necessary, since I'm
guilty of perpetrating that code.

cheers

andrew



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Andrew,

> 1. Postgres does context diffs for patches, not unidiffs.

Yah - sorry.  I used diff to get the "--ignore-space-change" feature.  I've
attached a "cvs diff -c" with identical content.

> 2. This comment raises a flag in my mind:
>
> + * each attribute begins. If a specific attribute is not used for this
> + * COPY command (ommitted from the column list), a value of 0 will be
> assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
> + * attr_offsets may look something like this after this routine
> + * returns: [0,20,0,0,55]. That means that column "a" value starts
> + * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.
>
> Would it not be better to mark missing attributes with something that can't
> be a valid offset, like -1?
>
> 3. This comment needs improving:
>
> +/*
> + * Copy FROM file to relation with faster processing.
> + */

Good comments! I'll look for Alon to respond on these.

> 4. We should indeed do this for CSV, especially since a lot of the relevant
> logic for detecting attribute starts is already there for CSV in
> CopyReadLine. I'm prepared to help you do that if necessary, since I'm
> guilty of perpetrating that code.

That would be awesome!  We'll help you with testing and regressions when
you're ready.

- Luke


Attachment

Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
Andrew,

Thanks for your initial feedback.

> 2. This comment raises a flag in my mind:
>
> + * each attribute begins. If a specific attribute is not used for this
> + * COPY command (ommitted from the column list), a value of 0 will be
> assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
> + * attr_offsets may look something like this after this routine
> + * returns: [0,20,0,0,55]. That means that column "a" value starts
> + * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.
>
> Would it not be better to mark missing attributes with something that can't
> be a valid offset, like -1?

That's probably a good idea. Generally, these offsets of attributes that are
not in the column list will not be looked at in the conversion loop
(functioncall3) but it wouldn't hurt to make them non-valid just in case.

> 3. This comment needs improving:
>
> +/*
> + * Copy FROM file to relation with faster processing.
> + */
True. Probably along with several others that I missed. There was a lot of
new code and not a lot of time... I'll make a second pass at comments.

> 4. We should indeed do this for CSV, especially since a lot of the relevant
> logic for detecting attribute starts is already there for CSV in
> CopyReadLine. I'm prepared to help you do that if necessary, since I'm
> guilty of perpetrating that code.

That will be great. My idea is to eventually keep processing for BINARY in
one routine while text can be in another (CSV+delimited). The reason being
is that there is a lot in common for the processing of the 2 text types. The
CopyReadLineBuffered will have to be changed just a little to accommodate
embedded newlines inside CSV quotes, and the CopyReadAttributeCSV would need
to change as well.

Question for you -- you'll probably notice that I made the escape of the
delimited text format (previously '\\') a constant variable "escapec".
Reason being - maybe some people would like to choose an escape for
delimited COPY too, or even disable it. So it's a good base to start with.
However, I just noticed that CSV uses that exact variable name... Sorry
about that. Any suggestion on how to rename it to something different?

Thx,
Alon.

>
> cheers
>
> andrew
>
>
>



Re: COPY FROM performance improvements

From
"Andrew Dunstan"
Date:
Luke Lonergan said:

>> 4. We should indeed do this for CSV, especially since a lot of the
>> relevant logic for detecting attribute starts is already there for CSV
>> in CopyReadLine. I'm prepared to help you do that if necessary, since
>> I'm guilty of perpetrating that code.
>
> That would be awesome!  We'll help you with testing and regressions
> when you're ready.
>

I didn't say I'd do it, I said I'd help you to do it ;-)

Alon's comments elsewhere look close to the mark.

cheers

andrew




Re: COPY FROM performance improvements

From
"Andrew Dunstan"
Date:
Alon Goldshuv said:
> Andrew,
>

>> 4. We should indeed do this for CSV, especially since a lot of the
>> relevant logic for detecting attribute starts is already there for CSV
>> in CopyReadLine. I'm prepared to help you do that if necessary, since
>> I'm guilty of perpetrating that code.
>
> That will be great. My idea is to eventually keep processing for BINARY
> in one routine while text can be in another (CSV+delimited). The reason
> being is that there is a lot in common for the processing of the 2 text
> types. The CopyReadLineBuffered will have to be changed just a little
> to accommodate embedded newlines inside CSV quotes, and the
> CopyReadAttributeCSV would need to change as well.
>

That sounds right.

> Question for you -- you'll probably notice that I made the escape of
> the delimited text format (previously '\\') a constant variable
> "escapec". Reason being - maybe some people would like to choose an
> escape for delimited COPY too, or even disable it. So it's a good base
> to start with. However, I just noticed that CSV uses that exact
> variable name... Sorry about that. Any suggestion on how to rename it
> to something different?
>

I haven't seen any demand for users to be allowed to specify an escape char
in text mode (BTW, that's what the docs call what you have called delimited
mode). Maybe there's a case for it, but I somewhat doubt it.

As for variable names, choose some variant that seems reasonable. If we need
to distinguish we should use csv_escapec and text_escapec. But I'm not sure
we can't use the same variable name for both cases, unless they would have
conflicting scopes.

What I would like to have is a high level description of
. how the new text mode code differs from the old text mode code, and
. which part of the change is responsible for how much performance gain.

Maybe I have missed that in previous discussion, but this change is
sufficiently invasive that I think you owe that to the reviewers.

cheers

andrew

cheers

andrew



Re: COPY FROM performance improvements

From
"Michael Paesold"
Date:
Luke Lonergan wrote:
> I've attached Alon's patch ported to the CVS trunk.  It applies cleanly
> and
> passes the regressions.  With fsync=false it is 40% faster loading a
> sample
> dataset with 15 columns of varied type.  It's 19% faster with fsync=true.
>
> This patch separates the CopyFrom code into two pieces, the new logic for
> delimited data and the existing logic for CSV and Binary.
>
> - Luke

What about this one:
    case COPY_OLD_FE:
+    ereport(ERROR,
+      (errcode(ERRCODE_CONNECTION_FAILURE),
+       errmsg("Old FE protocal not yet supported in fast COPY processing
(CopyGetData())")));
+

This can't stay like it is, can it?
Who uses the old FE protocol for COPY? Any drivers?

Otherwise I think the speed improvment is very fine. :-)

Best Regards,
Michael Paesold


Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
> What about this one:
>     case COPY_OLD_FE:
> +    ereport(ERROR,
> +      (errcode(ERRCODE_CONNECTION_FAILURE),
> +       errmsg("Old FE protocal not yet supported in fast COPY processing
> (CopyGetData())")));
> +
>
> This can't stay like it is, can it?
> Who uses the old FE protocol for COPY? Any drivers?
In my message while submitting the patch I indicated that this is the only
thing that will not work, and also wondered what are the use cases of the
old FE protocol... So I am joining you looking for an answer :-)

Alon.


>
> Otherwise I think the speed improvment is very fine. :-)
>
> Best Regards,
> Michael Paesold
>
>



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Andrew,

> What I would like to have is a high level description of
> . how the new text mode code differs from the old text mode code, and
> . which part of the change is responsible for how much performance gain.
>
> Maybe I have missed that in previous discussion, but this change is
> sufficiently invasive that I think you owe that to the reviewers.

You're right - this is a nearly full replacement of the code for the text
mode, so some explanation is necessary.

We have users that are focused on Business Intelligence and Data Warehousing
use cases.  They routinely load files of sizes upward of 1GB and some load n
x 10GB per load. They normally use a text format because it's the easiest
and fastest to produce and should also be fast to load.  In the absence of a
configurable format loader like Oracle's SQL*Loader, the COPY FROM path is
what they use.  We'll leave the formatting discussion for later, because
there are definitely big improvements needed to serve this audience, but
there is too much info to cover here before 8.1 freeze, so back to
performance.

Our customers noticed that PostgreSQL's COPY text performance was
substantially slower than Oracle and far slower than the underlying disk
subsystems.  Typical performance ranged from 4MB/s to 8MB/s and was
bottlenecked on CPU.  Disk subsystems we use are typically capable of n x
100MB/s write rates.  This was proving to be a big problem for daily update
loads on warehouses, so we looked into what the slowdown was.

We profiled the copy input path and found that the combination of  per
character I/O (fgetc, etc) and the subsequent per character logic was
responsible for a large fraction of the time it took to load the data.  The
parsing routine was running at 17 MB/s on the fastest (Opteron) CPUs at very
high compiler optimization, whether taking input from the network or from
file on a backend COPY.  Given other logic we've worked with for high
performance I/O subsystems, we knew we could improve this to well over
100MB/s on typical machines, but it would require rewriting the logic to
expose more to the compiler and CPU.

An improvement would require exposing much more long runs of instructions
with efficient access to memory to the compiler/CPU to allow for pipelining
and micro-parallelism to become effective in order to reach higher I/O input
rates.  That is why the current patch reads a buffer of input, then scans
for the "special" bytes (line ends, escapes, delimiters), avoiding copies
until they can be efficiently done in bulk operations.  The resulting code
now runs at over 100MB/s, which exposes the remainder of the COPY path as
the new bottleneck.  Now we see between 20% and 90% performance improvements
in COPY speed, which will make many customers in BI/DW very happy.

I do not believe that significant performance gains can be made without this
invasive change to the copy.c routine because of the need for extensive
buffering and array logic, which is requires a substantially different
organization of the processing.

I hope this helps,

- Luke



Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:

> I haven't seen any demand for users to be allowed to specify an escape char
> in text mode (BTW, that's what the docs call what you have called delimited
> mode). Maybe there's a case for it, but I somewhat doubt it.

I ran into users that could not load their data using COPY because their
data included a lot of embedded '\\'... These users would like to chose
another char to be the escape or maybe even chose to disable escaping
alltogher. However, this is probably a separate discussion that we should
think about in the future and see if it's worth introducing.

> As for variable names, choose some variant that seems reasonable. If we need
> to distinguish we should use csv_escapec and text_escapec. But I'm not sure
> we can't use the same variable name for both cases, unless they would have
> conflicting scopes.
Cool. I don't think the scopes are conflicting at the moment.

> What I would like to have is a high level description of
> . how the new text mode code differs from the old text mode code, and
> . which part of the change is responsible for how much performance gain.

Fair enough!

Where the code differs and why it's faster:
-------------------------------------------

The changes in the code only touch the *parsing* parts for *text* format.
All the rest of the COPY process remains the same. More specifically:

CopyFrom() still handles BINARY and CSV, and I extracted TEXT mode to
another function called CopyFromDelimited() (refering to delimited text
format). This function calls the rest of the modified functions below:

CopyGetData() was modified to return the number of bytes actually read
successfully. Rather than asking for a char at a time, the text parsing code
asks for big blocks of data at a time, and needs to know how much data was
accepted into the input buffer (input_buf) for parsing. CopyFrom still can
call CopyGetData and not care about the return value, like it did before.

Why is it faster: Instead of a calling the CopyGetData for every single byte
(involves huge amount of function calls, conditionals, and getc() ) we call
this function once for every buffer (64K) and then start parsing the buffer.

CopyReadLineBuffered is a modified version of CopyReadLine and is used by
CopyFromDelimited to load a line at a time into the line buffer
line_bytebuf. Very similar to CopyReadLine, but:

Why is it faster: beside to reason I explained above (non char-at-a-time
read) this function copies the line of data in it's entirety to the line
buffer, instead of a char-at-a-time with AppendStringInfoMacro. The gain
here is very large. Also, instead of checking for if '\n', if '\r', if '\\',
if '.'... For every character, we only look for the first character of the
line end sequence and when finding it we look backwards or forwards to see
if it's escaped etc..

CopyReadAllAttrs() is the equivalent of CopyReadAttribute(). However it
reads ALL the attributes of the line at once. It stores all of them in a
byte buffer called attr_bytebuf and the offsets to beginnings of the
attributes in attr_offsets array. There is a pretty detailed description in
the function header.

Why is it faster: Here lies probably the biggest improvement. For one thing
all the work is focused (all attributes parsed at once) - less function
calls and conditions. Further, the null mark side is determined ahead of
time to save from multiple calls to strlen (you'll be surprised how slow it
can make things in some occasions). Also, again, the attribute data is
loaded big chunks to the attribute buffer (1 chunk if there are no escapes
in the data) instead of a character at a time. This is another major
speedup. So, after parsing all attributes in one function, functioncall3
will find them by pointing to the attribute array using the attribute
location pointers in attr_offsets.

There is a function (bottom of the file) I wrote that is like the c library
strchr() but instead allows you to not stop scanning when the string has
embedded nulls in it, but stop only at end-of-line characters or end of
input buffer. It's pretty simple and fast, but there may be a way to
optimize it further.

WRT encodings. Conversion to server encoding still happens for every line if
necessary. However checking for mblen only needs to occur if our client
encoding is one of the 5 non-server-supported encodings, not for every case
that the encodings are different (although I don't think it previously
harmed the performance, it probably helped a bit by avoiding extra per
character conditionals). In any case, the strchrlen function skips mb chars
appropriately as well.

If I remember more relevant things I'll post them here.

Hope that helps, please ask more if there are unclear things!

Alon.





>
> Maybe I have missed that in previous discussion, but this change is
> sufficiently invasive that I think you owe that to the reviewers.
>
> cheers
>
> andrew
>
> cheers
>
> andrew
>
>
>



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Alon,

> If I remember more relevant things I'll post them here.
>
> Hope that helps, please ask more if there are unclear things!

Good detail, very useful!

- Luke



Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
Just remembered another thing.

A struct "bytebuffer" is used instead of a StringInfoData for storing the
line and attributes. A StringInfoData is actually really cool and useful,
but we don't really need it's formatting capabilities in COPY FROM (as far
as I know), and so the bytebuffer is more straightfoward and faster.

Alon.


On 6/25/05 3:52 PM, "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:

>
>
>> I haven't seen any demand for users to be allowed to specify an escape char
>> in text mode (BTW, that's what the docs call what you have called delimited
>> mode). Maybe there's a case for it, but I somewhat doubt it.
>
> I ran into users that could not load their data using COPY because their
> data included a lot of embedded '\\'... These users would like to chose
> another char to be the escape or maybe even chose to disable escaping
> alltogher. However, this is probably a separate discussion that we should
> think about in the future and see if it's worth introducing.
>
>> As for variable names, choose some variant that seems reasonable. If we need
>> to distinguish we should use csv_escapec and text_escapec. But I'm not sure
>> we can't use the same variable name for both cases, unless they would have
>> conflicting scopes.
> Cool. I don't think the scopes are conflicting at the moment.
>
>> What I would like to have is a high level description of
>> . how the new text mode code differs from the old text mode code, and
>> . which part of the change is responsible for how much performance gain.
>
> Fair enough!
>
> Where the code differs and why it's faster:
> -------------------------------------------
>
> The changes in the code only touch the *parsing* parts for *text* format.
> All the rest of the COPY process remains the same. More specifically:
>
> CopyFrom() still handles BINARY and CSV, and I extracted TEXT mode to
> another function called CopyFromDelimited() (refering to delimited text
> format). This function calls the rest of the modified functions below:
>
> CopyGetData() was modified to return the number of bytes actually read
> successfully. Rather than asking for a char at a time, the text parsing code
> asks for big blocks of data at a time, and needs to know how much data was
> accepted into the input buffer (input_buf) for parsing. CopyFrom still can
> call CopyGetData and not care about the return value, like it did before.
>
> Why is it faster: Instead of a calling the CopyGetData for every single byte
> (involves huge amount of function calls, conditionals, and getc() ) we call
> this function once for every buffer (64K) and then start parsing the buffer.
>
> CopyReadLineBuffered is a modified version of CopyReadLine and is used by
> CopyFromDelimited to load a line at a time into the line buffer
> line_bytebuf. Very similar to CopyReadLine, but:
>
> Why is it faster: beside to reason I explained above (non char-at-a-time
> read) this function copies the line of data in it's entirety to the line
> buffer, instead of a char-at-a-time with AppendStringInfoMacro. The gain
> here is very large. Also, instead of checking for if '\n', if '\r', if '\\',
> if '.'... For every character, we only look for the first character of the
> line end sequence and when finding it we look backwards or forwards to see
> if it's escaped etc..
>
> CopyReadAllAttrs() is the equivalent of CopyReadAttribute(). However it
> reads ALL the attributes of the line at once. It stores all of them in a
> byte buffer called attr_bytebuf and the offsets to beginnings of the
> attributes in attr_offsets array. There is a pretty detailed description in
> the function header.
>
> Why is it faster: Here lies probably the biggest improvement. For one thing
> all the work is focused (all attributes parsed at once) - less function
> calls and conditions. Further, the null mark side is determined ahead of
> time to save from multiple calls to strlen (you'll be surprised how slow it
> can make things in some occasions). Also, again, the attribute data is
> loaded big chunks to the attribute buffer (1 chunk if there are no escapes
> in the data) instead of a character at a time. This is another major
> speedup. So, after parsing all attributes in one function, functioncall3
> will find them by pointing to the attribute array using the attribute
> location pointers in attr_offsets.
>
> There is a function (bottom of the file) I wrote that is like the c library
> strchr() but instead allows you to not stop scanning when the string has
> embedded nulls in it, but stop only at end-of-line characters or end of
> input buffer. It's pretty simple and fast, but there may be a way to
> optimize it further.
>
> WRT encodings. Conversion to server encoding still happens for every line if
> necessary. However checking for mblen only needs to occur if our client
> encoding is one of the 5 non-server-supported encodings, not for every case
> that the encodings are different (although I don't think it previously
> harmed the performance, it probably helped a bit by avoiding extra per
> character conditionals). In any case, the strchrlen function skips mb chars
> appropriately as well.
>
> If I remember more relevant things I'll post them here.
>
> Hope that helps, please ask more if there are unclear things!
>
> Alon.
>
>
>
>
>
>>
>> Maybe I have missed that in previous discussion, but this change is
>> sufficiently invasive that I think you owe that to the reviewers.
>>
>> cheers
>>
>> andrew
>>
>> cheers
>>
>> andrew
>>
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>



Re: COPY FROM performance improvements

From
Tom Lane
Date:
"Michael Paesold" <mpaesold@gmx.at> writes:
> What about this one:
>     case COPY_OLD_FE:
> +    ereport(ERROR,
> +      (errcode(ERRCODE_CONNECTION_FAILURE),
> +       errmsg("Old FE protocal not yet supported in fast COPY processing
> (CopyGetData())")));

> This can't stay like it is, can it?

Certainly not --- that omission will have to be fixed if this is to be
accepted.  However, I'd counsel Luke and Alon not to spend any time on
it until the other questions about the patch have been dealt with.

            regards, tom lane

Re: COPY FROM performance improvements

From
Tom Lane
Date:
"Alon Goldshuv" <agoldshuv@greenplum.com> writes:
> A struct "bytebuffer" is used instead of a StringInfoData for storing the
> line and attributes. A StringInfoData is actually really cool and useful,
> but we don't really need it's formatting capabilities in COPY FROM (as far
> as I know), and so the bytebuffer is more straightfoward and faster.

Is it really faster than appendStringInfoString or
appendBinaryStringInfo?

            regards, tom lane

Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
Hmm, now that I look back at them I can't remember why I thought it is
slower. Certainly using appendStringInfoCharMacro for every char is very
slow, but I could probably use appendStringInfoString and it should be as
fast as using the bytebuffer, they both do a straight forward memcpy.

Alon.


On 6/25/05 4:27 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Alon Goldshuv" <agoldshuv@greenplum.com> writes:
>> A struct "bytebuffer" is used instead of a StringInfoData for storing the
>> line and attributes. A StringInfoData is actually really cool and useful,
>> but we don't really need it's formatting capabilities in COPY FROM (as far
>> as I know), and so the bytebuffer is more straightfoward and faster.
>
> Is it really faster than appendStringInfoString or
> appendBinaryStringInfo?
>
> regards, tom lane
>



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Alon,

> Hmm, now that I look back at them I can't remember why I thought it is
> slower. Certainly using appendStringInfoCharMacro for every char is very
> slow, but I could probably use appendStringInfoString and it should be as
> fast as using the bytebuffer, they both do a straight forward memcpy.

This is what it does:

    void
    appendStringInfoString(StringInfo str, const char *s)
    {
        appendBinaryStringInfo(str, s, strlen(s));
    }

Then it does the memcpy within appendBinaryStringInfo().  This is an extra 2
function calls, one of them a strlen() of the total contents of the string
to that point for every copy to the bytebuffer.  I'd expect this to be
slower for some of the use-cases we have.

If the StringInfo API were extended to use lengths...

- Luke



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
> If the StringInfo API were extended to use lengths...

Er, like appendBinaryStringInfo() perhaps? ;-)

Looks like what we need is there, it only lacks the offset adjustment of our
bytebuffer.

We should redo the copy.c with appendBinaryStringInfo() I think.

- Luke



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

> Is it really faster than appendStringInfoString or
> appendBinaryStringInfo?

Apparently not, attached patch strips out the other bytebuffer and replaces
it with appendBinaryStringInfo.

- Luke


Attachment

Re: COPY FROM performance improvements

From
Bruce Momjian
Date:
Please change 'if(' to 'if (', and remove parenthese like this:

       for(start = s; (*s != c) && (s < (start + len)) ; s++)

My only other comment is, "Yow, that is a massive patch".

---------------------------------------------------------------------------

Luke Lonergan wrote:
> Tom,
>
> > Is it really faster than appendStringInfoString or
> > appendBinaryStringInfo?
>
> Apparently not, attached patch strips out the other bytebuffer and replaces
> it with appendBinaryStringInfo.
>
> - Luke
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Attached has spaces between if,for, and foreach and "(", e.g., "if(" is now
"if (".  It definitely looks better to me :-)

Massive patch - agreed.  Less bloated than it was yesterday though.

What about the Protocol version 2?  Looks like it could be added back
without too much trouble.

- Luke


Attachment

Re: COPY FROM performance improvements

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> Attached has spaces between if,for, and foreach and "(", e.g., "if(" is now
> "if (".  It definitely looks better to me :-)
>
> Massive patch - agreed.  Less bloated than it was yesterday though.

Good, thanks.

> What about the Protocol version 2?  Looks like it could be added back
> without too much trouble.

Well, there has been no discussion about removing version 2 support, so
it seems it is required.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Bruce,

> Well, there has been no discussion about removing version 2 support, so
> it seems it is required.

This should do it - see attached.

- Luke


Attachment

Re: COPY FROM performance improvements

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> Bruce,
>
> > Well, there has been no discussion about removing version 2 support, so
> > it seems it is required.
>
> This should do it - see attached.

Those parentheses are still there:

      for (start = s; (*s != c) && (s < (start + len)) ; s++)

It should be:

      for (start = s; *s != c && s < start + len; s++)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Bruce,
>
> Those parentheses are still there:
>
>       for (start = s; (*s != c) && (s < (start + len)) ; s++)
>
> It should be:
>
>       for (start = s; *s != c && s < start + len; s++)

Thanks - didn't understand what you'd meant before.  They're gone now.

- Luke


Attachment

Re: COPY FROM performance improvements

From
Alvaro Herrera
Date:
On Sun, Jun 26, 2005 at 09:17:14PM -0700, Luke Lonergan wrote:
> Bruce,
> >
> > Those parentheses are still there:
> >
> >       for (start = s; (*s != c) && (s < (start + len)) ; s++)
> >
> > It should be:
> >
> >       for (start = s; *s != c && s < start + len; s++)
>
> Thanks - didn't understand what you'd meant before.  They're gone now.

Am I the only one annoyed by the fact that the patch is not very nice to
80-columns-wide terminals?  It doesn't need to be a rigid rule but I
think the code looks much better if it's not too wide.  This code is
wide already, but I think we should be making it better, not the other
way around.

Also, your text editor seems to be messing the indentation of comments
when there are ( or other symbols in the comment text.  (Maybe this
doesn't matter a lot because pgindent will fix it, but still -- it makes
it slightly more difficult to read.)

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Alvaro,

> Am I the only one annoyed by the fact that the patch is not very nice to
> 80-columns-wide terminals?  It doesn't need to be a rigid rule but I
> think the code looks much better if it's not too wide.  This code is
> wide already, but I think we should be making it better, not the other
> way around.

Yup - fixed (as well as I can without mucking readability).

> Also, your text editor seems to be messing the indentation of comments
> when there are ( or other symbols in the comment text.  (Maybe this
> doesn't matter a lot because pgindent will fix it, but still -- it makes
> it slightly more difficult to read.)

Yah - I think I fixed several mis-indented comments.  I'm using vim with
tabstop=4.  I personally don't like tabs in text and would prefer them
expanded using spaces, but that's a nice way to make small formatting
changes look huge in a cvs diff.

See attached - only formatting changes included.

- Luke

Attachment

Re: COPY FROM performance improvements

From
Andrew Dunstan
Date:

Luke Lonergan wrote:

>Yah - I think I fixed several mis-indented comments.  I'm using vim with
>tabstop=4.  I personally don't like tabs in text and would prefer them
>expanded using spaces, but that's a nice way to make small formatting
>changes look huge in a cvs diff.
>
>
>

You might like to look at running pgindent (see src/tools/pgindent) over
the file before cutting a patch. Since this is usually run over each
file just before a release, the only badness should be things from
recent patches.

cheers

andrew

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Andrew,

> You might like to look at running pgindent (see src/tools/pgindent) over
> the file before cutting a patch. Since this is usually run over each
> file just before a release, the only badness should be things from
> recent patches.

I've attached two patches, one gained from running pgindent against the
current CVS tip copy.c (:-D) and one gained by running the COPY FROM perf
improvements through the same.  Nifty tool!

Only formatting changes in these.

- Luke


Attachment

Re: COPY FROM performance improvements

From
Andrew Dunstan
Date:

Luke Lonergan wrote:

>Andrew,
>
>
>
>>You might like to look at running pgindent (see src/tools/pgindent) over
>>the file before cutting a patch. Since this is usually run over each
>>file just before a release, the only badness should be things from
>>recent patches.
>>
>>
>
>I've attached two patches, one gained from running pgindent against the
>current CVS tip copy.c (:-D) and one gained by running the COPY FROM perf
>improvements through the same.  Nifty tool!
>
>Only formatting changes in these.
>
>
>
>
Luke,

Something strange has happened. I suspect that you've inadvertantly used
GNU indent or an unpatched BSD indent. pgindent needs a special patched
BSD indent to work according to the PG standards - see the README

cheers

andrew

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Andrew,

> Something strange has happened. I suspect that you've inadvertantly used
> GNU indent or an unpatched BSD indent. pgindent needs a special patched
> BSD indent to work according to the PG standards - see the README

OK - phew!  I generated new symbols for pgindent and fixed a bug in the awk
scripting within (diff attached) and ran against the CVS tip copy.c and got
only minor changes in formatting that appear to be consistent with the rest
of the code.  I pgindent'ed the COPY FROM performance modded code and it
looks good and tests good.

Only formatting changes to the previous patch for copy.c attached.

Patch to update pgindent with new symbols and fix a bug in an awk section
(extra \\ in front of a ')').

- Luke


Attachment

Re: COPY FROM performance improvements

From
Andrew Dunstan
Date:
Luke, Alon

OK, I'm going to apply the patch to my copy and try to get my head
around it. meanwhile:

. we should not be describing things as "old" or "new".  The person
reading the code might have no knowledge of the history, and should not
need to.
. we should not have "slow" and "fast" either. We should have "text",
"csv" and "binary".

IOW, the patch comments look slightly like it is intended for after the
fact application rather than incorporation into the main code.

Are you looking at putting CSV mode into the fast code? Please let me
know if you have questions about that. There are only a few days left to
whip this into shape.

cheers

andrew

Luke Lonergan wrote:

>Andrew,
>
>
>
>>Something strange has happened. I suspect that you've inadvertantly used
>>GNU indent or an unpatched BSD indent. pgindent needs a special patched
>>BSD indent to work according to the PG standards - see the README
>>
>>
>
>OK - phew!  I generated new symbols for pgindent and fixed a bug in the awk
>scripting within (diff attached) and ran against the CVS tip copy.c and got
>only minor changes in formatting that appear to be consistent with the rest
>of the code.  I pgindent'ed the COPY FROM performance modded code and it
>looks good and tests good.
>
>Only formatting changes to the previous patch for copy.c attached.
>
>Patch to update pgindent with new symbols and fix a bug in an awk section
>(extra \\ in front of a ')').
>
>- Luke
>
>
>

Re: COPY FROM performance improvements

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> Patch to update pgindent with new symbols and fix a bug in an awk section
> (extra \\ in front of a ')').

Yea, that '\' wasn't needed.  I applied the following patch to use //
instead of "" for patterns, and removed the unneeded backslash.

I will update the typedefs in a separate commit.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/tools/pgindent/pgindent
===================================================================
RCS file: /cvsroot/pgsql/src/tools/pgindent/pgindent,v
retrieving revision 1.73
diff -c -c -r1.73 pgindent
*** src/tools/pgindent/pgindent    7 Oct 2004 14:15:50 -0000    1.73
--- src/tools/pgindent/pgindent    28 Jun 2005 23:12:07 -0000
***************
*** 50,62 ****
               if (NR >= 2)
                  print line1;
              if (NR >= 2 &&
!                 line2 ~ "^{[     ]*$" &&
!                 line1 !~ "^struct" &&
!                 line1 !~ "^enum" &&
!                 line1 !~ "^typedef" &&
!                 line1 !~ "^extern[     ][     ]*\"C\"" &&
!                 line1 !~ "=" &&
!                 line1 ~ "\)")
                  print "int    pgindent_func_no_var_fix;";
              line1 = line2;
          }
--- 50,62 ----
               if (NR >= 2)
                  print line1;
              if (NR >= 2 &&
!                 line2 ~ /^{[     ]*$/ &&
!                 line1 !~ /^struct/ &&
!                 line1 !~ /^enum/ &&
!                 line1 !~ /^typedef/ &&
!                 line1 !~ /^extern[     ][     ]*"C"/ &&
!                 line1 !~ /=/ &&
!                 line1 ~ /)/)
                  print "int    pgindent_func_no_var_fix;";
              line1 = line2;
          }
***************
*** 70,77 ****
              line2 = $0;
              if (skips > 0)
                  skips--;
!             if (line1 ~ "^#ifdef[     ]*__cplusplus" &&
!                 line2 ~ "^extern[     ]*\"C\"[     ]*$")
              {
                  print line1;
                  print line2;
--- 70,77 ----
              line2 = $0;
              if (skips > 0)
                  skips--;
!             if (line1 ~ /^#ifdef[     ]*__cplusplus/ &&
!                 line2 ~ /^extern[     ]*"C"[     ]*$/)
              {
                  print line1;
                  print line2;
***************
*** 81,88 ****
                  line2 = "";
                  skips = 2;
              }
!             else if (line1 ~ "^#ifdef[     ]*__cplusplus" &&
!                 line2 ~ "^}[     ]*$")
              {
                  print line1;
                  print "/* Close extern \"C\" */";
--- 81,88 ----
                  line2 = "";
                  skips = 2;
              }
!             else if (line1 ~ /^#ifdef[     ]*__cplusplus/ &&
!                 line2 ~ /^}[     ]*$/)
              {
                  print line1;
                  print "/* Close extern \"C\" */";
***************
*** 1732,1738 ****
  # work around misindenting of function with no variables defined
      awk '
      {
!         if ($0 ~ "^[     ]*int[     ]*pgindent_func_no_var_fix;")
          {
              if (getline && $0 != "")
                  print $0;
--- 1732,1738 ----
  # work around misindenting of function with no variables defined
      awk '
      {
!         if ($0 ~ /^[     ]*int[     ]*pgindent_func_no_var_fix;/)
          {
              if (getline && $0 != "")
                  print $0;
***************
*** 1751,1759 ****
  #            line3 = $0;
  #            if (skips > 0)
  #                skips--;
! #            if (line1 ~ "        *{$" &&
! #                line2 ~ "        *[^;{}]*;$" &&
! #                line3 ~ "        *}$")
  #            {
  #                print line2;
  #                line2 = "";
--- 1751,1759 ----
  #            line3 = $0;
  #            if (skips > 0)
  #                skips--;
! #            if (line1 ~ /        *{$/ &&
! #                line2 ~ /        *[^;{}]*;$/ &&
! #                line3 ~ /        *}$/)
  #            {
  #                print line2;
  #                line2 = "";
***************
*** 1778,1786 ****
              line3 = $0;
              if (skips > 0)
                  skips--;
!             if (line1 ~ "    *{$" &&
!                 line2 ~ "^$" &&
!                 line3 ~ "        */\\*$")
              {
                  print line1;
                  print line3;
--- 1778,1786 ----
              line3 = $0;
              if (skips > 0)
                  skips--;
!             if (line1 ~ /    *{$/ &&
!                 line2 ~ /^$/ &&
!                 line3 ~ /        *\/\*$/)
              {
                  print line1;
                  print line3;
***************
*** 1819,1826 ****
              line2 = $0;
              if (skips > 0)
                  skips--;
!             if (line1 ~ "^$" &&
!                 line2 ~ "^#endif")
              {
                  print line2;
                  line2 = "";
--- 1819,1826 ----
              line2 = $0;
              if (skips > 0)
                  skips--;
!             if (line1 ~ /^$/ &&
!                 line2 ~ /^#endif/)
              {
                  print line2;
                  line2 = "";
***************
*** 1844,1850 ****
              line1 = line2;
          }
          END {
!             if (NR >= 1 && line2 ~ "^#endif")
                  printf "\n";
              print line1;
          }' |
--- 1844,1850 ----
              line1 = line2;
          }
          END {
!             if (NR >= 1 && line2 ~ /^#endif/)
                  printf "\n";
              print line1;
          }' |
***************
*** 1853,1868 ****
  #  like real functions.
      awk '    BEGIN    {paren_level = 0}
      {
!         if ($0 ~ /^[a-zA-Z_][a-zA-Z_0-9]*[^\(]*$/)
          {
              saved_len = 0;
              saved_lines[++saved_len] = $0;
              if ((getline saved_lines[++saved_len]) == 0)
                  print saved_lines[1];
              else
!             if (saved_lines[saved_len] !~ /^[a-zA-Z_][a-zA-Z_0-9]*\(/ ||
!                 saved_lines[saved_len] ~  /^[a-zA-Z_][a-zA-Z_0-9]*\(.*\)$/ ||
!                 saved_lines[saved_len] ~  /^[a-zA-Z_][a-zA-Z_0-9]*\(.*\);$/)
              {
                  print saved_lines[1];
                  print saved_lines[2];
--- 1853,1868 ----
  #  like real functions.
      awk '    BEGIN    {paren_level = 0}
      {
!         if ($0 ~ /^[a-zA-Z_][a-zA-Z_0-9]*[^(]*$/)
          {
              saved_len = 0;
              saved_lines[++saved_len] = $0;
              if ((getline saved_lines[++saved_len]) == 0)
                  print saved_lines[1];
              else
!             if (saved_lines[saved_len] !~ /^[a-zA-Z_][a-zA-Z_0-9]*(/ ||
!                 saved_lines[saved_len] ~  /^[a-zA-Z_][a-zA-Z_0-9]*(.*)$/ ||
!                 saved_lines[saved_len] ~  /^[a-zA-Z_][a-zA-Z_0-9]*(.*);$/)
              {
                  print saved_lines[1];
                  print saved_lines[2];
***************
*** 1879,1885 ****
                  }
                  for (i=1; i <= saved_len; i++)
                  {
!                     if (i == 1 && saved_lines[saved_len] ~ /\);$/)
                      {
                          printf "%s", saved_lines[i];
                          if (substr(saved_lines[i], length(saved_lines[i]),1) != "*")
--- 1879,1885 ----
                  }
                  for (i=1; i <= saved_len; i++)
                  {
!                     if (i == 1 && saved_lines[saved_len] ~ /);$/)
                      {
                          printf "%s", saved_lines[i];
                          if (substr(saved_lines[i], length(saved_lines[i]),1) != "*")

Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Patch attached.

Greetings,
Alon.



Attachment

Re: COPY FROM performance improvements

From
Andrew Dunstan
Date:

Alon Goldshuv wrote:

>I revisited my patch and removed the code duplications that were there, and
>added support for CSV with buffered input, so CSV now runs faster too
>(although it is not as optimized as the TEXT format parsing). So now
>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>
>Patch attached.
>
>
>

I do not have time to review this  2900 line patch analytically, nor to
benchmark it. I have done some functional testing of it on Windows, and
tried to break it in text and CSV modes, and with both Unix and Windows
type line endings - I have not observed any breakage.

This does need lots of eyeballs, though.

cheers

andrew

Re: COPY FROM performance improvements

From
Mark Wong
Date:
On Thu, 14 Jul 2005 17:22:18 -0700
"Alon Goldshuv" <agoldshuv@greenplum.com> wrote:

> I revisited my patch and removed the code duplications that were there, and
> added support for CSV with buffered input, so CSV now runs faster too
> (although it is not as optimized as the TEXT format parsing). So now
> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.

Mark

Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting.

Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.






On 7/19/05 12:54 PM, "Mark Wong" <markw@osdl.org> wrote:

> On Thu, 14 Jul 2005 17:22:18 -0700
> "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>
>> I revisited my patch and removed the code duplications that were there, and
>> added support for CSV with buffered input, so CSV now runs faster too
>> (although it is not as optimized as the TEXT format parsing). So now
>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>
> Hi Alon,
>
> I'm curious, what kind of system are you testing this on?  I'm trying to
> load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
> interested in the results you would expect.
>
> Mark
>



Re: COPY FROM performance improvements

From
Mark Wong
Date:
Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create
tables:

http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
    15000000 customer.tbl
   600037902 lineitem.tbl
          25 nation.tbl
   150000000 orders.tbl
    20000000 part.tbl
    80000000 partsupp.tbl
           5 region.tbl
     1000000 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
"Alon Goldshuv" <agoldshuv@greenplum.com> wrote:

> Hi Mark,
>
> I improved the data *parsing* capabilities of COPY, and didn't touch the
> data conversion or data insertion parts of the code. The parsing improvement
> will vary largely depending on the ratio of parsing -to- converting and
> inserting.
>
> Therefore, the speed increase really depends on the nature of your data:
>
> 100GB file with
> long data rows (lots of parsing)
> Small number of columns (small number of attr conversions per row)
> less rows (less tuple insertions)
>
> Will show the best performance improvements.
>
> However, same file size 100GB with
> Short data rows (minimal parsing)
> large number of columns (large number of attr conversions per row)
> AND/OR
> more rows (more tuple insertions)
>
> Will show improvements but not as significant.
> In general I'll estimate 40%-95% improvement in load speed for the 1st case
> and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
> etc... This is for TEXT format. As for CSV, it may be faster but not as much
> as I specified here. BINARY will stay the same as before.
>
> HTH
> Alon.
>
>
>
>
>
>
> On 7/19/05 12:54 PM, "Mark Wong" <markw@osdl.org> wrote:
>
> > On Thu, 14 Jul 2005 17:22:18 -0700
> > "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
> >
> >> I revisited my patch and removed the code duplications that were there, and
> >> added support for CSV with buffered input, so CSV now runs faster too
> >> (although it is not as optimized as the TEXT format parsing). So now
> >> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
> >
> > Hi Alon,
> >
> > I'm curious, what kind of system are you testing this on?  I'm trying to
> > load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
> > interested in the results you would expect.
> >
> > Mark
> >
>

Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
Mark,

Thanks for the info.

Yes, isolating indexes out of the picture is a good idea for this purpose.

I can't really give a guess to how fast the load rate should be. I don't
know how your system is configured, and all the hardware characteristics
(and even if I knew that info I may not be able to guess...). I am pretty
confident that the load will be faster than before, I'll risk that ;-)
Looking into your TPC-H size and metadata I'll estimate that
partsupp,customer and orders will have the most significant increase in load
rate. You could start with those.

I guess the only way to really know is to try... Load several times with the
existing PG-COPY and then load several times with the patched COPY and
compare. I'll be curious to hear your results.

Thx,
Alon.




On 7/19/05 2:37 PM, "Mark Wong" <markw@osdl.org> wrote:

> Hi Alon,
>
> Yeah, that helps.  I just need to break up my scripts a little to just
> load the data and not build indexes.
>
> Is the following information good enough to give a guess about the data
> I'm loading, if you don't mind? ;)  Here's a link to my script to create
> tables:
> http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f
> 7f23437e432&path=scripts/pgsql/create_tables.sh.in
>
> File sizes:
> -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
> -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
> -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
> -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
> -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
> -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
> -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
> -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
>
> Number of rows:
> # wc -l *.tbl
>     15000000 customer.tbl
>    600037902 lineitem.tbl
>           25 nation.tbl
>    150000000 orders.tbl
>     20000000 part.tbl
>     80000000 partsupp.tbl
>            5 region.tbl
>      1000000 supplier.tbl
>
> Thanks,
> Mark
>
> On Tue, 19 Jul 2005 14:05:56 -0700
> "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>
>> Hi Mark,
>>
>> I improved the data *parsing* capabilities of COPY, and didn't touch the
>> data conversion or data insertion parts of the code. The parsing improvement
>> will vary largely depending on the ratio of parsing -to- converting and
>> inserting.
>>
>> Therefore, the speed increase really depends on the nature of your data:
>>
>> 100GB file with
>> long data rows (lots of parsing)
>> Small number of columns (small number of attr conversions per row)
>> less rows (less tuple insertions)
>>
>> Will show the best performance improvements.
>>
>> However, same file size 100GB with
>> Short data rows (minimal parsing)
>> large number of columns (large number of attr conversions per row)
>> AND/OR
>> more rows (more tuple insertions)
>>
>> Will show improvements but not as significant.
>> In general I'll estimate 40%-95% improvement in load speed for the 1st case
>> and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
>> etc... This is for TEXT format. As for CSV, it may be faster but not as much
>> as I specified here. BINARY will stay the same as before.
>>
>> HTH
>> Alon.
>>
>>
>>
>>
>>
>>
>> On 7/19/05 12:54 PM, "Mark Wong" <markw@osdl.org> wrote:
>>
>>> On Thu, 14 Jul 2005 17:22:18 -0700
>>> "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>>>
>>>> I revisited my patch and removed the code duplications that were there, and
>>>> added support for CSV with buffered input, so CSV now runs faster too
>>>> (although it is not as optimized as the TEXT format parsing). So now
>>>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
>>>> file.
>>>
>>> Hi Alon,
>>>
>>> I'm curious, what kind of system are you testing this on?  I'm trying to
>>> load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
>>> interested in the results you would expect.
>>>
>>> Mark
>>>
>>
>



Re: COPY FROM performance improvements

From
Andrew Dunstan
Date:
Mark,

You should definitely not be doing this sort of thing, I believe:

CREATE TABLE orders (
    o_orderkey INTEGER,
    o_custkey INTEGER,
    o_orderstatus CHAR(1),
    o_totalprice REAL,
    o_orderDATE DATE,
    o_orderpriority CHAR(15),
    o_clerk CHAR(15),
    o_shippriority INTEGER,
    o_comment VARCHAR(79),
    PRIMARY KEY (o_orderkey))

Create the table with no constraints, load the data, then set up primary keys and whatever other constraints you want
usingALTER TABLE. Last time I did a load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from
deferringconstarint creation. 


cheers

andrew



Mark Wong wrote:

>Hi Alon,
>
>Yeah, that helps.  I just need to break up my scripts a little to just
>load the data and not build indexes.
>
>Is the following information good enough to give a guess about the data
>I'm loading, if you don't mind? ;)  Here's a link to my script to create
>tables:

>http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in
>
>File sizes:
>-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
>-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
>-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
>-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
>-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
>-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
>-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
>-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
>
>Number of rows:
># wc -l *.tbl
>    15000000 customer.tbl
>   600037902 lineitem.tbl
>          25 nation.tbl
>   150000000 orders.tbl
>    20000000 part.tbl
>    80000000 partsupp.tbl
>           5 region.tbl
>     1000000 supplier.tbl
>
>Thanks,
>Mark
>
>On Tue, 19 Jul 2005 14:05:56 -0700
>"Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>
>
>
>>Hi Mark,
>>
>>I improved the data *parsing* capabilities of COPY, and didn't touch the
>>data conversion or data insertion parts of the code. The parsing improvement
>>will vary largely depending on the ratio of parsing -to- converting and
>>inserting.
>>
>>Therefore, the speed increase really depends on the nature of your data:
>>
>>100GB file with
>>long data rows (lots of parsing)
>>Small number of columns (small number of attr conversions per row)
>>less rows (less tuple insertions)
>>
>>Will show the best performance improvements.
>>
>>However, same file size 100GB with
>>Short data rows (minimal parsing)
>>large number of columns (large number of attr conversions per row)
>>AND/OR
>>more rows (more tuple insertions)
>>
>>Will show improvements but not as significant.
>>In general I'll estimate 40%-95% improvement in load speed for the 1st case
>>and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
>>etc... This is for TEXT format. As for CSV, it may be faster but not as much
>>as I specified here. BINARY will stay the same as before.
>>
>>HTH
>>Alon.
>>
>>
>>
>>
>>
>>
>>On 7/19/05 12:54 PM, "Mark Wong" <markw@osdl.org> wrote:
>>
>>
>>
>>>On Thu, 14 Jul 2005 17:22:18 -0700
>>>"Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>>>
>>>
>>>
>>>>I revisited my patch and removed the code duplications that were there, and
>>>>added support for CSV with buffered input, so CSV now runs faster too
>>>>(although it is not as optimized as the TEXT format parsing). So now
>>>>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>>>>
>>>>
>>>Hi Alon,
>>>
>>>I'm curious, what kind of system are you testing this on?  I'm trying to
>>>load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
>>>interested in the results you would expect.
>>>
>>>Mark
>>>
>>>
>>>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>
>

Re: COPY FROM performance improvements

From
Mark Wong
Date:
Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.

Mark

On Tue, 19 Jul 2005 18:17:52 -0400
Andrew Dunstan <andrew@dunslane.net> wrote:

> Mark,
>
> You should definitely not be doing this sort of thing, I believe:
>
> CREATE TABLE orders (
>     o_orderkey INTEGER,
>     o_custkey INTEGER,
>     o_orderstatus CHAR(1),
>     o_totalprice REAL,
>     o_orderDATE DATE,
>     o_orderpriority CHAR(15),
>     o_clerk CHAR(15),
>     o_shippriority INTEGER,
>     o_comment VARCHAR(79),
>     PRIMARY KEY (o_orderkey))
>
> Create the table with no constraints, load the data, then set up primary keys and whatever other constraints you want
usingALTER TABLE. Last time I did a load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from
deferringconstarint creation. 
>
>
> cheers
>
> andrew
>
>
>
> Mark Wong wrote:
>
> >Hi Alon,
> >
> >Yeah, that helps.  I just need to break up my scripts a little to just
> >load the data and not build indexes.
> >
> >Is the following information good enough to give a guess about the data
> >I'm loading, if you don't mind? ;)  Here's a link to my script to create
> >tables:
>
>http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in
> >
> >File sizes:
> >-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
> >-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
> >-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
> >-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
> >-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
> >-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
> >-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
> >-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
> >
> >Number of rows:
> ># wc -l *.tbl
> >    15000000 customer.tbl
> >   600037902 lineitem.tbl
> >          25 nation.tbl
> >   150000000 orders.tbl
> >    20000000 part.tbl
> >    80000000 partsupp.tbl
> >           5 region.tbl
> >     1000000 supplier.tbl
> >
> >Thanks,
> >Mark
> >
> >On Tue, 19 Jul 2005 14:05:56 -0700
> >"Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
> >
> >
> >
> >>Hi Mark,
> >>
> >>I improved the data *parsing* capabilities of COPY, and didn't touch the
> >>data conversion or data insertion parts of the code. The parsing improvement
> >>will vary largely depending on the ratio of parsing -to- converting and
> >>inserting.
> >>
> >>Therefore, the speed increase really depends on the nature of your data:
> >>
> >>100GB file with
> >>long data rows (lots of parsing)
> >>Small number of columns (small number of attr conversions per row)
> >>less rows (less tuple insertions)
> >>
> >>Will show the best performance improvements.
> >>
> >>However, same file size 100GB with
> >>Short data rows (minimal parsing)
> >>large number of columns (large number of attr conversions per row)
> >>AND/OR
> >>more rows (more tuple insertions)
> >>
> >>Will show improvements but not as significant.
> >>In general I'll estimate 40%-95% improvement in load speed for the 1st case
> >>and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
> >>etc... This is for TEXT format. As for CSV, it may be faster but not as much
> >>as I specified here. BINARY will stay the same as before.
> >>
> >>HTH
> >>Alon.
> >>
> >>
> >>
> >>
> >>
> >>
> >>On 7/19/05 12:54 PM, "Mark Wong" <markw@osdl.org> wrote:
> >>
> >>
> >>
> >>>On Thu, 14 Jul 2005 17:22:18 -0700
> >>>"Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
> >>>
> >>>
> >>>
> >>>>I revisited my patch and removed the code duplications that were there, and
> >>>>added support for CSV with buffered input, so CSV now runs faster too
> >>>>(although it is not as optimized as the TEXT format parsing). So now
> >>>>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
> >>>>
> >>>>
> >>>Hi Alon,
> >>>
> >>>I'm curious, what kind of system are you testing this on?  I'm trying to
> >>>load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
> >>>interested in the results you would expect.
> >>>
> >>>Mark
> >>>
> >>>
> >>>
> >

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Good points on all, another element in the performance expectations is the
ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier.

This patch substantially (500%) improves the efficiency of parsing in the
COPY path, which, on a 3GHz P4 desktop with a commodity disk drive
represents 8 of a total of 30 seconds of processing time.  So, by reducing
the parsing time from 8 seconds to 1.5 seconds, the overall COPY time is
reduced from 30 seconds to 23.5 seconds, or a speedup of about 20%.

On a dual 2.2GHz Opteron machine with a 6-disk SCSI RAID subsystem capable
of 240MB/s sequential read and writes, the ratios change and we see between
35% and 95% increase in COPY performance, with the bottleneck being CPU.
The disk is only running at about 90MB/s during this period.

I'd expect that as your CPUs slow down relative to your I/O speed, and
Itaniums or IT2s are quite slow, you should see an increased effect of the
parsing improvements.

One good way to validate the effect is to watch the I/O bandwidth using
vmstat 1 (on Linux) while the load is progressing.  When you watch that with
the unpatched source and with the patched source, if they are the same, you
should see no benefit from the patch (you are I/O limited).

If you check your underlying sequential write speed, you will be
bottlenecked at roughly half that in performing COPY because of the
write-through the WAL.

- Luke

On 7/19/05 3:51 PM, "Mark Wong" <markw@osdl.org> wrote:

> Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.
>
> Mark
>
> On Tue, 19 Jul 2005 18:17:52 -0400
> Andrew Dunstan <andrew@dunslane.net> wrote:
>
>> Mark,
>>
>> You should definitely not be doing this sort of thing, I believe:
>>
>> CREATE TABLE orders (
>> o_orderkey INTEGER,
>> o_custkey INTEGER,
>> o_orderstatus CHAR(1),
>> o_totalprice REAL,
>> o_orderDATE DATE,
>> o_orderpriority CHAR(15),
>> o_clerk CHAR(15),
>> o_shippriority INTEGER,
>> o_comment VARCHAR(79),
>> PRIMARY KEY (o_orderkey))
>>
>> Create the table with no constraints, load the data, then set up primary keys
>> and whatever other constraints you want using ALTER TABLE. Last time I did a
>> load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup
>> from deferring constarint creation.
>>
>>
>> cheers
>>
>> andrew
>>
>>
>>
>> Mark Wong wrote:
>>
>>> Hi Alon,
>>>
>>> Yeah, that helps.  I just need to break up my scripts a little to just
>>> load the data and not build indexes.
>>>
>>> Is the following information good enough to give a guess about the data
>>> I'm loading, if you don't mind? ;)  Here's a link to my script to create
>>> tables:
>>> http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb4
>>> 4f7f23437e432&path=scripts/pgsql/create_tables.sh.in
>>>
>>> File sizes:
>>> -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
>>> -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
>>> -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
>>> -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
>>> -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
>>> -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
>>> -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
>>> -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
>>>
>>> Number of rows:
>>> # wc -l *.tbl
>>>    15000000 customer.tbl
>>>   600037902 lineitem.tbl
>>>          25 nation.tbl
>>>   150000000 orders.tbl
>>>    20000000 part.tbl
>>>    80000000 partsupp.tbl
>>>           5 region.tbl
>>>     1000000 supplier.tbl
>>>
>>> Thanks,
>>> Mark
>>>
>>> On Tue, 19 Jul 2005 14:05:56 -0700
>>> "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>>>
>>>
>>>
>>>> Hi Mark,
>>>>
>>>> I improved the data *parsing* capabilities of COPY, and didn't touch the
>>>> data conversion or data insertion parts of the code. The parsing
>>>> improvement
>>>> will vary largely depending on the ratio of parsing -to- converting and
>>>> inserting.
>>>>
>>>> Therefore, the speed increase really depends on the nature of your data:
>>>>
>>>> 100GB file with
>>>> long data rows (lots of parsing)
>>>> Small number of columns (small number of attr conversions per row)
>>>> less rows (less tuple insertions)
>>>>
>>>> Will show the best performance improvements.
>>>>
>>>> However, same file size 100GB with
>>>> Short data rows (minimal parsing)
>>>> large number of columns (large number of attr conversions per row)
>>>> AND/OR
>>>> more rows (more tuple insertions)
>>>>
>>>> Will show improvements but not as significant.
>>>> In general I'll estimate 40%-95% improvement in load speed for the 1st case
>>>> and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
>>>> etc... This is for TEXT format. As for CSV, it may be faster but not as
>>>> much
>>>> as I specified here. BINARY will stay the same as before.
>>>>
>>>> HTH
>>>> Alon.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/19/05 12:54 PM, "Mark Wong" <markw@osdl.org> wrote:
>>>>
>>>>
>>>>
>>>>> On Thu, 14 Jul 2005 17:22:18 -0700
>>>>> "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I revisited my patch and removed the code duplications that were there,
>>>>>> and
>>>>>> added support for CSV with buffered input, so CSV now runs faster too
>>>>>> (although it is not as optimized as the TEXT format parsing). So now
>>>>>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
>>>>>> file.
>>>>>>
>>>>>>
>>>>> Hi Alon,
>>>>>
>>>>> I'm curious, what kind of system are you testing this on?  I'm trying to
>>>>> load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
>>>>> interested in the results you would expect.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



Re: COPY FROM performance improvements

From
Mark Wong
Date:
I just ran through a few tests with the v14 patch against 100GB of data
from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
give a few details, I only loaded data and started a COPY in parallel
for each the data files:
    http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/

Here's a visual of my disk layout, for those familiar with the database schema:
    http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4-010-dbt3.html

I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.

Let me know if you have any questions.

Mark

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Cool!

At what rate does your disk setup write sequential data, e.g.:
  time dd if=/dev/zero of=bigfile bs=8k count=500000

(sized for 2x RAM on a system with 2GB)

BTW - the Compaq smartarray controllers are pretty broken on Linux from a
performance standpoint in our experience.  We've had disastrously bad
results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
2.6.10, on the order of 20MB/s.

For comparison, the results on our dual opteron with a single LSI SCSI
controller with software RAID0 on a 2.6.10 kernel:

[llonergan@stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
count=500000
500000+0 records in
500000+0 records out

real    0m24.702s
user    0m0.077s
sys     0m8.794s

Which calculates out to about 161MB/s.

- Luke


On 7/21/05 2:55 PM, "Mark Wong" <markw@osdl.org> wrote:

> I just ran through a few tests with the v14 patch against 100GB of data
> from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
> give a few details, I only loaded data and started a COPY in parallel
> for each the data files:
> http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/
>
> Here's a visual of my disk layout, for those familiar with the database
> schema:
> http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
> -010-dbt3.html
>
> I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
> attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
>
> Let me know if you have any questions.
>
> Mark
>



Re: COPY FROM performance improvements

From
"Joshua D. Drake"
Date:
Luke Lonergan wrote:
> Cool!
>
> At what rate does your disk setup write sequential data, e.g.:
>   time dd if=/dev/zero of=bigfile bs=8k count=500000
>
> (sized for 2x RAM on a system with 2GB)
>
> BTW - the Compaq smartarray controllers are pretty broken on Linux from a
> performance standpoint in our experience.  We've had disastrously bad
> results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
> 2.6.10, on the order of 20MB/s.

O.k. this strikes me as interesting, now we know that Compaq and Dell
are borked for Linux. Is there a name brand server (read Enterprise)
that actually does provide reasonable performance?

>
> For comparison, the results on our dual opteron with a single LSI SCSI
> controller with software RAID0 on a 2.6.10 kernel:
>
> [llonergan@stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
> count=500000
> 500000+0 records in
> 500000+0 records out
>
> real    0m24.702s
> user    0m0.077s
> sys     0m8.794s
>
> Which calculates out to about 161MB/s.
>
> - Luke
>
>
> On 7/21/05 2:55 PM, "Mark Wong" <markw@osdl.org> wrote:
>
>
>>I just ran through a few tests with the v14 patch against 100GB of data
>>from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
>>give a few details, I only loaded data and started a COPY in parallel
>>for each the data files:
>>http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/
>>
>>Here's a visual of my disk layout, for those familiar with the database
>>schema:
>>http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
>>-010-dbt3.html
>>
>>I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
>>attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
>>
>>Let me know if you have any questions.
>>
>>Mark
>>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Joshua,

On 7/21/05 5:08 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> O.k. this strikes me as interesting, now we know that Compaq and Dell
> are borked for Linux. Is there a name brand server (read Enterprise)
> that actually does provide reasonable performance?

I think late model Dell (post the bad chipset problem, circa 2001-2?) and
IBM and Sun servers are fine because they all use simple SCSI adapters from
LSI or Adaptec.

The HP Smartarray is an aberration, they don't have good driver support for
Linux and as a consequence have some pretty bad problems with both
performance and stability.  On Windows they perform quite well.

Also - there are very big issues with some SATA controllers and Linux we've
seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
Not sure about Nvidia, but the only ones I trust now are 3Ware and the
others mentioned in earlier posts.

- Luke





Re: COPY FROM performance improvements

From
"Joshua D. Drake"
Date:
> I think late model Dell (post the bad chipset problem, circa 2001-2?) and
> IBM and Sun servers are fine because they all use simple SCSI adapters from
> LSI or Adaptec.

Well I know that isn't true at least not with ANY of the Dells my
customers have purchased in the last 18 months. They are still really,
really slow.

> Also - there are very big issues with some SATA controllers and Linux we've
> seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
> Not sure about Nvidia, but the only ones I trust now are 3Ware and the
> others mentioned in earlier posts.

I have great success with Silicon Image as long as I am running them
with Linux software RAID. The LSI controllers are also really nice.

J



>
> - Luke
>
>
>


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Joshua,

On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
> Well I know that isn't true at least not with ANY of the Dells my
> customers have purchased in the last 18 months. They are still really,
> really slow.

That's too bad, can you cite some model numbers?  SCSI?

> I have great success with Silicon Image as long as I am running them
> with Linux software RAID. The LSI controllers are also really nice.

That's good to hear, I gave up on Silicon Image controllers on Linux about 1
year ago, which kernel are you using with success?  Silicon Image
controllers are the most popular, so it's important to see them supported
well, though I'd rather see more SATA headers than 2 off of the built-in
chipsets.

- Luke



Re: COPY FROM performance improvements

From
"Andrew Dunstan"
Date:
this discussion belongs on -performance

cheers

andrew


Luke Lonergan said:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
>> Well I know that isn't true at least not with ANY of the Dells my
>> customers have purchased in the last 18 months. They are still really,
>> really slow.
>
> That's too bad, can you cite some model numbers?  SCSI?
>
>> I have great success with Silicon Image as long as I am running them
>> with Linux software RAID. The LSI controllers are also really nice.
>
> That's good to hear, I gave up on Silicon Image controllers on Linux
> about 1 year ago, which kernel are you using with success?  Silicon
> Image
> controllers are the most popular, so it's important to see them
> supported well, though I'd rather see more SATA headers than 2 off of
> the built-in chipsets.
>
> - Luke




Re: COPY FROM performance improvements

From
"Joshua D. Drake"
Date:
Luke Lonergan wrote:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
>
>>Well I know that isn't true at least not with ANY of the Dells my
>>customers have purchased in the last 18 months. They are still really,
>>really slow.
>
>
> That's too bad, can you cite some model numbers?  SCSI?

Yeah I will get them and post, but yes they are all SCSI.

>
>
>>I have great success with Silicon Image as long as I am running them
>>with Linux software RAID. The LSI controllers are also really nice.
>
>
> That's good to hear, I gave up on Silicon Image controllers on Linux about 1
> year ago, which kernel are you using with success?

Any of the 2.6 kernels. ALso the laster 2.4 (+22 I believe) support it
pretty well as well.

   Silicon Image
> controllers are the most popular, so it's important to see them supported
> well, though I'd rather see more SATA headers than 2 off of the built-in
> chipsets.
>
> - Luke
>


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: COPY FROM performance improvements

From
Patrick Welche
Date:
On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
> > Well I know that isn't true at least not with ANY of the Dells my
> > customers have purchased in the last 18 months. They are still really,
> > really slow.
>
> That's too bad, can you cite some model numbers?  SCSI?

I would be interested too, given

   http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30531


Cheers,

Patrick

Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
New patch attached. It includes very minor changes. These are changes that
were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
previous patch.

Alon.


Attachment

Re: COPY FROM performance improvements

From
Tom Lane
Date:
"Alon Goldshuv" <agoldshuv@greenplum.com> writes:
> New patch attached. It includes very minor changes. These are changes that
> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
> previous patch.

I've applied this with (rather extensive) revisions.  I didn't like what
you had done with the control structure --- loading the input buffer
only at the outermost loop level was a bad design choice IMHO.  You had
sprinkled the code with an unreasonable number of special cases in order
to try to cope with the effects of that mistake, but there were lots
of problems still left.  Some of the bugs I noticed:

* Broke old-protocol COPY, since that has no provision for stopping at
the EOF marker except by parsing the data carefully to start with.  The
backend would just hang up unless the total data size chanced to be a
multiple of 64K.

* Subtle change in interpretation of \. EOF marker (the existing code
will recognize it even when not at start of line).

* Seems to have thrown away detection of newline format discrepancies.

* Fails for zero-column tables.

* Broke display of column values during error context callback (would
always show the first column contents no matter which one is being
complained of).

* DetectLineEnd mistakenly assumes CR mode if very last character of first
bufferload is CR; need to reserve judgment until next char is available.

* DetectLineEnd fails to account for backslashed control characters,
so it will e.g. accept \ followed by \n as determining the newline
style.

* Fails to apply encoding conversion if first line exceeds copy buf
size, because when DetectLineEnd fails the quick-exit path doesn't do
it.

* There seem to be several bugs associated with the fact that input_buf[]
always has 64K of data in it even when EOF has been reached on the
input.  One example:
    echo -n 123 >zzz1
    psql> create temp table t1(f1 text);
    psql> copy t1 from '/home/tgl/zzz1';
    psql> select * from t1;
hmm ... where'd that 64K of whitespace come from?

I rewrote the patch in a way that retained most of the speedups without
changing the basic control structure (except for replacing multiple
CopyReadAttribute calls with one CopyReadAttributes call per line).

I had some difficulty in generating test cases that weren't largely
I/O-bound, but AFAICT the patch as applied is about the same speed
as what you submitted.

            regards, tom lane

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

Thanks for finding the bugs and reworking things.

> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.

You achieve the important objective of knocking the parsing stage down a
lot, but your parsing code is actually about 20% slower than Alon's.

Before your patch:
  Time: 14205.606 ms

With your patch:
  Time: 10565.374 ms

With Alon's patch:
  Time: 10289.845 ms

The parsing part of the code in your version is slower, but as a percentage
of the total it's hidden. The loss of 0.3 seconds on 143MB means:

- If parsing takes a total of 0.9 seconds, the parsing rate is 160MB/s
(143/0.9)

- If we add another 0.3 seconds to parsing to bring it to 1.2, then the
parsing rate becomes 120MB/s

When we improve the next stages of the processing (attribute conversion,
write-to disk), this will stand out a lot more.  Our objective is to get the
COPY rate *much* faster than the current poky rate of 14MB/s (after this
patch).

- Luke

On 8/6/05 2:04 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Alon Goldshuv" <agoldshuv@greenplum.com> writes:
>> New patch attached. It includes very minor changes. These are changes that
>> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
>> previous patch.
>
> I've applied this with (rather extensive) revisions.  I didn't like what
> you had done with the control structure --- loading the input buffer
> only at the outermost loop level was a bad design choice IMHO.  You had
> sprinkled the code with an unreasonable number of special cases in order
> to try to cope with the effects of that mistake, but there were lots
> of problems still left.  Some of the bugs I noticed:
>
> * Broke old-protocol COPY, since that has no provision for stopping at
> the EOF marker except by parsing the data carefully to start with.  The
> backend would just hang up unless the total data size chanced to be a
> multiple of 64K.
>
> * Subtle change in interpretation of \. EOF marker (the existing code
> will recognize it even when not at start of line).
>
> * Seems to have thrown away detection of newline format discrepancies.
>
> * Fails for zero-column tables.
>
> * Broke display of column values during error context callback (would
> always show the first column contents no matter which one is being
> complained of).
>
> * DetectLineEnd mistakenly assumes CR mode if very last character of first
> bufferload is CR; need to reserve judgment until next char is available.
>
> * DetectLineEnd fails to account for backslashed control characters,
> so it will e.g. accept \ followed by \n as determining the newline
> style.
>
> * Fails to apply encoding conversion if first line exceeds copy buf
> size, because when DetectLineEnd fails the quick-exit path doesn't do
> it.
>
> * There seem to be several bugs associated with the fact that input_buf[]
> always has 64K of data in it even when EOF has been reached on the
> input.  One example:
> echo -n 123 >zzz1
> psql> create temp table t1(f1 text);
> psql> copy t1 from '/home/tgl/zzz1';
> psql> select * from t1;
> hmm ... where'd that 64K of whitespace come from?
>
> I rewrote the patch in a way that retained most of the speedups without
> changing the basic control structure (except for replacing multiple
> CopyReadAttribute calls with one CopyReadAttributes call per line).
>
> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

The previous timings were for a table with 15 columns of mixed type.  We
also test with 1 column to make the parsing overhead more apparent.  In the
case of 1 text column with 145MB of input data:

Your patch:
  Time: 6612.599 ms

Alon's patch:
  Time: 6119.244 ms


Alon's patch is 7.5% faster here, where it was only 3% faster on the 15
column case.  This is consistent with a large difference in parsing speed
between your approach and Alon's.

I'm pretty sure that the "mistake" you refer to is responsible for the speed
improvement, and was deliberately chosen to minimize memory copies, etc.
Given that we're looking ahead to getting much higher speeds, approaching
current high performance disk speeds, we've been looking more closely at the
parsing speed.  It comes down to a tradeoff between elegant code and speed.

We'll prove it in lab tests soon, where we measure the parsing rate
directly, but these experiments show it clearly, though indirectly.

- Luke



On 8/6/05 2:04 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Alon Goldshuv" <agoldshuv@greenplum.com> writes:
>> New patch attached. It includes very minor changes. These are changes that
>> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
>> previous patch.
>
> I've applied this with (rather extensive) revisions.  I didn't like what
> you had done with the control structure --- loading the input buffer
> only at the outermost loop level was a bad design choice IMHO.  You had
> sprinkled the code with an unreasonable number of special cases in order
> to try to cope with the effects of that mistake, but there were lots
> of problems still left.  Some of the bugs I noticed:
>
> * Broke old-protocol COPY, since that has no provision for stopping at
> the EOF marker except by parsing the data carefully to start with.  The
> backend would just hang up unless the total data size chanced to be a
> multiple of 64K.
>
> * Subtle change in interpretation of \. EOF marker (the existing code
> will recognize it even when not at start of line).
>
> * Seems to have thrown away detection of newline format discrepancies.
>
> * Fails for zero-column tables.
>
> * Broke display of column values during error context callback (would
> always show the first column contents no matter which one is being
> complained of).
>
> * DetectLineEnd mistakenly assumes CR mode if very last character of first
> bufferload is CR; need to reserve judgment until next char is available.
>
> * DetectLineEnd fails to account for backslashed control characters,
> so it will e.g. accept \ followed by \n as determining the newline
> style.
>
> * Fails to apply encoding conversion if first line exceeds copy buf
> size, because when DetectLineEnd fails the quick-exit path doesn't do
> it.
>
> * There seem to be several bugs associated with the fact that input_buf[]
> always has 64K of data in it even when EOF has been reached on the
> input.  One example:
> echo -n 123 >zzz1
> psql> create temp table t1(f1 text);
> psql> copy t1 from '/home/tgl/zzz1';
> psql> select * from t1;
> hmm ... where'd that 64K of whitespace come from?
>
> I rewrote the patch in a way that retained most of the speedups without
> changing the basic control structure (except for replacing multiple
> CopyReadAttribute calls with one CopyReadAttributes call per line).
>
> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

My direct e-mails to you are apparently blocked, so I'll send this to the
list.

I've attached the case we use for load performance testing, with the data
generator modified to produce a single row version of the dataset.

I do believe that you/we will need to invert the processing loop to get the
maximum parsing speed.  We will be implementing much higher loading speeds
which require it to compete with Oracle, Netezza, Teradata, so we'll have to
work this out for the best interests of our users.

- Luke


Attachment

Re: COPY FROM performance improvements

From
Tom Lane
Date:
"Luke Lonergan" <llonergan@greenplum.com> writes:
>> I had some difficulty in generating test cases that weren't largely
>> I/O-bound, but AFAICT the patch as applied is about the same speed
>> as what you submitted.

> You achieve the important objective of knocking the parsing stage down a
> lot, but your parsing code is actually about 20% slower than Alon's.

I would like to see the exact test case you are using to make this
claim; the tests I did suggested my code is the same speed or faster.
The particular test case I was using was the "tenk1" data from the
regression database, duplicated out to about 600K rows so as to run
long enough to measure with some degree of repeatability.

As best I can tell, my version of CopyReadAttributes is significantly
quicker than Alon's, approximately balancing out the fact that my
version of CopyReadLine is slower.  I did the latter first, and would
now be tempted to rewrite it in the same style as CopyReadAttributes,
ie one pass of memory-to-memory copy using pointers rather than buffer
indexes.

BTW, late today I figured out a way to get fairly reproducible
non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
the actual inserts, thus:

create table foo ...
create function noway() returns trigger as
'begin return null; end' language plpgsql;
create trigger noway before insert on foo
  for each row execute procedure noway();
then repeat:
copy foo from '/tmp/foo.data';

If the source file is not too large to fit in kernel disk cache, then
after the first iteration there is no I/O at all.  I got numbers
that were reproducible within less than 1%, as opposed to 5% or more
variation when the thing was partially I/O bound.  Pretty useless in the
real world, of course, but great for timing COPY's data-pushing.

            regards, tom lane

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

On 8/6/05 9:08 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Luke Lonergan" <llonergan@greenplum.com> writes:
>>> I had some difficulty in generating test cases that weren't largely
>>> I/O-bound, but AFAICT the patch as applied is about the same speed
>>> as what you submitted.
>
>> You achieve the important objective of knocking the parsing stage down a
>> lot, but your parsing code is actually about 20% slower than Alon's.
>
> I would like to see the exact test case you are using to make this
> claim; the tests I did suggested my code is the same speed or faster.

I showed mine - you show yours :-)  Apparently our e-mail crossed.

> As best I can tell, my version of CopyReadAttributes is significantly
> quicker than Alon's, approximately balancing out the fact that my
> version of CopyReadLine is slower.  I did the latter first, and would
> now be tempted to rewrite it in the same style as CopyReadAttributes,
> ie one pass of memory-to-memory copy using pointers rather than buffer
> indexes.

See previous timings - looks like Alon's parsing is substantially faster.
However, I'd like him to confirm by running with the "shunt" placed at
different stages, in this case between parse and attribute conversion (not
attribute parse).

> BTW, late today I figured out a way to get fairly reproducible
> non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
> the actual inserts, thus:
>
> create table foo ...
> create function noway() returns trigger as
> 'begin return null; end' language plpgsql;
> create trigger noway before insert on foo
>   for each row execute procedure noway();
> then repeat:
> copy foo from '/tmp/foo.data';

Cool!  That's a better way than hacking code and inserting shunts.

Alon will likely hit this tomorrow.

- Luke



Re: COPY FROM performance improvements

From
"Alon Goldshuv"
Date:
I did some performance checks after the recent code commit.

The good news is that the parsing speed of COPY is now MUCH faster, which is
great. It is about 5 times faster - about 100MB/sec on my machine
(previously 20MB/sec at best, usually less).

The better news is that my original patch parsing speed reaches 120MB/sec,
about 20MB/sec faster than the code that's now in CVS. This can be
significant for the long scheme of things and for large data sets. Maybe we
can improve the current code a bit more to reach this number.

I performed those measurement by executing *only the parsing logic* of the
COPY pipeline. All data conversion (functioncall3(string...)) and tuple
handling (form_heaptuple etc...) and insertion were manually disabled. So
the only code measured is reading from disk and parsing to the attribute
level.

Cheers,
Alon.





On 8/7/05 1:21 AM, "Luke Lonergan" <llonergan@greenplum.com> wrote:

> Tom,
>
> On 8/6/05 9:08 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
>> "Luke Lonergan" <llonergan@greenplum.com> writes:
>>>> I had some difficulty in generating test cases that weren't largely
>>>> I/O-bound, but AFAICT the patch as applied is about the same speed
>>>> as what you submitted.
>>
>>> You achieve the important objective of knocking the parsing stage down a
>>> lot, but your parsing code is actually about 20% slower than Alon's.
>>
>> I would like to see the exact test case you are using to make this
>> claim; the tests I did suggested my code is the same speed or faster.
>
> I showed mine - you show yours :-)  Apparently our e-mail crossed.
>
>> As best I can tell, my version of CopyReadAttributes is significantly
>> quicker than Alon's, approximately balancing out the fact that my
>> version of CopyReadLine is slower.  I did the latter first, and would
>> now be tempted to rewrite it in the same style as CopyReadAttributes,
>> ie one pass of memory-to-memory copy using pointers rather than buffer
>> indexes.
>
> See previous timings - looks like Alon's parsing is substantially faster.
> However, I'd like him to confirm by running with the "shunt" placed at
> different stages, in this case between parse and attribute conversion (not
> attribute parse).
>
>> BTW, late today I figured out a way to get fairly reproducible
>> non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
>> the actual inserts, thus:
>>
>> create table foo ...
>> create function noway() returns trigger as
>> 'begin return null; end' language plpgsql;
>> create trigger noway before insert on foo
>>   for each row execute procedure noway();
>> then repeat:
>> copy foo from '/tmp/foo.data';
>
> Cool!  That's a better way than hacking code and inserting shunts.
>
> Alon will likely hit this tomorrow.
>
> - Luke
>



Re: COPY FROM performance improvements

From
Andrew Dunstan
Date:

Alon Goldshuv wrote:

>I performed those measurement by executing *only the parsing logic* of the
>COPY pipeline. All data conversion (functioncall3(string...)) and tuple
>handling (form_heaptuple etc...) and insertion were manually disabled. So
>the only code measured is reading from disk and parsing to the attribute
>level.
>
>

Arguably this might exaggerate the effect quite significantly.  Users
will want to know the real time effect on a complete COPY. Depending on
how much the pasing is in the total time your 20% improvement in parsing
might only be a small fraction of 20% improvement in COPY.

Like you, I'm happy we have seen a 5 times improvement in parsing. Is it
possible you can factor out something smallish from your patch that
might make up the balance?

cheers

andrew

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Andrew,

> Arguably this might exaggerate the effect quite significantly.  Users
> will want to know the real time effect on a complete COPY. Depending on
> how much the pasing is in the total time your 20% improvement in parsing
> might only be a small fraction of 20% improvement in COPY.

Arguably has already been argued.  We knew this because we profiled the
entire COPY process and came up with this approx. breakdown for a specific
case:
Parsing:                25%
Attribute Conversion:   40%
Data Insertion:         35%

Net copy rate: 8 MB/s on a filesystem that does 240 MB/s

So - if we speed up parsing by 500% or 450%, the end result is about a
20-30% speed increase in the overall process.

Note that we're still a *long* way from getting anywhere near the limit of
the I/O subsystem at 12 MB/s.  Oracle can probably get 5-8 times this data
rate, if not better.

The attribute conversion logic is also very slow and needs similar
improvements.

The reason we focused first on Parsing is that our MPP version of Bizgres
will reach data loading rates approaching the parsing speed, so we needed to
improve that part to get it out of the way.

We will continue to improve COPY speed in Bizgres so that we can provide
comparable COPY performance to Oracle and MySQL.

> Like you, I'm happy we have seen a 5 times improvement in parsing. Is it
> possible you can factor out something smallish from your patch that
> might make up the balance?

That parsing was 25% of the workload was traceable to a 3 main things:
1) Per character acquisition from source instead of buffering
2) Frequent interruption of the parsing pipeline to handle attribute
conversion
3) Lack of micro-parallelism in the character finding logic

Tom's patch took our contribution from (1) and (2) and his improvements, and
he rejected (3).  The net result is that we lost performance from the lack
of (3) but gained performance from his improvements of (1) and (2).

I believe that re-introducing (3) may bring us from 100 MB/s to 150 MB/s
parsing speed.

- Luke



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

> As best I can tell, my version of CopyReadAttributes is significantly
> quicker than Alon's, approximately balancing out the fact that my
> version of CopyReadLine is slower.  I did the latter first, and would
> now be tempted to rewrite it in the same style as CopyReadAttributes,
> ie one pass of memory-to-memory copy using pointers rather than buffer
> indexes.

I think you are right, with the exception that Alon's results prove out that
the net result of your patch is 20% slower than his.

I think with your speedup of CopyReadAttributes and some additional work on
CopyReadLine the net result could be 50% faster than Alon's patch.

The key thing that is missing is the lack of micro-parallelism in the
character processing in this version.  By "inverting the loop", or putting
the characters into a buffer on the outside, then doing fast character
scanning inside with special "fix-up" cases, we exposed long runs of
pipeline-able code to the compiler.

I think there is another way to accomplish the same thing and still preserve
the current structure, but it requires "strip mining" the character buffer
into chunks that can be processed with an explicit loop to check for the
different characters.  While it may seem artificial (it is), it will provide
the compiler with the ability to pipeline the character finding logic over
long runs.  The other necessary element will have to avoid pipeline stalls
from the "if" conditions as much as possible.

Anyway, thanks for reviewing this code and improving it - it's important to
bring speed increases to our collective customer base.  With Bizgres, we're
not satisfied with 12 MB/s, we won't stop until we saturate the I/O bus, so
we may get more extreme with the code than seems reasonable for the general
audience.

- Luke



Re: COPY FROM performance improvements

From
Simon Riggs
Date:
On Tue, 2005-08-09 at 21:48 -0700, Luke Lonergan wrote:

> The key thing that is missing is the lack of micro-parallelism in the
> character processing in this version.  By "inverting the loop", or putting
> the characters into a buffer on the outside, then doing fast character
> scanning inside with special "fix-up" cases, we exposed long runs of
> pipeline-able code to the compiler.
>
> I think there is another way to accomplish the same thing and still preserve
> the current structure, but it requires "strip mining" the character buffer
> into chunks that can be processed with an explicit loop to check for the
> different characters.  While it may seem artificial (it is), it will provide
> the compiler with the ability to pipeline the character finding logic over
> long runs.  The other necessary element will have to avoid pipeline stalls
> from the "if" conditions as much as possible.

This is a key point, IMHO.

That part of the code was specifically written to take advantage of
processing pipelines in the hardware, not because the actual theoretical
algorithm for that approach was itself faster.

Nobody's said what compiler/hardware they have been using, so since both
Alon and Tom say their character finding logic is faster, it is likely
to be down to that? Name your platforms gentlemen, please.

My feeling is that we may learn something here that applies more widely
across many parts of the code.

Best Regards, Simon Riggs



Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Simon,

> That part of the code was specifically written to take advantage of
> processing pipelines in the hardware, not because the actual theoretical
> algorithm for that approach was itself faster.

Yup, good point.

> Nobody's said what compiler/hardware they have been using, so since both
> Alon and Tom say their character finding logic is faster, it is likely
> to be down to that? Name your platforms gentlemen, please.

In this case, we've been using gcc (3.2.3 RHEL3 Linux, 3.4.3 Solaris 10) on Opteron and Intel Xeon and Pentium 4.
Alon'sperformance comparisons for the parse only were done on a HT enabled P4 3.0GHz on RHEL3 with gcc 3.2.3, probably
withoptimization -O2, but possibly -O3. 

Note that the level of microparallelism on upcoming CPUs is increasing with increasing pipeline depth.  Though there
willbe a step back on the Intel line with the introduction of the Centrino-based Xeon cores in 2006/7, other CPUs
continuethe trend, and I expect the next generation of multi-core CPUs to possibly introduce threaded
micro-architectureswhich can also be scheduled as pipelines.  The gcc 4 compiler introduces auto vectorization, which
mayenhance the optimality of some loops. 

I think the key thing is to make as much parallelism apparent to the compiler as possible, which will generally mean
loops. This means faster code on all modern CPUs and it won't hurt older CPU speeds. 

> My feeling is that we may learn something here that applies more widely
> across many parts of the code.

Yes, I think one thing we've learned is that there are important parts of the code, those that are in the data path
(COPY,sort, spill to disk, etc) that are in dire need of optimization.  For instance, the fgetc() pattern should be
bannedeverywhere in the data path. 

BTW - we are tracking down (in our spare time :-() the extremely slow sort performance.  We're seeing sort times of
1.7MB/son our fastest machines, even when the work_mem is equal to the square root of the sort set.  This is a
*serious*problem for us and we aren't getting to it - ideas are welcome. 

Optimization here means both the use of good fundamental algorithms and micro-optimization (minimize memory copies,
exposelong runs of operations to the compiler, maximize computational intensity by working in cache-resident blocks,
etc).

- Luke




Re: COPY FROM performance improvements

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Nobody's said what compiler/hardware they have been using, so since both
> Alon and Tom say their character finding logic is faster, it is likely
> to be down to that? Name your platforms gentlemen, please.

I tested on HPPA with gcc 2.95.3 and on a Pentium 4 with gcc 3.4.3.
Got pretty much the same results on both.

            regards, tom lane

Re: COPY FROM performance improvements

From
Tom Lane
Date:
"Luke Lonergan" <LLonergan@greenplum.com> writes:
> Yes, I think one thing we've learned is that there are important parts
> of the code, those that are in the data path (COPY, sort, spill to
> disk, etc) that are in dire need of optimization.  For instance, the
> fgetc() pattern should be banned everywhere in the data path.

Luke, I dislike whacking people upside the head, but this discussion
seems to presume that raw speed on Intel platforms is the only thing
that matters.  We have a few other concerns.  Portability, readability,
maintainability, and correctness all trump platform-specific
optimizations.  The COPY patch as presented lost badly on all those
counts, and you are lucky that it didn't get rejected completely.

            regards, tom lane

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Tom,

On 8/10/05 8:37 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Luke, I dislike whacking people upside the head, but this discussion
> seems to presume that raw speed on Intel platforms is the only thing
> that matters.  We have a few other concerns.  Portability, readability,
> maintainability, and correctness all trump platform-specific
> optimizations.  The COPY patch as presented lost badly on all those
> counts, and you are lucky that it didn't get rejected completely.

It's a pleasure working with you too Tom :-)

Until you present a result on platform that is faster than Alon's in the
code that was modified, our proof still stands that his is 20% faster than
yours.

Also, as we proved the last time the correctness argument was thrown in, we
can fix the bugs and still make it a lot faster - and I would stick to that
whether it's a PA-RISC, DEC Alpha, Intel or AMD or event Ultra Sparc.

PostgreSQL needs major improvement to compete with Oracle and even MySQL on
speed.  No whacking on the head is going to change that.

- Luke



Re: COPY FROM performance improvements

From
Alvaro Herrera
Date:
On Wed, Aug 10, 2005 at 09:16:08AM -0700, Luke Lonergan wrote:

> On 8/10/05 8:37 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> > Luke, I dislike whacking people upside the head, but this discussion
> > seems to presume that raw speed on Intel platforms is the only thing
> > that matters.  We have a few other concerns.  Portability, readability,
> > maintainability, and correctness all trump platform-specific
> > optimizations.  The COPY patch as presented lost badly on all those
> > counts, and you are lucky that it didn't get rejected completely.
>
> It's a pleasure working with you too Tom :-)
>
> Until you present a result on platform that is faster than Alon's in the
> code that was modified, our proof still stands that his is 20% faster than
> yours.

AFAIR he never claimed otherwise ... his point was that to gain that
additional speedup, the code has to be made considerable "worse" (in
maintenability terms.)  Have you (or Alon) tried to port the rest of the
speed improvement to the new code?  Maybe it's possible to have at least
some of it without worsening the maintenability too badly.

Another question that comes to mind is: have you tried another compiler?
I see you are all using GCC at most 3.4; maybe the new optimizing
infrastructure in GCC 4.1 means you can have most of the speedup without
uglifying the code.  What about Intel's compiler?

> PostgreSQL needs major improvement to compete with Oracle and even MySQL on
> speed.  No whacking on the head is going to change that.

Certainly.  I think the point is what cost do we want to pay for the
speedup.  I think we all agree that even if we gain a 200% speedup by
rewriting COPY in assembly, it's simply not acceptable.

Another point may be that Bizgres can have a custom patch for the extra
speedup, without inflicting the maintenance cost on the community.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)

Re: COPY FROM performance improvements

From
"Joshua D. Drake"
Date:
> Also, as we proved the last time the correctness argument was thrown in, we
> can fix the bugs and still make it a lot faster - and I would stick to that
> whether it's a PA-RISC, DEC Alpha, Intel or AMD or event Ultra Sparc.

Luke this comment doesn't work. Do you have a test case that shows that
on an Ultra Sparc or PPC that you are accurate? Both of these CPUs are
pretty big players in the Enterprise space.

> PostgreSQL needs major improvement to compete with Oracle and even MySQL on
> speed.  No whacking on the head is going to change that.

I am going to assume that you forgot to clarify this statement with "IN
BULK LOADING", because if you didn't I would like to see your test results.

My very real life experience shows that MySQL can't not keep up with
PostgreSQL under load.

Nobody here argues that PostgreSQL needs to improve. If it didn't need
to improve I would be out of business because it would be perfect.


Sincerely,

Joshua D. Drake



>
> - Luke
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly


--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: COPY FROM performance improvements

From
Bruce Momjian
Date:
Luke Lonergan wrote:
> Tom,
>
> On 8/10/05 8:37 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> > Luke, I dislike whacking people upside the head, but this discussion
> > seems to presume that raw speed on Intel platforms is the only thing
> > that matters.  We have a few other concerns.  Portability, readability,
> > maintainability, and correctness all trump platform-specific
> > optimizations.  The COPY patch as presented lost badly on all those
> > counts, and you are lucky that it didn't get rejected completely.
>
> It's a pleasure working with you too Tom :-)
>
> Until you present a result on platform that is faster than Alon's in the
> code that was modified, our proof still stands that his is 20% faster than
> yours.

Well, we could write it in assembler and make it even faster. :-)

I assume no one is suggesting that, so in such cases, we need to weigh
readability with performance.  I have not looked at the patch issues,
but usually loop unrolling is the opposite of readability, so we have to
make a tradeoff.  We have used macros in places where function call
overhead is a major hit, so we can consider loop unrolling in places
that are a major performance hit.  The macros we have used have
maintained the readability of the function call (unless you look at the
macro contents) so perhaps the optimizations you suggest can be done
with a similar eye to readability.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY FROM performance improvements

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Another question that comes to mind is: have you tried another compiler?
> I see you are all using GCC at most 3.4; maybe the new optimizing
> infrastructure in GCC 4.1 means you can have most of the speedup without
> uglifying the code.  What about Intel's compiler?

EnterpriseDB mentioned on their web page they use the Intel compiler, so
I assume they saw a speedup by using it:

    http://www.enterprisedb.com/edb-db.do

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COPY FROM performance improvements

From
"Luke Lonergan"
Date:
Alvaro,

On 8/10/05 9:46 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:

> AFAIR he never claimed otherwise ... his point was that to gain that
> additional speedup, the code has to be made considerable "worse" (in
> maintenability terms.)  Have you (or Alon) tried to port the rest of the
> speed improvement to the new code?  Maybe it's possible to have at least
> some of it without worsening the maintenability too badly.

As I suggested previously, there is another, more maintainable way to get
more performance from the parsing logic.

It involves replacing something like this:

============================
char c = input_routine()
 if (c == '\n') {
 else if (
.
.
.
 }
============================

With something like this:

============================
char [32] carr;

nread = Input_routine_new(carr,32)

  for (i=0; i<nread; i++) {
    if (carr[I] == '\n') {
.
.
.
  }
============================

And this section would run much faster (3x?).

This is what I think could make the overall patch 50% faster than it is now
(on the parsing part).

The issue that I expect we'll hear about is that since the parsing is
already 500% faster, it has vanished in the profile.  That's why Tom's
testing is not showing much difference between his and Alon's code, we
actually drop the other sections to bring it forward where we see the bigger
difference.

However, what I'm arguing here and elsewhere is that there's still a lot
more of this kind of optimization to be done.  12 MB/s COPY speed is not
enough.  There's 40% of the time in processing left to smack down.

> Another question that comes to mind is: have you tried another compiler?
> I see you are all using GCC at most 3.4; maybe the new optimizing
> infrastructure in GCC 4.1 means you can have most of the speedup without
> uglifying the code.  What about Intel's compiler?

We have routinely distributed PostgreSQL with the Intel compiler, up until
recently.  Interestingly, GCC now beats it handily in our tests on Opteron
and matches it on Xeon, which is too bad - it's my fav compiler.

The problem with this code is that it doesn't have enough micro-parallelism
without loops on the character parsing core.  The compiler can only do
register optimizations and branch prediction (poorly) unless it is given
more to work with.

>> PostgreSQL needs major improvement to compete with Oracle and even MySQL on
>> speed.  No whacking on the head is going to change that.
>
> Certainly.  I think the point is what cost do we want to pay for the
> speedup.  I think we all agree that even if we gain a 200% speedup by
> rewriting COPY in assembly, it's simply not acceptable.

Understood, and I totally agree.

> Another point may be that Bizgres can have a custom patch for the extra
> speedup, without inflicting the maintenance cost on the community.

We are committed to making Postgres the best DBMS for Business Intelligence.
Bizgres makes it safe for businesses to rely on open source for their
production uses.  As far as features go, I think the best way for our
customers is to make sure that Bizgres features are supporting the
PostgreSQL core and vis-versa.

- Luke



Re: COPY FROM performance improvements

From
Alvaro Herrera
Date:
On Wed, Aug 10, 2005 at 12:57:18PM -0400, Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Another question that comes to mind is: have you tried another compiler?
> > I see you are all using GCC at most 3.4; maybe the new optimizing
> > infrastructure in GCC 4.1 means you can have most of the speedup without
> > uglifying the code.  What about Intel's compiler?
>
> EnterpriseDB mentioned on their web page they use the Intel compiler, so
> I assume they saw a speedup by using it:
>
>     http://www.enterprisedb.com/edb-db.do

Yes, but notice these guys are Greenplum, not EDB.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)

Re: (was COPY FROM) performance improvements

From
Simon Riggs
Date:
On Wed, 2005-08-10 at 11:37 -0400, Tom Lane wrote:
> "Luke Lonergan" <LLonergan@greenplum.com> writes:
> > Yes, I think one thing we've learned is that there are important parts
> > of the code, those that are in the data path (COPY, sort, spill to
> > disk, etc) that are in dire need of optimization.  For instance, the
> > fgetc() pattern should be banned everywhere in the data path.
>
> this discussion
> seems to presume that raw speed on Intel platforms is the only thing
> that matters.  We have a few other concerns.  Portability, readability,
> maintainability, and correctness all trump platform-specific
> optimizations.

I am interested in the general principle here, not this specific case.

As you know, I have raised the need for specific hardware tuning in
certain critical areas on a number of occasions. I very much respect the
need for all of the other aspects of code quality mentioned.

Pipeline parallelism is a feature of all modern CPUs since the Pentium,
not just Intel's. I think judicious exploitation of hardware features
that are common to multiple hardware architectures would be of
considerable benefit to everybody. We do already exploit some common
hardware tuning recommendations, such as buffer word alignment, but not
others such as false sharing avoidance and pipeline parallelism of key
loops. (There may be others...)

I say "judicious" because I do not presume that I am the judge ... but I
hope that judgements in these areas can fall towards the side of greater
performance as often as possible. Hardware and OS do exist, much as I
would prefer the simplicity of life in a layered IT architecture.

Best Regards, Simon Riggs


Re: (was COPY FROM) performance improvements

From
Bruce Momjian
Date:
Simon Riggs wrote:
> As you know, I have raised the need for specific hardware tuning in
> certain critical areas on a number of occasions. I very much respect the
> need for all of the other aspects of code quality mentioned.
>
> Pipeline parallelism is a feature of all modern CPUs since the Pentium,
> not just Intel's. I think judicious exploitation of hardware features
> that are common to multiple hardware architectures would be of
> considerable benefit to everybody. We do already exploit some common
> hardware tuning recommendations, such as buffer word alignment, but not
> others such as false sharing avoidance and pipeline parallelism of key
> loops. (There may be others...)
>
> I say "judicious" because I do not presume that I am the judge ... but I
> hope that judgements in these areas can fall towards the side of greater
> performance as often as possible. Hardware and OS do exist, much as I
> would prefer the simplicity of life in a layered IT architecture.

Right. We already have per-cpu test-and-set locks, and lots of macros,
so we just need to decide what places we need these optionations, and
how to do it cleanly.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073