copy vs. C function - Mailing list pgsql-performance

From Jon Nelson
Subject copy vs. C function
Date
Msg-id CAKuK5J3hB1g+dUtwu68sDF-bED=Xv_TaBfoOoyNRQbUGu9Zs3Q@mail.gmail.com
Whole thread Raw
Responses Re: copy vs. C function  (Craig Ringer <ringerc@ringerc.id.au>)
Re: copy vs. C function  (Sam Gendler <sgendler@ideasculptor.com>)
Re: copy vs. C function  (Merlin Moncure <mmoncure@gmail.com>)
Re: [PERFORM] copy vs. C function  (rajmhn <rajmhn.ram@gmail.com>)
List pgsql-performance
I was experimenting with a few different methods of taking a line of
text, parsing it, into a set of fields, and then getting that info
into a table.

The first method involved writing a C program to parse a file, parse
the lines and output newly-formatted lines in a format that
postgresql's COPY function can use.
End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
output new data to new file -- 4 seconds, COPY new file -- 10
seconds).

The next approach I took was to write a C function in postgresql to
parse a single TEXT datum into an array of C strings, and then use
BuildTupleFromCStrings. There are 8 columns involved.
Eliding the time it takes to COPY the (raw) file into a temporary
table, this method took 120 seconds, give or take.

The difference was /quite/ a surprise to me. What is the probability
that I am doing something very, very wrong?

NOTE: the code that does the parsing is actually the same,
line-for-line, the only difference is whether the routine is called by
a postgresql function or by a C program via main, so obviously the
overhead is elsewhere.
NOTE #2: We are talking about approximately 2.6 million lines.

I was testing:

\copy some_table from 'some_file.csv' with csv
vs.
insert into some_table select (some_func(line)).* from some_temp_table;

where some_func had been defined with (one) IN TEXT and (8) OUT params
of varying types.

PostgreSQL 9.1.1 on Linux, x86_64

--
Jon

pgsql-performance by date:

Previous
From: Daniel Cristian Cruz
Date:
Subject: Re: Common slow query reasons - help with a special log
Next
From: Craig Ringer
Date:
Subject: Re: copy vs. C function