Thread: populating a table via the COPY command using C code.

populating a table via the COPY command using C code.

From
"Mak, Jason"
Date:
> hi,
>
> I'm writing an application in C that basically converts binary data into something meaningful.  My first attempt was
toparse the binary and insert directly to the database in one step.  But this proved to be very slow.  So I decided to
gowith a two step process.  The first step is to parse the data and create a flat file with tab delimited fields.  The
secondstep is to load this data using the COPY command.  I don't quite understand how this is done within C.  Can
someoneprovide me with some examples.  I've already done some searches on the internet.  the examples that I found
don'tmatch with I'm trying to do.  Please help! 
>
> thanks,
> jason.
>

Re: populating a table via the COPY command using C code.

From
Michael Fuhr
Date:
On Wed, Apr 27, 2005 at 01:12:42PM -0400, Mak, Jason wrote:
>
> The second step is to load this data using the COPY command.
> I don't quite understand how this is done within C.

Are you writing a client application that uses libpq?  If so, have
you seen "Functions Associated with the COPY Command" in the libpq
chapter of the documentation?

http://www.postgresql.org/docs/8.0/interactive/libpq-copy.html

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: populating a table via the COPY command using C code.

From
Michael Fuhr
Date:
[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:
>
> Yes, my application is a client application that uses libpq api, ie.
> PQexec, etc...  I have looked at the "Functions Associated with the COPY
> Command".  But I still don't understand.  what I really need is an
> example of how those api's(PQputCopyData) are used, other than the
> "simple" example that's provided.

What example are you looking at and what don't you understand about it?

> This "dataload" should be relatively simple.  I already have a flat
> file created.  I should be able to use some api and say here is the
> pointer to my db connection and here is a pointer to the flat file.
> now do your thing.  Perhaps you can explain this to me.

libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database.  As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished.  Add the necessary file I/O
statements and there's your function.

Do you have a reason for using an intermediate file?  Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: populating a table via the COPY command using C code.

From
"Mak, Jason"
Date:
> What example are you looking at and what don't you understand about it?

Some of the examples that I looked over are either from the internet or from the Postgres Manual.  The API I'm refering
tois PQputCopyData.  However, with the explanation given below.  I'm starting to understand. 

> libpq provides the primitives that you could use to implement such
> an API: it would be a trivial matter to write a function that opens
> the indicated file, reads its contents, and sends them to the
> database.  As the documentation indicates, you'd use PQexec() or
> its ilk to send a COPY FROM STDIN command (see the COPY documentation
> for the exact syntax), then PQputCopyData() or PQputline() to send
> the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
> to indicate that you're finished.  Add the necessary file I/O
> statements and there's your function.

so basically in C, I would open some file i/o using fopen and in a loop.  Do something like a read line into the buffer
withsome byte count and send that to the database using the PQputCopyData.  Is this correct?? 

> Do you have a reason for using an intermediate file?  Instead of
> writing data to the file and then reading it back, you could use
> PQputCopyData() or PQputline() to send the data directly to the
> database.

For the project I'm working on.  We basically setup a postgres data warehouse.  We have a large set of binary data that
needsto be parsed and translated into something meaningful.  We intend to load this processed data into 3 tables using
thequickest means possible.  I've already tried parsing and doing inserts.  but this proved to be very slow.  So I
figureda 2 step automated process.  The first step would be to parse the data and create 3 separate files.  then load
eachfile into the warehouse.  Never considered using PQputCopyData in realtime.  Not sure how this would work given 3
differenttables that hold differnet data or how fast it's going to be.  but I have tried the last approach.  It works
fairlywell.  The only problem is the lack of insight into where it is during the load processing. 

What's your thoughts??  which approach would be the fastest?
1) 2 step process.
2) realtime PQputCopyData - not sure how this would work with 3 different tables.
3) COPY tablename FROM 'filename'

thanks,
jason.



-----Original Message-----
From: Michael Fuhr [mailto:mike@fuhr.org]
Sent: Wednesday, April 27, 2005 3:46 PM
To: Mak, Jason
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] populating a table via the COPY command using C
code.


[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:
>
> Yes, my application is a client application that uses libpq api, ie.
> PQexec, etc...  I have looked at the "Functions Associated with the COPY
> Command".  But I still don't understand.  what I really need is an
> example of how those api's(PQputCopyData) are used, other than the
> "simple" example that's provided.

What example are you looking at and what don't you understand about it?

> This "dataload" should be relatively simple.  I already have a flat
> file created.  I should be able to use some api and say here is the
> pointer to my db connection and here is a pointer to the flat file.
> now do your thing.  Perhaps you can explain this to me.

libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database.  As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished.  Add the necessary file I/O
statements and there's your function.

Do you have a reason for using an intermediate file?  Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: populating a table via the COPY command using C code.

From
Sean Davis
Date:
On Apr 27, 2005, at 4:48 PM, Mak, Jason wrote:

> What's your thoughts??  which approach would be the fastest?
> 1) 2 step process.
> 2) realtime PQputCopyData - not sure how this would work with 3
> different tables.
> 3) COPY tablename FROM 'filename'
>
> thanks,
> jason.

COPY tablename FROM 'filename'

is VERY fast.  Generally, I think people generally load the data into
postgres using COPY (perhaps into a "loader" table that isn't in the
same format that the final tables will be in) and then do data
manipulation and cleaning within the database using database tools.
This paradigm may or may not work for you, but it seems to be pretty
general.

Sean