Thread: Load data from a csv file without using COPY

Load data from a csv file without using COPY

From

Ravi Krishna

Date:

19 June 2018, 23:16:41

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
 
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).

Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

thanks

Re: Load data from a csv file without using COPY

From

Nicolas Paris

Date:

19 June 2018, 23:30:15

AFAIK you can use copy from a jdbc command since copy allows to stream data (from stdin version)

However while faster than INSERT INTO, this might lock the target table during the process

2018-06-19 22:16 GMT+02:00 Ravi Krishna <srkrishna@yahoo.com>:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

thanks

Re: Load data from a csv file without using COPY

From

Hans Schou

Date:

19 June 2018, 23:31:56

On Tue, Jun 19, 2018 at 10:17 PM Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).

Once you have parsed the data it is fairly easy to use PostgreSQL "COPY FROM stdin" format. If you have all data with a tabulator separator. A simple table (t1) could look like:

COPY t1 (f1,f2) FROM stdin;

3<tab>Joe

7<tab>Jane

These data can be piped directly to psql and it will be fast.

Note: NULL should be '\N', see manual:

https://www.postgresql.org/docs/current/static/sql-copy.html

It is the same kind of data you get with pg_dump.

./hans

Re: Load data from a csv file without using COPY

From

"David G. Johnston"

Date:

19 June 2018, 23:49:33

On Tue, Jun 19, 2018 at 1:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

I'm skeptical that injesting CSV of any form, even if you intentionally blow things up by converting into:

BEGIN;

INSERT INTO tbl VALUES ('','','');

COMMIT;

BEGIN;

INSERT INTO tbl VALUES ('','','');

COMMIT;

(which is what auto-commit mode looks like)

Is going to provide a meaningful benchmark for application-like usage patterns.

But anyway, I'm not familiar with any tools that make doing this particularly simple. In most situations like this I'll just import the CSV into a spreadsheet and create a formula that builds out the individual SQL commands. Whether that's useful depends a lot on how often the source CSV is updated.

That said, I have the following tool to be generally helpful in this area - though I'm thinking it doesn't do what you want here.

http://csvkit.readthedocs.io/en/1.0.3/scripts/csvsql.html

David J.

Re: Load data from a csv file without using COPY

From

Steve Atkins

Date:

20 June 2018, 00:09:22

> On Jun 19, 2018, at 9:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files.  
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command). 
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.

Cheers,
  Steve

Re: Load data from a csv file without using COPY

From

Alban Hertroys

Date:

20 June 2018, 00:10:04

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files.  
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command). 
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then
pg_dumpthat as separate insert statements with pg_dump —inserts. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Load data from a csv file without using COPY

From

Ravi Krishna

Date:

20 June 2018, 00:14:53

>
> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
> an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.

Re: Load data from a csv file without using COPY

From

Ravi Krishna

Date:

20 June 2018, 00:17:46

>
> I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then
pg_dumpthat as separate insert statements with pg_dump —inserts. 
>

This was my first thought too.  However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every
row.  
In other words, each row is un-prepared and executed individually.  That is also not real life scenario.

Re: Load data from a csv file without using COPY

From

Steve Atkins

Date:

20 June 2018, 00:23:11

> On Jun 19, 2018, at 10:14 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
>>
>> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
>> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
>> an appropriate benchmark.
>
> Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
> two diff env.

That's something I'd only do if I intended to rig a benchmark between a RDBMS with good bulk import
and one without. If that's not your goal, your approach doesn't seem to make sense and is unlikely
to provide performance metrics that are useful or related to your app performance, unless you intend
to hamstring your app in exactly the same way you're running the benchmark.

Maybe use your app, or write ten minutes worth of code that'll interact with the database in much the
same way as your app will?

Cheers,
  Steve

Re: Load data from a csv file without using COPY

From

Rob Sargent

Date:

20 June 2018, 00:24:10


On 06/19/2018 03:14 PM, Ravi Krishna wrote:
>> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
>> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
>> an appropriate benchmark.
> Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
> two diff env.
If bulk loading is the actual production target, all your RDBMS choices 
have their own loaders.  I suggest that is what you ought tocompare.

Re: Load data from a csv file without using COPY

From

"David G. Johnston"

Date:

20 June 2018, 00:32:10

On Tue, Jun 19, 2018 at 2:17 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:

>
> I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
>

This was my first thought too. However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every row.
In other words, each row is un-prepared and executed individually. That is also not real life scenario.

You really need to describe what you consider to be a "real life scenario"; and probably give a better idea of creation and number of these csv files. In addition to describing the relevant behavior of the application you are testing.

If you want maximum realism you should probably write integration tests for your application and then execute those at high volume.

Or at minimum give an example of the output you would want from this unknown program...

David J.

Re: Load data from a csv file without using COPY

From

Tim Cross

Date:

20 June 2018, 00:59:10

Ravi Krishna <srkrishna@yahoo.com> writes:

> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files. 

> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command).

> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>

Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.

There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.

To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses. 

Tim
-- 
Tim Cross

Re: Load data from a csv file without using COPY

From

Asif Ali

Date:

20 June 2018, 01:06:35

how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

From: Tim Cross <theophilusx@gmail.com>
Sent: Wednesday, June 20, 2018 2:59 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Ravi Krishna <srkrishna@yahoo.com> writes:

> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>

Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.

There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.

To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.

Tim
--
Tim Cross

Re: Load data from a csv file without using COPY

From

Asif Ali

Date:

20 June 2018, 01:06:59

how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Load data from a csv file without using COPY

From

James Keener

Date:

20 June 2018, 01:11:50

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From

Asif Ali

Date:

20 June 2018, 01:13:49

please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From

"David G. Johnston"

Date:

20 June 2018, 01:16:27

https://lists.postgresql.org/unsubscribe/

On Tuesday, June 19, 2018, Asif Ali <asif2k@hotmail.com> wrote:

please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From

James Keener

Date:

20 June 2018, 01:16:35

It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From

Adrian Klaver

Date:

20 June 2018, 01:16:47

On 06/19/2018 03:13 PM, Asif Ali wrote:
> please just tell me the site i will do it right away and i have marked 
> it junked so many times , i will keep spamming it until my email address 
> is removed from the list

https://lists.postgresql.org/unsubscribe/
> 
> Bye
> 
> ------------------------------------------------------------------------
> *From:* James Keener <jim@jimkeener.com>
> *Sent:* Wednesday, June 20, 2018 3:11 AM
> *To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi 
> Krishna
> *Cc:* PG mailing List
> *Subject:* Re: Load data from a csv file without using COPY
> Seriously, stop spamming the list and stop cursing and acting like a 
> petulant child. Go to the site and unsubscribe or use a mail client that 
> understands the standard list headers.
> 
> On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
> 
>     how the fuck i unsubscribe to this mailing list , i get more than
>     100 emails a day
> 
>     Bye
> 
>     ------------------------------------------------------------------------
>     *From:* Alban Hertroys <haramrae@gmail.com>
>     *Sent:* Wednesday, June 20, 2018 2:10 AM
>     *To:* Ravi Krishna
>     *Cc:* PG mailing List
>     *Subject:* Re: Load data from a csv file without using COPY
> 
>     > On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>     > 
>     > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from
csvfiles. 
 
>     > The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command).
>     > Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>     > 
>     > thanks
> 
>     I think an easy approach would be to COPY the CSV files into a
>     separate database using psql's \copy command and then pg_dump that
>     as separate insert statements with pg_dump —inserts.
> 
>     Alban Hertroys
>     --
>     If you can't see the forest for the trees,
>     cut the trees and you'll find there is no forest.
> 
> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Load data from a csv file without using COPY

From

Asif Ali

Date:

20 June 2018, 01:18:14

just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list

Bye

From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From

Asif Ali

Date:

20 June 2018, 01:18:45

just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list

Bye

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From

Adrian Klaver

Date:

20 June 2018, 01:20:00

On 06/19/2018 03:18 PM, Asif Ali wrote:
> just tell me the site , i dont have time to waste on shitty things , i 
> will program a spammer to send email to this list

So why subscribe in the first place?

> 
> Bye
> 
> ------------------------------------------------------------------------
> *From:* James Keener <jim@jimkeener.com>
> *Sent:* Wednesday, June 20, 2018 3:16 AM
> *To:* Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi 
> Krishna
> *Subject:* Re: Load data from a csv file without using COPY
> It's people like you who make spam filters worse for the rest of us to 
> the point they need to be checked daily for false positives. I'm sure 
> you could have found it in less time than it took to spam the list with 
> obscenities.
> 
> On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
> 
>     please just tell me the site i will do it right away and i have
>     marked it junked so many times , i will keep spamming it until my
>     email address is removed from the list
> 
>     Bye
> 
>     ------------------------------------------------------------------------
>     *From:* James Keener <jim@jimkeener.com>
>     *Sent:* Wednesday, June 20, 2018 3:11 AM
>     *To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys;
>     Ravi Krishna
>     *Cc:* PG mailing List
>     *Subject:* Re: Load data from a csv file without using COPY
>     Seriously, stop spamming the list and stop cursing and acting like a
>     petulant child. Go to the site and unsubscribe or use a mail client
>     that understands the standard list headers.
> 
>     On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
> 
>         how the fuck i unsubscribe to this mailing list , i get more
>         than 100 emails a day
> 
>         Bye
> 
>         ------------------------------------------------------------------------
>         *From:* Alban Hertroys <haramrae@gmail.com>
>         *Sent:* Wednesday, June 20, 2018 2:10 AM
>         *To:* Ravi Krishna
>         *Cc:* PG mailing List
>         *Subject:* Re: Load data from a csv file without using COPY
> 
>         > On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>         > 
>         > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data
fromcsv files. 
 
>         > The requirement is that the load should happen like an application writing to the database ( that is, no
COPYcommand).
 
>         > Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by
row.
>         > 
>         > thanks
> 
>         I think an easy approach would be to COPY the CSV files into a
>         separate database using psql's \copy command and then pg_dump
>         that as separate insert statements with pg_dump —inserts.
> 
>         Alban Hertroys
>         --
>         If you can't see the forest for the trees,
>         cut the trees and you'll find there is no forest.
> 
> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Load data from a csv file without using COPY

From

Stephen Frost

Date:

20 June 2018, 01:22:02

Greetings,

* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> On 06/19/2018 03:18 PM, Asif Ali wrote:
> >just tell me the site , i dont have time to waste on shitty things , i
> >will program a spammer to send email to this list
>
> So why subscribe in the first place?

Thanks for the attempts at helping folks, but it's been addressed.
Please don't reply further on this sub-thread.

Thanks!

Stephen

Attachment

signature.asc

Re: Load data from a csv file without using COPY

From

Adrian Klaver

Date:

20 June 2018, 01:38:29

On 06/19/2018 01:16 PM, Ravi Krishna wrote:
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

http://csvkit.readthedocs.io/en/1.0.3/scripts/csvsql.html
> 
> thanks
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Load data from a csv file without using COPY

From

"Ravi Krishna"

Date:

20 June 2018, 02:20:11

Thanks all for replying. I see that I did not explain my requirement in detail. So let me

explain it in detail.

1. Currently we have a legacy app running in DB2/LUW. Application writes to it either via Java program
or uses a custom ETL scripts using a vendor product.

2. We want to migrate it to DB2 and eliminate vendor ETL tool.

3. We now have a catch-22 situation. Should we spend time porting the app to PG without first verifying
that PG can perform as well as DB2. In other words, if some sort of testing rules out PG as a good

replacement for DB2, why even bother to port. Of course that does not prove conclusively that if PG
passes the test, then it would mean that the app will work just as fine. But at least basic test will tell
that we are not on a wrong path.

4. What I am planning is:

4.a Get a set of large tables exported as a pipe delimited text file.

4.b Load them in both DB2 and PG on a similar h/w

4.c Run OLAP queries.

4.b is to test i/o. Our app is sensitive to the load times and some of the tables are really wide.

4.c is to test maturity of PG in handling complex OLAP SQLs. From what I have read, while PG

optimizer is very good in handling OLTP, it is not, as yet, as good in OLAP queries.

I just want to keep the testing tool same in 4.b for both db2 and pg. If COPY is the only way,

we will use it with something comparable on the DB2 side.

Re: Load data from a csv file without using COPY

From

Michael Paquier

Date:

20 June 2018, 06:17:09

On Tue, Jun 19, 2018 at 02:32:10PM -0700, David G. Johnston wrote:
> You really need to describe what you consider to be a "real life
> scenario"; and probably give a better idea of creation and number of these
> csv files.  In addition to describing the relevant behavior of the
> application you are testing.
>
> If you want maximum realism you should probably write integration tests for
> your application and then execute those at high volume.
>
> Or at minimum give an example of the output you would want from this
> unknown program...

Hard to say what you are especially looking for that psql's \copy cannot
do, but perhaps you have an interest in pg_bulkload?  Here is a link to
the project:
https://github.com/ossc-db/pg_bulkload/

It has a couple of fancy features as well, like preventing failures of
rows if loading a large file, etc.
--
Michael

Attachment

signature.asc

Re: Load data from a csv file without using COPY

From

Adrian Klaver

Date:

20 June 2018, 16:55:43

On 06/19/2018 04:20 PM, Ravi Krishna wrote:
> Thanks all for replying.  I see that I did not explain my requirement in 
> detail.  So let me
> explain it in detail.
> 
> 1. Currently we have a legacy app running in DB2/LUW. Application writes 
> to it either via Java program
>      or uses a custom ETL scripts using a vendor product.
> 2. We want to migrate it to DB2 and eliminate vendor ETL tool.
> 3. We now have a catch-22 situation.  Should we spend time porting the 
> app to PG without first verifying
>      that PG can perform as well as DB2. In other words, if some sort of 
> testing rules out PG as a good
>      replacement for DB2, why even bother to port.  Of course that does 
> not prove conclusively that if PG
>      passes the test, then it would mean that the app will work just as 
> fine.  But at least basic test will tell
>     that we are not on a wrong path.
> 4. What I am planning is:
>      4.a Get a set of large tables exported as a pipe delimited text file.
>      4.b Load them in both DB2 and PG on a similar h/w
>      4.c  Run OLAP queries.
> 
> 4.b is to test i/o. Our app is sensitive to the load times and some of 
> the tables are really wide.
> 4.c is to test maturity of PG in handling complex OLAP SQLs. From what I 
> have read, while PG
>       optimizer is very good in handling OLTP, it is not, as yet, as 
> good in OLAP queries.
> 
> I just want to keep the testing tool same in 4.b for both db2 and pg. If 

To me, 4.b means find the tool that minimizes the load time in each 
database. This is not necessarily the same tool for each.

> COPY is the only way,
> we will use it with something comparable on the DB2 side.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Load data from a csv file without using COPY

From

Kevin Brannen

Date:

25 June 2018, 19:17:11

From: Ravi Krishna [mailto:srkrishna@yahoo.com]
Sent: Tuesday, June 19, 2018 4:15 PM
To: Steve Atkins <steve@blighty.com>
Cc: PG mailing List <pgsql-general@lists.postgresql.org>
Subject: Re: Load data from a csv file without using COPY

>
> If performance is relevant then your app should probably be using COPY
> protocol, not line by line inserts. It's supported by most postgresql
> access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env.

---

Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the
inputeasier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For
slightlybetter performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something
likethat). 

Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to the
otherDB and try the other one. Unless you want to be really clever and make the same program do both and pick the DB by
acommand line switch. :) 

HTH,
Kevin
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential
information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient,
youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or
attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately
notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them
todisk. Thank you.

Re: Load data from a csv file without using COPY

From

Adrian Klaver

Date:

25 June 2018, 22:52:11

On 06/25/2018 09:17 AM, Kevin Brannen wrote:
> From: Ravi Krishna [mailto:srkrishna@yahoo.com]
> Sent: Tuesday, June 19, 2018 4:15 PM
> To: Steve Atkins <steve@blighty.com>
> Cc: PG mailing List <pgsql-general@lists.postgresql.org>
> Subject: Re: Load data from a csv file without using COPY
> 
>>
>> If performance is relevant then your app should probably be using COPY
>> protocol, not line by line inserts. It's supported by most postgresql
>> access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark.
> 
> Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env.

That is not clear from this post:

https://www.postgresql.org/message-id/em2345975f-0c51-42dd-a35f-ff88715e8bbb%40ravi-lenovo


"3. We now have a catch-22 situation.  Should we spend time porting the
app to PG without first verifying
      that PG can perform as well as DB2. In other words, if some sort of
testing rules out PG as a good
      replacement for DB2, why even bother to port.  Of course that does
not prove conclusively that if PG
      passes the test, then it would mean that the app will work just as
fine.  But at least basic test will tell
     that we are not on a wrong path.
4. What I am planning is:
      4.a Get a set of large tables exported as a pipe delimited text
file.
      4.b Load them in both DB2 and PG on a similar h/w
      4.c  Run OLAP queries.

4.b is to test i/o. Our app is sensitive to the load times and some of
the tables are really wide.
4.c is to test maturity of PG in handling complex OLAP SQLs. From what I
have read, while PG
       optimizer is very good in handling OLTP, it is not, as yet, as good
in OLAP queries.

"

I read that as can Postgres replace DB2 and not lose performance? That 
would, to me, mean use whatever works best to get the job done.

> 
> ---
> 
> Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the
inputeasier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For
slightlybetter performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something
likethat).
 
> 
> Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to
theother DB and try the other one. Unless you want to be really clever and make the same program do both and pick the
DBby a command line switch. :)
 
> 
> HTH,
> Kevin


-- 
Adrian Klaver
adrian.klaver@aklaver.com