Thread: Load data from a csv file without using COPY

Load data from a csv file without using COPY

From
Ravi Krishna
Date:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
 
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).

Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

thanks





Re: Load data from a csv file without using COPY

From
Nicolas Paris
Date:
hi

AFAIK you can use copy from a jdbc command since copy allows to stream data (from stdin version)
However while faster than INSERT INTO,  this might lock the target table during the process

2018-06-19 22:16 GMT+02:00 Ravi Krishna <srkrishna@yahoo.com>:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

thanks





Re: Load data from a csv file without using COPY

From
Hans Schou
Date:
On Tue, Jun 19, 2018 at 10:17 PM Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). 

Once you have parsed the data it is fairly easy to use PostgreSQL "COPY FROM stdin" format. If you have all data with a tabulator separator. A simple table (t1) could look like:

COPY t1 (f1,f2) FROM stdin;
3<tab>Joe
7<tab>Jane
\.

These data can be piped directly to psql and it will be fast.

Note: NULL should be '\N', see manual:

It is the same kind of data you get with pg_dump.

./hans 

Re: Load data from a csv file without using COPY

From
"David G. Johnston"
Date:
On Tue, Jun 19, 2018 at 1:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

​I'm skeptical that injesting CSV of any form, even if you intentionally blow things up by converting into:

BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;
BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;

(which is what auto-commit mode looks like)

Is going to provide a meaningful ​benchmark for application-like usage patterns.

But anyway, I'm not familiar with any tools that make doing this particularly simple.  In most situations like this I'll just import the CSV into a spreadsheet and create a formula that builds out the individual SQL commands.  Whether that's useful depends a lot on how often the source CSV is updated.

That said, I have the following tool to be generally helpful in this area - though I'm thinking it doesn't do what you want here.


David J.

Re: Load data from a csv file without using COPY

From
Steve Atkins
Date:
> On Jun 19, 2018, at 9:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files.  
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command). 
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.

Cheers,
  Steve



Re: Load data from a csv file without using COPY

From
Alban Hertroys
Date:
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files.  
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command). 
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then
pg_dumpthat as separate insert statements with pg_dump —inserts. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Load data from a csv file without using COPY

From
Ravi Krishna
Date:
>
> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
> an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.

Re: Load data from a csv file without using COPY

From
Ravi Krishna
Date:
>
> I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then
pg_dumpthat as separate insert statements with pg_dump —inserts. 
>

This was my first thought too.  However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every
row.  
In other words, each row is un-prepared and executed individually.  That is also not real life scenario.



Re: Load data from a csv file without using COPY

From
Steve Atkins
Date:
> On Jun 19, 2018, at 10:14 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
>>
>> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
>> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
>> an appropriate benchmark.
>
> Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
> two diff env.

That's something I'd only do if I intended to rig a benchmark between a RDBMS with good bulk import
and one without. If that's not your goal, your approach doesn't seem to make sense and is unlikely
to provide performance metrics that are useful or related to your app performance, unless you intend
to hamstring your app in exactly the same way you're running the benchmark.

Maybe use your app, or write ten minutes worth of code that'll interact with the database in much the
same way as your app will?

Cheers,
  Steve



Re: Load data from a csv file without using COPY

From
Rob Sargent
Date:

On 06/19/2018 03:14 PM, Ravi Krishna wrote:
>> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
>> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
>> an appropriate benchmark.
> Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
> two diff env.
If bulk loading is the actual production target, all your RDBMS choices 
have their own loaders.  I suggest that is what you ought tocompare.


Re: Load data from a csv file without using COPY

From
"David G. Johnston"
Date:
On Tue, Jun 19, 2018 at 2:17 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
>

This was my first thought too.  However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every row. 
In other words, each row is un-prepared and executed individually.  That is also not real life scenario.

​You really need to describe what you consider to be a "real life​ scenario"; and probably give a better idea of creation and number of these csv files.  In addition to describing the relevant behavior of the application you are testing.

If you want maximum realism you should probably write integration tests for your application and then execute those at high volume.

Or at minimum give an example of the output you would want from this unknown program...

David J.

Re: Load data from a csv file without using COPY

From
Tim Cross
Date:
Ravi Krishna <srkrishna@yahoo.com> writes:

> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files. 
 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command).
 
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>

Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.

There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.

To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses. 

Tim
-- 
Tim Cross


Re: Load data from a csv file without using COPY

From
Asif Ali
Date:


how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day


From: Tim Cross <theophilusx@gmail.com>
Sent: Wednesday, June 20, 2018 2:59 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

Ravi Krishna <srkrishna@yahoo.com> writes:

> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>

Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.

There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.

To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.

Tim
--
Tim Cross

Re: Load data from a csv file without using COPY

From
Asif Ali
Date:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


Re: Load data from a csv file without using COPY

From
James Keener
Date:
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From
Asif Ali
Date:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From
"David G. Johnston"
Date:
https://lists.postgresql.org/unsubscribe/


On Tuesday, June 19, 2018, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From
James Keener
Date:
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From
Adrian Klaver
Date:
On 06/19/2018 03:13 PM, Asif Ali wrote:
> please just tell me the site i will do it right away and i have marked 
> it junked so many times , i will keep spamming it until my email address 
> is removed from the list

https://lists.postgresql.org/unsubscribe/
> 
> Bye
> 
> ------------------------------------------------------------------------
> *From:* James Keener <jim@jimkeener.com>
> *Sent:* Wednesday, June 20, 2018 3:11 AM
> *To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi 
> Krishna
> *Cc:* PG mailing List
> *Subject:* Re: Load data from a csv file without using COPY
> Seriously, stop spamming the list and stop cursing and acting like a 
> petulant child. Go to the site and unsubscribe or use a mail client that 
> understands the standard list headers.
> 
> On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
> 
>     how the fuck i unsubscribe to this mailing list , i get more than
>     100 emails a day
> 
>     Bye
> 
>     ------------------------------------------------------------------------
>     *From:* Alban Hertroys <haramrae@gmail.com>
>     *Sent:* Wednesday, June 20, 2018 2:10 AM
>     *To:* Ravi Krishna
>     *Cc:* PG mailing List
>     *Subject:* Re: Load data from a csv file without using COPY
> 
>     > On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>     > 
>     > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from
csvfiles. 
 
>     > The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command).
>     > Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>     > 
>     > thanks
> 
>     I think an easy approach would be to COPY the CSV files into a
>     separate database using psql's \copy command and then pg_dump that
>     as separate insert statements with pg_dump —inserts.
> 
>     Alban Hertroys
>     --
>     If you can't see the forest for the trees,
>     cut the trees and you'll find there is no forest.
> 
> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Load data from a csv file without using COPY

From
Asif Ali
Date:
just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
 
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From
Asif Ali
Date:
just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
 
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye


From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye


From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
 

> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. 
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.
>
> thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Load data from a csv file without using COPY

From
Adrian Klaver
Date:
On 06/19/2018 03:18 PM, Asif Ali wrote:
> just tell me the site , i dont have time to waste on shitty things , i 
> will program a spammer to send email to this list

So why subscribe in the first place?

> 
> Bye
> 
> ------------------------------------------------------------------------
> *From:* James Keener <jim@jimkeener.com>
> *Sent:* Wednesday, June 20, 2018 3:16 AM
> *To:* Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi 
> Krishna
> *Subject:* Re: Load data from a csv file without using COPY
> It's people like you who make spam filters worse for the rest of us to 
> the point they need to be checked daily for false positives. I'm sure 
> you could have found it in less time than it took to spam the list with 
> obscenities.
> 
> On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
> 
>     please just tell me the site i will do it right away and i have
>     marked it junked so many times , i will keep spamming it until my
>     email address is removed from the list
> 
>     Bye
> 
>     ------------------------------------------------------------------------
>     *From:* James Keener <jim@jimkeener.com>
>     *Sent:* Wednesday, June 20, 2018 3:11 AM
>     *To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys;
>     Ravi Krishna
>     *Cc:* PG mailing List
>     *Subject:* Re: Load data from a csv file without using COPY
>     Seriously, stop spamming the list and stop cursing and acting like a
>     petulant child. Go to the site and unsubscribe or use a mail client
>     that understands the standard list headers.
> 
>     On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
> 
>         how the fuck i unsubscribe to this mailing list , i get more
>         than 100 emails a day
> 
>         Bye
> 
>         ------------------------------------------------------------------------
>         *From:* Alban Hertroys <haramrae@gmail.com>
>         *Sent:* Wednesday, June 20, 2018 2:10 AM
>         *To:* Ravi Krishna
>         *Cc:* PG mailing List
>         *Subject:* Re: Load data from a csv file without using COPY
> 
>         > On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>         > 
>         > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data
fromcsv files. 
 
>         > The requirement is that the load should happen like an application writing to the database ( that is, no
COPYcommand).
 
>         > Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by
row.
>         > 
>         > thanks
> 
>         I think an easy approach would be to COPY the CSV files into a
>         separate database using psql's \copy command and then pg_dump
>         that as separate insert statements with pg_dump —inserts.
> 
>         Alban Hertroys
>         --
>         If you can't see the forest for the trees,
>         cut the trees and you'll find there is no forest.
> 
> 
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Load data from a csv file without using COPY

From
Stephen Frost
Date:
Greetings,

* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> On 06/19/2018 03:18 PM, Asif Ali wrote:
> >just tell me the site , i dont have time to waste on shitty things , i
> >will program a spammer to send email to this list
>
> So why subscribe in the first place?

Thanks for the attempts at helping folks, but it's been addressed.
Please don't reply further on this sub-thread.

Thanks!

Stephen

Attachment

Re: Load data from a csv file without using COPY

From
Adrian Klaver
Date:
On 06/19/2018 01:16 PM, Ravi Krishna wrote:
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv
files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY
command).
> Is there a tool which can do the job.  Basically parse the csv file and insert it to the database row by row.

http://csvkit.readthedocs.io/en/1.0.3/scripts/csvsql.html
> 
> thanks
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Load data from a csv file without using COPY

From
"Ravi Krishna"
Date:
Thanks all for replying.  I see that I did not explain my requirement in detail.  So let me
explain it in detail.

1. Currently we have a legacy app running in DB2/LUW. Application writes to it either via Java program
    or uses a custom ETL scripts using a vendor product.
2. We want to migrate it to DB2 and eliminate vendor ETL tool.
3. We now have a catch-22 situation.  Should we spend time porting the app to PG without first verifying
    that PG can perform as well as DB2. In other words, if some sort of testing rules out PG as a good
    replacement for DB2, why even bother to port.  Of course that does not prove conclusively that if PG
    passes the test, then it would mean that the app will work just as fine.  But at least basic test will tell
   that we are not on a wrong path.
4. What I am planning is:
    4.a Get a set of large tables exported as a pipe delimited text file.
    4.b Load them in both DB2 and PG on a similar h/w
    4.c  Run OLAP queries.

4.b is to test i/o. Our app is sensitive to the load times and some of the tables are really wide.
4.c is to test maturity of PG in handling complex OLAP SQLs. From what I have read, while PG
     optimizer is very good in handling OLTP, it is not, as yet, as good in OLAP queries.

I just want to keep the testing tool same in 4.b for both db2 and pg. If COPY is the only way,
we will use it with something comparable on the DB2 side.

Re: Load data from a csv file without using COPY

From
Michael Paquier
Date:
On Tue, Jun 19, 2018 at 02:32:10PM -0700, David G. Johnston wrote:
> ​You really need to describe what you consider to be a "real life​
> scenario"; and probably give a better idea of creation and number of these
> csv files.  In addition to describing the relevant behavior of the
> application you are testing.
>
> If you want maximum realism you should probably write integration tests for
> your application and then execute those at high volume.
>
> Or at minimum give an example of the output you would want from this
> unknown program...

Hard to say what you are especially looking for that psql's \copy cannot
do, but perhaps you have an interest in pg_bulkload?  Here is a link to
the project:
https://github.com/ossc-db/pg_bulkload/

It has a couple of fancy features as well, like preventing failures of
rows if loading a large file, etc.
--
Michael

Attachment

Re: Load data from a csv file without using COPY

From
Adrian Klaver
Date:
On 06/19/2018 04:20 PM, Ravi Krishna wrote:
> Thanks all for replying.  I see that I did not explain my requirement in 
> detail.  So let me
> explain it in detail.
> 
> 1. Currently we have a legacy app running in DB2/LUW. Application writes 
> to it either via Java program
>      or uses a custom ETL scripts using a vendor product.
> 2. We want to migrate it to DB2 and eliminate vendor ETL tool.
> 3. We now have a catch-22 situation.  Should we spend time porting the 
> app to PG without first verifying
>      that PG can perform as well as DB2. In other words, if some sort of 
> testing rules out PG as a good
>      replacement for DB2, why even bother to port.  Of course that does 
> not prove conclusively that if PG
>      passes the test, then it would mean that the app will work just as 
> fine.  But at least basic test will tell
>     that we are not on a wrong path.
> 4. What I am planning is:
>      4.a Get a set of large tables exported as a pipe delimited text file.
>      4.b Load them in both DB2 and PG on a similar h/w
>      4.c  Run OLAP queries.
> 
> 4.b is to test i/o. Our app is sensitive to the load times and some of 
> the tables are really wide.
> 4.c is to test maturity of PG in handling complex OLAP SQLs. From what I 
> have read, while PG
>       optimizer is very good in handling OLTP, it is not, as yet, as 
> good in OLAP queries.
> 
> I just want to keep the testing tool same in 4.b for both db2 and pg. If 

To me, 4.b means find the tool that minimizes the load time in each 
database. This is not necessarily the same tool for each.

> COPY is the only way,
> we will use it with something comparable on the DB2 side.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


RE: Load data from a csv file without using COPY

From
Kevin Brannen
Date:
From: Ravi Krishna [mailto:srkrishna@yahoo.com]
Sent: Tuesday, June 19, 2018 4:15 PM
To: Steve Atkins <steve@blighty.com>
Cc: PG mailing List <pgsql-general@lists.postgresql.org>
Subject: Re: Load data from a csv file without using COPY

>
> If performance is relevant then your app should probably be using COPY
> protocol, not line by line inserts. It's supported by most postgresql
> access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env.

---

Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the
inputeasier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For
slightlybetter performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something
likethat). 

Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to the
otherDB and try the other one. Unless you want to be really clever and make the same program do both and pick the DB by
acommand line switch. :) 

HTH,
Kevin
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential
information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient,
youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or
attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately
notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them
todisk. Thank you. 


Re: Load data from a csv file without using COPY

From
Adrian Klaver
Date:
On 06/25/2018 09:17 AM, Kevin Brannen wrote:
> From: Ravi Krishna [mailto:srkrishna@yahoo.com]
> Sent: Tuesday, June 19, 2018 4:15 PM
> To: Steve Atkins <steve@blighty.com>
> Cc: PG mailing List <pgsql-general@lists.postgresql.org>
> Subject: Re: Load data from a csv file without using COPY
> 
>>
>> If performance is relevant then your app should probably be using COPY
>> protocol, not line by line inserts. It's supported by most postgresql
>> access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark.
> 
> Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env.

That is not clear from this post:

https://www.postgresql.org/message-id/em2345975f-0c51-42dd-a35f-ff88715e8bbb%40ravi-lenovo


"3. We now have a catch-22 situation.  Should we spend time porting the
app to PG without first verifying
      that PG can perform as well as DB2. In other words, if some sort of
testing rules out PG as a good
      replacement for DB2, why even bother to port.  Of course that does
not prove conclusively that if PG
      passes the test, then it would mean that the app will work just as
fine.  But at least basic test will tell
     that we are not on a wrong path.
4. What I am planning is:
      4.a Get a set of large tables exported as a pipe delimited text
file.
      4.b Load them in both DB2 and PG on a similar h/w
      4.c  Run OLAP queries.

4.b is to test i/o. Our app is sensitive to the load times and some of
the tables are really wide.
4.c is to test maturity of PG in handling complex OLAP SQLs. From what I
have read, while PG
       optimizer is very good in handling OLTP, it is not, as yet, as good
in OLAP queries.

"

I read that as can Postgres replace DB2 and not lose performance? That 
would, to me, mean use whatever works best to get the job done.

> 
> ---
> 
> Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the
inputeasier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For
slightlybetter performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something
likethat).
 
> 
> Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to
theother DB and try the other one. Unless you want to be really clever and make the same program do both and pick the
DBby a command line switch. :)
 
> 
> HTH,
> Kevin


-- 
Adrian Klaver
adrian.klaver@aklaver.com