Thread: Load data from a csv file without using COPY
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. thanks
hi
AFAIK you can use copy from a jdbc command since copy allows to stream data (from stdin version)
However while faster than INSERT INTO, this might lock the target table during the process
2018-06-19 22:16 GMT+02:00 Ravi Krishna <srkrishna@yahoo.com>:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
thanks
On Tue, Jun 19, 2018 at 10:17 PM Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Once you have parsed the data it is fairly easy to use PostgreSQL "COPY FROM stdin" format. If you have all data with a tabulator separator. A simple table (t1) could look like:
COPY t1 (f1,f2) FROM stdin;
3<tab>Joe
7<tab>Jane
\.
These data can be piped directly to psql and it will be fast.
Note: NULL should be '\N', see manual:
It is the same kind of data you get with pg_dump.
./hans
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
I'm skeptical that injesting CSV of any form, even if you intentionally blow things up by converting into:
BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;
BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;
(which is what auto-commit mode looks like)
Is going to provide a meaningful benchmark for application-like usage patterns.
But anyway, I'm not familiar with any tools that make doing this particularly simple. In most situations like this I'll just import the CSV into a spreadsheet and create a formula that builds out the individual SQL commands. Whether that's useful depends a lot on how often the source CSV is updated.
That said, I have the following tool to be generally helpful in this area - though I'm thinking it doesn't do what you want here.
David J.
> On Jun 19, 2018, at 9:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote: > > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. > The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark. Cheers, Steve
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote: > > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. > The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. > > thanks I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dumpthat as separate insert statements with pg_dump —inserts. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
> > If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's > supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be > an appropriate benchmark. Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env.
> > I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dumpthat as separate insert statements with pg_dump —inserts. > This was my first thought too. However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every row. In other words, each row is un-prepared and executed individually. That is also not real life scenario.
> On Jun 19, 2018, at 10:14 PM, Ravi Krishna <srkrishna@yahoo.com> wrote: > >> >> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's >> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be >> an appropriate benchmark. > > Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in > two diff env. That's something I'd only do if I intended to rig a benchmark between a RDBMS with good bulk import and one without. If that's not your goal, your approach doesn't seem to make sense and is unlikely to provide performance metrics that are useful or related to your app performance, unless you intend to hamstring your app in exactly the same way you're running the benchmark. Maybe use your app, or write ten minutes worth of code that'll interact with the database in much the same way as your app will? Cheers, Steve
On 06/19/2018 03:14 PM, Ravi Krishna wrote: >> If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's >> supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be >> an appropriate benchmark. > Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in > two diff env. If bulk loading is the actual production target, all your RDBMS choices have their own loaders. I suggest that is what you ought tocompare.
>
> I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
>
This was my first thought too. However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every row.
In other words, each row is un-prepared and executed individually. That is also not real life scenario.
You really need to describe what you consider to be a "real life scenario"; and probably give a better idea of creation and number of these csv files. In addition to describing the relevant behavior of the application you are testing.
If you want maximum realism you should probably write integration tests for your application and then execute those at high volume.
Or at minimum give an example of the output you would want from this unknown program...
David J.
Ravi Krishna <srkrishna@yahoo.com> writes: > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. > The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. > Not clear what you mean by 'real world scenario', but you could possibly use PG's foreign table support and define a csv file as a foreign table and then have scripts which read from there and do whatever insert/update etc you need. However, this has a high level of 'fakery' going on and probably not testing what you really want. There are lots of ways that applications write to the database - different drivers (e.g. jdbc, odbc, pg etc), different commit and transaction strategies and even different ways to handle things like an update or insert process. You can even use streams and copy from an application. To get 'real world' equivalence, you really need to use the same interface as the application you are comparing. Most languages have support for processing CSV files, so you may be better off writing a small 'wrapper' app which uses the same drivers and assuming your database connectivity has been abstracted into some sort of module/library/class, use the same interface to write to the database that the application uses. Tim -- Tim Cross
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day
From: Tim Cross <theophilusx@gmail.com>
Sent: Wednesday, June 20, 2018 2:59 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Sent: Wednesday, June 20, 2018 2:59 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Ravi Krishna <srkrishna@yahoo.com> writes:
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.
There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.
To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.
Tim
--
Tim Cross
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day
Bye
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a dayByeFrom: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list
Bye
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a dayByeFrom: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
https://lists.postgresql.org/unsubscribe/
On Tuesday, June 19, 2018, Asif Ali <asif2k@hotmail.com> wrote:
On Tuesday, June 19, 2018, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the listByeFrom: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYSeriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:how the fuck i unsubscribe to this mailing list , i get more than 100 emails a dayByeFrom: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the listByeFrom: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYSeriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:how the fuck i unsubscribe to this mailing list , i get more than 100 emails a dayByeFrom: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 06/19/2018 03:13 PM, Asif Ali wrote: > please just tell me the site i will do it right away and i have marked > it junked so many times , i will keep spamming it until my email address > is removed from the list https://lists.postgresql.org/unsubscribe/ > > Bye > > ------------------------------------------------------------------------ > *From:* James Keener <jim@jimkeener.com> > *Sent:* Wednesday, June 20, 2018 3:11 AM > *To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi > Krishna > *Cc:* PG mailing List > *Subject:* Re: Load data from a csv file without using COPY > Seriously, stop spamming the list and stop cursing and acting like a > petulant child. Go to the site and unsubscribe or use a mail client that > understands the standard list headers. > > On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote: > > how the fuck i unsubscribe to this mailing list , i get more than > 100 emails a day > > Bye > > ------------------------------------------------------------------------ > *From:* Alban Hertroys <haramrae@gmail.com> > *Sent:* Wednesday, June 20, 2018 2:10 AM > *To:* Ravi Krishna > *Cc:* PG mailing List > *Subject:* Re: Load data from a csv file without using COPY > > > On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote: > > > > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csvfiles. > > The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). > > Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. > > > > thanks > > I think an easy approach would be to COPY the CSV files into a > separate database using psql's \copy command and then pg_dump that > as separate insert statements with pg_dump —inserts. > > Alban Hertroys > -- > If you can't see the forest for the trees, > cut the trees and you'll find there is no forest. > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Adrian Klaver adrian.klaver@aklaver.com
just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list
Bye
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the listByeFrom: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYSeriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:how the fuck i unsubscribe to this mailing list , i get more than 100 emails a dayByeFrom: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list
Bye
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the listByeFrom: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYSeriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:how the fuck i unsubscribe to this mailing list , i get more than 100 emails a dayByeFrom: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
> On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
> The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
> Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
>
> thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 06/19/2018 03:18 PM, Asif Ali wrote: > just tell me the site , i dont have time to waste on shitty things , i > will program a spammer to send email to this list So why subscribe in the first place? > > Bye > > ------------------------------------------------------------------------ > *From:* James Keener <jim@jimkeener.com> > *Sent:* Wednesday, June 20, 2018 3:16 AM > *To:* Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi > Krishna > *Subject:* Re: Load data from a csv file without using COPY > It's people like you who make spam filters worse for the rest of us to > the point they need to be checked daily for false positives. I'm sure > you could have found it in less time than it took to spam the list with > obscenities. > > On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote: > > please just tell me the site i will do it right away and i have > marked it junked so many times , i will keep spamming it until my > email address is removed from the list > > Bye > > ------------------------------------------------------------------------ > *From:* James Keener <jim@jimkeener.com> > *Sent:* Wednesday, June 20, 2018 3:11 AM > *To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; > Ravi Krishna > *Cc:* PG mailing List > *Subject:* Re: Load data from a csv file without using COPY > Seriously, stop spamming the list and stop cursing and acting like a > petulant child. Go to the site and unsubscribe or use a mail client > that understands the standard list headers. > > On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote: > > how the fuck i unsubscribe to this mailing list , i get more > than 100 emails a day > > Bye > > ------------------------------------------------------------------------ > *From:* Alban Hertroys <haramrae@gmail.com> > *Sent:* Wednesday, June 20, 2018 2:10 AM > *To:* Ravi Krishna > *Cc:* PG mailing List > *Subject:* Re: Load data from a csv file without using COPY > > > On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote: > > > > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data fromcsv files. > > The requirement is that the load should happen like an application writing to the database ( that is, no COPYcommand). > > Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. > > > > thanks > > I think an easy approach would be to COPY the CSV files into a > separate database using psql's \copy command and then pg_dump > that as separate insert statements with pg_dump —inserts. > > Alban Hertroys > -- > If you can't see the forest for the trees, > cut the trees and you'll find there is no forest. > > > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Adrian Klaver adrian.klaver@aklaver.com
Greetings, * Adrian Klaver (adrian.klaver@aklaver.com) wrote: > On 06/19/2018 03:18 PM, Asif Ali wrote: > >just tell me the site , i dont have time to waste on shitty things , i > >will program a spammer to send email to this list > > So why subscribe in the first place? Thanks for the attempts at helping folks, but it's been addressed. Please don't reply further on this sub-thread. Thanks! Stephen
Attachment
On 06/19/2018 01:16 PM, Ravi Krishna wrote: > In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files. > The requirement is that the load should happen like an application writing to the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row. http://csvkit.readthedocs.io/en/1.0.3/scripts/csvsql.html > > thanks > > > > -- Adrian Klaver adrian.klaver@aklaver.com
Thanks all for replying. I see that I did not explain my requirement in detail. So let me
explain it in detail.
1. Currently we have a legacy app running in DB2/LUW. Application writes to it either via Java program
or uses a custom ETL scripts using a vendor product.
or uses a custom ETL scripts using a vendor product.
2. We want to migrate it to DB2 and eliminate vendor ETL tool.
3. We now have a catch-22 situation. Should we spend time porting the app to PG without first verifying
that PG can perform as well as DB2. In other words, if some sort of testing rules out PG as a good
that PG can perform as well as DB2. In other words, if some sort of testing rules out PG as a good
replacement for DB2, why even bother to port. Of course that does not prove conclusively that if PG
passes the test, then it would mean that the app will work just as fine. But at least basic test will tell
that we are not on a wrong path.
passes the test, then it would mean that the app will work just as fine. But at least basic test will tell
that we are not on a wrong path.
4. What I am planning is:
4.a Get a set of large tables exported as a pipe delimited text file.
4.b Load them in both DB2 and PG on a similar h/w
4.c Run OLAP queries.
4.b is to test i/o. Our app is sensitive to the load times and some of the tables are really wide.
4.c is to test maturity of PG in handling complex OLAP SQLs. From what I have read, while PG
optimizer is very good in handling OLTP, it is not, as yet, as good in OLAP queries.
I just want to keep the testing tool same in 4.b for both db2 and pg. If COPY is the only way,
we will use it with something comparable on the DB2 side.
On Tue, Jun 19, 2018 at 02:32:10PM -0700, David G. Johnston wrote: > You really need to describe what you consider to be a "real life > scenario"; and probably give a better idea of creation and number of these > csv files. In addition to describing the relevant behavior of the > application you are testing. > > If you want maximum realism you should probably write integration tests for > your application and then execute those at high volume. > > Or at minimum give an example of the output you would want from this > unknown program... Hard to say what you are especially looking for that psql's \copy cannot do, but perhaps you have an interest in pg_bulkload? Here is a link to the project: https://github.com/ossc-db/pg_bulkload/ It has a couple of fancy features as well, like preventing failures of rows if loading a large file, etc. -- Michael
Attachment
On 06/19/2018 04:20 PM, Ravi Krishna wrote: > Thanks all for replying. I see that I did not explain my requirement in > detail. So let me > explain it in detail. > > 1. Currently we have a legacy app running in DB2/LUW. Application writes > to it either via Java program > or uses a custom ETL scripts using a vendor product. > 2. We want to migrate it to DB2 and eliminate vendor ETL tool. > 3. We now have a catch-22 situation. Should we spend time porting the > app to PG without first verifying > that PG can perform as well as DB2. In other words, if some sort of > testing rules out PG as a good > replacement for DB2, why even bother to port. Of course that does > not prove conclusively that if PG > passes the test, then it would mean that the app will work just as > fine. But at least basic test will tell > that we are not on a wrong path. > 4. What I am planning is: > 4.a Get a set of large tables exported as a pipe delimited text file. > 4.b Load them in both DB2 and PG on a similar h/w > 4.c Run OLAP queries. > > 4.b is to test i/o. Our app is sensitive to the load times and some of > the tables are really wide. > 4.c is to test maturity of PG in handling complex OLAP SQLs. From what I > have read, while PG > optimizer is very good in handling OLTP, it is not, as yet, as > good in OLAP queries. > > I just want to keep the testing tool same in 4.b for both db2 and pg. If To me, 4.b means find the tool that minimizes the load time in each database. This is not necessarily the same tool for each. > COPY is the only way, > we will use it with something comparable on the DB2 side. > -- Adrian Klaver adrian.klaver@aklaver.com
From: Ravi Krishna [mailto:srkrishna@yahoo.com] Sent: Tuesday, June 19, 2018 4:15 PM To: Steve Atkins <steve@blighty.com> Cc: PG mailing List <pgsql-general@lists.postgresql.org> Subject: Re: Load data from a csv file without using COPY > > If performance is relevant then your app should probably be using COPY > protocol, not line by line inserts. It's supported by most postgresql > access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark. Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env. --- Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the inputeasier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For slightlybetter performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something likethat). Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to the otherDB and try the other one. Unless you want to be really clever and make the same program do both and pick the DB by acommand line switch. :) HTH, Kevin This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them todisk. Thank you.
On 06/25/2018 09:17 AM, Kevin Brannen wrote: > From: Ravi Krishna [mailto:srkrishna@yahoo.com] > Sent: Tuesday, June 19, 2018 4:15 PM > To: Steve Atkins <steve@blighty.com> > Cc: PG mailing List <pgsql-general@lists.postgresql.org> > Subject: Re: Load data from a csv file without using COPY > >> >> If performance is relevant then your app should probably be using COPY >> protocol, not line by line inserts. It's supported by most postgresql >> access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark. > > Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env. That is not clear from this post: https://www.postgresql.org/message-id/em2345975f-0c51-42dd-a35f-ff88715e8bbb%40ravi-lenovo "3. We now have a catch-22 situation. Should we spend time porting the app to PG without first verifying that PG can perform as well as DB2. In other words, if some sort of testing rules out PG as a good replacement for DB2, why even bother to port. Of course that does not prove conclusively that if PG passes the test, then it would mean that the app will work just as fine. But at least basic test will tell that we are not on a wrong path. 4. What I am planning is: 4.a Get a set of large tables exported as a pipe delimited text file. 4.b Load them in both DB2 and PG on a similar h/w 4.c Run OLAP queries. 4.b is to test i/o. Our app is sensitive to the load times and some of the tables are really wide. 4.c is to test maturity of PG in handling complex OLAP SQLs. From what I have read, while PG optimizer is very good in handling OLTP, it is not, as yet, as good in OLAP queries. " I read that as can Postgres replace DB2 and not lose performance? That would, to me, mean use whatever works best to get the job done. > > --- > > Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the inputeasier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For slightlybetter performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something likethat). > > Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to theother DB and try the other one. Unless you want to be really clever and make the same program do both and pick the DBby a command line switch. :) > > HTH, > Kevin -- Adrian Klaver adrian.klaver@aklaver.com