Thread: create a script which imports csv data

create a script which imports csv data

From
Robert Buckley
Date:
Hi,

I have to create a script which imports csv data into postgresql ...and have a few questions about the best way to do it.

The csv data is automatically created from an external database so I have no influence over which columns etc are downloaded.

The csv comes without an fid field and has therefore no unique identifier.

How can I best create a table for the import?  

Would I first create a table without an fid and then after the import create a sequence and add the sequence to the table, then somehow update the fid field?

could anyone show me the best way to do this?


at the moment I am doing this...which makes postgresql throw an error because obviously the 'fid' field is missing from the csv data!

CREATE SEQUENCE fid_seq INCREMENT 1 START 1;

CREATE TABLE Anlagenregister_Aktuell_2011 (
fid INTEGER NOT NULL DEFAULT nextval('fid_seq'),
Firma TEXT,
Anlagenschluessel TEXT,
Anschrift TEXT,
PLZ TEXT,
Ort TEXT,
Bundesland TEXT,
Energietraeger TEXT,
Inbetriebnahmedatum DATE,
Netzzugangsdatum DATE,
Netzabgangsdatum DATE,
Ausserbetriebnahmedatum DATE,
Installierte_Leistung_kW numeric(11,4),
Lastgangmessung TEXT,
Einspeisemanagement TEXT,
Spannungsebene TEXT,
Zaehlpunktbezeichnung TEXT,
Anlagentyp TEXT,
Geographische_Koordinaten TEXT,
Schalloptimierung TEXT,
Biomasse_KWK_Bonus TEXT,
Biomasse_Technologie_Bonus TEXT,
PRIMARY KEY (fid)
);

copy Anlagenregister_Aktuell_2011 FROM '/var/www/Anlagenregister_Aktuell_2011.csv' DELIMITERS ';' CSV;


thanks,

Rob

Re: create a script which imports csv data

From
Raymond O'Donnell
Date:
On 28/06/2012 12:53, Robert Buckley wrote:
> Hi,
>
> I have to create a script which imports csv data into postgresql ...and
> have a few questions about the best way to do it.
>
> The csv data is automatically created from an external database so I
> have no influence over which columns etc are downloaded.
>
> The csv comes without an fid field and has therefore no unique identifier.
>
> How can I best create a table for the import?
>
> Would I first create a table without an fid and then after the import
> create a sequence and add the sequence to the table, then somehow update
> the fid field?
>
> could anyone show me the best way to do this?

Yes, you can do that - create the table initially without a primary key,
import the data, then do something like this:

alter table Anlagenregister_Aktuell_2011 add column fid serial;
update Anlagenregister_Aktuell_2011 set fid =
nextval('Anlagenregister_Aktuell_2011_fid_seq');

As an aside, note that the upper-case letters in the table name get
folded automatically to lower--case unless you double-quote it:
"Anlagenregister_Aktuell_2011".

Ray.



--

Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie



Re: create a script which imports csv data

From
"Marc Mamin"
Date:
>
>
> From: pgsql-general-owner@postgresql.org  On Behalf Of Robert Buckley
> Sent: Donnerstag, 28. Juni 2012 13:53
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] create a script which imports csv data
>
> Hi,
>
> I have to create a script which imports csv data into postgresql
...and have a few questions about the best way to do it.
>
> The csv data is automatically created from an external database so I
have no influence over which columns etc are downloaded.
>
> The csv comes without an fid field and has therefore no unique
identifier.
>
> How can I best create a table for the import?
>
> Would I first create a table without an fid and then after the import
create a sequence and add the sequence to the table, then somehow update
the fid field?
>
> could anyone show me the best way to do this?
>
>
> at the moment I am doing this...which makes postgresql throw an error
because obviously the 'fid' field is missing from the csv data!
>
> CREATE SEQUENCE fid_seq INCREMENT 1 START 1;
>
> CREATE TABLE Anlagenregister_Aktuell_2011 (
> fid INTEGER NOT NULL DEFAULT nextval('fid_seq'),
> Firma TEXT,
> Anlagenschluessel TEXT,
> Anschrift TEXT,
> PLZ TEXT,
> Ort TEXT,
> Bundesland TEXT,
> Energietraeger TEXT,
> Inbetriebnahmedatum DATE,
> Netzzugangsdatum DATE,
> Netzabgangsdatum DATE,
> Ausserbetriebnahmedatum DATE,
> Installierte_Leistung_kW numeric(11,4),
> Lastgangmessung TEXT,
> Einspeisemanagement TEXT,
> Spannungsebene TEXT,
> Zaehlpunktbezeichnung TEXT,
> Anlagentyp TEXT,
> Geographische_Koordinaten TEXT,
> Schalloptimierung TEXT,
> Biomasse_KWK_Bonus TEXT,
> Biomasse_Technologie_Bonus TEXT,
> PRIMARY KEY (fid)
> );
>
> copy Anlagenregister_Aktuell_2011 FROM
'/var/www/Anlagenregister_Aktuell_2011.csv' DELIMITERS ';' CSV;

Hello,

1st:

you don't need to create the sequence.
Just use

CREATE TABLE Anlagenregister_Aktuell_2011 (
fid SERIAL NOT NULL ,
...

with serial as data type, a sequence will be built for you in the
backgroud.


2nd:

in COPY you can list the the columns of the csv content:


COPY Anlagenregister_Aktuell_2011 (firma, anlagenschluessel, ...) FROM

As fid is not listed here, it will be filled by its default value
(sequence).


best regards,

Marc Mamin


Re: create a script which imports csv data

From
Berend Tober
Date:
Robert Buckley wrote:
> I have to create a script which imports csv data into postgresql
> The csv data is automatically created from an external database
> so I have no influence over which columns etc are downloaded.
>
> How can I best create a table for the import?
>

This is what I do:

1) I have a table defined for import which reflects the CSV
structure of the data to be imported, i.e., since I know what the
fields are on each line of the CSV, this template table has
columns defined to accommodate each known field. This table never
actually gets data written to it.

2) To import data, my script creates a temporary import table
LIKE the import template table.

3) The script then transfers and transform the data from the
temporary import table to another permanent table that has the
structure, including a primary key, that is more useful for my
purposes. (It omits some of the columns which I do not really
need from the CSV, uses a different name for one column, and adds
some reference information. You could do calculations here as well.)

4) The temporary import table is deleted at the end of the import
session.

Here is a sanitized (names changed to protect the innocent)
version of the script (the script parameter '$1' is the name of
the CSV file):

#!/bin/bash

# This script imports a CSV file of transactions from Discover.

#!/bin/bash

# This script imports a CSV file of transactions from Discover.

psql mydb <<-_END-OF-SCRIPT_
CREATE LOCAL TEMPORARY TABLE i (LIKE
my_financial_schema.import_discover_card);
COPY i
(transaction_date, post_date, description, amount, category,
share, net, type, paid_date)
FROM '$1'
WITH (FORMAT CSV, DELIMITER ',', QUOTE '"');
INSERT INTO my_financial_schema.my_permanent_record_table(
         transaction_date,
         paid_date,
         reference,
         category,
         amount,
         description
         )
     SELECT
       transaction_date,
       paid_date,
       'Discover Card',
       type,
       net,
       description
       FROM i;
DROP TABLE i;
_END-OF-SCRIPT_


Re: create a script which imports csv data

From
Arjen Nienhuis
Date:
On Thu, Jun 28, 2012 at 2:12 PM, Berend Tober <btober@broadstripe.net> wrote:
> Robert Buckley wrote:
>>
>> I have to create a script which imports csv data into postgresql
>> The csv data is automatically created from an external database
>> so I have no influence over which columns etc are downloaded.
>>
>> How can I best create a table for the import?
>>
>
> This is what I do:
>
> 1) I have a table defined for import which reflects the CSV structure of the
> data to be imported, i.e., since I know what the fields are on each line of
> the CSV, this template table has columns defined to accommodate each known
> field. This table never actually gets data written to it.
>
> 2) To import data, my script creates a temporary import table LIKE the
> import template table.
>
> 3) The script then transfers and transform the data from the temporary
> import table to another permanent table that has the structure, including a
> primary key, that is more useful for my purposes. (It omits some of the
> columns which I do not really need from the CSV, uses a different name for
> one column, and adds some reference information. You could do calculations
> here as well.)
>
> 4) The temporary import table is deleted at the end of the import session.
>

With the file_fdw you can skip some steps:

http://www.postgresql.org/docs/9.1/static/file-fdw.html

You can SELECT from the CSV file.

Re: create a script which imports csv data

From
Craig Ringer
Date:
On 06/28/2012 07:53 PM, Robert Buckley wrote:
Hi,

I have to create a script which imports csv data into postgresql ...and have a few questions about the best way to do it.

The advice already given is pretty good. Remember you can always create a clean new table then INSERT INTO ... SELECT to populate it from a scratch table you loaded your CSV into, so you don't have to do your cleanups/transformations to the CSV or during the COPY its self.

If it's a big job, it's going to be regular, you're going to have to merge it with more imports later, etc, consider an ETL tool like Penatho. http://kettle.pentaho.com/

For very very fast loading of bulk data, consider pg_bulkload http://pgbulkload.projects.postgresql.org/ . It's only worth the hassle if your load will take many, many hours without it.

--
Craig Ringer