Home > mailing lists

Re: copy from questions - Mailing list pgsql-general

From	Kirk Wythers
Subject	Re: copy from questions
Date	December 20, 2012 00:06:16
Msg-id	79CD04C6-6CFB-47DA-9BED-10E5DBC76A5E@gmail.com Whole thread Raw
In response to	Re: copy from questions (Steve Crawford <scrawford@pinpointresearch.com>)
Responses	Re: copy from questions (Steve Crawford <scrawford@pinpointresearch.com>)
List	pgsql-general

Tree view

Thanks for the reply Steve. These suggestions are new to me, so I'd like to rephrase them back to you in order to make sure I understand the bits and details.

On Dec 19, 2012, at 10:48 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote:

I suppose you could use a trigger to check each record before inserting but that is likely to be inefficient for bulk loads. A quick bash loop is probably your best bet. Something along the lines of:

for inputfile in /infiledirectory/*.csv
do
cat inputfile | psql [connection-params] -c '\copy rawinput from stdin csv header...'
done

I get this… If my except for the header… bit. Here is my interpretation of your code including my specific connection parameters.

#!/bin/sh

for inputfile in '/Volumes/disk7/b4warmed3/export/60min2/*.txt'
do
cat inputfile | psql -p 54321 -h localhost -c '\copy rawinput FROM stdin WITH CSV HEADER DELIMTER AS ',' NULL AS 'NA '
done

I added single quotes around the path to the input files. Correct right?

This imports everything into a "staging" table (I called it rawinput). From there you can create your final table with SELECT DISTINCT…

This bit needs to be as a separate step right? (rowid is the primary key)

SELECT DISTINCT ON (rowid) *

FROM rawinput;

From here do I need another COPY FROM or some kind of INSERT statement?

For speed make sure that you create your staging table as "unlogged".

I understand that I need to create the rawinput table first, but I am unfamiliar with the "UNLOGGED" option. I assume it makes things faster… Does it go something like:

CREATE TABLE UNLOGGED rawinput;

Do I need to create all the variables (including types) in rawinput as well? If so, then I assume that I do not want rowid to have a primary key… or else I would be back where I started.

Cheers,
Steve

Thanks again,

Kirk

pgsql-general by date:

From: "Kevin Grittner"
Date: 19 December 2012, 23:40:58
Subject: Re: pg_xlog is getting bigger

From: "Kevin Grittner"
Date: 20 December 2012, 00:16:13
Subject: Re: pg_xlog is getting bigger

Re: copy from questions - Mailing list pgsql-general

Previous

Next