Thread: loading data from flat text file
I am porting my database from MySQL to PostgreSQL 7.0 One of the first problems I ran into is the lack of data inload command. I would like to insert data from a 15 million record flat file into a table. How would I do that in PGSQL?
COPY command. [Charset iso-8859-1 unsupported, filtering to ASCII...] > I am porting my database from MySQL to PostgreSQL 7.0 One of the first > problems I ran into is the lack of data inload command. I would like to > insert data from a 15 million record flat file into a table. How would I do > that in PGSQL? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
So I am almost there, except my data is formated as follows:
"chunk1","chunk2","chunk3"
how would I tell COPY that my data is encapsulated in " and separated by , ? Furthermore, I did not find a manual on the COPY command. Anyone?
-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Monday, May 22, 2000 6:23 PM
To: Voitenko, Denis
Cc: 'pgsql-general@postgresql.org'
Subject: Re: [GENERAL] loading data from flat text file
COPY command.
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> I am porting my database from MySQL to PostgreSQL 7.0 One of the first
> problems I ran into is the lack of data inload command. I would like to
> insert data from a 15 million record flat file into a table. How would I do
> that in PGSQL?
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> "Voitenko, Denis" wrote: > > So I am almost there, except my data is formated as follows: > > "chunk1","chunk2","chunk3" > > how would I tell COPY that my data is encapsulated in " and separated > by , ? Furthermore, I did not find a manual on the COPY command. > Anyone? In /usr/local/pgsql/doc (assuming that's where you installed PostgreSQL) you will find a good deal of html format documentation. From the docs: COPY [ BINARY ] table [ WITH OIDS ] FROM { 'filename' | stdin } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ] COPY [ BINARY ] table [ WITH OIDS ] TO { 'filename' | stdout } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ] Basically, you just need to specify the delimiters. Bruce Momjian is also in the process of writing a book on PostgreSQL. You can download a snapshot of its current state from http://www.postgresql.org/docs/awbook.html. -Ron-
On Tue, May 23, 2000 at 10:10:41AM -0400, Ron Peterson wrote: > > "Voitenko, Denis" wrote: > > > > So I am almost there, except my data is formated as follows: > > > > "chunk1","chunk2","chunk3" > > > > how would I tell COPY that my data is encapsulated in " and separated > > by , ? Furthermore, I did not find a manual on the COPY command. > > Anyone? > > In /usr/local/pgsql/doc (assuming that's where you installed PostgreSQL) > you will find a good deal of html format documentation. From the docs: > > COPY [ BINARY ] table [ WITH OIDS ] > FROM { 'filename' | stdin } > [ [USING] DELIMITERS 'delimiter' ] > [ WITH NULL AS 'null string' ] > COPY [ BINARY ] table [ WITH OIDS ] > TO { 'filename' | stdout } > [ [USING] DELIMITERS 'delimiter' ] > [ WITH NULL AS 'null string' ] > > Basically, you just need to specify the delimiters. > However, the quotes might give you problems. Postgresql does not treat them specially, and expects delimited files, not seperated files. This leads to two problems. If you have text fields with internal commas, they'll split at the internal comma. The second problem is that the quotes will be stored with your data, and depending on the column type, may not transform at all: i.e. trying to store "12" in an int column won't work. They way a postgresql's copy delimited file handles embedded delimiters is to quote them with a back slash, as so: chunk one,another\, different\, chunk,third chunk So, you'll need to preprocess your flat file some. A simple sed should do it. My usual trick for this is to find some character sequence that _isn't_ in the dataset, like '|||', and do a global replace on "," with the pipes, then delete quotes, quote the commas, and resubstitute the pipes with commas. This converts a seperated file into a delimited one. cat my_file | sed 's/","/|||/g' | sed 's/^"//g'| sed 's/"$//g'| sed \ 's/,/\,/g'| sed 's/|||/,/g' >newfile Sort of ugly, but it should work. If you can get your other DB to dump in a delimited format, instead of a quoted CSV format, everything should work much easier. COPY wasn't really designed for importing data and data transformation, but as a reliable means of doing bulk dump and restore of tables. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005