importing a messy text file - Mailing list pgsql-general

From Willy-Bas Loos
Subject importing a messy text file
Date
Msg-id CAHnozTiOp45ur4=6kqUuPBadptFjM=hDgGa1fbZUmF1Z-UZijA@mail.gmail.com
Whole thread Raw
Responses Re: importing a messy text file  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Re: importing a messy text file  (Alberto Cabello Sánchez <alberto@unex.es>)
Re: importing a messy text file  (bricklen <bricklen@gmail.com>)
List pgsql-general
Hi,

I have a 56GB textfile that i want to import into postgres.
The file is tab delimited and not quoted.
I deleted the header with the column names (using sed) so that i could use COPY with the non-csv text type (because some of the text values contain quotes).

I had some minor trouble with the file which i managed, but now i have one where i can't think of a solution, even though it seems so simple.

The problem is this:
There is a tab after the last column, in many but not all records.
When i ran into the extra tab i added a dummy column in the destination table but now COPY thows an error because the data for the dummy column is missing (on record ~275K of about 150M).

The file is too big to edit by hand and anyway it would probably not be feasible to manually add tabs for every record that misses one, although i don't know how many it would be.

I realize that there could be other showstoppers in the file, like missing tabs in the middle or extra tabs in the middle, but i would like to try and get this fixed.

Maybe it would be feasible to add every record as 1 value and then splitting those into columns using postgres text processing.
Or maybe there is an (undocumented?) option in copy or \copy to ignore extra columns.
Or maybe there is some no-sql software where i can import this and then structure the data before i pass it to postgres..

Do you have any tips, please?

Cheers,


--
Willy-Bas Loos

pgsql-general by date:

Previous
From: Hello World
Date:
Subject: Re: Security Issues: Allowing Clients to Execute SQL in the Backend.
Next
From: Rory Campbell-Lange
Date:
Subject: Re: Security Issues: Allowing Clients to Execute SQL in the Backend.