Home > mailing lists

Re: Ragged CSV import - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Ragged CSV import
Date	September 9, 2009 17:56:22
Msg-id	1252529761.4080.23.camel@hvost1700 Whole thread Raw
In response to	Re: Ragged CSV import (Alvaro Herrera <alvherre@commandprompt.com>)
List	pgsql-hackers

Tree view

On Wed, 2009-09-09 at 16:34 -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> > >> I have received a requirement for the ability to import ragged CSV 
> > >> files, i.e. files that contain variable numbers of columns per row.
> > 
> > BTW, one other thought about this: I think the historical reason for
> > COPY being strict about the number of incoming columns was that it
> > provided a useful cross-check that the parsing hadn't gone off into
> > the weeds.  We have certainly seen enough examples where the reported
> > manifestation of, say, an escaping mistake was that COPY saw the row
> > as having too many or too few columns.  So being permissive about it
> > would lose some error detection capability.  I am not clear about
> > whether CSV format is sufficiently more robust than the traditional
> > COPY format to render this an acceptable loss.  Comments?
> 
> I think accepting less columns and filling with nulls should be
> protected enough for this not to be a problem; if the parser goes nuts,
> it will die eventually.  Silently dropping excessive trailing columns
> does not seem acceptable though; you could lose entire rows and not
> notice.

Maybe we could put a catch-all "text" or even "text[]" column at as the
last one of the table and gather all extra columns there ?

> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

pgsql-hackers by date:

From: Sam Mason
Date: 09 September 2009, 17:51:54
Subject: Re: RfD: more powerful "any" types

From: Sam Mason
Date: 09 September 2009, 18:00:15
Subject: Re: COALESCE and NULLIF semantics

Re: Ragged CSV import - Mailing list pgsql-hackers

Previous

Next