Thread: Issue with copying data from a text file.

Issue with copying data from a text file.

From

Paul Lambert

Date:

19 March 2007, 23:26:16

I have a procedure in place that copies data from a caret delimited text 
file into a table storing some information.

One of the fields in the table contains an item description which may 
contain item dimensions such as - 17" alloy wheels

The problem I am getting when I do my load is I believe due to the 
presence of the double quotation marks giving the copy the impression 
that it is to include the information following as a single text string 
until it gets to the next set of double quotes. As a result, I get the 
following:

AutoDRS=#       COPY deal_lines_temp_load FROM 
'c:/temp/autodrs_deal_lines.txt'
WITH DELIMITER AS '^' CSV HEADER;
ERROR:  value too long for type character varying(30)
CONTEXT:  COPY deal_lines_temp_load, line 87, column order_desc: "17 5 
spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^

The column as you can see is defined as a 30 character field, the load 
contains in this column ^17" 5 spoke alloy wheels.^

I note an option in the COPY command to specify the quote character, 
defaulting to double quote. The problem being a single quote will also 
be used in the data, as will other characters. Is there any way to get a 
copy to have no quote character? I.e. read the file and put whatever is 
between the caret characters straight into the appropriate field exactly 
as is.

TIA,
Paul.

-- 
Paul Lambert
Database Administrator
AutoLedgers

Re: Issue with copying data from a text file.

From

Phillip Smith

Date:

19 March 2007, 23:37:34

Can you not export the source file with escape characters? ie, <br /><br /> ^17\" Alloy Wheels^<br /><br /> ~p<br
/><br/> On Tue, 2007-03-20 at 11:25 +0900, Paul Lambert wrote: <blockquote type="CITE"><pre> 
<font color="#000000">I have a procedure in place that copies data from a caret delimited text </font>
<font color="#000000">file into a table storing some information.</font>

<font color="#000000">One of the fields in the table contains an item description which may </font>
<font color="#000000">contain item dimensions such as - 17" alloy wheels</font>

<font color="#000000">The problem I am getting when I do my load is I believe due to the </font>
<font color="#000000">presence of the double quotation marks giving the copy the impression </font>
<font color="#000000">that it is to include the information following as a single text string </font>
<font color="#000000">until it gets to the next set of double quotes. As a result, I get the </font>
<font color="#000000">following:</font>

<font color="#000000">AutoDRS=#       COPY deal_lines_temp_load FROM </font>
<font color="#000000">'c:/temp/autodrs_deal_lines.txt'</font>
<font color="#000000">WITH DELIMITER AS '^' CSV HEADER;</font>
<font color="#000000">ERROR:  value too long for type character varying(30)</font>
<font color="#000000">CONTEXT:  COPY deal_lines_temp_load, line 87, column order_desc: "17 5 </font>
<font color="#000000">spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^</font>

<font color="#000000">The column as you can see is defined as a 30 character field, the load </font>
<font color="#000000">contains in this column ^17" 5 spoke alloy wheels.^</font>

<font color="#000000">I note an option in the COPY command to specify the quote character, </font>
<font color="#000000">defaulting to double quote. The problem being a single quote will also </font>
<font color="#000000">be used in the data, as will other characters. Is there any way to get a </font>
<font color="#000000">copy to have no quote character? I.e. read the file and put whatever is </font>
<font color="#000000">between the caret characters straight into the appropriate field exactly </font>
<font color="#000000">as is.</font>

<font color="#000000">TIA,</font>
<font color="#000000">Paul.</font>

</pre></blockquote><table cellpadding="0" cellspacing="0" width="100%"><tr><td> Phillip Smith<br /> IT Coordinator<br
/>Weatherbeeta P/L<br /> 8 Moncrief Rd<br /> Nunawading, Vic, 3131<br /> AUSTRALIA<br /><br /> P. +613 9845 0600<br />
F.+613 9845 0655<br /> E. <a
href="mailto:phillip.smith@weatherbeeta.com.au">phillip.smith@weatherbeeta.com.au</a></td></tr></table>

Re: Issue with copying data from a text file.

From

Paul Lambert

Date:

19 March 2007, 23:45:25

Phillip Smith wrote:
> Can you not export the source file with escape characters? ie,
> 
> ^17\" Alloy Wheels^
> 
> ~p
> 

The source file comes from extracts on our main application which sits 
inside an in-house pretending-to-be-a-dbms file system. The content of 
these extracts would be difficult to change - the extract program would 
need to parse the data looking for quotes and preceed them with the 
necessary escape character.

Not being a proper database dump it's not a simple matter of flicking a 
switch to get it to include the escape character. The way the extracts 
are written would require a few dozen lines of code to each extract, and 
theres about 40ish extracts.

Plus I don't maintain that side of our code, and those that do can be a 
bit lazy and I'd likely be waiting months to get it done - if they even 
decide to do it.

-- 
Paul Lambert
Database Administrator
AutoLedgers

Re: Issue with copying data from a text file.

From

"Andrej Ricnik-Bay"

Date:

19 March 2007, 23:53:17

On 3/20/07, Paul Lambert <paul.lambert@autoledgers.com.au> wrote:

> The source file comes from extracts on our main application which sits
> inside an in-house pretending-to-be-a-dbms file system. The content of
> these extracts would be difficult to change - the extract program would
> need to parse the data looking for quotes and preceed them with the
> necessary escape character.
>
> Not being a proper database dump it's not a simple matter of flicking a
> switch to get it to include the escape character. The way the extracts
> are written would require a few dozen lines of code to each extract, and
> theres about 40ish extracts.
>
> Plus I don't maintain that side of our code, and those that do can be a
> bit lazy and I'd likely be waiting months to get it done - if they even
> decide to do it.
Pipe it through sed and replace the Carets with TABS?
sed 's/^/\t/g' c:/temp/autodrs_deal_lines.txt > c:/temp/autodrs_deal_lines.tab

Then use copy like so:
\copy table from 'c:/temp/autodrs_deal_lines.tab' delimiter E'\t' null ''


Cheers,
Andrej

Re: Issue with copying data from a text file.

From

Paul Lambert

Date:

19 March 2007, 23:59:42

Andrej Ricnik-Bay wrote:
> On 3/20/07, Paul Lambert <paul.lambert@autoledgers.com.au> wrote:
> 
>> The source file comes from extracts on our main application which sits
>> inside an in-house pretending-to-be-a-dbms file system. The content of
>> these extracts would be difficult to change - the extract program would
>> need to parse the data looking for quotes and preceed them with the
>> necessary escape character.
>>
>> Not being a proper database dump it's not a simple matter of flicking a
>> switch to get it to include the escape character. The way the extracts
>> are written would require a few dozen lines of code to each extract, and
>> theres about 40ish extracts.
>>
>> Plus I don't maintain that side of our code, and those that do can be a
>> bit lazy and I'd likely be waiting months to get it done - if they even
>> decide to do it.
> Pipe it through sed and replace the Carets with TABS?
> sed 's/^/\t/g' c:/temp/autodrs_deal_lines.txt > 
> c:/temp/autodrs_deal_lines.tab
> 
> Then use copy like so:
> \copy table from 'c:/temp/autodrs_deal_lines.tab' delimiter E'\t' null ''
> 
> 
> Cheers,
> Andrej
> 
> 

The data contains tabs... don't ask why... I don't have a clue :P

I'll do something along the lines of sed... but not with sed, I'll use 
the command line interpreter on the OpenVMS systems where the extracts 
run. I just thought there might have been a quicker way to switch it off 
in the copy command, i.e. specifying "quote none" as one of the 
parameters to the command. I guess not...

Thanks for the pointers.

P.

-- 
Paul Lambert
Database Administrator
AutoLedgers

Re: Issue with copying data from a text file.

From

Phillip Smith

Date:

20 March 2007, 00:05:43

Maybe use char 254 or 253 or something similar... Anything that isn't going to be found in the file.<br /><br /> Have
youtried using a string as a delimiter? QUOTE 'THIS.STRING.ISNT.IN.THE.TEXT.FILE'<br /><br /> ~p<br /><br /> On Tue,
2007-03-20at 11:59 +0900, Paul Lambert wrote: <blockquote type="CITE"><pre> 
<font color="#000000">Andrej Ricnik-Bay wrote:</font>
<font color="#000000">> On 3/20/07, Paul Lambert <<a
href="mailto:paul.lambert@autoledgers.com.au">paul.lambert@autoledgers.com.au</a>>wrote:</font> 
<font color="#000000">> </font>
<font color="#000000">>> The source file comes from extracts on our main application which sits</font>
<font color="#000000">>> inside an in-house pretending-to-be-a-dbms file system. The content of</font>
<font color="#000000">>> these extracts would be difficult to change - the extract program would</font>
<font color="#000000">>> need to parse the data looking for quotes and preceed them with the</font>
<font color="#000000">>> necessary escape character.</font>
<font color="#000000">>></font>
<font color="#000000">>> Not being a proper database dump it's not a simple matter of flicking a</font>
<font color="#000000">>> switch to get it to include the escape character. The way the extracts</font>
<font color="#000000">>> are written would require a few dozen lines of code to each extract, and</font>
<font color="#000000">>> theres about 40ish extracts.</font>
<font color="#000000">>></font>
<font color="#000000">>> Plus I don't maintain that side of our code, and those that do can be a</font>
<font color="#000000">>> bit lazy and I'd likely be waiting months to get it done - if they even</font>
<font color="#000000">>> decide to do it.</font>
<font color="#000000">> Pipe it through sed and replace the Carets with TABS?</font>
<font color="#000000">> sed 's/^/\t/g' c:/temp/autodrs_deal_lines.txt > </font>
<font color="#000000">> c:/temp/autodrs_deal_lines.tab</font>
<font color="#000000">> </font>
<font color="#000000">> Then use copy like so:</font>
<font color="#000000">> \copy table from 'c:/temp/autodrs_deal_lines.tab' delimiter E'\t' null ''</font>
<font color="#000000">> </font>
<font color="#000000">> </font>
<font color="#000000">> Cheers,</font>
<font color="#000000">> Andrej</font>
<font color="#000000">> </font>
<font color="#000000">> </font>

<font color="#000000">The data contains tabs... don't ask why... I don't have a clue :P</font>

<font color="#000000">I'll do something along the lines of sed... but not with sed, I'll use </font>
<font color="#000000">the command line interpreter on the OpenVMS systems where the extracts </font>
<font color="#000000">run. I just thought there might have been a quicker way to switch it off </font>
<font color="#000000">in the copy command, i.e. specifying "quote none" as one of the </font>
<font color="#000000">parameters to the command. I guess not...</font>

<font color="#000000">Thanks for the pointers.</font>

<font color="#000000">P.</font>

</pre></blockquote><table cellpadding="0" cellspacing="0" width="100%"><tr><td> Phillip Smith<br /> IT Coordinator<br
/>Weatherbeeta P/L<br /> 8 Moncrief Rd<br /> Nunawading, Vic, 3131<br /> AUSTRALIA<br /><br /> P. +613 9845 0600<br />
F.+613 9845 0655<br /> E. <a
href="mailto:phillip.smith@weatherbeeta.com.au">phillip.smith@weatherbeeta.com.au</a></td></tr></table>

Re: Issue with copying data from a text file.

From

Michael Fuhr

Date:

20 March 2007, 00:17:31

On Tue, Mar 20, 2007 at 11:25:38AM +0900, Paul Lambert wrote:
> I note an option in the COPY command to specify the quote character, 
> defaulting to double quote. The problem being a single quote will also 
> be used in the data, as will other characters. Is there any way to get a 
> copy to have no quote character?

Are there any control characters that won't appear in the data?

-- 
Michael Fuhr

Re: Issue with copying data from a text file.

From

Paul Lambert

Date:

20 March 2007, 00:22:19

Phillip Smith wrote:
> Maybe use char 254 or 253 or something similar... Anything that isn't 
> going to be found in the file.
> 
> Have you tried using a string as a delimiter? QUOTE 
> 'THIS.STRING.ISNT.IN.THE.TEXT.FILE'
> 
> ~p
> 

I changed it to QUOTE '\f' to set the quote character to form-feed which 
is not going to appear in the file and that appears to do the trick 
without any changes to the source file.

Thanks,
P.

-- 
Paul Lambert
Database Administrator
AutoLedgers

Re: Issue with copying data from a text file.

From

Steve Midgley

Date:

21 March 2007, 14:01:54

Hi,

I think I had the exact same problem as you do a while back and I 
solved it by removing the header row and the "CSV HEADER" clause of the 
statement. For the large files I had, it was easier (for me) to remove 
the header row than it was to escape out all the quotes (or regen the 
file):

COPY deal_lines_temp_load FROM
'c:/temp/autodrs_deal_lines.txt'
WITH DELIMITER AS '^';

I think the parser doesn't look for nor generate quoted rows except 
when CSV is specified.. It would be nice if there was a way to specify 
a "HEADER" row without invoking CSV parsing rules (friendly hint to 
core devs!) :)

Let us all know if that works!

Steve

At 03:14 AM 3/20/2007, you wrote:
>Date: Tue, 20 Mar 2007 11:25:38 +0900
>From: Paul Lambert <paul.lambert@autoledgers.com.au>
>To: pgsql-sql@postgresql.org
>Subject: Issue with copying data from a text file.
>Message-ID: <45FF4622.2010404@autoledgers.com.au>
>
>I have a procedure in place that copies data from a caret delimited 
>text
>file into a table storing some information.
>
>One of the fields in the table contains an item description which may
>contain item dimensions such as - 17" alloy wheels
>
>The problem I am getting when I do my load is I believe due to the
>presence of the double quotation marks giving the copy the impression
>that it is to include the information following as a single text 
>string
>until it gets to the next set of double quotes. As a result, I get the 
>
>following:
>
>AutoDRS=#       COPY deal_lines_temp_load FROM
>'c:/temp/autodrs_deal_lines.txt'
>WITH DELIMITER AS '^' CSV HEADER;
>ERROR:  value too long for type character varying(30)
>CONTEXT:  COPY deal_lines_temp_load, line 87, column order_desc: "17 5 
>
>spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^
>
>The column as you can see is defined as a 30 character field, the load 
>
>contains in this column ^17" 5 spoke alloy wheels.^
>
>I note an option in the COPY command to specify the quote character,
>defaulting to double quote. The problem being a single quote will also 
>
>be used in the data, as will other characters. Is there any way to get 
>a
>copy to have no quote character? I.e. read the file and put whatever 
>is
>between the caret characters straight into the appropriate field 
>exactly
>as is.
>
>TIA,
>Paul.
>
>--
>Paul Lambert
>Database Administrator
>AutoLedgers