Thread: Issue with copying data from a text file.
I have a procedure in place that copies data from a caret delimited text file into a table storing some information. One of the fields in the table contains an item description which may contain item dimensions such as - 17" alloy wheels The problem I am getting when I do my load is I believe due to the presence of the double quotation marks giving the copy the impression that it is to include the information following as a single text string until it gets to the next set of double quotes. As a result, I get the following: AutoDRS=# COPY deal_lines_temp_load FROM 'c:/temp/autodrs_deal_lines.txt' WITH DELIMITER AS '^' CSV HEADER; ERROR: value too long for type character varying(30) CONTEXT: COPY deal_lines_temp_load, line 87, column order_desc: "17 5 spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^ The column as you can see is defined as a 30 character field, the load contains in this column ^17" 5 spoke alloy wheels.^ I note an option in the COPY command to specify the quote character, defaulting to double quote. The problem being a single quote will also be used in the data, as will other characters. Is there any way to get a copy to have no quote character? I.e. read the file and put whatever is between the caret characters straight into the appropriate field exactly as is. TIA, Paul. -- Paul Lambert Database Administrator AutoLedgers
Can you not export the source file with escape characters? ie, <br /><br /> ^17\" Alloy Wheels^<br /><br /> ~p<br /><br/> On Tue, 2007-03-20 at 11:25 +0900, Paul Lambert wrote: <blockquote type="CITE"><pre> <font color="#000000">I have a procedure in place that copies data from a caret delimited text </font> <font color="#000000">file into a table storing some information.</font> <font color="#000000">One of the fields in the table contains an item description which may </font> <font color="#000000">contain item dimensions such as - 17" alloy wheels</font> <font color="#000000">The problem I am getting when I do my load is I believe due to the </font> <font color="#000000">presence of the double quotation marks giving the copy the impression </font> <font color="#000000">that it is to include the information following as a single text string </font> <font color="#000000">until it gets to the next set of double quotes. As a result, I get the </font> <font color="#000000">following:</font> <font color="#000000">AutoDRS=# COPY deal_lines_temp_load FROM </font> <font color="#000000">'c:/temp/autodrs_deal_lines.txt'</font> <font color="#000000">WITH DELIMITER AS '^' CSV HEADER;</font> <font color="#000000">ERROR: value too long for type character varying(30)</font> <font color="#000000">CONTEXT: COPY deal_lines_temp_load, line 87, column order_desc: "17 5 </font> <font color="#000000">spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^</font> <font color="#000000">The column as you can see is defined as a 30 character field, the load </font> <font color="#000000">contains in this column ^17" 5 spoke alloy wheels.^</font> <font color="#000000">I note an option in the COPY command to specify the quote character, </font> <font color="#000000">defaulting to double quote. The problem being a single quote will also </font> <font color="#000000">be used in the data, as will other characters. Is there any way to get a </font> <font color="#000000">copy to have no quote character? I.e. read the file and put whatever is </font> <font color="#000000">between the caret characters straight into the appropriate field exactly </font> <font color="#000000">as is.</font> <font color="#000000">TIA,</font> <font color="#000000">Paul.</font> </pre></blockquote><table cellpadding="0" cellspacing="0" width="100%"><tr><td> Phillip Smith<br /> IT Coordinator<br />Weatherbeeta P/L<br /> 8 Moncrief Rd<br /> Nunawading, Vic, 3131<br /> AUSTRALIA<br /><br /> P. +613 9845 0600<br /> F.+613 9845 0655<br /> E. <a href="mailto:phillip.smith@weatherbeeta.com.au">phillip.smith@weatherbeeta.com.au</a></td></tr></table>
Phillip Smith wrote: > Can you not export the source file with escape characters? ie, > > ^17\" Alloy Wheels^ > > ~p > The source file comes from extracts on our main application which sits inside an in-house pretending-to-be-a-dbms file system. The content of these extracts would be difficult to change - the extract program would need to parse the data looking for quotes and preceed them with the necessary escape character. Not being a proper database dump it's not a simple matter of flicking a switch to get it to include the escape character. The way the extracts are written would require a few dozen lines of code to each extract, and theres about 40ish extracts. Plus I don't maintain that side of our code, and those that do can be a bit lazy and I'd likely be waiting months to get it done - if they even decide to do it. -- Paul Lambert Database Administrator AutoLedgers
On 3/20/07, Paul Lambert <paul.lambert@autoledgers.com.au> wrote: > The source file comes from extracts on our main application which sits > inside an in-house pretending-to-be-a-dbms file system. The content of > these extracts would be difficult to change - the extract program would > need to parse the data looking for quotes and preceed them with the > necessary escape character. > > Not being a proper database dump it's not a simple matter of flicking a > switch to get it to include the escape character. The way the extracts > are written would require a few dozen lines of code to each extract, and > theres about 40ish extracts. > > Plus I don't maintain that side of our code, and those that do can be a > bit lazy and I'd likely be waiting months to get it done - if they even > decide to do it. Pipe it through sed and replace the Carets with TABS? sed 's/^/\t/g' c:/temp/autodrs_deal_lines.txt > c:/temp/autodrs_deal_lines.tab Then use copy like so: \copy table from 'c:/temp/autodrs_deal_lines.tab' delimiter E'\t' null '' Cheers, Andrej
Andrej Ricnik-Bay wrote: > On 3/20/07, Paul Lambert <paul.lambert@autoledgers.com.au> wrote: > >> The source file comes from extracts on our main application which sits >> inside an in-house pretending-to-be-a-dbms file system. The content of >> these extracts would be difficult to change - the extract program would >> need to parse the data looking for quotes and preceed them with the >> necessary escape character. >> >> Not being a proper database dump it's not a simple matter of flicking a >> switch to get it to include the escape character. The way the extracts >> are written would require a few dozen lines of code to each extract, and >> theres about 40ish extracts. >> >> Plus I don't maintain that side of our code, and those that do can be a >> bit lazy and I'd likely be waiting months to get it done - if they even >> decide to do it. > Pipe it through sed and replace the Carets with TABS? > sed 's/^/\t/g' c:/temp/autodrs_deal_lines.txt > > c:/temp/autodrs_deal_lines.tab > > Then use copy like so: > \copy table from 'c:/temp/autodrs_deal_lines.tab' delimiter E'\t' null '' > > > Cheers, > Andrej > > The data contains tabs... don't ask why... I don't have a clue :P I'll do something along the lines of sed... but not with sed, I'll use the command line interpreter on the OpenVMS systems where the extracts run. I just thought there might have been a quicker way to switch it off in the copy command, i.e. specifying "quote none" as one of the parameters to the command. I guess not... Thanks for the pointers. P. -- Paul Lambert Database Administrator AutoLedgers
Maybe use char 254 or 253 or something similar... Anything that isn't going to be found in the file.<br /><br /> Have youtried using a string as a delimiter? QUOTE 'THIS.STRING.ISNT.IN.THE.TEXT.FILE'<br /><br /> ~p<br /><br /> On Tue, 2007-03-20at 11:59 +0900, Paul Lambert wrote: <blockquote type="CITE"><pre> <font color="#000000">Andrej Ricnik-Bay wrote:</font> <font color="#000000">> On 3/20/07, Paul Lambert <<a href="mailto:paul.lambert@autoledgers.com.au">paul.lambert@autoledgers.com.au</a>>wrote:</font> <font color="#000000">> </font> <font color="#000000">>> The source file comes from extracts on our main application which sits</font> <font color="#000000">>> inside an in-house pretending-to-be-a-dbms file system. The content of</font> <font color="#000000">>> these extracts would be difficult to change - the extract program would</font> <font color="#000000">>> need to parse the data looking for quotes and preceed them with the</font> <font color="#000000">>> necessary escape character.</font> <font color="#000000">>></font> <font color="#000000">>> Not being a proper database dump it's not a simple matter of flicking a</font> <font color="#000000">>> switch to get it to include the escape character. The way the extracts</font> <font color="#000000">>> are written would require a few dozen lines of code to each extract, and</font> <font color="#000000">>> theres about 40ish extracts.</font> <font color="#000000">>></font> <font color="#000000">>> Plus I don't maintain that side of our code, and those that do can be a</font> <font color="#000000">>> bit lazy and I'd likely be waiting months to get it done - if they even</font> <font color="#000000">>> decide to do it.</font> <font color="#000000">> Pipe it through sed and replace the Carets with TABS?</font> <font color="#000000">> sed 's/^/\t/g' c:/temp/autodrs_deal_lines.txt > </font> <font color="#000000">> c:/temp/autodrs_deal_lines.tab</font> <font color="#000000">> </font> <font color="#000000">> Then use copy like so:</font> <font color="#000000">> \copy table from 'c:/temp/autodrs_deal_lines.tab' delimiter E'\t' null ''</font> <font color="#000000">> </font> <font color="#000000">> </font> <font color="#000000">> Cheers,</font> <font color="#000000">> Andrej</font> <font color="#000000">> </font> <font color="#000000">> </font> <font color="#000000">The data contains tabs... don't ask why... I don't have a clue :P</font> <font color="#000000">I'll do something along the lines of sed... but not with sed, I'll use </font> <font color="#000000">the command line interpreter on the OpenVMS systems where the extracts </font> <font color="#000000">run. I just thought there might have been a quicker way to switch it off </font> <font color="#000000">in the copy command, i.e. specifying "quote none" as one of the </font> <font color="#000000">parameters to the command. I guess not...</font> <font color="#000000">Thanks for the pointers.</font> <font color="#000000">P.</font> </pre></blockquote><table cellpadding="0" cellspacing="0" width="100%"><tr><td> Phillip Smith<br /> IT Coordinator<br />Weatherbeeta P/L<br /> 8 Moncrief Rd<br /> Nunawading, Vic, 3131<br /> AUSTRALIA<br /><br /> P. +613 9845 0600<br /> F.+613 9845 0655<br /> E. <a href="mailto:phillip.smith@weatherbeeta.com.au">phillip.smith@weatherbeeta.com.au</a></td></tr></table>
On Tue, Mar 20, 2007 at 11:25:38AM +0900, Paul Lambert wrote: > I note an option in the COPY command to specify the quote character, > defaulting to double quote. The problem being a single quote will also > be used in the data, as will other characters. Is there any way to get a > copy to have no quote character? Are there any control characters that won't appear in the data? -- Michael Fuhr
Phillip Smith wrote: > Maybe use char 254 or 253 or something similar... Anything that isn't > going to be found in the file. > > Have you tried using a string as a delimiter? QUOTE > 'THIS.STRING.ISNT.IN.THE.TEXT.FILE' > > ~p > I changed it to QUOTE '\f' to set the quote character to form-feed which is not going to appear in the file and that appears to do the trick without any changes to the source file. Thanks, P. -- Paul Lambert Database Administrator AutoLedgers
Hi, I think I had the exact same problem as you do a while back and I solved it by removing the header row and the "CSV HEADER" clause of the statement. For the large files I had, it was easier (for me) to remove the header row than it was to escape out all the quotes (or regen the file): COPY deal_lines_temp_load FROM 'c:/temp/autodrs_deal_lines.txt' WITH DELIMITER AS '^'; I think the parser doesn't look for nor generate quoted rows except when CSV is specified.. It would be nice if there was a way to specify a "HEADER" row without invoking CSV parsing rules (friendly hint to core devs!) :) Let us all know if that works! Steve At 03:14 AM 3/20/2007, you wrote: >Date: Tue, 20 Mar 2007 11:25:38 +0900 >From: Paul Lambert <paul.lambert@autoledgers.com.au> >To: pgsql-sql@postgresql.org >Subject: Issue with copying data from a text file. >Message-ID: <45FF4622.2010404@autoledgers.com.au> > >I have a procedure in place that copies data from a caret delimited >text >file into a table storing some information. > >One of the fields in the table contains an item description which may >contain item dimensions such as - 17" alloy wheels > >The problem I am getting when I do my load is I believe due to the >presence of the double quotation marks giving the copy the impression >that it is to include the information following as a single text >string >until it gets to the next set of double quotes. As a result, I get the > >following: > >AutoDRS=# COPY deal_lines_temp_load FROM >'c:/temp/autodrs_deal_lines.txt' >WITH DELIMITER AS '^' CSV HEADER; >ERROR: value too long for type character varying(30) >CONTEXT: COPY deal_lines_temp_load, line 87, column order_desc: "17 5 > >spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^ > >The column as you can see is defined as a 30 character field, the load > >contains in this column ^17" 5 spoke alloy wheels.^ > >I note an option in the COPY command to specify the quote character, >defaulting to double quote. The problem being a single quote will also > >be used in the data, as will other characters. Is there any way to get >a >copy to have no quote character? I.e. read the file and put whatever >is >between the caret characters straight into the appropriate field >exactly >as is. > >TIA, >Paul. > >-- >Paul Lambert >Database Administrator >AutoLedgers