Thread: Export tab delimited from mysql to postgres.
<p><font face="Times New Roman">Has anyone been able to export and import a tab delimited file from mysql to postgres successfully?</font><br/><font face="Times New Roman">This is my dilemma which I have posted on mysql sites with no response,however postgres users may have come across the same issue:</font><p><font face="Times New Roman">Trying to exporta tab delimited file with the following command, however it does not appear to strip out the CR/LF within a varcharcolumn.<br /><br /> select * into outfile '/tmp/users.txt' FIELDS ESCAPED BY '\\' LINES TERMINATED BY '\r\n' fromusers;<br /><br /> However when I use the below function to export as an insert statement, it escapes all the CR/LF withineach column.<br /><br /> mysqldump --database mydatabase --password --user myusername --verbose --complete-insert --tablesusers > myfile.sql<br /><br /> Does anyone have any insigt into why the above select does not escape CR/LF withinvarchar columns?<br /><br /> I really want to get the export as tab delimited working to import into postgres,whichis much faster than insert statements.<br /><br /> Theo</font><table><tr><td bgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
In article <82E30406384FFB44AFD1012BAB230B5505F40D3A@shiva.au.lpint.net>, Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> writes: > Has anyone been able to export and import a tab delimited file from mysql to > postgres successfully? > This is my dilemma which I have posted on mysql sites with no response, > however postgres users may have come across the same issue: > Trying to export a tab delimited file with the following command, however it > does not appear to strip out the CR/LF within a varchar column. > select * into outfile '/tmp/users.txt' FIELDS ESCAPED BY '\\' LINES TERMINATED > BY '\r\n' from users; It's a MySQL bug/misfeature that it doesn't translate CRs into '\r'. Nevertheless, you can import a MySQL tab delimited file into PostgreSQL - just pipe it through a small sed script.
<p> <font size="2">Could you provide a example of how to do this?</font><p> <font size="2">I actually endedup exporting the data as Insert statements, which strips out cf/lf within varchars. However it takes an eternity toimport 200,000 records... 24 hours infact???? Is this normal?</font><p><font size="2">Theo</font><p><font size="2">-----OriginalMessage-----</font><br /><font size="2">From: Harald Fuchs [<a href="mailto:hf0722x@protecting.net">mailto:hf0722x@protecting.net</a>]</font><br /><font size="2">Sent: Monday, 11 October2004 7:34 PM</font><br /><font size="2">To: pgsql-sql@postgresql.org</font><br /><font size="2">Subject: Re: [SQL]Export tab delimited from mysql to postgres.</font><br /><p><font size="2">In article <82E30406384FFB44AFD1012BAB230B5505F40D3A@shiva.au.lpint.net>,</font><br/><font size="2">Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au>writes:</font><p><font size="2">> Has anyone been able to export and importa tab delimited file from </font><br /><font size="2">> mysql to postgres successfully? This is my dilemma whichI have posted </font><br /><font size="2">> on mysql sites with no response, however postgres users may have come</font><br /><font size="2">> across the same issue:</font><p><font size="2">> Trying to export a tab delimitedfile with the following command, </font><br /><font size="2">> however it does not appear to strip out the CR/LFwithin a varchar </font><br /><font size="2">> column. select * into outfile '/tmp/users.txt' FIELDS ESCAPED BY '\\'</font><br /><font size="2">> LINES TERMINATED BY '\r\n' from users;</font><p><font size="2">It's a MySQL bug/misfeaturethat it doesn't translate CRs into '\r'. Nevertheless, you can import a MySQL tab delimited file into PostgreSQL- just pipe it through a small sed script.</font><br /><p><font size="2">---------------------------(end of broadcast)---------------------------</font><br/><font size="2">TIP 4: Don't 'kill -9' the postmaster</font><table><tr><tdbgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
On Tue, Oct 12, 2004 at 09:23:41 +1000, Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> wrote: > > Could you provide a example of how to do this? > > I actually ended up exporting the data as Insert statements, which > strips out cf/lf within varchars. However it takes an eternity to import > 200,000 records... 24 hours infact???? Is this normal? Are you doing the load in one transaction?
Quoth Theo.Galanakis@lonelyplanet.com.au (Theo Galanakis): > Could you provide a example of how to do this? > > I actually ended up exporting the data as Insert statements, > which strips out cf/lf within varchars. However it takes an eternity > to import 200,000 records... 24 hours infact???? Is this normal? I expect that this results from each INSERT being a separate transaction. If you put a BEGIN at the start and a COMMIT at the end, you'd doubtless see an ENORMOUS improvement. That's not even the _big_ improvement, either. The _big_ improvement would involve reformatting the data so that you could use the COPY statement, which is _way_ faster than a bunch of INSERTs. Take a look at the documentation to see the formatting that is needed: http://techdocs.postgresql.org/techdocs/usingcopy.php http://www.faqs.org/docs/ppbook/x5504.htm http://www.postgresql.org/docs/7.4/static/sql-copy.html -- output = ("cbbrowne" "@" "ntlug.org") http://www3.sympatico.ca/cbbrowne/lsf.html Question: How many surrealists does it take to change a light bulb? Answer: Two, one to hold the giraffe, and the other to fill the bathtub with brightly colored machine tools.
<p><font size="2">Thanks for all your comments,</font><p> <font size="2">I have beent trying the insert within a transactionblock, however it does not seem to reduce the time it takes to process each records. Mind you there are 80 columnand the insert statement explicitly defines the column to insert into.</font><p> <font size="2">I need any tipI can get help me transform the text file into a format postgres copy will successfully read. </font><p><font size="2">Hereis sample of the current format of a mysql tab delimited dump..</font><p><font size="2">columnA columnB </font><br/><font size="2">1 What a day!</font><br /><font size="2">2 What a week it has</font><br/><font size="2">been!</font><br /><font size="2">3 What the!</font><p><font size="2">As you cansee row 2 has a value that holds a CR value which ends up wrapping around onto the third line. Postgres copy command doesnot like this and mysql is unable to replace the value with another type of delimiter, like a \r.</font><p><font size="2">SoI gather I have to some how manually replace the carriage return with something postgres understand \r... </font><p><fontsize="2">columnA columnB </font><br /><font size="2">1 What a day!</font><br /><font size="2">2 What a week it has \r been!</font><br /><font size="2">3 What the!</font><p><font size="2">Howdo I do this without getting a text file that looks like this </font><br /><font size="2">1 What a day!\r\n2 What a week it has \r been!\r\n3 What the!\r\n</font><p><font size="2">Any help wouldbe appreciated.</font><p><font size="2">Theo</font><p><font size="2">-----Original Message-----</font><br /><font size="2">From:Christopher Browne [<a href="mailto:cbbrowne@acm.org">mailto:cbbrowne@acm.org</a>] </font><br /><font size="2">Sent:Tuesday, 12 October 2004 10:46 AM</font><br /><font size="2">To: pgsql-sql@postgresql.org</font><br /><fontsize="2">Subject: Re: [SQL] Export tab delimited from mysql to postgres.</font><br /><p><font size="2">Quoth Theo.Galanakis@lonelyplanet.com.au(Theo Galanakis):</font><br /><font size="2">> Could you provide a example ofhow to do this?</font><br /><font size="2">></font><br /><font size="2">> I actually ended up exporting thedata as Insert statements,</font><br /><font size="2">> which strips out cf/lf within varchars. However it takes aneternity </font><br /><font size="2">> to import 200,000 records... 24 hours infact???? Is this normal?</font><p><fontsize="2">I expect that this results from each INSERT being a separate transaction.</font><p><font size="2">Ifyou put a BEGIN at the start and a COMMIT at the end, you'd doubtless see an ENORMOUS improvement.</font><p><fontsize="2">That's not even the _big_ improvement, either. The _big_ improvement would involve reformattingthe data so that you could use the COPY statement, which is _way_ faster than a bunch of INSERTs. Take a lookat the documentation to see the formatting that is needed:</font><p><font size="2"><a href="http://techdocs.postgresql.org/techdocs/usingcopy.php" target="_blank">http://techdocs.postgresql.org/techdocs/usingcopy.php</a></font><br/><font size="2"><a href="http://www.faqs.org/docs/ppbook/x5504.htm"target="_blank">http://www.faqs.org/docs/ppbook/x5504.htm</a></font><br /><fontsize="2"><a href="http://www.postgresql.org/docs/7.4/static/sql-copy.html" target="_blank">http://www.postgresql.org/docs/7.4/static/sql-copy.html</a></font><br/><font size="2">-- </font><br /><fontsize="2">output = ("cbbrowne" "@" "ntlug.org") <a href="http://www3.sympatico.ca/cbbrowne/lsf.html" target="_blank">http://www3.sympatico.ca/cbbrowne/lsf.html</a></font><br/><font size="2">Question: How many surrealists doesit take to change a light bulb?</font><p><font size="2">Answer: Two, one to hold the giraffe, and the other to fill thebathtub</font><br /><font size="2"> with brightly colored machine tools.</font><p><font size="2">---------------------------(endof broadcast)---------------------------</font><br /><font size="2">TIP 2: you canget off all lists at once with the unregister command</font><br /><font size="2"> (send "unregister YourEmailAddressHere"to majordomo@postgresql.org)</font><table><tr><td bgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
A tested example in Python : Basically it counts the \t and accumulates the lines until it has enough and then prints the line. Note : as an exercise you could add a test so that there are exactly (columns-1) delimiters and not >=(columns-1). def grouplines( in_stream, columns, delimiter ):num_delimiters = columns - 1accum = ''for line in in_stream: accum +=line if accum.count( delimiter ) >= num_delimiters: print accum.replace( "\n", "\\n" ) accum = '' if accum: print "Last line unterminated." grouplines( open( 'data.in' ), 3, "\t" ): Input data (I added a column over your example): 1 What a day! A 2 What a week it has been! B 3 What the! C Output : 1 What a day! A\n 2 What a week it has\nbeen! B\n 3 What the! C Have fun with your copy ! On Tue, 12 Oct 2004 15:33:46 +1000, Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> wrote: > > Thanks for all your comments, > > I have beent trying the insert within a transaction block, however > it does not seem to reduce the time it takes to process each records. > Mind > you there are 80 column and the insert statement explicitly defines the > column to insert into. > > I need any tip I can get help me transform the text file into a > format postgres copy will successfully read. > > > Here is sample of the current format of a mysql tab delimited dump.. > > columnA columnB > > 1 What a day! > 2 What a week it has > been! > 3 What the! > > As you can see row 2 has a value that holds a CR value which ends up > wrapping around onto the third line. Postgres copy command does not like > this and mysql is unable to replace the value with another type of > delimiter, like a \r. > > So I gather I have to some how manually replace the carriage return with > something postgres understand \r... > > > columnA columnB > > 1 What a day! > 2 What a week it has \r been! > 3 What the! > > How do I do this without getting a text file that looks like this > > 1 What a day! \r\n2 What a week it has \r been!\r\n3 > What the!\r\n > > Any help would be appreciated. > > Theo > > -----Original Message----- > From: Christopher Browne [mailto:cbbrowne@acm.org] > > Sent: Tuesday, 12 October 2004 10:46 AM > To: pgsql-sql@postgresql.org > Subject: Re: [SQL] Export tab delimited from mysql to postgres. > > > Quoth Theo.Galanakis@lonelyplanet.com.au (Theo Galanakis): >> Could you provide a example of how to do this? >> >> I actually ended up exporting the data as Insert statements, >> which strips out cf/lf within varchars. However it takes an eternity > >> to import 200,000 records... 24 hours infact???? Is this normal? > > I expect that this results from each INSERT being a separate transaction. > > If you put a BEGIN at the start and a COMMIT at the end, you'd doubtless > see > an ENORMOUS improvement. > > That's not even the _big_ improvement, either. The _big_ improvement > would > involve reformatting the data so that you could use the COPY statement, > which is _way_ faster than a bunch of INSERTs. Take a look at the > documentation to see the formatting that is needed: > > http://techdocs.postgresql.org/techdocs/usingcopy.php > http://www.faqs.org/docs/ppbook/x5504.htm > http://www.postgresql.org/docs/7.4/static/sql-copy.html > -- > > output = ("cbbrowne" "@" "ntlug.org") > http://www3.sympatico.ca/cbbrowne/lsf.html > Question: How many surrealists does it take to change a light bulb? > > Answer: Two, one to hold the giraffe, and the other to fill the bathtub > with brightly colored machine tools. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > ______________________________________________________________________ > This email, including attachments, is intended only for the addressee > and may be confidential, privileged and subject to copyright. If you > have received this email in error, please advise the sender and delete > it. If you are not the intended recipient of this email, you must not > use, copy or disclose its content to anyone. You must not copy or > > communicate to others content that is confidential or subject to > > copyright, unless you have the consent of the content owner.
Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> writes: > As you can see row 2 has a value that holds a CR value which ends up > wrapping around onto the third line. Postgres copy command does not like > this and mysql is unable to replace the value with another type of > delimiter, like a \r. > So I gather I have to some how manually replace the carriage return with > something postgres understand \r... > columnA columnB > 1 What a day! > 2 What a week it has \r been! > 3 What the! > How do I do this without getting a text file that looks like this > 1 What a day! \r\n2 What a week it has \r been!\r\n3 > What the!\r\n Looks like a simple sed problem to me: cat myfile | sed 's/\r$//' | sed 's/\r/\\r/g' >newfile (Most likely you can do both steps in one sed process, but this is easy to follow.) This assumes that you only have newlines (\n) at the real ends of lines, else you need to think harder about how to tell the difference between data and formatting. regards, tom lane
<p><font size="2">Thankyou all for your advice,</font><p><font size="2">Looking further into the issue I have realised that"data and formatting" lines are denoted by the same CR/LF symbols. So if I where to replace all CR/LF symbols it wouldcreate a file with one line... I have replace the CR/LF for reading purposes below with \r\n or alternatively 0D 0A(hex):</font><p><font size="2">columnA columnB </font><br /><font size="2">1 What a day!\r\n</font><br/><font size="2">2 What a week it has\r\n</font><br /><font size="2">been!\r\n</font><br/><font size="2">3 What the!\r\n</font><p><font size="2">What this means that itis impossible to move line three back to line two to look something like this:</font><p><font size="2">columnA columnB </font><br /><font size="2">1 What a day!\r\n</font><br /><font size="2">2 What a week it has\rbeen!\r\n</font><br /><font size="2">3 What the!\r\n</font><p><fontsize="2">I was thinking of writing code that checked each line to see if correct number of tab delimitedelements in this case 2 elements, if this was not the case replace the \r\n with \r.</font><p><font size="2">Theo</font><br/><p><font size="2">-----Original Message-----</font><br /><font size="2">From: Harald Fuchs [<a href="mailto:hf0722x@protecting.net">mailto:hf0722x@protecting.net</a>]</font><br /><font size="2">Sent: Wednesday, 13 October2004 1:38 AM</font><br /><font size="2">To: Theo Galanakis</font><br /><font size="2">Subject: Re: Export tab delimitedfrom mysql to postgres.</font><br /><p><font size="2">> Could you provide a example of how to do this?</font><p><fontsize="2">Depends on what exactly your dump file contains. Probably something like</font><p><font size="2"> sed 's/\\\r/\\r/g'</font><p><font size="2">This means: replace all carriage returns preceded by a backslash by'\r'.</font><table><tr><td bgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>