Re: Export tab delimited from mysql to postgres. - Mailing list pgsql-sql
From | Pierre-Frédéric Caillaud |
---|---|
Subject | Re: Export tab delimited from mysql to postgres. |
Date | |
Msg-id | opsfqxb7stcq72hf@musicbox Whole thread Raw |
In response to | Re: Export tab delimited from mysql to postgres. (Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au>) |
List | pgsql-sql |
A tested example in Python : Basically it counts the \t and accumulates the lines until it has enough and then prints the line. Note : as an exercise you could add a test so that there are exactly (columns-1) delimiters and not >=(columns-1). def grouplines( in_stream, columns, delimiter ):num_delimiters = columns - 1accum = ''for line in in_stream: accum +=line if accum.count( delimiter ) >= num_delimiters: print accum.replace( "\n", "\\n" ) accum = '' if accum: print "Last line unterminated." grouplines( open( 'data.in' ), 3, "\t" ): Input data (I added a column over your example): 1 What a day! A 2 What a week it has been! B 3 What the! C Output : 1 What a day! A\n 2 What a week it has\nbeen! B\n 3 What the! C Have fun with your copy ! On Tue, 12 Oct 2004 15:33:46 +1000, Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> wrote: > > Thanks for all your comments, > > I have beent trying the insert within a transaction block, however > it does not seem to reduce the time it takes to process each records. > Mind > you there are 80 column and the insert statement explicitly defines the > column to insert into. > > I need any tip I can get help me transform the text file into a > format postgres copy will successfully read. > > > Here is sample of the current format of a mysql tab delimited dump.. > > columnA columnB > > 1 What a day! > 2 What a week it has > been! > 3 What the! > > As you can see row 2 has a value that holds a CR value which ends up > wrapping around onto the third line. Postgres copy command does not like > this and mysql is unable to replace the value with another type of > delimiter, like a \r. > > So I gather I have to some how manually replace the carriage return with > something postgres understand \r... > > > columnA columnB > > 1 What a day! > 2 What a week it has \r been! > 3 What the! > > How do I do this without getting a text file that looks like this > > 1 What a day! \r\n2 What a week it has \r been!\r\n3 > What the!\r\n > > Any help would be appreciated. > > Theo > > -----Original Message----- > From: Christopher Browne [mailto:cbbrowne@acm.org] > > Sent: Tuesday, 12 October 2004 10:46 AM > To: pgsql-sql@postgresql.org > Subject: Re: [SQL] Export tab delimited from mysql to postgres. > > > Quoth Theo.Galanakis@lonelyplanet.com.au (Theo Galanakis): >> Could you provide a example of how to do this? >> >> I actually ended up exporting the data as Insert statements, >> which strips out cf/lf within varchars. However it takes an eternity > >> to import 200,000 records... 24 hours infact???? Is this normal? > > I expect that this results from each INSERT being a separate transaction. > > If you put a BEGIN at the start and a COMMIT at the end, you'd doubtless > see > an ENORMOUS improvement. > > That's not even the _big_ improvement, either. The _big_ improvement > would > involve reformatting the data so that you could use the COPY statement, > which is _way_ faster than a bunch of INSERTs. Take a look at the > documentation to see the formatting that is needed: > > http://techdocs.postgresql.org/techdocs/usingcopy.php > http://www.faqs.org/docs/ppbook/x5504.htm > http://www.postgresql.org/docs/7.4/static/sql-copy.html > -- > > output = ("cbbrowne" "@" "ntlug.org") > http://www3.sympatico.ca/cbbrowne/lsf.html > Question: How many surrealists does it take to change a light bulb? > > Answer: Two, one to hold the giraffe, and the other to fill the bathtub > with brightly colored machine tools. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > ______________________________________________________________________ > This email, including attachments, is intended only for the addressee > and may be confidential, privileged and subject to copyright. If you > have received this email in error, please advise the sender and delete > it. If you are not the intended recipient of this email, you must not > use, copy or disclose its content to anyone. You must not copy or > > communicate to others content that is confidential or subject to > > copyright, unless you have the consent of the content owner.