Thread: Trying to load MySQL data
Hello, I am working on a project where we are converting from MySQL to Postgres. I figured the easiest way would be to export the MySQL data as CSV. I'm having a problem importing some of the data. What I have done is exported the MySQL data and then modified it so that all single quotes (a ' quote) are doubled, and null values are replaced with an empty value in the CSV. Our data, for example, looks like this: 2628,'Poster,5,'255.255.18.138',,'[quote=gz]I''m curious, why not allow users, then to freely edit topics?[/quote] We could either go with "it''s a minor annoyance that is one measure that helps keep DB load lower," or "we do it just to piss you off." You choose.',0,1124498148,,,376,0,0,, This row causes an error. The error is that: The value "why not allow users" is not valid for column 'x'. Column x is the first column after the long section of text, with a value of '0'. It appears that the quote in "I'm", which has been doubled quoted to '', is not being properly skipped over, and COPY thinks that the next comma (after "curious") is a new column, and tries to start inserting data there. At least that's what I have come up with. Why is this happening? I've used this method before and I didn't have any trouble, when loading a bunch of Wikipedia test data (which has all manner of quotes, commas, and apostrophes in it). This is line 39150 in the file; all previous lines import fine, but this kills the COPY process and all of the previous inserts are rolled back. I need to get this data loaded intact. My copy command is COPY posts FROM '/tmp/posts.txt' CSV QUOTE $$'$$;
On Tue, 2007-01-09 at 19:54, garrettmoore@gmail.com wrote: > Hello, > > I am working on a project where we are converting from MySQL to > Postgres. I figured the easiest way would be to export the MySQL data > as CSV. > > I'm having a problem importing some of the data. What I have done is > exported the MySQL data and then modified it so that all single quotes > (a ' quote) are doubled, and null values are replaced with an empty > value in the CSV. > > Our data, for example, looks like this: > > 2628,'Poster,5,'255.255.18.138',,'[quote=gz]I''m curious, why not allow > users, then to freely edit topics?[/quote] > We could either go with "it''s a minor annoyance that is one measure > that helps keep DB load lower," or "we do it just to piss you off." > You choose.',0,1124498148,,,376,0,0,, If that is indeed the line, then this part: 2628,'Poster,5,'255.255.18.138', is misformed. I'm assuming you really need: 2628,'Poster',5,'255.255.18.138',
garrettmoore@gmail.com wrote: > Hello, > > I am working on a project where we are converting from MySQL to > Postgres. I figured the easiest way would be to export the MySQL data > as CSV. > > I'm having a problem importing some of the data. What I have done is > exported the MySQL data and then modified it so that all single quotes > (a ' quote) are doubled, and null values are replaced with an empty > value in the CSV. > > Our data, for example, looks like this: > > 2628,'Poster,5,'255.255.18.138',,'[quote=gz]I''m curious, why not allow > users, then to freely edit topics?[/quote] > We could either go with "it''s a minor annoyance that is one measure > that helps keep DB load lower," or "we do it just to piss you off." > You choose.',0,1124498148,,,376,0,0,, > > This row causes an error. The error is that: > > The value "why not allow users" is not valid for column 'x'. > > Column x is the first column after the long section of text, with a > value of '0'. > > It appears that the quote in "I'm", which has been doubled quoted to > '', is not being properly skipped over, and COPY thinks that the next > comma (after "curious") is a new column, and tries to start inserting > data there. At least that's what I have come up with. > > Why is this happening? I've used this method before and I didn't have > any trouble, when loading a bunch of Wikipedia test data (which has all > manner of quotes, commas, and apostrophes in it). > > This is line 39150 in the file; all previous lines import fine, but > this kills the COPY process and all of the previous inserts are rolled > back. I need to get this data loaded intact. > > My copy command is > COPY posts FROM '/tmp/posts.txt' CSV QUOTE $$'$$; > It appears that you're missing a quote after the word 'Poster': > 2628,'Poster,5,'255.255.18.138', brian
garrettmoore@gmail.com wrote: > Hello, > > I am working on a project where we are converting from MySQL to > Postgres. I figured the easiest way would be to export the MySQL data > as CSV. I dunno, but unless you don't really care about your data, I'd use something that you have no chance of in your data. Things like | and ~ are pretty standard field separators. So you'd use things like FIELDS TERMINATED BY '|' in your MySQL export and WITH DELIMITER '|' NULL '' in your postgresql import Also remember that mySQL nulls are nothing like postgresql nulls. -- Walter
Hi, Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a écrit : > I am working on a project where we are converting from MySQL to > Postgres. I figured the easiest way would be to export the MySQL data > as CSV. You could also give pgloader a try. It uses COPY but allows you to load good data even in the presence of errors, and have a reject file containing erroneous data lines, to replay insertion later. It even allows you to reorder data for matching your columns definition, but as of now suffer from a psycopg2 limitation : you have to provide all table columns into your data file. http://pgfoundry.org/projects/pgloader/ http://debian.dalibo.org/unstable/ http://debian.dalibo.org/unstable/pgloader_2.0.2.tar.gz Hope this helps, -- Dimitri Fontaine http://www.dalibo.com/
Attachment
On 1/10/07, Dimitri Fontaine <dim@dalibo.com> wrote: > Hi, > > Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a écrit: > > I am working on a project where we are converting from MySQL to > > Postgres. I figured the easiest way would be to export the MySQL data > > as CSV. If you are using pg 8.2+, I've had good luck with the following: 1. create pgsql schema by hand or using some method, so they match mysql 2. mysqldump --compatible=postgresql [yadda] | grep ^INSERT | psql [yadda] this will work for most data types. as of 8.2, postgresql supports multiple record inserts, which while not as fast as copy, is pretty close. if mysqldump is dumping single line inserts, change it to multiple with -e switch iiirc. merlin
Merlin Moncure wrote: > On 1/10/07, Dimitri Fontaine <dim@dalibo.com> wrote: > > Hi, > > > > Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a ?crit: > > > I am working on a project where we are converting from MySQL to > > > Postgres. I figured the easiest way would be to export the MySQL data > > > as CSV. > > If you are using pg 8.2+, I've had good luck with the following: > > 1. create pgsql schema by hand or using some method, so they match mysql > 2. mysqldump --compatible=postgresql [yadda] | grep ^INSERT | psql [yadda] Wow, mysqldump has a postgresql compatibility mode? Intersting. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 1/11/07, Bruce Momjian <bruce@momjian.us> wrote: > Merlin Moncure wrote: > > On 1/10/07, Dimitri Fontaine <dim@dalibo.com> wrote: > > > Hi, > > > > > > Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a ?crit: > > > > I am working on a project where we are converting from MySQL to > > > > Postgres. I figured the easiest way would be to export the MySQL data > > > > as CSV. > > > > If you are using pg 8.2+, I've had good luck with the following: > > > > 1. create pgsql schema by hand or using some method, so they match mysql > > 2. mysqldump --compatible=postgresql [yadda] | grep ^INSERT | psql [yadda] > > Wow, mysqldump has a postgresql compatibility mode? Intersting. It does (had it for years), but it doesn't do very much...fixes the quotes and a couple of other things. In particular I know of no easy ways to convert the table schemas without use of external tools. merlin
The missing quote after Poster is a mistake I made when sanitzing the data for posting here. That error is NOT present in the actual data. There is a quote where needed in the data. So, with that in mind, why am I still getting the error? Also, there is no symbol we can expect to not be in the data. This data is from several sources, including a message board, and there could be tildes, pipes, or any other symbol in discussion fields. Also since it's CSV we just have null represented by lack of any value between two commas, so: a,b,,d represents a row with values a, b, NULL, d. This works fine in general. Walter Vaughan wrote: > garrettmoore@gmail.com wrote: > > > Hello, > > > > I am working on a project where we are converting from MySQL to > > Postgres. I figured the easiest way would be to export the MySQL data > > as CSV. > > I dunno, but unless you don't really care about your data, I'd use something > that you have no chance of in your data. Things like | and ~ are pretty standard > field separators. > > So you'd use things like FIELDS TERMINATED BY '|' in your MySQL export > and WITH DELIMITER '|' NULL '' in your postgresql import > > Also remember that mySQL nulls are nothing like postgresql nulls. > > -- > Walter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/
On Wed, 2007-01-10 at 16:51, garrettmoore@gmail.com wrote: > The missing quote after Poster is a mistake I made when sanitzing the > data for posting here. That error is NOT present in the actual data. > There is a quote where needed in the data. So, with that in mind, why > am I still getting the error? > > Also, there is no symbol we can expect to not be in the data. This data > is from several sources, including a message board, and there could be > tildes, pipes, or any other symbol in discussion fields. > > Also since it's CSV we just have null represented by lack of any value > between two commas, so: a,b,,d represents a row with values a, b, NULL, > d. This works fine in general. Can you make a sanitized test case, complete unto itself, and post that? The data to look for are generally \ and '