Thread: Getting Mysql data into Postgres: least painful methods?
I'm wondering if anyone can point me towards a good method for moving mysql data into Postgres? I've done some web searching, and found documentation from various years, but it's not clear what's current and what works best. Much of what I found seems to be flame war material (why Postgres is better), or is both old and seemingly involved and complex.
Here's the fuller description of what I'm trying to do. I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database. It's all reference data, and so will be read-only. There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.
The data as distributed can be had in Oracle or Mysql formats. (I already gave them my two cents to include Postgres.) I did see some information about modifying the Mysql distribution files to make them Postgres-compatible, but I thought (perhaps foolishly) it would be easier to bring them into Mysql, and from there export them to Postgres.
A recurring idea seemed to be to use:
mysqldump -v --compatible=postgresql umls_test > dumpfile.sql
followed by
sed -i "s/\\\'/\'\'/g" dumpfile.sql
but that didn't bring me much success. I figure this has to be a fairly common need, and hopefully by 2013 there's an easy solution. Thanks in advance!
Ken
--
Here's the fuller description of what I'm trying to do. I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database. It's all reference data, and so will be read-only. There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.
The data as distributed can be had in Oracle or Mysql formats. (I already gave them my two cents to include Postgres.) I did see some information about modifying the Mysql distribution files to make them Postgres-compatible, but I thought (perhaps foolishly) it would be easier to bring them into Mysql, and from there export them to Postgres.
A recurring idea seemed to be to use:
mysqldump -v --compatible=postgresql umls_test > dumpfile.sql
followed by
sed -i "s/\\\'/\'\'/g" dumpfile.sql
but that didn't bring me much success. I figure this has to be a fairly common need, and hopefully by 2013 there's an easy solution. Thanks in advance!
Ken
--
AGENCY Software
A data system that puts you in control
(253) 245-3801
On 01/11/2013 03:54 PM, Ken Tanzer wrote: > > > > but that didn't bring me much success. I figure this has to be a fairly > common need, and hopefully by 2013 there's an easy solution. Thanks in > advance! Have you looked at Foreign Data Wrappers(FDW): http://www.postgresql.org/docs/9.1/static/sql-createforeigndatawrapper.html If you use Python there is Multicorn: http://www.postgresql.org/docs/9.1/static/sql-createforeigndatawrapper.html There is also mysql_fdw: http://wiki.postgresql.org/wiki/Foreign_data_wrappers#mysql_fdw > > Ken > > -- -- Adrian Klaver adrian.klaver@gmail.com
You can search from google, https://www.google.com/search?q=mysql2pg&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
On Sat, Jan 12, 2013 at 7:54 AM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
I'm wondering if anyone can point me towards a good method for moving mysql data into Postgres? I've done some web searching, and found documentation from various years, but it's not clear what's current and what works best. Much of what I found seems to be flame war material (why Postgres is better), or is both old and seemingly involved and complex.
Here's the fuller description of what I'm trying to do. I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database. It's all reference data, and so will be read-only. There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.
The data as distributed can be had in Oracle or Mysql formats. (I already gave them my two cents to include Postgres.) I did see some information about modifying the Mysql distribution files to make them Postgres-compatible, but I thought (perhaps foolishly) it would be easier to bring them into Mysql, and from there export them to Postgres.
A recurring idea seemed to be to use:
mysqldump -v --compatible=postgresql umls_test > dumpfile.sql
followed by
sed -i "s/\\\'/\'\'/g" dumpfile.sql
but that didn't bring me much success. I figure this has to be a fairly common need, and hopefully by 2013 there's an easy solution. Thanks in advance!
Ken
--AGENCY SoftwareA data system that puts you in control
On Fri, 11 Jan 2013, Ken Tanzer wrote: > I'm wondering if anyone can point me towards a good method for moving > mysql data into Postgres? I had to do this last year with the ITIS (Integrated Taxonomic Information System) maintained by the US Geological Survey. Some MySQL key words were immediately recognized and I used emac's global-search-and-replace to change them to postgres words. Then I tried reading in individual tables to a newly created database and redirected errors to a disk file. I fixed the errors postgres identified, dropped the table, and repeated until there were no errors. Took a bit of time but worked just fine. Then I sent the USGS database maintainer a dump of the postgres database because he wanted to migrate from mysql to postgres there. I think of it as a public service. :-) Rich
On 1/11/2013 3:54 PM, Ken Tanzer wrote:
Here's the fuller description of what I'm trying to do. I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database. It's all reference data, and so will be read-only. There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.
how many tables? if its just one or a couple tables, can you get the data as CSV? then it would be trivial to import into postgres, using the COPY command (or, \c from psql)...
another alternative, investigate "ETL" tools, these are general purpose data manglers that can connect to a source database (usually any of about 20 supported), extract data, transform it if needed, and load it into a destination database (from a list of 20 or so typically supported)
Thanks for all the responses. I would totally go with the FDW, were I not stuck using (someone else's) 8.4.
In this case it looks like 24 tables, with CSV-like import files totaling 7G.
Since there didn't seem to be a clean, simple and automated path from mysql to postgres, I'm back to skipping mysql entirely and just trying to modify the mysql files to feed directly into postgres.
To that end, they have to be transformed a bit, which I've written a bit of script to accomplish. I'm wondering if there's a way to avoid creating another 7G of slightly-modified import files before feeding them to postgres. Specifically, is there a way to do something like
\copy my_table FROM '`cat my_import_file | my_transform_script`'
My 2 goals here are to be somewhat efficient (by not duplicating the input files), and to keep this all within a transaction. I could have the script transform each file separately and pipe it to postgres:
(echo 'copy mytable from stdin...' ; cat my_import_file | my_transform_script ) | psql
but I'm thinking that there's no way to group those all into a transaction.
Hopefully this makes sense, and any suggestions welcome. Thanks.
Ken
--
In this case it looks like 24 tables, with CSV-like import files totaling 7G.
Since there didn't seem to be a clean, simple and automated path from mysql to postgres, I'm back to skipping mysql entirely and just trying to modify the mysql files to feed directly into postgres.
To that end, they have to be transformed a bit, which I've written a bit of script to accomplish. I'm wondering if there's a way to avoid creating another 7G of slightly-modified import files before feeding them to postgres. Specifically, is there a way to do something like
\copy my_table FROM '`cat my_import_file | my_transform_script`'
My 2 goals here are to be somewhat efficient (by not duplicating the input files), and to keep this all within a transaction. I could have the script transform each file separately and pipe it to postgres:
(echo 'copy mytable from stdin...' ; cat my_import_file | my_transform_script ) | psql
but I'm thinking that there's no way to group those all into a transaction.
Hopefully this makes sense, and any suggestions welcome. Thanks.
Ken
On Fri, Jan 11, 2013 at 7:13 PM, John R Pierce <pierce@hogranch.com> wrote:
how many tables? if its just one or a couple tables, can you get the data as CSV? then it would be trivial to import into postgres, using the COPY command (or, \c from psql)...On 1/11/2013 3:54 PM, Ken Tanzer wrote:Here's the fuller description of what I'm trying to do. I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database. It's all reference data, and so will be read-only. There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.
another alternative, investigate "ETL" tools, these are general purpose data manglers that can connect to a source database (usually any of about 20 supported), extract data, transform it if needed, and load it into a destination database (from a list of 20 or so typically supported)
--
AGENCY Software
A data system that puts you in control
(253) 245-3801
On 01/15/2013 04:09 PM, Ken Tanzer wrote: > Thanks for all the responses. I would totally go with the FDW, were I > not stuck using (someone else's) 8.4. 8.4 supports FDW so I will assume you do not have the permissions to create one. > > In this case it looks like 24 tables, with CSV-like import files > totaling 7G. > > Since there didn't seem to be a clean, simple and automated path from > mysql to postgres, I'm back to skipping mysql entirely and just trying > to modify the mysql files to feed directly into postgres. > > To that end, they have to be transformed a bit, which I've written a bit > of script to accomplish. I'm wondering if there's a way to avoid > creating another 7G of slightly-modified import files before feeding > them to postgres. Specifically, is there a way to do something like > > \copy my_table FROM '`cat my_import_file | my_transform_script`' The way I have done this is to create a script using Python that follows this flow: MySQL --> MySQLdb module --> Data transform --> psycopg2 --> Postgres OR csv --> csv module ------^ In the script you can set up the transactions as you like, per row, batches, or everything in one transaction. > > My 2 goals here are to be somewhat efficient (by not duplicating the > input files), and to keep this all within a transaction. I could have > the script transform each file separately and pipe it to postgres: > > (echo 'copy mytable from stdin...' ; cat my_import_file | > my_transform_script ) | psql > > but I'm thinking that there's no way to group those all into a transaction. > > Hopefully this makes sense, and any suggestions welcome. Thanks. > > Ken -- Adrian Klaver adrian.klaver@gmail.com
8.4 supports FDW so I will assume you do not have the permissions to create one.
Thanks but I'm confused--the doc page you mentioned says the mysql FDW isn't supported until 9.1.
The way I have done this is to create a script using Python that follows this flow:
In this case it looks like 24 tables, with CSV-like import files
totaling 7G.
Since there didn't seem to be a clean, simple and automated path from
mysql to postgres, I'm back to skipping mysql entirely and just trying
to modify the mysql files to feed directly into postgres.
To that end, they have to be transformed a bit, which I've written a bit
of script to accomplish. I'm wondering if there's a way to avoid
creating another 7G of slightly-modified import files before feeding
them to postgres. Specifically, is there a way to do something like
\copy my_table FROM '`cat my_import_file | my_transform_script`'
MySQL --> MySQLdb module --> Data transform --> psycopg2 --> Postgres
OR
csv --> csv module ------^
In the script you can set up the transactions as you like, per row, batches, or everything in one transaction.
I'm also not sure about this. Are we both talking about a process that bypasses mysql itself, and transforms its input files on the fly? I want to write a script that will...
#!/bin/sh
... my script stuff...
... create tables...
psql \copy from (transformed mysql file 1)
psql \copy from (transformed mysql file 2)
psql \copy from (transformed mysql file 3)
... more script stuff...
without re-writing the mysql files, and within one transaction. I can't tell if your answer was getting at that, or something else.
Ken
#!/bin/sh
... my script stuff...
... create tables...
psql \copy from (transformed mysql file 1)
psql \copy from (transformed mysql file 2)
psql \copy from (transformed mysql file 3)
... more script stuff...
without re-writing the mysql files, and within one transaction. I can't tell if your answer was getting at that, or something else.
Ken
--
My 2 goals here are to be somewhat efficient (by not duplicating the
input files), and to keep this all within a transaction. I could have
the script transform each file separately and pipe it to postgres:
(echo 'copy mytable from stdin...' ; cat my_import_file |
my_transform_script ) | psql
but I'm thinking that there's no way to group those all into a transaction.
Hopefully this makes sense, and any suggestions welcome. Thanks.
Ken
Adrian Klaver
adrian.klaver@gmail.com
--
AGENCY Software
A data system that puts you in control
(253) 245-3801
On 01/15/2013 04:41 PM, Ken Tanzer wrote: > 8.4 supports FDW so I will assume you do not have the permissions to > create one. > > > Thanks but I'm confused--the doc page you mentioned says the mysql FDW > isn't supported until 9.1. My apologies. FDW appeared in 8.4, the extension mechanism which the mysql FDW and Multicorn use exists only in 9.1+ > > -- Adrian Klaver adrian.klaver@gmail.com