Thread: Getting Mysql data into Postgres: least painful methods?

Getting Mysql data into Postgres: least painful methods?

From
Ken Tanzer
Date:
I'm wondering if anyone can point me towards a good method for moving mysql data into Postgres?  I've done some web searching, and found documentation from various years, but it's not clear what's current and what works best.  Much of what I found seems to be flame war material (why Postgres is better), or is both old and seemingly involved and complex.

Here's the fuller description of what I'm trying to do.  I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database.  It's all reference data, and so will be read-only.  There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.

The data as distributed can be had in Oracle or Mysql formats.  (I already gave them my two cents to include Postgres.)  I did see some information about modifying the Mysql distribution files to make them Postgres-compatible, but I thought (perhaps foolishly) it would be easier to bring them into Mysql, and from there export them to Postgres.

A recurring idea seemed to be to use:

mysqldump -v --compatible=postgresql umls_test > dumpfile.sql

followed by

sed -i "s/\\\'/\'\'/g" dumpfile.sql

but that didn't bring me much success.  I figure this has to be a fairly common need, and hopefully by 2013 there's an easy solution.  Thanks in advance!

Ken

--
AGENCY Software  
A data system that puts you in control
(253) 245-3801

Re: Getting Mysql data into Postgres: least painful methods?

From
Adrian Klaver
Date:
On 01/11/2013 03:54 PM, Ken Tanzer wrote:
>
>
>
> but that didn't bring me much success.  I figure this has to be a fairly
> common need, and hopefully by 2013 there's an easy solution.  Thanks in
> advance!

Have you looked at Foreign Data Wrappers(FDW):

http://www.postgresql.org/docs/9.1/static/sql-createforeigndatawrapper.html

If you use Python there is Multicorn:
http://www.postgresql.org/docs/9.1/static/sql-createforeigndatawrapper.html

There is also mysql_fdw:
http://wiki.postgresql.org/wiki/Foreign_data_wrappers#mysql_fdw

>
> Ken
>
> --


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Getting Mysql data into Postgres: least painful methods?

From
wd
Date:


On Sat, Jan 12, 2013 at 7:54 AM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
I'm wondering if anyone can point me towards a good method for moving mysql data into Postgres?  I've done some web searching, and found documentation from various years, but it's not clear what's current and what works best.  Much of what I found seems to be flame war material (why Postgres is better), or is both old and seemingly involved and complex.

Here's the fuller description of what I'm trying to do.  I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database.  It's all reference data, and so will be read-only.  There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.

The data as distributed can be had in Oracle or Mysql formats.  (I already gave them my two cents to include Postgres.)  I did see some information about modifying the Mysql distribution files to make them Postgres-compatible, but I thought (perhaps foolishly) it would be easier to bring them into Mysql, and from there export them to Postgres.

A recurring idea seemed to be to use:

mysqldump -v --compatible=postgresql umls_test > dumpfile.sql

followed by

sed -i "s/\\\'/\'\'/g" dumpfile.sql

but that didn't bring me much success.  I figure this has to be a fairly common need, and hopefully by 2013 there's an easy solution.  Thanks in advance!

Ken

--
AGENCY Software  
A data system that puts you in control


Re: Getting Mysql data into Postgres: least painful methods?

From
Rich Shepard
Date:
On Fri, 11 Jan 2013, Ken Tanzer wrote:

> I'm wondering if anyone can point me towards a good method for moving
> mysql data into Postgres?

   I had to do this last year with the ITIS (Integrated Taxonomic Information
System) maintained by the US Geological Survey.

   Some MySQL key words were immediately recognized and I used emac's
global-search-and-replace to change them to postgres words. Then I tried
reading in individual tables to a newly created database and redirected
errors to a disk file. I fixed the errors postgres identified, dropped the
table, and repeated until there were no errors. Took a bit of time but
worked just fine.

   Then I sent the USGS database maintainer a dump of the postgres database
because he wanted to migrate from mysql to postgres there. I think of it as
a public service. :-)

Rich



Re: Getting Mysql data into Postgres: least painful methods?

From
John R Pierce
Date:
On 1/11/2013 3:54 PM, Ken Tanzer wrote:
Here's the fuller description of what I'm trying to do.  I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database.  It's all reference data, and so will be read-only.  There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.


how many tables?  if its just one or a couple tables, can you get the data as CSV?   then it would be trivial to import into postgres, using the COPY command (or, \c from psql)...

another alternative, investigate "ETL" tools, these are general purpose data manglers that can connect to a source database (usually any of about 20 supported), extract data, transform it if needed, and load it into a destination database (from a list of 20 or so typically supported)



Re: Getting Mysql data into Postgres: least painful methods?

From
Ken Tanzer
Date:
Thanks for all the responses.  I would totally go with the FDW, were I not stuck using (someone else's) 8.4.

In this case it looks like 24 tables, with CSV-like import files totaling 7G.

Since there didn't seem to be a clean, simple and automated path from mysql to postgres, I'm back to skipping mysql entirely and just trying to modify the mysql files to feed directly into postgres.

To that end, they have to be transformed a bit, which I've written a bit of script to accomplish.  I'm wondering if there's a way to avoid creating another 7G of slightly-modified import files before feeding them to postgres.  Specifically, is there a way to do something like

\copy my_table FROM '`cat my_import_file | my_transform_script`'

My 2 goals here are to be somewhat efficient (by not duplicating the input files), and to keep this all within a transaction.  I could have the script transform each file separately and pipe it to postgres:

(echo 'copy mytable from stdin...' ; cat my_import_file | my_transform_script ) | psql

but I'm thinking that there's no way to group those all into a transaction.

Hopefully this makes sense, and any suggestions welcome.  Thanks.

Ken




On Fri, Jan 11, 2013 at 7:13 PM, John R Pierce <pierce@hogranch.com> wrote:
On 1/11/2013 3:54 PM, Ken Tanzer wrote:
Here's the fuller description of what I'm trying to do.  I've got a dataset (a UMLS Metathesaurus subset) that I need to get into a Postgres database.  It's all reference data, and so will be read-only.  There's no functions or logic involved. I anticipate having to update it at least quarterly, so I'd like to get to a well-grooved import process.


how many tables?  if its just one or a couple tables, can you get the data as CSV?   then it would be trivial to import into postgres, using the COPY command (or, \c from psql)...

another alternative, investigate "ETL" tools, these are general purpose data manglers that can connect to a source database (usually any of about 20 supported), extract data, transform it if needed, and load it into a destination database (from a list of 20 or so typically supported)






--
AGENCY Software  
A data system that puts you in control
(253) 245-3801

Re: Getting Mysql data into Postgres: least painful methods?

From
Adrian Klaver
Date:
On 01/15/2013 04:09 PM, Ken Tanzer wrote:
> Thanks for all the responses.  I would totally go with the FDW, were I
> not stuck using (someone else's) 8.4.

8.4 supports FDW so I will assume you do not have the permissions to
create one.

>
> In this case it looks like 24 tables, with CSV-like import files
> totaling 7G.
>
> Since there didn't seem to be a clean, simple and automated path from
> mysql to postgres, I'm back to skipping mysql entirely and just trying
> to modify the mysql files to feed directly into postgres.
>
> To that end, they have to be transformed a bit, which I've written a bit
> of script to accomplish.  I'm wondering if there's a way to avoid
> creating another 7G of slightly-modified import files before feeding
> them to postgres.  Specifically, is there a way to do something like
>
> \copy my_table FROM '`cat my_import_file | my_transform_script`'


The way I have done this is to create a script using Python that follows
this flow:

MySQL --> MySQLdb module --> Data transform --> psycopg2 --> Postgres
                             OR
csv --> csv module ------^


In the script you can set up the transactions as you like, per row,
batches, or everything in one transaction.

>
> My 2 goals here are to be somewhat efficient (by not duplicating the
> input files), and to keep this all within a transaction.  I could have
> the script transform each file separately and pipe it to postgres:
>
> (echo 'copy mytable from stdin...' ; cat my_import_file |
> my_transform_script ) | psql
>
> but I'm thinking that there's no way to group those all into a transaction.
>
> Hopefully this makes sense, and any suggestions welcome.  Thanks.
>
> Ken



--
Adrian Klaver
adrian.klaver@gmail.com


Re: Getting Mysql data into Postgres: least painful methods?

From
Ken Tanzer
Date:
8.4 supports FDW so I will assume you do not have the permissions to create one.

Thanks but I'm confused--the doc page you mentioned says the mysql FDW isn't supported until 9.1.







In this case it looks like 24 tables, with CSV-like import files
totaling 7G.

Since there didn't seem to be a clean, simple and automated path from
mysql to postgres, I'm back to skipping mysql entirely and just trying
to modify the mysql files to feed directly into postgres.

To that end, they have to be transformed a bit, which I've written a bit
of script to accomplish.  I'm wondering if there's a way to avoid
creating another 7G of slightly-modified import files before feeding
them to postgres.  Specifically, is there a way to do something like

\copy my_table FROM '`cat my_import_file | my_transform_script`'


The way I have done this is to create a script using Python that follows this flow:

MySQL --> MySQLdb module --> Data transform --> psycopg2 --> Postgres
                            OR
csv --> csv module ------^


In the script you can set up the transactions as you like, per row, batches, or everything in one transaction.


I'm also not sure about this.  Are we both talking about a process that bypasses mysql itself, and transforms its input files on the fly?  I want to write a script that will...

#!/bin/sh
... my script stuff...
... create tables...
psql \copy from (transformed mysql file 1)
psql \copy from (transformed mysql file 2)
psql \copy from (transformed mysql file 3)
... more script stuff...

without re-writing the mysql files, and within one transaction.  I can't tell if your answer was getting at that, or something else.

Ken

 

My 2 goals here are to be somewhat efficient (by not duplicating the
input files), and to keep this all within a transaction.  I could have
the script transform each file separately and pipe it to postgres:

(echo 'copy mytable from stdin...' ; cat my_import_file |
my_transform_script ) | psql

but I'm thinking that there's no way to group those all into a transaction.

Hopefully this makes sense, and any suggestions welcome.  Thanks.

Ken



--
Adrian Klaver
adrian.klaver@gmail.com



--
AGENCY Software  
A data system that puts you in control
(253) 245-3801

Re: Getting Mysql data into Postgres: least painful methods?

From
Adrian Klaver
Date:
On 01/15/2013 04:41 PM, Ken Tanzer wrote:
>     8.4 supports FDW so I will assume you do not have the permissions to
>     create one.
>
>
> Thanks but I'm confused--the doc page you mentioned says the mysql FDW
> isn't supported until 9.1.

My apologies. FDW appeared in 8.4, the extension mechanism which the
mysql FDW and Multicorn use exists only in 9.1+

>
>


--
Adrian Klaver
adrian.klaver@gmail.com