Thread: MySQL -> Postgres migration tools?
Hello, Given that the EU has approved Oracle's acquisition of Sun w/ MySQL it's fairly likely that there may be a number of peopleand companies looking to move from MySQL to Postgres in the coming months. Does anyone know of any good, current migrationtools out there? A search for mysql on pgfoundry only turns up a few projects that haven't seen any recent updates... Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
Hi, I googled for 'migrate mysql to postgresql' the first 5 results seem very useful. The most useful link I found was http://www.raditha.com/mysql/mysql2pgsql.php HTH Regards, Serge Fonville On Fri, Jan 22, 2010 at 8:15 PM, Erik Jones <ejones@engineyard.com> wrote: > Hello, > > Given that the EU has approved Oracle's acquisition of Sun w/ MySQL it's fairly likely that there may be a number of peopleand companies looking to move from MySQL to Postgres in the coming months. Does anyone know of any good, current migrationtools out there? A search for mysql on pgfoundry only turns up a few projects that haven't seen any recent updates... > > Erik Jones, Database Administrator > Engine Yard > Support, Scalability, Reliability > 866.518.9273 x 260 > Location: US/Pacific > IRC: mage2k > > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- http://www.sergefonville.nl Convince Google!! They need to support Adsense over SSL https://www.google.com/adsense/support/bin/answer.py?hl=en&answer=10528 http://www.google.com/support/forum/p/AdSense/thread?tid=1884bc9310d9f923&hl=en
On Fri, Jan 22, 2010 at 7:15 PM, Erik Jones <ejones@engineyard.com> wrote: > Hello, > > Given that the EU has approved Oracle's acquisition of Sun w/ MySQL it's fairly likely that there may be a number of peopleand companies looking to move from MySQL to Postgres in the coming months. Does anyone know of any good, current migrationtools out there? A search for mysql on pgfoundry only turns up a few projects that haven't seen any recent updates... > The problem there is not that there are no such tools, but that the database might need a bit of refinement when you move away from such simple database engine as MySQL. Most often, either due to lack of knowledge, or sometimes because you are forced to - the mysql database won't utilise sometimes even basic features of postgresql (sequences would be one grand example). On app side, queries would need refinement. Some of the code can be moved away from app, and into plpgsql. Tables can be normalised better, since you can successfully run bit more complicated queries on database side. Etc. So to summarise, it is two stage process really. And the first stage - migration, should be fallowed by the database and code refinement before even the first 'after migration' release. And I believe that there is numerous sources out there to help people with that. Plus, we are always trying to be helpful on the list. -- GJ
On 23/01/2010 3:31 PM, Grzegorz Jaśkiewicz wrote: > On Fri, Jan 22, 2010 at 7:15 PM, Erik Jones<ejones@engineyard.com> wrote: >> Hello, >> >> Given that the EU has approved Oracle's acquisition of Sun w/ MySQL it's fairly likely that there may be a number of peopleand companies looking to move from MySQL to Postgres in the coming months. Does anyone know of any good, current migrationtools out there? A search for mysql on pgfoundry only turns up a few projects that haven't seen any recent updates... >> > > The problem there is not that there are no such tools, but that the > database might need a bit of refinement when you move away from such > simple database engine as MySQL. Most often, either due to lack of > knowledge, or sometimes because you are forced to - the mysql database > won't utilise sometimes even basic features of postgresql (sequences > would be one grand example). An InnoDB `AUTO_INCREMENT' column under MySQL 5.x is pretty similar to a sequence. I increasingly think it's pretty safe to just 's/AUTO_INCREMENT/SERIAL/g' in DDL. Apps that use MyISAM aren't going to be fussy about how it works, and apps that use InnoDB get Pg-like behaviour anyway. Modern (5.x-something-late) MySQL with the InnoDB storage engine is a very different beast from MySQL 3.x/4.x or MySQL 5.x with MyISAM. It still has some scary data-conversion habits and you need to turn on several different kinds of strict modes and behave-standard-ways modes to get it to play nice, but if you have and you're moving over to Pg you shouldn't actually have too rough a time. ( Unless you use some of the MySQL features, like its upsert command REPLACE or the INSERT ... ON DUPLICATE KEY UPDATE ..., that just have no equivalent in Pg. Or you're relying on scary MySQL-isms like 0000-00-00 dates. ). > So to summarise, it is two stage process really. And the first stage - > migration, should be fallowed by the database and code refinement > before even the first 'after migration' release. And I believe that > there is numerous sources out there to help people with that. Plus, we > are always trying to be helpful on the list. That I tend to agree with. It's not just about converting your data and schema. You *will* need to adjust apps with any non-trivial queries. Even if you can hack it to work, you can often learn things as part of the porting process that can improve your results on *both* databases. -- Craig Ringer
2010/1/23 Craig Ringer <craig@postnewspapers.com.au>: > > An InnoDB `AUTO_INCREMENT' column under MySQL 5.x is pretty similar to a > sequence. > > I increasingly think it's pretty safe to just > 's/AUTO_INCREMENT/SERIAL/g' > in DDL. Apps that use MyISAM aren't going to be fussy about how it works, > and apps that use InnoDB get Pg-like behaviour anyway. Yes, but I have seen it many times, where people needed a sequence that drives more than just one column/table. Can't do it with the simplistic mysql approach. > Modern (5.x-something-late) MySQL with the InnoDB storage engine is a very > different beast from MySQL 3.x/4.x or MySQL 5.x with MyISAM. It still has > some scary data-conversion habits and you need to turn on several different > kinds of strict modes and behave-standard-ways modes to get it to play nice, > but if you have and you're moving over to Pg you shouldn't actually have too > rough a time. > > ( Unless you use some of the MySQL features, like its upsert command REPLACE > or the INSERT ... ON DUPLICATE KEY UPDATE ..., that just have no equivalent > in Pg. Or you're relying on scary MySQL-isms like 0000-00-00 dates. ). this is easily done with triggers, and again is more powerful. MySQL was designed to be easy to grasp for someone with limited time/skill in SQL. And as everything that is meant to be simple, it fails quickly if you need to use it for something more realistic/complete/insert your word here. Lets be honest, people use mysql with the default backend usually, because innodb is just too slow. I dumped mysql long time ago, but I still hear from people that it suffers from same problems. Hence, I still recommend people to switch to postgresql. Btw, if you want to help, please ask as many hosting places as you can about postgresql support, mentioning that you would probably buy it from them, if they had postgresql for the same price as mysql (read, for free, included in cheapest package usually). This is so far the biggest problem to overcome, and the major reason people choose mysql in first place. -- GJ
On Fri, 2010-01-22 at 11:15 -0800, Erik Jones wrote: > Does anyone know of any good, current migration tools out there? There is an open source tool developed by EnterpriseDB: MigrationWizard http://www.enterprisedb.com/openDownloads.do?productId=407&redirectReason=true&productVersion=otherDownload If you are using RHEL/CentOS 5 or Fedora 11+, you may find RPM version from: http://yum.pgsqlrpms.org/8.4/redhat/rhel-5Server-i386/repoview/migrationwizard.html http://yum.pgsqlrpms.org/8.4/fedora/fedora-11-i386/repoview/migrationwizard.html http://yum.pgsqlrpms.org/8.4/fedora/fedora-12-i386/repoview/migrationwizard.html Regards, -- Devrim GÜNDÜZ, RHCE Command Prompt - http://www.CommandPrompt.com devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Attachment
Hello, I have a software that uses Posgtres 8.4.2 on Windows. I have a database with data splitted into schemas, so that every schema replicates the same set of tables. One of the table is called "code": it has 16 columns, almos all numerics except for a carachtervarying(1024) and two text fields. It holds usually a few thousands record at most, then the file size of the table is usually around few hundred kbytes. In only one case so far, the "code" table with 442 record has a size of 18MB. If I run an vacuum full and a reindex it shrinks to less than 100KB. If I use the software to delete the rows and reinsert the same records it explodes again to 18MB. I have backed up the table (from a WIndows7 instance running on a virtual machine) and recovered it on another database (running on a"real" Windows Vista) and tried the same things with the same results. Then I have dropped the table, recreated it and reinserted the records: the anomaly has disappeared. I really don't know what to look for. I cannot ignore the problem because schemas are created by the software on software users request and it could reappear in any schema. I really don't have a clue: I would be happy to further study the problem but I don't know in which direction I have to go. Can someone point me in some (hopefully good) direction? -- ================================================== dott. Ivano Mario Luberti Archimede Informatica societa' cooperativa a r. l. Sede Operativa Via Gereschi 36 - 56126- Pisa tel.: +39-050- 580959 tel/fax: +39-050-9711344 web: www.archicoop.it ==================================================
Sorry to post this again, but I have seen no response at all and this is strange on this list. Maybe I have not properly submitted my question ? I wish also to add another parameter: the size problem is usually associated with the following log messages: 2010-02-02 00:00:14 GMTLOG: checkpoints are occurring too frequently (15 seconds apart) 2010-02-02 00:00:14 GMTHINT: Consider increasing the configuration parameter "checkpoint_segments". Where the number of seconds apart of course changes --------------------- Hello, I have a software that uses Posgtres 8.4.2 on Windows. I have a database with data splitted into schemas, so that every schema replicates the same set of tables. One of the table is called "code": it has 16 columns, almos all numerics except for a carachtervarying(1024) and two text fields. It holds usually a few thousands record at most, then the file size of the table is usually around few hundred kbytes. In only one case so far, the "code" table with 442 record has a size of 18MB. If I run an vacuum full and a reindex it shrinks to less than 100KB. If I use the software to delete the rows and reinsert the same records it explodes again to 18MB. I have backed up the table (from a WIndows7 instance running on a virtual machine) and recovered it on another database (running on a"real" Windows Vista) and tried the same things with the same results. Then I have dropped the table, recreated it and reinserted the records: the anomaly has disappeared. I really don't know what to look for. I cannot ignore the problem because schemas are created by the software on software users request and it could reappear in any schema. I really don't have a clue: I would be happy to further study the problem but I don't know in which direction I have to go. Can someone point me in some (hopefully good) direction? -- ================================================== dott. Ivano Mario Luberti Archimede Informatica societa' cooperativa a r. l. Sede Operativa Via Gereschi 36 - 56126- Pisa tel.: +39-050- 580959 tel/fax: +39-050-9711344 web: www.archicoop.it ==================================================
On 02/02/10 14:46, Ivano Luberti wrote: > Sorry to post this again, but I have seen no response at all and this is > strange on this list. > Maybe I have not properly submitted my question ? You've replied to an existing question, which means your message is hidden in amidst the replies to that. > I wish also to add another parameter: the size problem is usually > associated with the following log messages: > > 2010-02-02 00:00:14 GMTLOG: checkpoints are occurring too frequently > (15 seconds apart) > 2010-02-02 00:00:14 GMTHINT: Consider increasing the configuration > parameter "checkpoint_segments". > > Where the number of seconds apart of course changes Not directly related, although you might want to do as it says. > > > Hello, I have a software that uses Posgtres 8.4.2 on Windows. > I have a database with data splitted into schemas, so that every schema > replicates the same set of tables. > One of the table is called "code": it has 16 columns, almos all numerics > except for a carachtervarying(1024) and two text fields. It holds > usually a few thousands record at most, then the file size of the table > is usually around few hundred kbytes. > > In only one case so far, the "code" table with 442 record has a size of > 18MB. If I run an vacuum full and a reindex it shrinks to less than 100KB. > If I use the software to delete the rows and reinsert the same records > it explodes again to 18MB. That suggests the autovacuum system isn't checking the table often enough. Or, perhaps that you have a long-lived transaction that is preventing it from reclaiming space. Autovacuum is disussed at the bottom of this page: http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html The "storage parameters" link has details on setting vacuum parameters for a single table. If your application is sat there holding open a transaction without doing anything stop doing that. It means the system can't be sure it's safe to reclaim the space used by old versions of rows. -- Richard Huxton Archonet Ltd
Richard Huxton ha scritto: > On 02/02/10 14:46, Ivano Luberti wrote: >> Sorry to post this again, but I have seen no response at all and this is >> strange on this list. >> Maybe I have not properly submitted my question ? > > You've replied to an existing question, which means your message is > hidden in amidst the replies to that. > I don't want to bore the list with this but I don't understand. I have posted a new message to the list. I didn't reply to anything......or al least I didn't mean to do that. > >> In only one case so far, the "code" table with 442 record has a size of >> 18MB. If I run an vacuum full and a reindex it shrinks to less than >> 100KB. >> If I use the software to delete the rows and reinsert the same records >> it explodes again to 18MB. > > That suggests the autovacuum system isn't checking the table often > enough. Or, perhaps that you have a long-lived transaction that is > preventing it from reclaiming space. > > Autovacuum is disussed at the bottom of this page: > http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html > The "storage parameters" link has details on setting vacuum parameters > for a single table. > > If your application is sat there holding open a transaction without > doing anything stop doing that. It means the system can't be sure it's > safe to reclaim the space used by old versions of rows. > No the application is doing what is supposed to do: inserting records. But when the size of the table is so great insert become really slow, so indeed autovacuum has been canceled a few times. Moreover when autovacuum runs and the application is idle he is able to run but not able to claim space. Then if I run vacuum manually with full, freeze and analyze checked and also I run reindex everything return to normality. What really worries and puzzles me is the size of the table is not coherent with other copies of the same table with similar records number. -- ================================================== dott. Ivano Mario Luberti Archimede Informatica societa' cooperativa a r. l. Sede Operativa Via Gereschi 36 - 56126- Pisa tel.: +39-050- 580959 tel/fax: +39-050-9711344 web: www.archicoop.it ==================================================
In response to Ivano Luberti <luberti@archicoop.it>: > > > >> In only one case so far, the "code" table with 442 record has a size of > >> 18MB. If I run an vacuum full and a reindex it shrinks to less than > >> 100KB. > >> If I use the software to delete the rows and reinsert the same records > >> it explodes again to 18MB. > > > > That suggests the autovacuum system isn't checking the table often > > enough. Or, perhaps that you have a long-lived transaction that is > > preventing it from reclaiming space. > > > > Autovacuum is disussed at the bottom of this page: > > http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html > > The "storage parameters" link has details on setting vacuum parameters > > for a single table. > > > > If your application is sat there holding open a transaction without > > doing anything stop doing that. It means the system can't be sure it's > > safe to reclaim the space used by old versions of rows. > > > No the application is doing what is supposed to do: inserting records. > But when the size of the table is so great insert become really slow, so > indeed autovacuum has been canceled a few times. This is another strong indicator that your autovacuum is not configured properly. You should probably make the settings more aggressive. > Moreover when autovacuum runs and the application is idle he is able to > run but not able to claim space. I don't understand what that comment means. > Then if I run vacuum manually with full, freeze and analyze checked and > also I run reindex everything return to normality. Again, this indicates that autovacuum is not configured to do what you want it to do. > What really worries and puzzles me is the size of the table is not > coherent with other copies of the same table with similar records number. Have you read the section on how MVCC and autovacuum work? The size of the table is not a function of the # of records. It's the # of records, plus the number of "dead rows" (which are records that have been deleted or updated and thus replaced with new rows). If this table sees frequent updates, then it will _NEVER_ be the size that you expect it to be if you just multiple #rows * size of row, because it will _always_ have some dead rows in the table. This is OK, it's how the system is designed to work, and frequent VACUUM FULL is just wasting time as the table will just enlarge again. The key is that autovacuum runs often enough that the size stabalizes based on usage. For example, if you have a table that usually has 500 rows in it and it's common for all the rows to be deleted and replaced, then you can expect the table to _always_ be big enough to hold 1000 rows. However, if all the rows are updated 5 times between each vacuum run, the table will be big enough to hold 2500 rows most of the time. Again, this is typical, it's how the system is designed to run. I'm guessing (although a lot of the original email has been trimmed) that the actual problem you're having is that autovacuum is taking too long, and is slowing down modifications to table data. If autovacuum is taking too long and is slowing down other operations, you have a few options (you can do one or many of these): * Run autovacuum more frequently. * Modify your application so it modifies less rows (yes, sometimes this is possible and the best solution. The fact that the table blows up to 180x the minimum size is a hint that you may be updating very inefficiently.) * Tune PostgreSQL to make more efficient use of RAM for caching (which will speed everything up) * Get faster hardware * Upgrade to a newer version of PostgreSQL that has more efficient vacuum code (if you mentioned which version you are using, it was lost when the message was trimmed) -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Ok I definitely food for thought and that is what I was searching for, as stated in my first message. Thanks to Richard and Bill for that. But I have a few things that still I don't understand and I think I had not outlined enough. > > >> What really worries and puzzles me is the size of the table is not >> coherent with other copies of the same table with similar records number. >> > > Have you read the section on how MVCC and autovacuum work? The size of > the table is not a function of the # of records. It's the # of records, > plus the number of "dead rows" (which are records that have been deleted > or updated and thus replaced with new rows). > > I was not assuming the proportion between size and number of records. I was simply comparing the size of the table that troubles me with the size of other identical tables that have similar usage but are in other schemas inside the same DB. > The key is that autovacuum runs often enough that the size stabalizes > based on usage. For example, if you have a table that usually has 500 > rows in it and it's common for all the rows to be deleted and replaced, > then you can expect the table to _always_ be big enough to hold 1000 > rows. However, if all the rows are updated 5 times between each vacuum > run, the table will be big enough to hold 2500 rows most of the time. > Again, this is typical, it's how the system is designed to run. > In this table and all her sisters in the other schemas, records are only inserted and deleted. No update > I'm guessing (although a lot of the original email has been trimmed) that > the actual problem you're having is that autovacuum is taking too long, > and is slowing down modifications to table data. > > This is what I was thinking , but today I was able to look at the processes running while a client was doing a bunc of inserts. There was no autovacuum running and every insert was taking many seconds to e executed. Moreover every insert seemed to increase a lot the size of the table. Much more than the size of a record and the related index Then after the insert autovacuum started and it took a lot of time to complete. So the following suggestions certainly have interest to me. By the way we are running 8.4.2 on Windows 2003 server, but the same problem ha occurred on Windows (S)vista and Windows 7. So no chance to upgrade :-). > If autovacuum is taking too long and is slowing down other operations, > you have a few options (you can do one or many of these): > * Run autovacuum more frequently. > * Modify your application so it modifies less rows (yes, sometimes this > is possible and the best solution. The fact that the table blows > up to 180x the minimum size is a hint that you may be updating > very inefficiently.) > * Tune PostgreSQL to make more efficient use of RAM for caching (which > will speed everything up) > * Get faster hardware > * Upgrade to a newer version of PostgreSQL that has more efficient > vacuum code (if you mentioned which version you are using, it was > lost when the message was trimmed) > > -- ================================================== dott. Ivano Mario Luberti Archimede Informatica societa' cooperativa a r. l. Sede Operativa Via Gereschi 36 - 56126- Pisa tel.: +39-050- 580959 tel/fax: +39-050-9711344 web: www.archicoop.it ==================================================
On Tue, Feb 2, 2010 at 1:45 PM, Ivano Luberti <luberti@archicoop.it> wrote: > In this table and all her sisters in the other schemas, records are only > inserted and deleted. No update same diff. In pgsql an update equals a delete and an insert. deleted rows use up space just like former versions from an update. If you're deleting a lot of rows then you'll have a lot of bloat.
Ivano Luberti wrote: > This is what I was thinking , but today I was able to look at the > processes running while a client was doing a bunc of inserts. There was > no autovacuum running and every insert was taking many seconds to e > executed. > Have you done any basic tuning of the database parameters? From your earlier message, it sounds like you might have checkpoint_segments at its default, which can contribute to these inserts taking so long. See http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for a guide to where to start. Also, if you want to get a better idea what's actually going on with your data, you should be looking at "select * from pg_stat_user_tables" ; that will give you information about things like how many dead rows there are in the table, when autovacuum last did some work, etc. Should be possible to figure out what's different about the use pattern of this table compared to the ones you suggest work as expected by analyzing that data. P.S. To clean up from one of these messes you might try CLUSTER instead of VACUUM FULL. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us