Thread: Database conversion woes...
Hello everyone, I am doing a massive database conversion from MySQL to Postgresql for a company I am working for. This has a few quirks to it that I haven't been able to nail down the answers I need from reading and searching through previous list info. For starters, I am moving roughly 50 seperate databases which each one represents one of our clients and is roughly 500 megs to 3 gigs in size. Currently we are using the MySQL replication, and so I am looking at Mammoths replicator for this one. However I have seen it only allows on DB to be replicated at a time. With the size of each single db, I don't know how I could put them all together under one roof, and if I was going to, what are the maximums that Postgres can handle for tables in one db? We track over 2 million new points of data (records) a day, and are moving to 5 million in the next year. Second what about the physical database size, what are the limits there? I have seen that it was 4 gig on Linux from a 2000 message, but what about now? Have we found way's past that? Thanks in advance, will give more detail - just looking for some open directions and maybe some kicks to fuel my thought in other areas. Thanks, -- kevin@valenfor.com
On Tue, 03 Feb 2004 11:42:59 -0500 "Kevin Carpenter" <kevin@valenfor.com> wrote: > For starters, I am moving roughly 50 seperate databases which each one > represents one of our clients and is roughly 500 megs to 3 gigs in > size. > Currently we are using the MySQL replication, and so I am looking at > Mammoths replicator for this one. However I have seen it only allows > on DB to be replicated at a time. With the size of each single db, I Not knowing too much about mammoths, but how the others work, you should be able to run a replicator for each db. (Or hack a shell script up to make it run the replicator for each db.. either way each db will be replicated independant of the others) > don't know how I could put them all together under one roof, and if I > was going to, what are the maximums that Postgres can handle for > tables in one db? We track over 2 million new points of data > (records) a day, and are moving to 5 million in the next year. > From the docs: Maximum size for a database unlimited (4 TB databases exist) Maximum size for a table 16 TB on all operating systems Maximum size for a row 1.6 TB Maximum size for a field 1 GB Maximum number of rows in a table unlimited Maximum number of columns in a table 250 - 1600 depending on column types Maximum number of indexes on a table unlimited ... My largest PG db is 50GB. My busiest PG db runs about 50 update|delete|insert's / second (sustained throughout the day. It bursts up to 150 now and then). And we're doing about 40 selects / second. And the machine it is running on is typically 95% idle. (Quad 2ghz xeon) -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
On Tue, 3 Feb 2004, Kevin Carpenter wrote: > Hello everyone, > > I am doing a massive database conversion from MySQL to Postgresql for a > company I am working for. This has a few quirks to it that I haven't > been able to nail down the answers I need from reading and searching > through previous list info. > > For starters, I am moving roughly 50 seperate databases which each one > represents one of our clients and is roughly 500 megs to 3 gigs in size. > Currently we are using the MySQL replication, and so I am looking at > Mammoths replicator for this one. However I have seen it only allows on > DB to be replicated at a time. Look into importing all those seperate databases into seperate schemas in one postgresql database. > With the size of each single db, I don't > know how I could put them all together under one roof, There's no functional difference to postgresql if you have 1 huge database or 50 smaller ones that add up to the same size. > and if I was > going to, what are the maximums that Postgres can handle for tables in > one db? None. also see: http://www.postgresql.org/docs/faqs/FAQ.html#4.5 > We track over 2 million new points of data (records) a day, and > are moving to 5 million in the next year. That's quite a bit. Postgresql can handle it. > Second what about the physical database size, what are the limits there? none. > I have seen that it was 4 gig on Linux from a 2000 message, but what > about now? Have we found way's past that? It has never been 4 gig. It was once, a long time ago, 2 gig for a table I believe. That was fixed years ago. > Thanks in advance, will give more detail - just looking for some open > directions and maybe some kicks to fuel my thought in other areas. Import in bulk, either using copy or wrap a few thousand inserts inside begin;end; pairs.
Kevin, > With the size of each single db, I don't > know how I could put them all together under one roof, and if I was > going to, what are the maximums that Postgres can handle for tables in > one db? We track over 2 million new points of data (records) a day, and > are moving to 5 million in the next year. Use schemas per Scott's suggestion. This will also ease the sharing of data between "databases". > Second what about the physical database size, what are the limits there? > I have seen that it was 4 gig on Linux from a 2000 message, but what > about now? Have we found way's past that? The biggest database I've ever worked with was 175G, but I've seen reports of 2TB databases out there. We don't know what the limit is; so far it's always been hardware. > Thanks in advance, will give more detail - just looking for some open > directions and maybe some kicks to fuel my thought in other areas. Come back to this list for help tuning your system! You'll need it, you've got an unusual set-up. -- Josh Berkus Aglio Database Solutions San Francisco
On Tuesday 03 February 2004 16:42, Kevin Carpenter wrote: > > Thanks in advance, will give more detail - just looking for some open > directions and maybe some kicks to fuel my thought in other areas. I've taken to doing a lot of my data manipulation (version conversions etc) in PG even if the final resting place is MySQL. It's generally not too difficult to transfer data but you will have problems with MySQL's more "relaxed attitude" to data types (things like all-zero timestamps). I tend to write a script to tidy the data before export, and repeatedly restore from backup until the script corrects all problems.Not sure how convenient that'll be with dozens of gigs of data. Might be practical to start with the smaller databases, let your script grow in capabilities before importing the larger ones. -- Richard Huxton Archonet Ltd
kevin@valenfor.com ("Kevin Carpenter") writes: > I am doing a massive database conversion from MySQL to Postgresql for a > company I am working for. This has a few quirks to it that I haven't > been able to nail down the answers I need from reading and searching > through previous list info. > > For starters, I am moving roughly 50 seperate databases which each > one represents one of our clients and is roughly 500 megs to 3 gigs > in size. Currently we are using the MySQL replication, and so I am > looking at Mammoths replicator for this one. However I have seen it > only allows on DB to be replicated at a time. With the size of each > single db, I don't know how I could put them all together under one > roof, and if I was going to, what are the maximums that Postgres can > handle for tables in one db? We track over 2 million new points of > data (records) a day, and are moving to 5 million in the next year. I'll be evasive about replication, because the answers are pretty painful :-(, but as for the rest of it, nothing about this sounds challenging. There is room for debate here as to whether you should have: a) One postmaster and many database instances, 2) One postmaster, one (or a few) database instances, and do the client 'split' via schemas, or iii) Several postmasters, many database instances. Replication will tend to work best with scenario 2), which minimizes the number of connections that are needed to manage replication; that's definitely a factor worth considering. It is also possible for it to be worthwhile to spread vastly differing kinds of activity across different backends so that they can have separate buffer caches. If all the activity is shared across one postmaster, that means it is all shared across one buffer cache, and there are pathological situations that are occasionally observed in practice where one process will be "trashing" the shared cache, thereby injuring performance for all other processes using that back end. In such a case, it may be best to give the "ill-behaved" process its own database instance with a small cache that it can thrash on without inconveniencing others. Jan Wieck is working on some improvements for buffer management in 7.5 that may improve the situation vis-a-vis buffering, but that is certainly not something ready to deploy in production just yet. > Second what about the physical database size, what are the limits > there? I have seen that it was 4 gig on Linux from a 2000 message, > but what about now? Have we found way's past that? There's NO problem with having enormous databases now; each table is represented as one or more files (if you break a size barrier, oft configured as 1GB, it creates an "extent" and extends into another file), and for there to be problems with this, the problems would be _really crippling_ OS problems. -- (format nil "~S@~S" "cbbrowne" "cbbrowne.com") http://www3.sympatico.ca/cbbrowne/linuxxian.html "We come to bury DOS, not to praise it." -- Paul Vojta <vojta@math.berkeley.edu>, paraphrasing a quote of Shakespeare
First just wanted to say thank you all for the quick and helpful answers. With all the input I know I am on the right track. With that in mind I created a perl script to do my migrations and to do it based on moving from a db name to a schema name. I had done alot of the reading on converting based on the miss match of data types that MySQL likes to use. I must say it is VERY nice having a intelligent system that say won't let a date of '0000-00-00' be entered. Luckily I didn't have to deal with any enumerations. So the conversion goes on. I will definitely be back and forth in here as I get the new queries written and start migrating all I can back into the pg backend using plpgsql or c for the stored procedures where required. The mammoth replicator has been working well. I had tried the pgsql-r and had limited success with it, and dbmirror was just taking to long having to do 4 db transactions just to mirror one command. I have eserv but was never really a java kind of guy. Alright then - back to my code. Again thanks for the help and info. Kevin