Thread: Database conversion woes...

Database conversion woes...

From
"Kevin Carpenter"
Date:
Hello everyone,

I am doing a massive database conversion from MySQL to Postgresql for a
company I am working for.  This has a few quirks to it that I haven't
been able to nail down the answers I need from reading and searching
through previous list info.

For starters, I am moving roughly 50 seperate databases which each one
represents one of our clients and is roughly 500 megs to 3 gigs in size.
 Currently we are using the MySQL replication, and so I am looking at
Mammoths replicator for this one.  However I have seen it only allows on
DB to be replicated at a time.  With the size of each single db, I don't
know how I could put them all together under one roof, and if I was
going to, what are the maximums that Postgres can handle for tables in
one db?  We track over 2 million new points of data (records) a day, and
are moving to 5 million in the next year.

Second what about the physical database size, what are the limits there?
 I have seen that it was 4 gig on Linux from a 2000 message, but what
about now?  Have we found way's past that?

Thanks in advance, will give more detail - just looking for some open
directions and maybe some kicks to fuel my thought in other areas.

Thanks,

--
kevin@valenfor.com

Re: Database conversion woes...

From
Jeff
Date:
On Tue, 03 Feb 2004 11:42:59 -0500
"Kevin Carpenter" <kevin@valenfor.com> wrote:

> For starters, I am moving roughly 50 seperate databases which each one
> represents one of our clients and is roughly 500 megs to 3 gigs in
> size.
>  Currently we are using the MySQL replication, and so I am looking at
> Mammoths replicator for this one.  However I have seen it only allows
> on DB to be replicated at a time.  With the size of each single db, I

Not knowing too much about mammoths, but how the others work, you should
be able to run a replicator for each db.  (Or hack a shell script up to
make it run the replicator for each db.. either way each db will be
replicated independant of the others)

> don't know how I could put them all together under one roof, and if I
> was going to, what are the maximums that Postgres can handle for
> tables in one db?  We track over 2 million new points of data
> (records) a day, and are moving to 5 million in the next year.
>

From the docs:

Maximum size for a database     unlimited (4 TB databases exist)
Maximum size for a table     16 TB on all operating systems
Maximum size for a row     1.6 TB
Maximum size for a field     1 GB
Maximum number of rows in a table     unlimited
Maximum number of columns in a table     250 - 1600 depending on column
types Maximum number of indexes on a table     unlimited

...

My largest PG db is 50GB.

My busiest PG db runs about 50 update|delete|insert's / second
(sustained throughout the day. It bursts up to 150 now and then).  And
we're doing about 40 selects / second.  And the machine it is running on
is typically 95% idle.  (Quad 2ghz xeon)

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Database conversion woes...

From
"scott.marlowe"
Date:
On Tue, 3 Feb 2004, Kevin Carpenter wrote:

> Hello everyone,
>
> I am doing a massive database conversion from MySQL to Postgresql for a
> company I am working for.  This has a few quirks to it that I haven't
> been able to nail down the answers I need from reading and searching
> through previous list info.
>
> For starters, I am moving roughly 50 seperate databases which each one
> represents one of our clients and is roughly 500 megs to 3 gigs in size.
>  Currently we are using the MySQL replication, and so I am looking at
> Mammoths replicator for this one.  However I have seen it only allows on
> DB to be replicated at a time.

Look into importing all those seperate databases into seperate schemas in
one postgresql database.

> With the size of each single db, I don't
> know how I could put them all together under one roof,

There's no functional difference to postgresql if you have 1 huge database
or 50 smaller ones that add up to the same size.

> and if I was
> going to, what are the maximums that Postgres can handle for tables in
> one db?

None. also see:

http://www.postgresql.org/docs/faqs/FAQ.html#4.5

> We track over 2 million new points of data (records) a day, and
> are moving to 5 million in the next year.

That's quite a bit.  Postgresql can handle it.

> Second what about the physical database size, what are the limits there?

none.

>  I have seen that it was 4 gig on Linux from a 2000 message, but what
> about now?  Have we found way's past that?

It has never been 4 gig.  It was once, a long time ago, 2 gig for a table
I believe.  That was fixed years ago.

> Thanks in advance, will give more detail - just looking for some open
> directions and maybe some kicks to fuel my thought in other areas.

Import in bulk, either using copy or wrap a few thousand inserts inside
begin;end; pairs.


Re: Database conversion woes...

From
Josh Berkus
Date:
Kevin,

>  With the size of each single db, I don't
> know how I could put them all together under one roof, and if I was
> going to, what are the maximums that Postgres can handle for tables in
> one db?  We track over 2 million new points of data (records) a day, and
> are moving to 5 million in the next year.

Use schemas per Scott's suggestion.   This will also ease the sharing of data
between "databases".

> Second what about the physical database size, what are the limits there?
>  I have seen that it was 4 gig on Linux from a 2000 message, but what
> about now?  Have we found way's past that?

The biggest database I've ever worked with was 175G, but I've seen reports of
2TB databases out there.  We don't know what the limit is; so far it's always
been hardware.

> Thanks in advance, will give more detail - just looking for some open
> directions and maybe some kicks to fuel my thought in other areas.

Come back to this list for help tuning your system!   You'll need it, you've
got an unusual set-up.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Database conversion woes...

From
Richard Huxton
Date:
On Tuesday 03 February 2004 16:42, Kevin Carpenter wrote:
>
> Thanks in advance, will give more detail - just looking for some open
> directions and maybe some kicks to fuel my thought in other areas.

I've taken to doing a lot of my data manipulation (version conversions etc) in
PG even if the final resting place is MySQL.

It's generally not too difficult to transfer data but you will have problems
with MySQL's more "relaxed attitude" to data types (things like all-zero
timestamps). I tend to write a script to tidy the data before export, and
repeatedly restore from backup until the script corrects all problems.Not
sure how convenient that'll be with dozens of gigs of data. Might be
practical to start with the smaller databases, let your script grow in
capabilities before importing the larger ones.

--
  Richard Huxton
  Archonet Ltd

Re: Database conversion woes...

From
Christopher Browne
Date:
kevin@valenfor.com ("Kevin Carpenter") writes:
> I am doing a massive database conversion from MySQL to Postgresql for a
> company I am working for.  This has a few quirks to it that I haven't
> been able to nail down the answers I need from reading and searching
> through previous list info.
>
> For starters, I am moving roughly 50 seperate databases which each
> one represents one of our clients and is roughly 500 megs to 3 gigs
> in size.  Currently we are using the MySQL replication, and so I am
> looking at Mammoths replicator for this one.  However I have seen it
> only allows on DB to be replicated at a time.  With the size of each
> single db, I don't know how I could put them all together under one
> roof, and if I was going to, what are the maximums that Postgres can
> handle for tables in one db?  We track over 2 million new points of
> data (records) a day, and are moving to 5 million in the next year.

I'll be evasive about replication, because the answers are pretty
painful :-(, but as for the rest of it, nothing about this sounds
challenging.

There is room for debate here as to whether you should have:
 a) One postmaster and many database instances,
 2) One postmaster, one (or a few) database instances, and do the
    client 'split' via schemas, or
 iii) Several postmasters, many database instances.

Replication will tend to work best with scenario 2), which minimizes
the number of connections that are needed to manage replication;
that's definitely a factor worth considering.

It is also possible for it to be worthwhile to spread vastly differing
kinds of activity across different backends so that they can have
separate buffer caches.  If all the activity is shared across one
postmaster, that means it is all shared across one buffer cache, and
there are pathological situations that are occasionally observed in
practice where one process will be "trashing" the shared cache,
thereby injuring performance for all other processes using that back
end.  In such a case, it may be best to give the "ill-behaved" process
its own database instance with a small cache that it can thrash on
without inconveniencing others.

Jan Wieck is working on some improvements for buffer management in 7.5
that may improve the situation vis-a-vis buffering, but that is
certainly not something ready to deploy in production just yet.

> Second what about the physical database size, what are the limits
> there?  I have seen that it was 4 gig on Linux from a 2000 message,
> but what about now?  Have we found way's past that?

There's NO problem with having enormous databases now; each table is
represented as one or more files (if you break a size barrier, oft
configured as 1GB, it creates an "extent" and extends into another
file), and for there to be problems with this, the problems would be
_really crippling_ OS problems.
--
(format nil "~S@~S" "cbbrowne" "cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/linuxxian.html
"We come to bury DOS, not to praise it."
-- Paul Vojta <vojta@math.berkeley.edu>, paraphrasing a quote of
Shakespeare

Re: Database conversion woes...

From
Kevin Carpenter
Date:
First just wanted to say thank you all for the quick and helpful
answers.  With all the input I know I am on the right track.  With that
in mind I created a perl script to do my migrations and to do it based
on moving from a db name to a schema name.  I had done alot of the
reading on converting based on the miss match of data types that MySQL
likes to use.  I must say it is VERY nice having a intelligent system
that say won't let a date of '0000-00-00' be entered.  Luckily I didn't
have to deal with any enumerations.

So the conversion goes on.  I will definitely be back and forth in here
as I get the new queries written and start migrating all I can back into
the pg backend using plpgsql or c for the stored procedures where
required.  The mammoth replicator has been working well.  I had tried
the pgsql-r and had limited success with it, and dbmirror was just
taking to long having to do 4 db transactions just to mirror one
command.  I have eserv but was never really a java kind of guy.

Alright then - back to my code.  Again thanks for the help and info.

Kevin