Thread: No flamefest please, MySQL vs. PostgreSQL AGAIN

No flamefest please, MySQL vs. PostgreSQL AGAIN

From
timeless postgres
Date:
I've just accepted a position at a company where MySQL is the database
of choice. They are running ~sixty MySQL instances, and I am beginning
to learn MySQL. My general inclination, being as I've been using
PostgreSQL for a number of years, is to recommend switching to
PostgreSQL (and they are at a point where switching is a viable option),
but I don't want to start recommending if:
 1. MySQL really is best for them, or
 2. PostgreSQL is better, but don't know why.
For whatever reason, the various MSvs.PG comparisons I've found have
fallen quite short in accuracy, timeliness, or objectiveness. I've made
my own updateable version at:
  http://faemalia.org/wiki/view/Technical/PostgreSQLvsMySQL.
I wouldn't mind having calm, rational individuals go change that page
all around. It is flawed and incomplete, but is read/write.

I want to have a (flame-free as possible) discussion of relative merits
to both systems. I know I'm not the only one interested in these
comparisons.

The key issues I wonder about are:

1. Replication -- Supposedly Postgres-R was to be merged into 7.2?
   Did this happen? Is the pgsql.com offering still the only game
   in town? (pgsql.com was down at the time I wrote this)
2. Read/write backups -- Supposedly MySQL locks tables into read-only
   mode while backing them up?
3. Non-logged bulk inserts -- How much logging does COPY table FROM
   do? Is it comparable to a MySQL MyISAM table?
4. Point-in-time recovery -- Was this supposed to go into CVS sometime
   this month?

--
Tim Ellis
Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org)
If this helped you, http://svcs.affero.net/rm.php?r=philovivero


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Tom Lane
Date:
timeless postgres <pvspam-postgres@hacklab.net> writes:
> 1. Replication -- Supposedly Postgres-R was to be merged into 7.2?
>    Did this happen? Is the pgsql.com offering still the only game
>    in town? (pgsql.com was down at the time I wrote this)

Postgres-R hasn't been merged, and I see no prospect that it will appear
in 7.4 either.  Possibly 7.5.  In the meantime, third-party solutions
are still your only option, and PostgreSQL Inc's one is probably the
best.

> 2. Read/write backups -- Supposedly MySQL locks tables into read-only
>    mode while backing them up?

You'd have to ask them.  Ours doesn't though.

> 3. Non-logged bulk inserts -- How much logging does COPY table FROM
>    do? Is it comparable to a MySQL MyISAM table?

I cannot imagine why you'd want non-logged inserts, unless you don't
actually care about your data.  It should be noted though that as of
7.3, operations on TEMP tables don't do WAL logging; perhaps that
would be of use to you.

> 4. Point-in-time recovery -- Was this supposed to go into CVS sometime
>    this month?

It might be there in 7.4 ... it ain't there today ...

            regards, tom lane


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Robert Treat
Date:
On Mon, 2003-05-12 at 10:32, Tom Lane wrote:
> timeless postgres <pvspam-postgres@hacklab.net> writes:
> > 1. Replication -- Supposedly Postgres-R was to be merged into 7.2?
> >    Did this happen? Is the pgsql.com offering still the only game
> >    in town? (pgsql.com was down at the time I wrote this)
>
> Postgres-R hasn't been merged, and I see no prospect that it will appear
> in 7.4 either.  Possibly 7.5.  In the meantime, third-party solutions
> are still your only option, and PostgreSQL Inc's one is probably the
> best.

I wouldn't say they are your only options. there is the rserv code in
contrib which I've seen people post they have gotten working. There is
also the usogres stuff that I have heard of a few people using. While
none of these are considered "ready for prime time" by the core group, I
don't think they should be ignored. If more people tried using them and
submitted some patches, we might get a solid replication solution that
much sooner.


I also feel I should point out that in a lot of the cases I have seen
mysql replication used because they couldn't get a single mysql instance
to scale up enough.  Given that postgresql scales so well, it cuts down
on the need to have a replication solution, which is probably part of
the reason why we have gone so long without one.

Robert Treat


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Naomi Walker
Date:
>.
>
>I also feel I should point out that in a lot of the cases I have seen
>mysql replication used because they couldn't get a single mysql instance
>to scale up enough.  Given that postgresql scales so well, it cuts down
>on the need to have a replication solution, which is probably part of
>the reason why we have gone so long without one.

We would be interested in replication, so reporting could be done against a
different server than production.


>Robert Treat
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo@postgresql.org so that your
>message can get through to the mailing list cleanly


-------------------------------------------------------------------------------------------------------------------------
Naomi Walker                         Chief Information Officer
                                               Eldorado Computing, Inc.
nwalker@eldocomp.com           602-604-3100

-------------------------------------------------------------------------------------------------------------------------
Don't throw away the old bucket until you know whether the new one holds water.
Swedish Proverb

-------------------------------------------------------------------------------------------------------------------------



Eldorado Test Disclaimer...please ignore.


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
"David F. Skoll"
Date:
On Mon, 12 May 2003, Naomi Walker wrote:

> We would be interested in replication, so reporting could be done against a
> different server than production.

And I'm interested in replication for failover purposes.  Automatic
hot-failover isn't really required for my application, but a "warm"
failover that can have a mostly-up-to-date data set and be activated
within a few minutes would be very nice.

--
David.


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
timeless postgres
Date:
> > 3. Non-logged bulk inserts -- How much logging does COPY table FROM
> >    do? Is it comparable to a MySQL MyISAM table?
> I cannot imagine why you'd want non-logged inserts, unless you don't
> actually care about your data.

More or less, we don't. If we try to insert 1,000 entries, and our table
has now 1,000 entries more, then we're satisfied. If our database goes
down, we simply need to re-load. It's a data warehouse, after all, and
its source is the L bit of the ETL (extract/transform/load) process. All
the data we truly care about transactions for are on the OLTP databases.

To wit: atomicity? We don't care. Consistency? We don't care. Isolation?
We don't care. Durability? We care, but if it goes dead-mode before
buffers get flushed to disk, we've got all the data ready to load again.

Maybe I don't understand how to ask the question. I want to know how to
insert (say) 10M rows into a table quickly. I mean... VERY quickly.

Obviously the following 10M transactions are going to be slow:

        insert into tab values (1);
        insert into tab values (2);
        insert into tab values (3);
         . . .
        insert into tab values (10000000);

Would it be faster if I put a single transaction around that? Would it
be faster to do the following?

        copy tab (col) from stdin;
        1
        2
        3
         . . .
        10000000
        \.

> It should be noted though that as of
> 7.3, operations on TEMP tables don't do WAL logging; perhaps that
> would be of use to you.

It does sound useful, on new tables, to load into a temp table, then
rename the temp table as permanent table.

> [point-in-time recovery] might be there in 7.4 ... it ain't there today

Would it be better to hang out on hackers to find out about this?
Looking at archives, I see messages all the way back to July of last
year talking about it. I only recall seeing one message (from Bruce)
about it on this list.

--
Tim Ellis
Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org)
If this helped you, http://svcs.affero.net/rm.php?r=philovivero


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Oliver Elphick
Date:
On Tue, 2003-05-13 at 00:02, timeless postgres wrote:
...
> Maybe I don't understand how to ask the question. I want to know how to
> insert (say) 10M rows into a table quickly. I mean... VERY quickly.
>
> Obviously the following 10M transactions are going to be slow:
>
>         insert into tab values (1);
>         insert into tab values (2);
>         insert into tab values (3);
>          . . .
>         insert into tab values (10000000);
>
> Would it be faster if I put a single transaction around that?

10M transactions are going to be slow. 10M inserts in one transaction or
in a few transactions will be a lot faster.

> Would it
> be faster to do the following?
>
>         copy tab (col) from stdin;
>         1
>         2
>         3
>          . . .
>         10000000
>         \.

Yes, that will be the fastest of all.

> > It should be noted though that as of
> > 7.3, operations on TEMP tables don't do WAL logging; perhaps that
> > would be of use to you.
>
> It does sound useful, on new tables, to load into a temp table, then
> rename the temp table as permanent table.

I don't think that is possible.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                             http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "Watch ye and pray, lest ye enter into temptation. The
      spirit truly is ready, but the flesh is weak."
                           Mark 14:38


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Andrew Sullivan
Date:
On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote:
> On Mon, 2003-05-12 at 10:32, Tom Lane wrote:

> > in 7.4 either.  Possibly 7.5.  In the meantime, third-party solutions
> > are still your only option, and PostgreSQL Inc's one is probably the
> > best.
>
> I wouldn't say they are your only options. there is the rserv code in
> contrib which I've seen people post they have gotten working. There is

I think what Tom was saying is that the PostgreSQL Inc version is
production-ready-ish right now.  It's sort of expensive, and it's a
pain in the neck to administer (and has some real annoying behaviour
under a couple of conditions), but if you're dealing with any volume,
it's what you should use today.

That isn't to say you should use it forever.  There was some mighty
interesting work being done on dbmirror (several related questions
showed up on -hackers), and if you want to be sure that you replay
_every_ transaction to your slave, I gather it's the only way to go.

The contrib/rserv code does indeed work for some people, and it is
useful.  It is nowhere close to handling large volumes, but for fewer
than a few thousand writes an hour, it seems to be good.

I haven't tried the other systems that are out there.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Andrew Sullivan
Date:
On Mon, May 12, 2003 at 04:02:42PM -0700, timeless postgres wrote:
> Maybe I don't understand how to ask the question. I want to know how to
> insert (say) 10M rows into a table quickly. I mean... VERY quickly.

You probably want a different word from "insert", since it has a
specific meaning.

> Would it be faster if I put a single transaction around that? Would it
> be faster to do the following?

Yes, you want COPY.  It's the fastest.

> > [point-in-time recovery] might be there in 7.4 ... it ain't there today
>
> Would it be better to hang out on hackers to find out about this?

Probably.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Murthy Kambhampaty
Date:
>-----Original Message-----
>From: David F. Skoll [mailto:dfs@roaringpenguin.com]
>Sent: Monday, May 12, 2003 16:08
>To: pgsql-admin@postgresql.org
>Subject: Re: [ADMIN] No flamefest please, MySQL vs. PostgreSQL AGAIN
>
>
>On Mon, 12 May 2003, Naomi Walker wrote:
>
>> We would be interested in replication, so reporting could be
>done against a
>> different server than production.
>
>And I'm interested in replication for failover purposes.  Automatic
>hot-failover isn't really required for my application, but a "warm"
>failover that can have a mostly-up-to-date data set and be activated
>within a few minutes would be very nice.

I think you'll find the discussion on "LVM snapshots"
http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=LVM+snapshots&q=b
relevant. If you don't need the log roll-forward coming with postgresql
PITR, and your reporting function is run infrequently, you can get "really
lazy replication" with rsync, LVM (or EVMS or hardware) snapshots, and a
journaling filesystem (an example script is included in the discussion).
This strategy has eliminated a big problem we had with postgresql dumps
taking too long on large datasets, but it also facilitates the applications
David and Naomi are interested in, under the right circumstances.

    Murthy

(With XFS as the underlying filesystem, snapshot creation sometimes gets
stuck in D state, and so does the $PGDATA filesystem. I have an updated
example script with a workaround for this, which I will post in a day or
two; an updated version of XFS with a permanent fix is in testing.)


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Will LaShell
Date:
On Tue, 2003-05-13 at 05:08, Andrew Sullivan wrote:
> On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote:
> > On Mon, 2003-05-12 at 10:32, Tom Lane wrote:
>
> > > in 7.4 either.  Possibly 7.5.  In the meantime, third-party solutions
> > > are still your only option, and PostgreSQL Inc's one is probably the
> > > best.
> >
> > I wouldn't say they are your only options. there is the rserv code in
> > contrib which I've seen people post they have gotten working. There is
>
> I think what Tom was saying is that the PostgreSQL Inc version is
> production-ready-ish right now.  It's sort of expensive, and it's a
> pain in the neck to administer (and has some real annoying behaviour
> under a couple of conditions), but if you're dealing with any volume,
> it's what you should use today.
>
> That isn't to say you should use it forever.  There was some mighty
> interesting work being done on dbmirror (several related questions
> showed up on -hackers), and if you want to be sure that you replay
> _every_ transaction to your slave, I gather it's the only way to go.
>
> The contrib/rserv code does indeed work for some people, and it is
> useful.  It is nowhere close to handling large volumes, but for fewer
> than a few thousand writes an hour, it seems to be good.

I'd like to put my two cents in on this one.  We use rserv on our
cluster here, and we do a few hundred writes every 2 minutes. The
biggest thing that will cause slowdowns with rserv is not indexing the
replication id field.  If there is an index for that it should work
fine.

I do have some rudimentary documentation on how we did it all, and I
suppose I should really get that sent in.

Sincerely,

Will


> I haven't tried the other systems that are out there.
>
> A
>
> --
> ----
> Andrew Sullivan                         204-4141 Yonge Street
> Liberty RMS                           Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M2P 2A8
>                                          +1 416 646 3304 x110


Attachment

Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Michael A Nachbaur
Date:
On Tuesday 13 May 2003 11:36 am, Will LaShell wrote:
> On Tue, 2003-05-13 at 05:08, Andrew Sullivan wrote:
> > On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote:
> > > On Mon, 2003-05-12 at 10:32, Tom Lane wrote:
> > > > in 7.4 either.  Possibly 7.5.  In the meantime, third-party solutions
> > > > are still your only option, and PostgreSQL Inc's one is probably the
> > > > best.
> > > I wouldn't say they are your only options. there is the rserv code in
> > > contrib which I've seen people post they have gotten working. There is
> > The contrib/rserv code does indeed work for some people, and it is
> > useful.  It is nowhere close to handling large volumes, but for fewer
> > than a few thousand writes an hour, it seems to be good.
>
> I'd like to put my two cents in on this one.  We use rserv on our
> cluster here, and we do a few hundred writes every 2 minutes. The
> biggest thing that will cause slowdowns with rserv is not indexing the
> replication id field.  If there is an index for that it should work
> fine.

I tried rserv with a database that has over 1000 inserts per minute, and it
would just sit there for days at the "Preparing snapshot" (on a
Dual-Xeon/2GHz).  I hadn't tried indexing the id column though.

> I do have some rudimentary documentation on how we did it all, and I
> suppose I should really get that sent in.

Yes, Please, the documentation out there could definitely be improved with
some other case studies.


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Will LaShell
Date:
On Tue, 2003-05-13 at 12:32, Michael A Nachbaur wrote:
> On Tuesday 13 May 2003 11:36 am, Will LaShell wrote:
> > On Tue, 2003-05-13 at 05:08, Andrew Sullivan wrote:
> > > On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote:
> > > > On Mon, 2003-05-12 at 10:32, Tom Lane wrote:
> > > > > in 7.4 either.  Possibly 7.5.  In the meantime, third-party solutions
> > > > > are still your only option, and PostgreSQL Inc's one is probably the
> > > > > best.
> > > > I wouldn't say they are your only options. there is the rserv code in
> > > > contrib which I've seen people post they have gotten working. There is
> > > The contrib/rserv code does indeed work for some people, and it is
> > > useful.  It is nowhere close to handling large volumes, but for fewer
> > > than a few thousand writes an hour, it seems to be good.
> >
> > I'd like to put my two cents in on this one.  We use rserv on our
> > cluster here, and we do a few hundred writes every 2 minutes. The
> > biggest thing that will cause slowdowns with rserv is not indexing the
> > replication id field.  If there is an index for that it should work
> > fine.
>
> I tried rserv with a database that has over 1000 inserts per minute, and it
> would just sit there for days at the "Preparing snapshot" (on a
> Dual-Xeon/2GHz).  I hadn't tried indexing the id column though.

heh, yea,  we had a similar problem.  you should index the replication
id, and make sure the _rserv_log_ table has appropriate indexs on it.
You can enable the logging functions of rserv in one of the perl
modules. Look for DEBUG if I recall correctly. This will happily spam
you with information on what it is doing.

The other thing that should be remembered ( your email didn't mention it
which is why I bring it up ) is that when doing replication your master
server can potentially have double the read access on it during a cycle.
If you don't have a strong disk subsystem you'll send your server and
postgresql into a death spiral.

> > I do have some rudimentary documentation on how we did it all, and I
> > suppose I should really get that sent in.
>
> Yes, Please, the documentation out there could definitely be improved with
> some other case studies.

Indeed!  One of the important things we learned when we set up rserv is
that bigint's suck when it comes to indexes! Heh, after we modified the
rserv code to cast appropriately the world worked spectacularly.

Sincerely,

Will


Attachment

Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
Andrew Sullivan
Date:
On Tue, May 13, 2003 at 12:32:14PM -0700, Michael A Nachbaur wrote:
>
> I tried rserv with a database that has over 1000 inserts per minute, and it
> would just sit there for days at the "Preparing snapshot" (on a
> Dual-Xeon/2GHz).  I hadn't tried indexing the id column though.

You not only need an index on it; you need a unique index on it.  (If
you have duplicates, replication breaks.)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: No flamefest please, MySQL vs. PostgreSQL AGAIN

From
timeless postgres
Date:
> > I do have some rudimentary documentation on how we did it all, and I
> > suppose I should really get that sent in.
>
> Yes, Please, the documentation out there could definitely be improved with
> some other case studies.

Will, I'm with Michael. Yes, please. I'd be very keen on reading how you
got rserv setup/working. With your permission, I'd post said
documentation on my database knowledge repository wiki (to which anyone
can get write access).

--
Tim Ellis
Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org)
If this helped you, http://svcs.affero.net/rm.php?r=philovivero