Thread: No flamefest please, MySQL vs. PostgreSQL AGAIN
I've just accepted a position at a company where MySQL is the database of choice. They are running ~sixty MySQL instances, and I am beginning to learn MySQL. My general inclination, being as I've been using PostgreSQL for a number of years, is to recommend switching to PostgreSQL (and they are at a point where switching is a viable option), but I don't want to start recommending if: 1. MySQL really is best for them, or 2. PostgreSQL is better, but don't know why. For whatever reason, the various MSvs.PG comparisons I've found have fallen quite short in accuracy, timeliness, or objectiveness. I've made my own updateable version at: http://faemalia.org/wiki/view/Technical/PostgreSQLvsMySQL. I wouldn't mind having calm, rational individuals go change that page all around. It is flawed and incomplete, but is read/write. I want to have a (flame-free as possible) discussion of relative merits to both systems. I know I'm not the only one interested in these comparisons. The key issues I wonder about are: 1. Replication -- Supposedly Postgres-R was to be merged into 7.2? Did this happen? Is the pgsql.com offering still the only game in town? (pgsql.com was down at the time I wrote this) 2. Read/write backups -- Supposedly MySQL locks tables into read-only mode while backing them up? 3. Non-logged bulk inserts -- How much logging does COPY table FROM do? Is it comparable to a MySQL MyISAM table? 4. Point-in-time recovery -- Was this supposed to go into CVS sometime this month? -- Tim Ellis Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org) If this helped you, http://svcs.affero.net/rm.php?r=philovivero
timeless postgres <pvspam-postgres@hacklab.net> writes: > 1. Replication -- Supposedly Postgres-R was to be merged into 7.2? > Did this happen? Is the pgsql.com offering still the only game > in town? (pgsql.com was down at the time I wrote this) Postgres-R hasn't been merged, and I see no prospect that it will appear in 7.4 either. Possibly 7.5. In the meantime, third-party solutions are still your only option, and PostgreSQL Inc's one is probably the best. > 2. Read/write backups -- Supposedly MySQL locks tables into read-only > mode while backing them up? You'd have to ask them. Ours doesn't though. > 3. Non-logged bulk inserts -- How much logging does COPY table FROM > do? Is it comparable to a MySQL MyISAM table? I cannot imagine why you'd want non-logged inserts, unless you don't actually care about your data. It should be noted though that as of 7.3, operations on TEMP tables don't do WAL logging; perhaps that would be of use to you. > 4. Point-in-time recovery -- Was this supposed to go into CVS sometime > this month? It might be there in 7.4 ... it ain't there today ... regards, tom lane
On Mon, 2003-05-12 at 10:32, Tom Lane wrote: > timeless postgres <pvspam-postgres@hacklab.net> writes: > > 1. Replication -- Supposedly Postgres-R was to be merged into 7.2? > > Did this happen? Is the pgsql.com offering still the only game > > in town? (pgsql.com was down at the time I wrote this) > > Postgres-R hasn't been merged, and I see no prospect that it will appear > in 7.4 either. Possibly 7.5. In the meantime, third-party solutions > are still your only option, and PostgreSQL Inc's one is probably the > best. I wouldn't say they are your only options. there is the rserv code in contrib which I've seen people post they have gotten working. There is also the usogres stuff that I have heard of a few people using. While none of these are considered "ready for prime time" by the core group, I don't think they should be ignored. If more people tried using them and submitted some patches, we might get a solid replication solution that much sooner. I also feel I should point out that in a lot of the cases I have seen mysql replication used because they couldn't get a single mysql instance to scale up enough. Given that postgresql scales so well, it cuts down on the need to have a replication solution, which is probably part of the reason why we have gone so long without one. Robert Treat
>. > >I also feel I should point out that in a lot of the cases I have seen >mysql replication used because they couldn't get a single mysql instance >to scale up enough. Given that postgresql scales so well, it cuts down >on the need to have a replication solution, which is probably part of >the reason why we have gone so long without one. We would be interested in replication, so reporting could be done against a different server than production. >Robert Treat > > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly ------------------------------------------------------------------------------------------------------------------------- Naomi Walker Chief Information Officer Eldorado Computing, Inc. nwalker@eldocomp.com 602-604-3100 ------------------------------------------------------------------------------------------------------------------------- Don't throw away the old bucket until you know whether the new one holds water. Swedish Proverb ------------------------------------------------------------------------------------------------------------------------- Eldorado Test Disclaimer...please ignore.
On Mon, 12 May 2003, Naomi Walker wrote: > We would be interested in replication, so reporting could be done against a > different server than production. And I'm interested in replication for failover purposes. Automatic hot-failover isn't really required for my application, but a "warm" failover that can have a mostly-up-to-date data set and be activated within a few minutes would be very nice. -- David.
> > 3. Non-logged bulk inserts -- How much logging does COPY table FROM > > do? Is it comparable to a MySQL MyISAM table? > I cannot imagine why you'd want non-logged inserts, unless you don't > actually care about your data. More or less, we don't. If we try to insert 1,000 entries, and our table has now 1,000 entries more, then we're satisfied. If our database goes down, we simply need to re-load. It's a data warehouse, after all, and its source is the L bit of the ETL (extract/transform/load) process. All the data we truly care about transactions for are on the OLTP databases. To wit: atomicity? We don't care. Consistency? We don't care. Isolation? We don't care. Durability? We care, but if it goes dead-mode before buffers get flushed to disk, we've got all the data ready to load again. Maybe I don't understand how to ask the question. I want to know how to insert (say) 10M rows into a table quickly. I mean... VERY quickly. Obviously the following 10M transactions are going to be slow: insert into tab values (1); insert into tab values (2); insert into tab values (3); . . . insert into tab values (10000000); Would it be faster if I put a single transaction around that? Would it be faster to do the following? copy tab (col) from stdin; 1 2 3 . . . 10000000 \. > It should be noted though that as of > 7.3, operations on TEMP tables don't do WAL logging; perhaps that > would be of use to you. It does sound useful, on new tables, to load into a temp table, then rename the temp table as permanent table. > [point-in-time recovery] might be there in 7.4 ... it ain't there today Would it be better to hang out on hackers to find out about this? Looking at archives, I see messages all the way back to July of last year talking about it. I only recall seeing one message (from Bruce) about it on this list. -- Tim Ellis Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org) If this helped you, http://svcs.affero.net/rm.php?r=philovivero
On Tue, 2003-05-13 at 00:02, timeless postgres wrote: ... > Maybe I don't understand how to ask the question. I want to know how to > insert (say) 10M rows into a table quickly. I mean... VERY quickly. > > Obviously the following 10M transactions are going to be slow: > > insert into tab values (1); > insert into tab values (2); > insert into tab values (3); > . . . > insert into tab values (10000000); > > Would it be faster if I put a single transaction around that? 10M transactions are going to be slow. 10M inserts in one transaction or in a few transactions will be a lot faster. > Would it > be faster to do the following? > > copy tab (col) from stdin; > 1 > 2 > 3 > . . . > 10000000 > \. Yes, that will be the fastest of all. > > It should be noted though that as of > > 7.3, operations on TEMP tables don't do WAL logging; perhaps that > > would be of use to you. > > It does sound useful, on new tables, to load into a temp table, then > rename the temp table as permanent table. I don't think that is possible. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "Watch ye and pray, lest ye enter into temptation. The spirit truly is ready, but the flesh is weak." Mark 14:38
On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote: > On Mon, 2003-05-12 at 10:32, Tom Lane wrote: > > in 7.4 either. Possibly 7.5. In the meantime, third-party solutions > > are still your only option, and PostgreSQL Inc's one is probably the > > best. > > I wouldn't say they are your only options. there is the rserv code in > contrib which I've seen people post they have gotten working. There is I think what Tom was saying is that the PostgreSQL Inc version is production-ready-ish right now. It's sort of expensive, and it's a pain in the neck to administer (and has some real annoying behaviour under a couple of conditions), but if you're dealing with any volume, it's what you should use today. That isn't to say you should use it forever. There was some mighty interesting work being done on dbmirror (several related questions showed up on -hackers), and if you want to be sure that you replay _every_ transaction to your slave, I gather it's the only way to go. The contrib/rserv code does indeed work for some people, and it is useful. It is nowhere close to handling large volumes, but for fewer than a few thousand writes an hour, it seems to be good. I haven't tried the other systems that are out there. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Mon, May 12, 2003 at 04:02:42PM -0700, timeless postgres wrote: > Maybe I don't understand how to ask the question. I want to know how to > insert (say) 10M rows into a table quickly. I mean... VERY quickly. You probably want a different word from "insert", since it has a specific meaning. > Would it be faster if I put a single transaction around that? Would it > be faster to do the following? Yes, you want COPY. It's the fastest. > > [point-in-time recovery] might be there in 7.4 ... it ain't there today > > Would it be better to hang out on hackers to find out about this? Probably. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
>-----Original Message----- >From: David F. Skoll [mailto:dfs@roaringpenguin.com] >Sent: Monday, May 12, 2003 16:08 >To: pgsql-admin@postgresql.org >Subject: Re: [ADMIN] No flamefest please, MySQL vs. PostgreSQL AGAIN > > >On Mon, 12 May 2003, Naomi Walker wrote: > >> We would be interested in replication, so reporting could be >done against a >> different server than production. > >And I'm interested in replication for failover purposes. Automatic >hot-failover isn't really required for my application, but a "warm" >failover that can have a mostly-up-to-date data set and be activated >within a few minutes would be very nice. I think you'll find the discussion on "LVM snapshots" http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=LVM+snapshots&q=b relevant. If you don't need the log roll-forward coming with postgresql PITR, and your reporting function is run infrequently, you can get "really lazy replication" with rsync, LVM (or EVMS or hardware) snapshots, and a journaling filesystem (an example script is included in the discussion). This strategy has eliminated a big problem we had with postgresql dumps taking too long on large datasets, but it also facilitates the applications David and Naomi are interested in, under the right circumstances. Murthy (With XFS as the underlying filesystem, snapshot creation sometimes gets stuck in D state, and so does the $PGDATA filesystem. I have an updated example script with a workaround for this, which I will post in a day or two; an updated version of XFS with a permanent fix is in testing.)
On Tue, 2003-05-13 at 05:08, Andrew Sullivan wrote: > On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote: > > On Mon, 2003-05-12 at 10:32, Tom Lane wrote: > > > > in 7.4 either. Possibly 7.5. In the meantime, third-party solutions > > > are still your only option, and PostgreSQL Inc's one is probably the > > > best. > > > > I wouldn't say they are your only options. there is the rserv code in > > contrib which I've seen people post they have gotten working. There is > > I think what Tom was saying is that the PostgreSQL Inc version is > production-ready-ish right now. It's sort of expensive, and it's a > pain in the neck to administer (and has some real annoying behaviour > under a couple of conditions), but if you're dealing with any volume, > it's what you should use today. > > That isn't to say you should use it forever. There was some mighty > interesting work being done on dbmirror (several related questions > showed up on -hackers), and if you want to be sure that you replay > _every_ transaction to your slave, I gather it's the only way to go. > > The contrib/rserv code does indeed work for some people, and it is > useful. It is nowhere close to handling large volumes, but for fewer > than a few thousand writes an hour, it seems to be good. I'd like to put my two cents in on this one. We use rserv on our cluster here, and we do a few hundred writes every 2 minutes. The biggest thing that will cause slowdowns with rserv is not indexing the replication id field. If there is an index for that it should work fine. I do have some rudimentary documentation on how we did it all, and I suppose I should really get that sent in. Sincerely, Will > I haven't tried the other systems that are out there. > > A > > -- > ---- > Andrew Sullivan 204-4141 Yonge Street > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M2P 2A8 > +1 416 646 3304 x110
Attachment
On Tuesday 13 May 2003 11:36 am, Will LaShell wrote: > On Tue, 2003-05-13 at 05:08, Andrew Sullivan wrote: > > On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote: > > > On Mon, 2003-05-12 at 10:32, Tom Lane wrote: > > > > in 7.4 either. Possibly 7.5. In the meantime, third-party solutions > > > > are still your only option, and PostgreSQL Inc's one is probably the > > > > best. > > > I wouldn't say they are your only options. there is the rserv code in > > > contrib which I've seen people post they have gotten working. There is > > The contrib/rserv code does indeed work for some people, and it is > > useful. It is nowhere close to handling large volumes, but for fewer > > than a few thousand writes an hour, it seems to be good. > > I'd like to put my two cents in on this one. We use rserv on our > cluster here, and we do a few hundred writes every 2 minutes. The > biggest thing that will cause slowdowns with rserv is not indexing the > replication id field. If there is an index for that it should work > fine. I tried rserv with a database that has over 1000 inserts per minute, and it would just sit there for days at the "Preparing snapshot" (on a Dual-Xeon/2GHz). I hadn't tried indexing the id column though. > I do have some rudimentary documentation on how we did it all, and I > suppose I should really get that sent in. Yes, Please, the documentation out there could definitely be improved with some other case studies.
On Tue, 2003-05-13 at 12:32, Michael A Nachbaur wrote: > On Tuesday 13 May 2003 11:36 am, Will LaShell wrote: > > On Tue, 2003-05-13 at 05:08, Andrew Sullivan wrote: > > > On Mon, May 12, 2003 at 02:21:21PM -0400, Robert Treat wrote: > > > > On Mon, 2003-05-12 at 10:32, Tom Lane wrote: > > > > > in 7.4 either. Possibly 7.5. In the meantime, third-party solutions > > > > > are still your only option, and PostgreSQL Inc's one is probably the > > > > > best. > > > > I wouldn't say they are your only options. there is the rserv code in > > > > contrib which I've seen people post they have gotten working. There is > > > The contrib/rserv code does indeed work for some people, and it is > > > useful. It is nowhere close to handling large volumes, but for fewer > > > than a few thousand writes an hour, it seems to be good. > > > > I'd like to put my two cents in on this one. We use rserv on our > > cluster here, and we do a few hundred writes every 2 minutes. The > > biggest thing that will cause slowdowns with rserv is not indexing the > > replication id field. If there is an index for that it should work > > fine. > > I tried rserv with a database that has over 1000 inserts per minute, and it > would just sit there for days at the "Preparing snapshot" (on a > Dual-Xeon/2GHz). I hadn't tried indexing the id column though. heh, yea, we had a similar problem. you should index the replication id, and make sure the _rserv_log_ table has appropriate indexs on it. You can enable the logging functions of rserv in one of the perl modules. Look for DEBUG if I recall correctly. This will happily spam you with information on what it is doing. The other thing that should be remembered ( your email didn't mention it which is why I bring it up ) is that when doing replication your master server can potentially have double the read access on it during a cycle. If you don't have a strong disk subsystem you'll send your server and postgresql into a death spiral. > > I do have some rudimentary documentation on how we did it all, and I > > suppose I should really get that sent in. > > Yes, Please, the documentation out there could definitely be improved with > some other case studies. Indeed! One of the important things we learned when we set up rserv is that bigint's suck when it comes to indexes! Heh, after we modified the rserv code to cast appropriately the world worked spectacularly. Sincerely, Will
Attachment
On Tue, May 13, 2003 at 12:32:14PM -0700, Michael A Nachbaur wrote: > > I tried rserv with a database that has over 1000 inserts per minute, and it > would just sit there for days at the "Preparing snapshot" (on a > Dual-Xeon/2GHz). I hadn't tried indexing the id column though. You not only need an index on it; you need a unique index on it. (If you have duplicates, replication breaks.) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
> > I do have some rudimentary documentation on how we did it all, and I > > suppose I should really get that sent in. > > Yes, Please, the documentation out there could definitely be improved with > some other case studies. Will, I'm with Michael. Yes, please. I'd be very keen on reading how you got rserv setup/working. With your permission, I'd post said documentation on my database knowledge repository wiki (to which anyone can get write access). -- Tim Ellis Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org) If this helped you, http://svcs.affero.net/rm.php?r=philovivero