Thread: More grist for the PostgreSQL vs MySQL mill
I have a MySQL table on our public website that is populated from a similar table on our internal site, which runs PostgreSQL.
Recently I was trying to enhance one of our website queries and ran across an interesting phenomenon:
The following query runs very quickly in both PostgreSQL (8.1.3) and MySQL (5.0.21)
select plr_rated_memid from tnmt_plr where plr_eventid in ('200607163681');
(tnmt_plr has around 3.5 million rows in it, and plr_eventid is an indexed field.)
Both databases return the correct number of rows (74) in less than a second.
However, when I then try to use that query as a subquery to select rows from another table, things change:
select count(*) from memmast where memid in (select plr_rated_memid from tnmt_plr where plr_eventid in ('200607163681');
(memid is also an indexed field in memmast, a table which has about 650,000 rows in it.)
This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES on MySQL!
--
Mike Nolan
Recently I was trying to enhance one of our website queries and ran across an interesting phenomenon:
The following query runs very quickly in both PostgreSQL (8.1.3) and MySQL (5.0.21)
select plr_rated_memid from tnmt_plr where plr_eventid in ('200607163681');
(tnmt_plr has around 3.5 million rows in it, and plr_eventid is an indexed field.)
Both databases return the correct number of rows (74) in less than a second.
However, when I then try to use that query as a subquery to select rows from another table, things change:
select count(*) from memmast where memid in (select plr_rated_memid from tnmt_plr where plr_eventid in ('200607163681');
(memid is also an indexed field in memmast, a table which has about 650,000 rows in it.)
This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES on MySQL!
--
Mike Nolan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/20/07 16:52, Michael Nolan wrote: > I have a MySQL table on our public website that is populated from a similar > table on our internal site, which runs PostgreSQL. > > Recently I was trying to enhance one of our website queries and ran across > an interesting phenomenon: > > The following query runs very quickly in both PostgreSQL (8.1.3) and > MySQL ( > 5.0.21) > > select plr_rated_memid from tnmt_plr where plr_eventid in ('200607163681'); Is this query created by an application? I.e, there might be a list of PLR_EVENTIDs? If so, I understand why it is like it is. Otherwise, why not make it a direct equality? > (tnmt_plr has around 3.5 million rows in it, and plr_eventid is an indexed > field.) > > Both databases return the correct number of rows (74) in less than a > second. > > However, when I then try to use that query as a subquery to select rows > from > another table, things change: > > select count(*) from memmast where memid in (select plr_rated_memid from > tnmt_plr where plr_eventid in ('200607163681'); > > (memid is also an indexed field in memmast, a table which has about 650,000 > rows in it.) > > This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES > on MySQL! > -- > Mike Nolan > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD4DBQFFsp7iS9HxQb37XmcRAkx6AJdnxYd9yxYLHRBf1jCu51y+9WDQAJ4qfvGD Axhu0LZJXH9HgHDDazFWIA== =hcs5 -----END PGP SIGNATURE-----
This is a generated query in a web form where there could be a series of 12 digit event IDs input by the user, hence using the 'in' form. This is slightly lazy programming on my part, but it makes little difference in either PostgreSQL or MySQL whether I use = or 'in'.
--
Mike Nolan
--
Mike Nolan
On 1/20/07, Ron Johnson <ron.l.johnson@cox.net> wrote:
Is this query created by an application? I.e, there might be a list
of PLR_EVENTIDs?
If so, I understand why it is like it is. Otherwise, why not make
it a direct equality?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 It's exactly what I would do... On 01/20/07 17:09, Michael Nolan wrote: > This is a generated query in a web form where there could be a series of 12 > digit event IDs input by the user, hence using the 'in' form. This is > slightly lazy programming on my part, but it makes little difference in > either PostgreSQL or MySQL whether I use = or 'in'. > -- > Mike Nolan > > On 1/20/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >> >> >> >> Is this query created by an application? I.e, there might be a list >> of PLR_EVENTIDs? >> >> If so, I understand why it is like it is. Otherwise, why not make >> it a direct equality? >> >> > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFsqLuS9HxQb37XmcRAuaeAKCBXTAg7gfhWk/p4sv4OG2Rdtsc5gCeI71c Nw4vY1tttyY3+5JRY4t89UE= =jkEY -----END PGP SIGNATURE-----
Ron Johnson <ron.l.johnson@cox.net> writes: > On 01/20/07 16:52, Michael Nolan wrote: >> select plr_rated_memid from tnmt_plr where plr_eventid in ('200607163681'); > Is this query created by an application? I.e, there might be a list > of PLR_EVENTIDs? > If so, I understand why it is like it is. Otherwise, why not make > it a direct equality? It wouldn't make any measurable difference, I think. The parser folds it to a simple equality as soon as it notices there's only one ... regards, tom lane
"Michael Nolan" <htfoot@gmail.com> writes: > select count(*) from memmast where memid in (select plr_rated_memid from > tnmt_plr where plr_eventid in ('200607163681'); > This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES > on MySQL! Yeah, and we probably would have sucked about as badly before 7.4 or so. There's a long way from "having subselects" to being able to optimize them decently. AFAIK mysql is still at the "we've got subselects!" stage ... regards, tom lane
What I think bothers me is this whole concept that if PostgreSQL is to flourish, MySQL has to be beaten down. Folks, both products are free, both can be used in the same shop (maybe not on the same computer if your running them in production). Putting down MySQL will not make PostgreSQL any better, or vice versa. Tom Lane wrote: > "Michael Nolan" <htfoot@gmail.com> writes: >> select count(*) from memmast where memid in (select plr_rated_memid from >> tnmt_plr where plr_eventid in ('200607163681'); > >> This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES >> on MySQL! > > Yeah, and we probably would have sucked about as badly before 7.4 or so. > There's a long way from "having subselects" to being able to optimize > them decently. AFAIK mysql is still at the "we've got subselects!" > stage ... > > regards, tom lane
John Meyer wrote: > What I think bothers me is this whole concept that if PostgreSQL is to > flourish, MySQL has to be beaten down. Folks, both products are free, > both can be used in the same shop (maybe not on the same computer if > your running them in production). Putting down MySQL will not make > PostgreSQL any better, or vice versa. It isn't that simple. There are many on this list that feel that MySQL does it *wrong* and frankly uses their marketing prowess to make themselves out to be something they are not. The point below is extremely valid. MySQL sucks at subselects. So what is the problem? Joshua D. Drake > > Tom Lane wrote: >> "Michael Nolan" <htfoot@gmail.com> writes: >>> select count(*) from memmast where memid in (select plr_rated_memid from >>> tnmt_plr where plr_eventid in ('200607163681'); >>> This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES >>> on MySQL! >> Yeah, and we probably would have sucked about as badly before 7.4 or so. >> There's a long way from "having subselects" to being able to optimize >> them decently. AFAIK mysql is still at the "we've got subselects!" >> stage ... >> >> regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
I'd say fine, but why discuss the flaws of MySQL on a PostgreSQL list? If you want to correct it, why not put that flaw on a MySQL list. And yes, I agree, there is a difference between pointing out a legitimate flaw and simply bashing for bashing's sake. Joshua D. Drake wrote: > John Meyer wrote: >> What I think bothers me is this whole concept that if PostgreSQL is to >> flourish, MySQL has to be beaten down. Folks, both products are free, >> both can be used in the same shop (maybe not on the same computer if >> your running them in production). Putting down MySQL will not make >> PostgreSQL any better, or vice versa. > > It isn't that simple. There are many on this list that feel that MySQL > does it *wrong* and frankly uses their marketing prowess to make > themselves out to be something they are not. The point below is > extremely valid. MySQL sucks at subselects. > > So what is the problem? > > Joshua D. Drake > > >> Tom Lane wrote: >>> "Michael Nolan" <htfoot@gmail.com> writes: >>>> select count(*) from memmast where memid in (select plr_rated_memid from >>>> tnmt_plr where plr_eventid in ('200607163681'); >>>> This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES >>>> on MySQL! >>> Yeah, and we probably would have sucked about as badly before 7.4 or so. >>> There's a long way from "having subselects" to being able to optimize >>> them decently. AFAIK mysql is still at the "we've got subselects!" >>> stage ... >>> >>> regards, tom lane >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> > >
Re: More grist for the PostgreSQL vs MySQL mill
From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "John" == John Meyer <john.l.meyer@gmail.com> writes: John> I'd say fine, but why discuss the flaws of MySQL on a PostgreSQL list? John> If you want to correct it, why not put that flaw on a MySQL list. And John> yes, I agree, there is a difference between pointing out a legitimate John> flaw and simply bashing for bashing's sake. It's a valid discussion here (although better on -advocacy), because it helps me have the right facts to present to clients about whether they should stay with a legacy database in MySQL vs upgrading to a modern PostgreSQL. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
> It's a valid discussion here (although better on -advocacy), because it helps > me have the right facts to present to clients about whether they should stay > with a legacy database in MySQL vs upgrading to a modern PostgreSQL. For all its flaws, MySQL is catching on quick and has a very active community of developments that several of us find rather handy - http://forge.mysql.com/ Is there something similar for Pgsql? (Disclaimer: new to this group because I'm unimpressed with MySQL beyond 30 million rows in a table that require advanced subqueries). Shanx
2007/1/21, Shashank Tripathi <shashank.tripathi@gmail.com>: > For all its flaws, MySQL is catching on quick and has a very active > community of developments that several of us find rather handy - > http://forge.mysql.com/ > > Is there something similar for Pgsql? <url:http://pgfoundry.org/> greetings, Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html
On 1/20/07, John Meyer <john.l.meyer@gmail.com> wrote:
One thing to point out here is that MySQL's business strategy seems to be changing, and I believe the "community edition" is starting to fall apart (or struggling to gain momentum). It seems MySQL just dropped the ball on the free version of their product, and it is taking time for people to run with it. Their attention is fully on the "enterprise edition" packages. Take a look at the 5.0 GA releases, you can't even download binary releases of 5.0.33. You have to build from source, it kind of indicates the lack of effort that MySQL is putting forward to the community, and sending a signal "buy our enterprise edition".
Additionally, they feel that Oracle is such a threat that they have dumped BDB (I believe this move was after Oracle acquired Sleepycat) and now they announced they are dumping InnoDB (again after Oracle acquired it). Now they are off building their own engine to compete against InnoDB.
I know right now they are promoting SolidDB, but that won't last long ("Falcon" is their new storage engine). And frankly, SolidDB+MySQL just doesn't work right. I gave it a shot and there was tons of problems, hardly production worthy. There also seems to be an increase of chatter about people coming to PostgreSQL because of the actions that MySQL has taken.
Just my 2 cents.
--
Chad
http://www.postgresqlforums.com/
What I think bothers me is this whole concept that if PostgreSQL is to
flourish, MySQL has to be beaten down. Folks, both products are free,
both can be used in the same shop (maybe not on the same computer if
your running them in production). Putting down MySQL will not make
PostgreSQL any better, or vice versa.
One thing to point out here is that MySQL's business strategy seems to be changing, and I believe the "community edition" is starting to fall apart (or struggling to gain momentum). It seems MySQL just dropped the ball on the free version of their product, and it is taking time for people to run with it. Their attention is fully on the "enterprise edition" packages. Take a look at the 5.0 GA releases, you can't even download binary releases of 5.0.33. You have to build from source, it kind of indicates the lack of effort that MySQL is putting forward to the community, and sending a signal "buy our enterprise edition".
Additionally, they feel that Oracle is such a threat that they have dumped BDB (I believe this move was after Oracle acquired Sleepycat) and now they announced they are dumping InnoDB (again after Oracle acquired it). Now they are off building their own engine to compete against InnoDB.
I know right now they are promoting SolidDB, but that won't last long ("Falcon" is their new storage engine). And frankly, SolidDB+MySQL just doesn't work right. I gave it a shot and there was tons of problems, hardly production worthy. There also seems to be an increase of chatter about people coming to PostgreSQL because of the actions that MySQL has taken.
Just my 2 cents.
--
Chad
http://www.postgresqlforums.com/
Shashank Tripathi wrote: >> It's a valid discussion here (although better on -advocacy), because >> it helps >> me have the right facts to present to clients about whether they >> should stay >> with a legacy database in MySQL vs upgrading to a modern PostgreSQL. > > > For all its flaws, MySQL is catching on quick and has a very active > community of developments that several of us find rather handy - > http://forge.mysql.com/ > > Is there something similar for Pgsql? (Disclaimer: new to this group > because I'm unimpressed with MySQL beyond 30 million rows in a table > that require advanced subqueries). www.pgfoundry.org > > Shanx > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
> It seems MySQL just dropped the ball on > the free version of their product, and it Not sure what you mean. I can download their latest versions without any trouble. > Additionally, they feel that Oracle is such a threat that they have dumped > BDB (I believe this move was after Oracle acquired Sleepycat) and now they > announced they are dumping InnoDB (again after Oracle acquired it). Where is this announcement? They don't need to drop either engine, as both are GPL. MySQL as a group was never too hot with BDB. As for InnoDB, if Oracle acts up, the GPL allows MySQL or any of its community members to fork out a separate version. SolidDB and Falcon are just storage engines, which is quite a smart architecture for MySQL to follow. There's an interesting discussion about ReiserFS vs MySQL (directions of filesystems versus databases) if you google for it. I think the more storage engines that come as default in a database, the more useful it will be to different audiences. Meanwhile, it is unclear what the goofs at Oracle have in mind for their two acquisitions. Lest I sound like a MySQL devotee here are some recent stats- http://spyced.blogspot.com/2006/12/benchmark-postgresql-beats-stuffing.html :) Shanx
Shashank wrote: >> It seems MySQL just dropped the ball on >> the free version of their product, and it > > Not sure what you mean. I can download their latest versions without > any trouble. > In contrast to the MySQL Enterprise Server, which receives both monthly rapid updates and quarterly service pack releases, there is no specific schedule for when a new version of the MySQL Community Server is released. While every bug fix that has been applied to the Enterprise Server will also be available in the subsequent Community Server release, there will be source-only releases in between full (source and binary) Community builds. So while the latest published community sources will always be available from the Source Downloads Section, the binaries listed on this page may be from a previous release. In any case, full binaries for all our supported operating systems are and will remain conveniently available from this page. > > > Where is this announcement? They don't need to drop either engine, as > both are GPL. MySQL as a group was never too hot with BDB. http://www.linux.com/article.pl?sid=06/08/30/2151251 And no BDB (at least last I checked is not GPL) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On 21 Jan 2007 08:01:57 -0800, Shashank <shashank.tripathi@gmail.com> wrote:
The point was they are not going to the effort to roll binary releases, if you can find binaries for 5.0.33 on their community download page then point it out.
http://dev.mysql.com/downloads/mysql/5.0.html
They are not there, just source downloads. In other words they dropped the ball, in my opinion.
Your correct, I misspoke about an announcement regarding InnoDB. It was actually speculation from the community.
I don't actually agree that it is a smart architecture. BDB, InnoDB, SolidDB, etc all require separate shared memory areas for caching. It just isn't efficient use of memory.
Not sure why you think anything is up their sleeve, other than they would like to be more competitive in the embedded marketplace and offer a larger product portfolio. The problem is Oracle Database is trying to serve a much different market than TimesTen, BDB, and InnoDB. Oracle Database is trying to serve high availability & fault tolerant enterprise markets, and they do it very well in my book. TimesTen is trying to serve a high-performance market, BDB is a light-weight (small device) embedded market, and InnoDB is more of an larger device (PC-based, perhaps) embedded market.
I know that since Oracle has acquired BDB, they have added multi-versioning. I was never really impressed with BDB embedded in MySQL, but who knows if that is how it was implemented or what. BDB in general seems to perform well.
--
Chad
http://www.postgresqlforums.com/
> It seems MySQL just dropped the ball on
> the free version of their product, and it
Not sure what you mean. I can download their latest versions without
any trouble.
The point was they are not going to the effort to roll binary releases, if you can find binaries for 5.0.33 on their community download page then point it out.
http://dev.mysql.com/downloads/mysql/5.0.html
They are not there, just source downloads. In other words they dropped the ball, in my opinion.
Where is this announcement? They don't need to drop either engine, as
both are GPL. MySQL as a group was never too hot with BDB. As for
InnoDB, if Oracle acts up, the GPL allows MySQL or any of its community
Your correct, I misspoke about an announcement regarding InnoDB. It was actually speculation from the community.
members to fork out a separate version. SolidDB and Falcon are just
storage engines, which is quite a smart architecture for MySQL to
follow. There's an interesting discussion about ReiserFS vs MySQL
I don't actually agree that it is a smart architecture. BDB, InnoDB, SolidDB, etc all require separate shared memory areas for caching. It just isn't efficient use of memory.
more useful it will be to different audiences. Meanwhile, it is unclear
what the goofs at Oracle have in mind for their two acquisitions.
Not sure why you think anything is up their sleeve, other than they would like to be more competitive in the embedded marketplace and offer a larger product portfolio. The problem is Oracle Database is trying to serve a much different market than TimesTen, BDB, and InnoDB. Oracle Database is trying to serve high availability & fault tolerant enterprise markets, and they do it very well in my book. TimesTen is trying to serve a high-performance market, BDB is a light-weight (small device) embedded market, and InnoDB is more of an larger device (PC-based, perhaps) embedded market.
I know that since Oracle has acquired BDB, they have added multi-versioning. I was never really impressed with BDB embedded in MySQL, but who knows if that is how it was implemented or what. BDB in general seems to perform well.
Chad
http://www.postgresqlforums.com/
"Joshua D. Drake" <jd@commandprompt.com> writes: > Shashank wrote: >>> It seems MySQL just dropped the ball on >>> the free version of their product, and it >> >> Not sure what you mean. I can download their latest versions without >> any trouble. > In contrast to the MySQL Enterprise Server, which receives both monthly > rapid updates and quarterly service pack releases, there is no specific > schedule for when a new version of the MySQL Community Server is > released. Not only that, but mysql has repeatedly issued community updates that were actively *broken* --- something like four of the last seven minor 5.0.x updates were not pushed out by Red Hat because they broke ABI compatibility with the prior version of libmysqlclient. I get the strong impression that those guys are in over their heads in terms of their ability to manage their software, or at least their ability to manage two separate release streams. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/21/07 10:20, Joshua D. Drake wrote: > Shashank wrote: [snip] >> Where is this announcement? They don't need to drop either >> engine, as both are GPL. MySQL as a group was never too hot >> with BDB. > > > http://www.linux.com/article.pl?sid=06/08/30/2151251 > > And no BDB (at least last I checked is not GPL) It's BSD (for obvious reasons), no? But still, it's just as easy to fork a BSD-licensed project as a GPL product. Just as NetBSD & OpenBSD. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFs8RRS9HxQb37XmcRAg5SAKDZHKlEnrBwXkgyQzebd+U3NJFRKACg3DvK LIpfiTJBGNYWTeOhXaNVdLg= =Mzin -----END PGP SIGNATURE-----
On 2007-01-21, Ron Johnson <ron.l.johnson@cox.net> wrote: >> And no BDB (at least last I checked is not GPL) > > It's BSD (for obvious reasons), no? No, Sleepycat's licence is _NOT_ BSD. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Back on topic, I can confirm that MySQL does indeed have various problems with optimizing sub-selects.
There are times where doing two seperate selects is orders of magnitude faster than doing a single with a sub-select due to index selection decisions.
Namely:
select * from table where table_id in (select something from othertable);
is much much slower than:
select something from othertable;
select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...)
MySQL are aware of this optimization problem but it does not seem to be of very high priority.
Bad optimization of complex queries is why I am looking at pg.
On 21/01/07, Andrew - Supernews <andrew+nonews@supernews.com> wrote:
On 2007-01-21, Ron Johnson <ron.l.johnson@cox.net> wrote:
>> And no BDB (at least last I checked is not GPL)
>
> It's BSD (for obvious reasons), no?
No, Sleepycat's licence is _NOT_ BSD.
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
> select something from othertable; > select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...) This is what MySQL's CEO Martin said in an interview on Slashdot. If we can manage two queries as above through, say, a PHP application, with each executing in 0.004 seconds, then an optimized subquery needs to be beat the 0.008 mark to be a viable alternative. This works for most people who're not looking for elegance of database design. I am assuming MYISAM engine with indexes on the WHERE columns. Granted, this is neither future nor fool proof, but in high traffic environments, it is all worth the drop in elegance. The problem is when the number of rows exceeds 30 million, MySQL performance degrades substantially. For most people, this is not an issue. PG is solid with huge databases, but in my experience, even the most optimized subselect on PG will not return a value in 0.008 seconds on 10 million rows -- I'd appreciate other experiences. Shanx
"Shashank Tripathi" <shashank.tripathi@gmail.com> writes: >> select something from othertable; >> select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...) > This is what MySQL's CEO Martin said in an interview on Slashdot. If > we can manage two queries as above through, say, a PHP application, > with each executing in 0.004 seconds, then an optimized subquery needs > to be beat the 0.008 mark to be a viable alternative. That certainly *should* be possible, because the subquery approach requires one less network round-trip, and less data marshaling/ transmission/demarshalling/back-the-other-way. If MySQL is seriously making that argument as a reason why they need not put work into subselects, you should be hearing strong echoes of their former positions about "you don't need foreign keys" and so on. regards, tom lane
Shashank Tripathi wrote: >> select something from othertable; >> select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...) > > > This is what MySQL's CEO Martin said in an interview on Slashdot. If > we can manage two queries as above through, say, a PHP application, > with each executing in 0.004 seconds, then an optimized subquery needs > to be beat the 0.008 mark to be a viable alternative. Not really. If you have too many values, you have problems.. eg the "select something from table" returns 100+ records (for example, don't have a concrete number), you'll run into this problem: http://dev.mysql.com/doc/refman/4.1/en/packet-too-large.html when you try to put them all in the 'in' clause in the 2nd query. But as you say not usually a problem in most cases but something you need to be aware of (and you're only aware of it once you've been bitten by it heh). -- Postgresql & php tutorials http://www.designmagick.com/
Right,
You also have to realize that your first query might return zero results, and MySQL (and maybe this is correct SQL behavior) balks at an empty value set "where table_id in ()".
I would expect that giving the DBMS the whole picture of what you want to do, should allow it to make better decisions on how to retrieve the data.
MySQL Inefficiencies like this seem to hit the performance of a highly normalized database design hard.
On 22/01/07, Chris <dmagick@gmail.com> wrote:
Shashank Tripathi wrote:
>> select something from othertable;
>> select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...)
>
>
> This is what MySQL's CEO Martin said in an interview on Slashdot. If
> we can manage two queries as above through, say, a PHP application,
> with each executing in 0.004 seconds, then an optimized subquery needs
> to be beat the 0.008 mark to be a viable alternative.
Not really.
If you have too many values, you have problems.. eg the "select
something from table" returns 100+ records (for example, don't have a
concrete number), you'll run into this problem:
http://dev.mysql.com/doc/refman/4.1/en/packet-too-large.html
when you try to put them all in the 'in' clause in the 2nd query.
But as you say not usually a problem in most cases but something you
need to be aware of (and you're only aware of it once you've been bitten
by it heh).
--
Postgresql & php tutorials
http://www.designmagick.com/
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/
>> select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...)
I usually try to rewrite this kind of queries to
select whatever from table t1 join
(select table_id from xxxxx where xxxxx) t2 using (table_id)
And 3 out of 4 this performs better on Oracle and PostgreSQL.
Would be curious why it does , but usually I am happy that it does:)
Harald
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
On 1/22/07, Harald Armin Massa <haraldarminmassa@gmail.com> wrote:
>> select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...)
I usually try to rewrite this kind of queries to
select whatever from table t1 join
(select table_id from xxxxx where xxxxx) t2 using (table_id)
And 3 out of 4 this performs better on Oracle and PostgreSQL.
Would be curious why it does , but usually I am happy that it does:)
Because the results would be different than a subselect, less work = faster. One thing to point out is that a query of the form:
select ...
from foo
where id in (select id from bar where n=27)
Would normally result in a SORT UNIQUE for the "select id from bar where n=27" part. Where as:
select ...
from foo f1, (select id from bar where n=27) f2
where f1.id = f2.id
is the same as...
select ...
from foo f1, bar f2
where f2.n=27
and f1.id=f2.id
which would not result in a sort unique. In order to obtain the same results as a subselect you would need to group or distinct, and I would imagine the results would be the same as the IN..SUBSELECT
--
Chad
http://www.postgresqlforums.com/
Chad,
aaah, you are right. My rewriting only works when the "id" column is a primary key in the subqueried table; that way guaranteed to be unique, so that
select distinct id from whatever where whateverelse
yields the same results as
select id from whatever where whateverelse
thanks for pointing it out,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
>> select * from table where table_id in (?, ?, ?, ?, ?, ?, ?, ...)
I usually try to rewrite this kind of queries to
select whatever from table t1 join
(select table_id from xxxxx where xxxxx) t2 using (table_id)
Because the results would be different than a subselect, less work = faster. One thing to point out is that a query of the form:
Would normally result in a SORT UNIQUE for the "select id from bar where n=27" part. Where as:
select ...
from foo f1, (select id from bar where n=27) f2
where f1.id = f2.id
is the same as...
select ...
from foo f1, bar f2
where f2.n=27
and f1.id=f2.id
which would not result in a sort unique. In order to obtain the same results as a subselect you would need to group or distinct, and I would imagine the results would be the same as the IN..SUBSELECT
aaah, you are right. My rewriting only works when the "id" column is a primary key in the subqueried table; that way guaranteed to be unique, so that
select distinct id from whatever where whateverelse
yields the same results as
select id from whatever where whateverelse
thanks for pointing it out,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/22/07 05:49, Peter Rosenthal wrote: > Right, > > You also have to realize that your first query might return zero results, > and MySQL (and maybe this is correct SQL behavior) balks at an empty value > set "where table_id in ()". > > I would expect that giving the DBMS the whole picture of what you want to > do, should allow it to make better decisions on how to retrieve the data. > > MySQL Inefficiencies like this seem to hit the performance of a highly > normalized database design hard. <SNARKY_REMARK> Do MySQL "designers" even know what data normalization is? </SNARKY_REMARK> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFtMt4S9HxQb37XmcRAn3LAKDQo5NK1Htd/QrQ6h15iBS8xZU2NgCfW/Ak HBliUtTkRP0yH8MtuabNCjE= =b8aT -----END PGP SIGNATURE-----
On Sun, 2007-01-21 at 10:01, Shashank wrote: > > It seems MySQL just dropped the ball on > > the free version of their product, and it > > Not sure what you mean. I can download their latest versions without > any trouble. > > > > Additionally, they feel that Oracle is such a threat that they have dumped > > BDB (I believe this move was after Oracle acquired Sleepycat) and now they > > announced they are dumping InnoDB (again after Oracle acquired it). > > > Where is this announcement? They don't need to drop either engine, as > both are GPL. MySQL as a group was never too hot with BDB. As for > InnoDB, if Oracle acts up, the GPL allows MySQL or any of its community > members to fork out a separate version. It has been hinted at by a few mysql guys here and there. Not sure if they'll do it. Here's the problem with GPL InnoDB for MySQL in a nutshell: MySQL makes money by selling a commercially licensed (i.e. not GPLd) version of its database. All the code in MySQL is either solely under the copyright of MySQL AB or licensed from other parties who hold sole copyright to the code (InnoDB). So, if MySQL forks InnoDB, they cannot include it in their commercial offering without GPLing the whole commercial offering. Which blows their business model apart. So, if Oracle shows up at the end of the innodb contract and goes "That'll be $10,000 a copy for innodb commercial licenses" then MySQL either pays them that for each copy of MySQL they sell with a commercial license, or they yank out the code, or they stop selling commercial licenses and go GPL only.
On Mon, 2007-01-22 at 10:30 +0800, Shashank Tripathi wrote: > The problem is when the number of rows exceeds 30 million, MySQL > performance degrades substantially. For most people, this is not an > issue. PG is solid with huge databases, but in my experience, even the > most optimized subselect on PG will not return a value in 0.008 > seconds on 10 million rows -- I'd appreciate other experiences. > A quick test of an IN query against 2 10M record tables gets a result in less than 1/10th of the 8ms you're asking for. Regards, Jeff Davis Output from psql (with some editing): db02=> CREATE TABLE a(x int unique, y int); NOTICE: CREATE TABLE / UNIQUE will create implicit index "a_x_key" for table "a" CREATE TABLE Time: 26.679 ms db02=> CREATE TABLE b(z int unique, x int); NOTICE: CREATE TABLE / UNIQUE will create implicit index "b_z_key" for table "b" CREATE TABLE Time: 6.419 ms db02=> insert into a select generate_series, generate_series+1 from generate_series(1,10000000); drop table foo; INSERT 0 10000000 Time: 74483.276 ms db02=> insert into b select generate_series, generate_series+1 from generate_series(1,10000000); INSERT 0 10000000 Time: 75848.143 ms db02=> analyze a; ANALYZE Time: 119.965 ms db02=> analyze c; ERROR: relation "c" does not exist db02=> analyze b; ANALYZE Time: 902.873 ms db02=> select y from a where x in (select x from b where z in (1000000,5000000)); y --------- 1000002 5000002 (2 rows) Time: 0.384 ms
John, you may not like it but we are in a competitive marketing environment with MySQL (even if both products are open source), and also with Oracle, SQL*Server, etc.
The MySQL folks will take every chance they get to point out instances where PostgreSQL performance is inferior to MySQL (and that's not unreasonable for them to do that), what's wrong with us pointing out their weaknesses as well? How many different threads have we had here or on -advocacy asking for real-world benchmarks between database products? I can think of several just in the last few months.
I know that as a result of this test when I rewrite this particular web app later this year I will be asking my ISP to make PostgreSQL 8.2 available, because this particular app requires a more robust database search engine than MySQL currently has.
--
Mike Nolan
The MySQL folks will take every chance they get to point out instances where PostgreSQL performance is inferior to MySQL (and that's not unreasonable for them to do that), what's wrong with us pointing out their weaknesses as well? How many different threads have we had here or on -advocacy asking for real-world benchmarks between database products? I can think of several just in the last few months.
I know that as a result of this test when I rewrite this particular web app later this year I will be asking my ISP to make PostgreSQL 8.2 available, because this particular app requires a more robust database search engine than MySQL currently has.
--
Mike Nolan
On 1/20/07, John Meyer <john.l.meyer@gmail.com> wrote:
What I think bothers me is this whole concept that if PostgreSQL is to
flourish, MySQL has to be beaten down. Folks, both products are free,
both can be used in the same shop (maybe not on the same computer if
your running them in production). Putting down MySQL will not make
PostgreSQL any better, or vice versa.
Tom Lane wrote:
> "Michael Nolan" <htfoot@gmail.com> writes:
>> select count(*) from memmast where memid in (select plr_rated_memid from
>> tnmt_plr where plr_eventid in ('200607163681');
>
>> This query takes about a second on PostgreSQL but takes OVER SEVEN MINUTES
>> on MySQL!
>
> Yeah, and we probably would have sucked about as badly before 7.4 or so.
> There's a long way from "having subselects" to being able to optimize
> them decently. AFAIK mysql is still at the "we've got subselects!"
> stage ...
>
> regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
I think any comparison between mysql and postgresql is faulty. I have used mysql for a very long time. As my skills matured and I was entrusted with larger projects, I could no longer make an intelligent case to use mysql over postgresql. I needed more from my database.
Most arguments in favor of mysql, are based off ones lack of experience and knowledge about postgresql; lazy coder too comfortable with what they know. Who wants to hear that something better is available after they have invested years into a mysql based project? Who wants to migrate (big pain in the a**)?
Anyone making an argument that mysql is better in any area, looses my respect as an engineer. They simply have not done their homework. If the project requires maintaining an existing mysql database(s), this is a legitimate argument. However, if this is the case one should say so instead of disguising their requirements in some silly debate about mysql being better.
"better" is quite a loose term, BTW, and its meaning changes with a DBA's skill level.
my 2 pennies
skye
Shashank <shashank.tripathi@gmail.com> wrote:
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.
Most arguments in favor of mysql, are based off ones lack of experience and knowledge about postgresql; lazy coder too comfortable with what they know. Who wants to hear that something better is available after they have invested years into a mysql based project? Who wants to migrate (big pain in the a**)?
Anyone making an argument that mysql is better in any area, looses my respect as an engineer. They simply have not done their homework. If the project requires maintaining an existing mysql database(s), this is a legitimate argument. However, if this is the case one should say so instead of disguising their requirements in some silly debate about mysql being better.
"better" is quite a loose term, BTW, and its meaning changes with a DBA's skill level.
my 2 pennies
skye
Shashank <shashank.tripathi@gmail.com> wrote:
> It seems MySQL just dropped the ball on
> the free version of their product, and it
Not sure what you mean. I can download their latest versions without
any trouble.
> Additionally, they feel that Oracle is such a threat that they have dumped
> BDB (I believe this move was after Oracle acquired Sleepycat) and now they
> announced they are dumping InnoDB (again after Oracle acquired it).
Where is this announcement? They don't need to drop either engine, as
both are GPL. MySQL as a group was never too hot with BDB. As for
InnoDB, if Oracle acts up, the GPL allows MySQL or any of its community
members to fork out a separate version. SolidDB and Falcon are just
storage engines, which is quite a smart architecture for MySQL to
follow. There's an interesting discussion about ReiserFS vs MySQL
(directions of filesystems versus databases) if you google for it. I
think the more storage engines that come as default in a database, the
more useful it will be to different audiences. Meanwhile, it is unclear
what the goofs at Oracle have in mind for their two acquisitions.
Lest I sound like a MySQL devotee here are some recent stats-
http://spyced.blogspot.com/2006/12/benchmark-postgresql-beats-stuffing.html
:)
Shanx
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.