Thread: PostgreSQL TPC-H test result?
I read something from http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html saying that PostgreSQL can't give the correct result of the some TPC-H queries, I wonder is there any official statements about this, because it will affect our plane of using PostgreSQL as an alternative because it's usability. BTW I don't think PostgreSQL performances worse because the default configuration usually can't use enough resources of the computer, as as memory.
On Tue, Sep 09, 2008 at 07:59:49PM +0800, Amber wrote: > I read something from > http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html Given that the point of that "study" is to prove something about performance, one should be leery of any claims based on an "out of the box" comparison. Particularly since the "box" their own product comes out of is "compiled from CVS checkout". Their argument seems to be that people can learn how to drive CVS and to compile software under active development, but can't read the manual that comes with Postgres (and a release of Postgres well over a year old, at that). I didn't get any further in reading the claims, because it's obviously nothing more than a marketing effort using the principle that deriding everyone else will make them look better. Whether they have a good product is another question entirely. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
Yes, we don't care about the performance results, but we do care about the point that PostgreSQL can't give the correct resultsof TPC-H queries. -------------------------------------------------- From: "Andrew Sullivan" <ajs@commandprompt.com> Sent: Tuesday, September 09, 2008 8:39 PM To: <pgsql-general@postgresql.org> Subject: Re: [GENERAL] PostgreSQL TPC-H test result? > On Tue, Sep 09, 2008 at 07:59:49PM +0800, Amber wrote: > >> I read something from >> http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html > > Given that the point of that "study" is to prove something about > performance, one should be leery of any claims based on an "out of the > box" comparison. Particularly since the "box" their own product comes > out of is "compiled from CVS checkout". Their argument seems to be > that people can learn how to drive CVS and to compile software under > active development, but can't read the manual that comes with Postgres > (and a release of Postgres well over a year old, at that). > > I didn't get any further in reading the claims, because it's obviously > nothing more than a marketing effort using the principle that deriding > everyone else will make them look better. Whether they have a good > product is another question entirely. > > A > -- > Andrew Sullivan > ajs@commandprompt.com > +1 503 667 4564 x104 > http://www.commandprompt.com/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Tue, Sep 9, 2008 at 7:06 AM, Amber <guxiaobo1982@hotmail.com> wrote: > Yes, we don't care about the performance results, but we do care about the point that PostgreSQL can't give the correctresults of TPC-H queries. It would be nice to know about the data, queries, and the expected results of their tests just so we could see this for ourselves. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
On Tue, Sep 9, 2008 at 10:06 AM, Amber <guxiaobo1982@hotmail.com> wrote: > Yes, we don't care about the performance results, but we do care about the point that PostgreSQL can't give the correctresults of TPC-H queries. PostgreSQL, at least in terms of the open source databases, is probably your best bet if you are all concerned about correctness. Do not give any credence to a vendor published benchmark unless the test is published and can be independently verifed. merlin
On Tue, Sep 09, 2008 at 10:06:01PM +0800, Amber wrote: > Yes, we don't care about the performance results, but we do care about the point that PostgreSQL can't give the correctresults of TPC-H queries. > I have never heard a reputable source claim this. I have grave doubts about their claim: they don't specify what implementation of TPC-H they use. They don't actually have the right, AIUI, to claim they tested under TPC-H, since their results aren't listed anywhere on http://www.tpc.org/tpch/results/tpch_results.asp?orderby=dbms. It could well be that they made up something that kinda does something like TPC-H, tailored to how their database works, and then claimed others can't do the job. That's nice marketing material, but it's not a meaningful test result. Without access to the methodology, you should be wary of accepting any of the conclusions. There is, I understand, an implementation of something like TPC-H that you could use to test it yourself. http://osdldbt.sourceforge.net/. DBT-3 is supposed to be that workload. Please note that the license does not allow you to publish competitive tests for marketing reasons. but you could see for yourself whether the claim is true that way. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Tuesday 09 September 2008 10:06:01 Amber wrote: > From: "Andrew Sullivan" <ajs@commandprompt.com> > Sent: Tuesday, September 09, 2008 8:39 PM > To: <pgsql-general@postgresql.org> > Subject: Re: [GENERAL] PostgreSQL TPC-H test result? > > > On Tue, Sep 09, 2008 at 07:59:49PM +0800, Amber wrote: > >> I read something from > >> http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html > > > > Given that the point of that "study" is to prove something about > > performance, one should be leery of any claims based on an "out of the > > box" comparison. Particularly since the "box" their own product comes > > out of is "compiled from CVS checkout". Their argument seems to be > > that people can learn how to drive CVS and to compile software under > > active development, but can't read the manual that comes with Postgres > > (and a release of Postgres well over a year old, at that). > > > > I didn't get any further in reading the claims, because it's obviously > > nothing more than a marketing effort using the principle that deriding > > everyone else will make them look better. Whether they have a good > > product is another question entirely. > > > > >Yes, we don't care about the performance results, but we do care > > >about the > > > point that PostgreSQL can't give the correct results of TPC-H queries. Given the point of those benchmarks is to make other systems look bad, I think you have to take them with a grain of salt. Since we don't know what the errors/results were, and no information is giving, we are left to wonder if this is a problem with the software or the tester. The site would have us believe the former, but I think I would lean toward the latter... case in point, I did a quick google and turned up this link: http://www.it.iitb.ac.in/~chetanv/personal/acads/db/report_html/node10.html. It isn't terribly informative, but it doesindicate one thing, someone else was able to run query #6 correctly, while the above site claims it returns an error. Now when I look at query#6 from that site, I notice it shows the following syntax: interval '1' year. when I saw that, it jumped out at me as something that could be an issue, and it is: pagila=# select now() - interval '1' year, now() - interval '1 year'; ?column? | ?column? -------------------------------+------------------------------- 2008-09-09 11:28:46.938209-04 | 2007-09-09 11:28:46.938209-04 (1 row) Now, I'm not sure if there is an issue that monet supports the first syntax and so when they ran thier test on postgres this query produced wrong results, but that seems possible. In this case I would wonder if the first syntax is sql compliant, but it doesn't really matter, the tpc-h allows for changes to queries to support syntax variations between databases; I'm pretty sure I could make suttle changes to "break" other databases as well. Incidentally, I poked Mark Wong, who used to work at the OSDL (big linux kernel hacking shop), and he noted he has successfully run the tpc-h tests before on postgres. In the end, I can't speak to what the issues are wrt monet and postgres and thier tpc-h benchmarks, but personally I don't think they are worth worring about. -- Robert Treat http://www.omniti.com Database: Scalability: Consulting:
My 02c, Pg does itself no favours by sticking with such pessimistic defaults, and a novice user wanting to try it out will find tweakingthe pg configuration files for performance quite complicated. Given the general increase in typical hardware specs these days, perhaps the default pg specs could be set for higher specsystems? Or perhaps the standard install could come with 2 or 3 versions of the config files, & the user can simply rename/invokethe one that fits their system best? I figure (somewhat simplistically) that most settings are more relatedto available memory than anything else, so perhaps config files for typical 1Gb, 4Gb & 8Gb systems could be providedout of the box to make initial installs simpler? Cheers, Brent Wood Brent Wood DBA/GIS consultant NIWA, Wellington New Zealand >>> Andrew Sullivan <ajs@commandprompt.com> 09/10/08 3:47 AM >>> On Tue, Sep 09, 2008 at 07:59:49PM +0800, Amber wrote: > I read something from > http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html Given that the point of that "study" is to prove something about performance, one should be leery of any claims based on an "out of the box" comparison. Particularly since the "box" their own product comes out of is "compiled from CVS checkout". Their argument seems to be that people can learn how to drive CVS and to compile software under active development, but can't read the manual that comes with Postgres (and a release of Postgres well over a year old, at that). I didn't get any further in reading the claims, because it's obviously nothing more than a marketing effort using the principle that deriding everyone else will make them look better. Whether they have a good product is another question entirely. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Wed, Sep 10, 2008 at 07:16:35AM +1200, Brent Wood wrote: > Given the general increase in typical hardware specs these days, > perhaps the default pg specs could be set for higher spec systems? Given the default shmem configuration on operating systems these days, upping the default will likely cause postgresql to not run at all. For some reason the shmem defaults in OSes have not been increased in line with the hardware specs, not sure what can be done about that. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On Wed, Sep 10, 2008 at 07:16:35AM +1200, Brent Wood wrote: > Pg does itself no favours by sticking with such pessimistic > defaults, and a novice user wanting to try it out will find tweaking > the pg configuration files for performance quite complicated. You do know that at install time, Pg does some elementary investigation of the system to see what it can set its defaults to, right? In addition, every time this comes up I find it perplexing. The idea seems to be that "novices" in databases should be excused from learning about their system and should expect a nearly-optimally tuned system out of the box. But there are so many variables involved in database tuning as to make such a claim hard to swallow. For instance . . . > fits their system best? I figure (somewhat simplistically) that most > settings are more related to available memory than anything else, so . . . your figuring here is indeed simplistic. Every day I see requests for help from people who have followed the rule of thumb "1/4 of memory for shared_buffers", except that they're also running apache+jakarta, MySQL, and a mail server on the same box. They wonder why the stock advice is so wrong. It's wrong because a general-purpose tool is almost never going to come pre-set for every possible workload you might want to throw at it. So even "how much memory" there is on the machine is a question that is harder to answer than it might seem. Disk layout, data access patterns, even the filesystem you choose can make significant differences in how the system performs. Finally, part of the reason people make these claims is because they tend to hold Postgres up against toy systems that are _not_ designed to scale up. A certain well known database product, for instance, has been struggling for the last several years to turn itself into a full-featured, high-volume, safe transactional system. But the seams keep showing, because it just wasn't designed for this workload in the first place. But it sure is fast out of the box on a single-user system! A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
Robert Treat <robert@omniti.com> writes: > http://www.it.iitb.ac.in/~chetanv/personal/acads/db/report_html/node10.html. > It isn't terribly informative, but it doesindicate one thing, someone else > was able to run query #6 correctly, while the above site claims it returns an > error. Now when I look at query#6 from that site, I notice it shows the > following syntax: > interval '1' year. > when I saw that, it jumped out at me as something that could be an issue, and > it is: Yeah. This is SQL spec syntax, but it's not fully implemented in Postgres: the grammar supports it but the info doesn't get propagated to interval_in, and interval_in wouldn't know what to do even if it did have the information that there was a YEAR qualifier after the literal. That's probably not good because it *looks* like we support the syntax, but in fact produce non-spec-compliant results. I think it might be better if we threw an error. Or someone could try to make it work, but given that no one has taken the slightest interest since Tom Lockhart left the project, I wouldn't hold my breath waiting for that. regards, tom lane
On Tue, Sep 9, 2008 at 2:07 PM, Andrew Sullivan <ajs@commandprompt.com> wrote: > > . . . your figuring here is indeed simplistic. Every day I see > requests for help from people who have followed the rule of thumb "1/4 > of memory for shared_buffers", except that they're also running > apache+jakarta, MySQL, and a mail server on the same box. They wonder > why the stock advice is so wrong. It's wrong because a > general-purpose tool is almost never going to come pre-set for every > possible workload you might want to throw at it. So even "how much > memory" there is on the machine is a question that is harder to answer > than it might seem. Disk layout, data access patterns, even the > filesystem you choose can make significant differences in how the > system performs. Just as common is the beginner showing up with an 8 core opteron server with 64 Gigs of ram trying to get fast write transactions on a single 7200 rpm 500G sata drive. > Finally, part of the reason people make these claims is because they > tend to hold Postgres up against toy systems that are _not_ designed > to scale up. A certain well known database product, for instance, has > been struggling for the last several years to turn itself into a > full-featured, high-volume, safe transactional system. But the seams > keep showing, because it just wasn't designed for this workload in the > first place. But it sure is fast out of the box on a single-user > system! reference tweakers.net http://tweakers.net/reviews/649/8/database-test-sun-ultrasparc-t1-vs-punt-amd-opteron-pagina-8.html http://tweakers.net/reviews/661/6/database-test-intel-xeon-clovertown-x5355-pagina-6.html
On Tue, 9 Sep 2008, Amber wrote: > I read something from > http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html > saying that PostgreSQL can't give the correct result of the some TPC-H > queries Jignesh Shah at Sun ran into that same problem. It's mentioned briefly in his presentation at http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_postgresql on pages 26 and 27. 5 of the 22 reference TCP-H queries (4, 5, 6, 10, 14) returned zero rows immediately for his tests. Looks like the MonetDB crew is saying it does that on queries 4,5,6,10,12,14,15 and that 20 takes too long to run to generate a result. Maybe 12/15/20 were fixed by changes in 8.3, or perhaps there were subtle errors there that Jignesh didn't catch--it's not like he did a formal submission run, was just kicking the tires. I suspect the difference on 20 was that his hardware and tuning was much better, so it probably did execute fast enough. While some of the MonetDB bashing in this thread was unwarranted, it is quite inappropriate that they published performance results here. Would be nice if someone in the community were to grab ahold of the TPC-H problems and try to shake them out. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Tue, 9 Sep 2008, Amber wrote: > >> I read something from >> http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html >> saying that PostgreSQL can't give the correct result of the some >> TPC-H queries > > Jignesh Shah at Sun ran into that same problem. It's mentioned > briefly in his presentation at > http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_postgresql > on pages 26 and 27. 5 of the 22 reference TCP-H queries (4, 5, 6, 10, > 14) returned zero rows immediately for his tests. Looks like the > MonetDB crew is saying it does that on queries 4,5,6,10,12,14,15 and > that 20 takes too long to run to generate a result. Maybe 12/15/20 > were fixed by changes in 8.3, or perhaps there were subtle errors > there that Jignesh didn't catch--it's not like he did a formal > submission run, was just kicking the tires. I suspect the difference > on 20 was that his hardware and tuning was much better, so it probably > did execute fast enough. > > While some of the MonetDB bashing in this thread was unwarranted, it > is quite inappropriate that they published performance results here. > Would be nice if someone in the community were to grab ahold of the > TPC-H problems and try to shake them out. > Hmm This is the second time MonetDB has published PostgreSQL numbers. I think I will try to spend few days on TPC-H again on a much smaller scale (to match what MonetDB used) and start discussions on solving the problem.. Keep tuned. Regards, Jignesh
On Tue, Sep 9, 2008 at 3:37 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Wed, Sep 10, 2008 at 07:16:35AM +1200, Brent Wood wrote: >> Given the general increase in typical hardware specs these days, >> perhaps the default pg specs could be set for higher spec systems? > > Given the default shmem configuration on operating systems these days, > upping the default will likely cause postgresql to not run at all. And it wont change the results much either. Changing shared_buffers is very nuanced and can help or hurt performance, but it isn't tuning in the sense it's a level you can pull to make the database 'go faster' like magic. A lot of the obsessing about shared_buffers resolves around the fact that remarkably few people understand how memory works in modern operating systems. The 'big ticket' .conf items are those that affect syncing in write heavy enviroments (fsync, etc) and planner affecting settings (work_mem, effective_cache_size). That said, PostgreSQL requires very little tuning outside of the obvious tradeoffs between speed and safety. This is an ongoing process...in the old days I had to agonize about dealing with the stats collector...in modern terms there is much less to 'trade off' (hopefully less still with 8.4). merlin
On Tue, 2008-09-09 at 16:26 -0400, Tom Lane wrote: > That's probably not good because it *looks* like we support the syntax, > but in fact produce non-spec-compliant results. I think it might be > better if we threw an error. Definitely. If we accept SQL Standard syntax like this but then not do what we should, it is clearly an ERROR. Our reputation will be damaged if we don't, since people will think that we are blase about standards compliance and about query correctness. Please lets move swiftly to plug this hole, as if it were a data loss bug (it is, if it causes wrong answers to queries for unsuspecting users). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Tue, Sep 09, 2008 at 05:42:50PM -0400, Greg Smith wrote: > > While some of the MonetDB bashing in this thread was unwarranted, What bashing? I didn't see any bashing of them. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
Tom Lane wrote: >> interval '1' year. > > ...is SQL spec syntax, but it's not fully implemented in Postgres... > > Or someone could try to make it work, but given that no one has taken > the slightest interest since Tom Lockhart left the project, I wouldn't > hold my breath waiting for that. I have interest. For 5 years I've been maintaining a patch for a client that allows the input of ISO-8601 intervals (like 'P1YT1M') rather than the nonstandard shorthand ('1Y1M') that postgresql supports[1]. I'd be interested in working on this. Especially if supporting SQL standard interval syntax could improve the chances of getting my ISO-8601-interval-syntax replacing nonstandard-postgres-shorthand-intervals patch accepted again, I'd be quite happy work on it. Tom in 2003 said my code looked cleaner than the current code[2], and the patch was accepted[3] for a while before being rejected - I believe because Peter said he'd like to see the SQL standard intervals first. I see it's still a TODO, though. > the grammar supports it but the info doesn't get propagated to > interval_in, and interval_in wouldn't know what to do even if it did > have the information that there was a YEAR qualifier after the literal. Any hints on how best to propagate the needed info from the grammar? Or should it be obvious to me from reading the code? [1] http://archives.postgresql.org/pgsql-patches/2003-09/msg00119.php [2] http://archives.postgresql.org/pgsql-patches/2003-09/msg00121.php [3] http://archives.postgresql.org/pgsql-patches/2003-12/msg00253.php Ron Mayer (formerly ron@intervideo.com who posted those ISO-8601 interval patches)
Ron Mayer wrote: > Tom Lane wrote: >> Or someone could try to make it work, but given that no one has taken >> the slightest interest since Tom Lockhart left the project, I wouldn't >> hold my breath waiting for that. > > I have interest. For 5 years I've been maintaining a patch for a client Doh. Now that I catch up on emails I see Tom has a patch in a different thread. I'll follow up there...
Moving this thread to Performance alias as it might make more sense for folks searching on this topic: Greg Smith wrote: > On Tue, 9 Sep 2008, Amber wrote: > >> I read something from >> http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html >> saying that PostgreSQL can't give the correct result of the some >> TPC-H queries > > Jignesh Shah at Sun ran into that same problem. It's mentioned > briefly in his presentation at > http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_postgresql > on pages 26 and 27. 5 of the 22 reference TCP-H queries (4, 5, 6, 10, > 14) returned zero rows immediately for his tests. Looks like the > MonetDB crew is saying it does that on queries 4,5,6,10,12,14,15 and > that 20 takes too long to run to generate a result. Maybe 12/15/20 > were fixed by changes in 8.3, or perhaps there were subtle errors > there that Jignesh didn't catch--it's not like he did a formal > submission run, was just kicking the tires. I suspect the difference > on 20 was that his hardware and tuning was much better, so it probably > did execute fast enough. I redid a quick test with the same workload on one of my systems with SF 10 which is about 10GB (I hope it comes out properly displayed) Jignesh From Monet (8.3T/8.2.9) Q Time PG8.3.3 Time PG8.2.9 Ratio 1 429.01 510 0.84 2 3.65 54 0.07 3 33.49 798 0.04 4 6.53 Empty 35 (E) 0.19 5 8.45 Empty 5.5(E) 1.54 6 32.84 Empty 172 (E) 0.19 7 477.95 439 1.09 8 58.55 251 0.23 9 781.96 2240 0.35 10 9.03 Empty 6.1(E) 1.48 11 3.57 Empty 25 0.14 12 56.11 Empty 179 (E) 0.31 13 61.01 140 0.44 14 30.69 Empty 169 (E) 0.18 15 32.81 Empty 168 (E) 0.2 16 23.98 115 0.21 17 Did not finish Did not finish 18 58.93 882 0.07 19 71.55 218 0.33 20 Did not finish Did not finish 21 550.51 477 1.15 22 6.21 Did not finish All time is in seconds (sub seconds where availabe) Ratio > 1 means 8.3.3 is slower and <1 means 8.3.3 is faster My take on the results: * I had to tweak the statement of Q1 in order to execute it. (TPC-H kit does not directly support POSTGRESQL statements) * Timings with 8.3.3 and bit of tuning gives much better time overall This was expected (Some queries finish in 7% of the time than what MonetDB reported. From the queries that worked only Q7 & Q21 seem to have regressed) * However Empty rows results is occuring consistently (Infact Q11 also returned empty for me while it worked in their test) Queries: 4,5,6,10,11,12,14,15 (ACTION ITEM: I will start separate threads for each of those queries in HACKERS alias to figure out the problem since it looks like Functional problem to me and should be interesting to hackers alias) * Two queries 17,20 looks like will not finish (I let Q17 to run for 18 hrs and yet it had not completed. As for Q20 I killed it as it was approaching an hour.) (ACTION ITEM: Not sure whether debugging for these queries will go in hackers or perform alias but I will start a thread on them too.) * Looks like bit of tuning is required for Q1, Q7, Q9, Q21 to improve their overall time. Specially understanding if PostgreSQL is missing a more efficient plan for them. (ACTION ITEM: I will start separate threads on performance alias to dig into those queries) I hope to start separate threads for each queries so we can track them easier. I hope to provide explain analyze outputs for each one of them and lets see if there are any problems. Feedback welcome on what you want to see for each threads. Regards, Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah Sun Microsystems,Inc http://sun.com/postgresql