Re: Benchmark Data requested - Mailing list pgsql-performance
From | Jignesh K. Shah |
---|---|
Subject | Re: Benchmark Data requested |
Date | |
Msg-id | 47A792BE.80804@sun.com Whole thread Raw |
In response to | Benchmark Data requested (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Benchmark Data requested
Re: Benchmark Data requested Re: Benchmark Data requested |
List | pgsql-performance |
Hi Simon, I have some insight into TPC-H on how it works. First of all I think it is a violation of TPC rules to publish numbers without auditing them first. So even if I do the test to show the better performance of PostgreSQL 8.3, I cannot post it here or any public forum without doing going through the "process". (Even though it is partial benchmark as they are just doing the equivalent of the PowerRun of TPCH) Maybe the PR of PostgreSQL team should email info@tpc.org about them and see what they have to say about that comparison. On the technical side: Remember all TPC-H queries when run sequentially on PostgreSQL uses only 1 core or virtual CPU so it is a very bad for system to use it with PostgreSQL (same for MySQL too). Also very important unless you are running the UPDATE FUNCTIONS which are separate queries, all these Q1-Q22 Queries are pure "READ-ONLY" queries. Traditionally I think PostgreSQL does lack "READ-SPEED"s specially since it is bottlenecked by the size of the reads it does (BLOCKSIZE). Major database provides multi-block parameters to do multiple of reads/writes in terms of blocksizes to reduce IOPS and also for read only they also have READ-AHEAD or prefetch sizes which is generally bigger than multi-block or extent sizes to aid reads. Scale factor is in terms of gigs and hence using max scale of 5 (5G) is pretty useless since most of the rows could be cached in modern day systems. And comparing with 0.01 is what 10MB? Size of recent L2 cache of Intel is probably bigger than that size. If you are doing tuning for TPC-H Queries focus on few of them: For example Query 1 is very Join intensive and if your CPU is not 100% used then you have a problem in your IO to solve before tuning it. Another example is Query 16 is literally IO scan speed, many people use it to see if the database can scan at "line speeds" of the storage, ending up with 100% CPU means the database cannot process that many rows (just to bring it in). In essence each query does some combination of system features to highlight the performance. However since it is an old benchmark, database companies end up "re-engineering" their technologies to gain advantage in this benchmark (Hence its time for a successor in work called TPC-DS which will have more than 100 such queries) Few of the technologies that have really helped gain ground in TPC-H world * Hash and/or Range Partitioning of tables ( PostgreSQL 8.3 can do that but the setup cost of writing schema is great specially since data has to be loaded in separate tables) * Automated Aggregated Views - used by optmiziers - database technology to update more frequently used aggregations in a smaller views * Cube views Index - like bitmap but multidimensional (I think ..but not sure) That said, is it useful to be used in "Regression testing in PostgreSQL farms. I would think yes.. specially Q16 Hope this helps. Regards, Jignesh Simon Riggs wrote: > Can I ask for some help with benchmarking? > > There are some results here that show PostgreSQL is slower in some cases > than Monet and MySQL. Of course these results were published immediately > prior to 8.2 being released, plus run out-of-the-box, so without even > basic performance tuning. > > Would anybody like to repeat these tests with the latest production > versions of these databases (i.e. with PGSQL 8.3), and with some > sensible tuning settings for the hardware used? It will be useful to get > some blind tests with more sensible settings. > > http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ > > Multiple runs from different people/different hardware is useful since > they help to iron-out differences in hardware and test methodology. So > don't worry if you see somebody else doing this also. > > Thanks, > >
pgsql-performance by date: