Re: Benchmark Data requested - Mailing list pgsql-performance

From Jignesh K. Shah
Subject Re: Benchmark Data requested
Date
Msg-id 47A792BE.80804@sun.com
Whole thread Raw
In response to Benchmark Data requested  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Benchmark Data requested
Re: Benchmark Data requested
Re: Benchmark Data requested
List pgsql-performance
Hi Simon,

I have some insight into TPC-H on how it works.

First of all I think it is a violation of TPC rules to publish numbers
without auditing them first.  So even if I do the test to show the
better performance of PostgreSQL 8.3, I cannot post it here or any
public forum without doing going through the "process". (Even though it
is partial benchmark as they are just doing the equivalent of the
PowerRun of TPCH) Maybe the PR of PostgreSQL team should email
info@tpc.org about them and see what they have to say about that comparison.

On the technical side:

Remember all TPC-H queries when run sequentially on PostgreSQL uses only
1 core or virtual CPU so it is a very bad for system to use it with
PostgreSQL (same for MySQL too).

Also very important unless you are running the UPDATE FUNCTIONS which
are separate queries, all these Q1-Q22 Queries are pure "READ-ONLY"
queries. Traditionally I think PostgreSQL does lack "READ-SPEED"s
specially since it is bottlenecked by the size of the reads it does
(BLOCKSIZE). Major database provides multi-block parameters to do
multiple of reads/writes in terms of blocksizes to reduce IOPS and also
for read only they also have READ-AHEAD or prefetch sizes which is
generally bigger than multi-block or extent sizes to aid reads.

Scale factor is in terms of gigs and hence using max scale of 5 (5G) is
pretty useless since most of the rows could be cached in modern day
systems. And comparing with 0.01 is what 10MB? Size of recent L2 cache
of Intel is probably bigger than that size.

If you are doing tuning for TPC-H Queries  focus on few of them:
For example Query 1 is very Join intensive  and if your CPU is not 100%
used then you have a problem in your IO to solve before tuning it.

Another example is Query 16 is literally IO scan speed, many people use
it to see if the database can scan at "line speeds" of the storage,
ending up with 100% CPU means the database cannot process that many rows
(just to bring it in).

In essence  each query does some combination of system features to
highlight the performance. However since it is an old benchmark,
database companies end up "re-engineering" their technologies to gain
advantage in this benchmark (Hence its time for a successor in work
called TPC-DS which will have more than 100 such queries)

Few of the technologies that have really helped gain ground in TPC-H world
* Hash and/or Range Partitioning of tables  ( PostgreSQL 8.3 can do that
but the setup cost of writing schema is great specially since data has
to be loaded in separate tables)
* Automated Aggregated Views  - used by optmiziers - database technology
to update more frequently used aggregations in a smaller views
* Cube views Index - like bitmap but multidimensional (I think ..but not
sure)

That said, is it useful to be used in "Regression testing in PostgreSQL
farms.  I would think yes.. specially Q16

Hope this helps.
Regards,
Jignesh




Simon Riggs wrote:
> Can I ask for some help with benchmarking?
>
> There are some results here that show PostgreSQL is slower in some cases
> than Monet and MySQL. Of course these results were published immediately
> prior to 8.2 being released, plus run out-of-the-box, so without even
> basic performance tuning.
>
> Would anybody like to repeat these tests with the latest production
> versions of these databases (i.e. with PGSQL 8.3), and with some
> sensible tuning settings for the hardware used? It will be useful to get
> some blind tests with more sensible settings.
>
> http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/
>
> Multiple runs from different people/different hardware is useful since
> they help to iron-out differences in hardware and test methodology. So
> don't worry if you see somebody else doing this also.
>
> Thanks,
>
>

pgsql-performance by date:

Previous
From: Luke Lonergan
Date:
Subject: Re: Benchmark Data requested
Next
From: Greg Smith
Date:
Subject: Re: Benchmark Data requested