Re: TPC-R benchmarks - Mailing list pgsql-performance

From Oleg Lebedev
Subject Re: TPC-R benchmarks
Date
Msg-id 993DBE5B4D02194382EC8DF8554A52731E782B@postoffice.waterford.org
Whole thread Raw
In response to TPC-R benchmarks  (Oleg Lebedev <oleg.lebedev@waterford.org>)
List pgsql-performance
It took 10 hours to compute the query without the index on
lineitem.l_partkey.
Once I created the index on lineitem.l_partkey, it took only 32 secs to
run the same query.
After VACUUM ANALYZE it took 72 secs to run the query.
All the subsequent runs took under 3 seconds!

That's quite amazing!

I just checked

-----Original Message-----
From: Mary Edie Meredith [mailto:maryedie@osdl.org]
Sent: Monday, September 29, 2003 10:04 AM
To: Oleg Lebedev
Cc: Tom Lane; Jenny Zhang; pgsql-performance
Subject: RE: [PERFORM] TPC-R benchmarks


On Mon, 2003-09-29 at 07:35, Oleg Lebedev wrote:
> I left my TPC-R query #17 working over the weekend and it took 3988
> mins ~ 10 hours to complete. And this is considering that I am using a

> TPC-R database created with a scale factor of 1, which corresponds to
> ~1 GB of data. I am running RedHat 8.0 on a dual 1 GHz processor, 512
> MB RAM.

Was this run with or without the l_partkey index that Jenny suggested?

>
> Here is an excerpt from my postgresql.conf file (the rest of the
> settings are commented out):
>
> #
> #    Shared Memory Size
> #
> shared_buffers = 16384        # 2*max_connections, min 16,
typically
> 8KB each
>
> #
> #    Non-shared Memory Sizes
> #
> sort_mem = 32768
>
> #
> #    Optimizer Parameters
> #
> effective_cache_size = 32000    # typically 8KB each
>
> Any suggestions on how to optimize these settings?
>
> I agree with Jenny that declaring additional indexes on the TPC-R
> tables may alter the validity of the benchmarks. Are there any
> official TPC benchmarks submitted by PostgreSQL?

Actually, for the TPC-R you _are allowed to declare additional indexes.
With TPC-H you are restricted to a specific set listed in the spec (an
index on l_partkey is allowed for both).

What you cannot do for either TPC-R or TPC-H is rewrite the SQL of the
query for the purposes of making the query run faster.

Sorry if I was unclear.

Valid TPC-R benchmark results are on the TPC web site:
http://www.tpc.org/tpcr/default.asp

I do not see one for PostgreSQL.


Regards,

Mary

--
Mary Edie Meredith <maryedie@osdl.org>
Open Source Development Lab

>
> Thanks.
>
> Oleg
>
> -----Original Message-----
> From: Mary Edie Meredith [mailto:maryedie@osdl.org]
> Sent: Friday, September 26, 2003 10:12 AM
> To: Tom Lane
> Cc: Oleg Lebedev; Jenny Zhang; pgsql-performance
> Subject: Re: [PERFORM] TPC-R benchmarks
>
>
> The TPC-H/R rules allow only minor changes to the SQL that are
> necessary due to SQL implementation differences. They do not allow
> changes made to improve performance.  It is their way to test
> optimizer's ability to recognize an inefficient SQL statement and do
> the rewrite.
>
> The rule makes sense for the TPC-H, which is supposed to represent
> ad-Hoc query.  One might argue that for TPC-R, which is suppose to
> represent "Reporting" with pre-knowledge of the query, that re-write
> should be allowed. However, that is currently not the case. Since the
> RDBMS's represented on the TPC council are competing with TPC-H, their

> optimizers already do the re-write, so (IMHO) there is no motivation
> to relax the rules for the TPC-R.
>
>
> On Thu, 2003-09-25 at 21:28, Tom Lane wrote:
> > Oleg Lebedev <oleg.lebedev@waterford.org> writes:
> > > Seems like in your case postgres uses an i_l_partkey index on
> > > lineitem table. I have a foreign key constraint defined between
the
> > > lineitem and part table, but didn't create an special indexes.
Here
> > > is my query plan:
> >
> > The planner is obviously unhappy with this plan (note the large cost
> > numbers), but it can't find a way to do better.  An index on
> > lineitem.l_partkey would help, I think.
> >
> > The whole query seems like it's written in a very inefficient
> > fashion;
>
> > couldn't the estimation of '0.2 * avg(l_quantity)' be amortized
> > across
>
> > multiple join rows?  But I dunno whether the TPC rules allow for
> > significant manual rewriting of the given query.
> >
> >             regards, tom lane
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that
> your
> >       message can get through to the mailing list cleanly

*************************************

This e-mail may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network.

*************************************

pgsql-performance by date:

Previous
From: Andrew Rawnsley
Date:
Subject: Re: Performance: BigInt vs Decimal(19,0)
Next
From: Oleg Lebedev
Date:
Subject: Re: TPC-R benchmarks