Re: TPC-R benchmarks - Mailing list pgsql-performance

From Oleg Lebedev
Subject Re: TPC-R benchmarks
Date
Msg-id 993DBE5B4D02194382EC8DF8554A52731E783E@postoffice.waterford.org
Whole thread Raw
In response to TPC-R benchmarks  (Oleg Lebedev <oleg.lebedev@waterford.org>)
Responses Re: TPC-R benchmarks
List pgsql-performance
Thanks everyone for the help.

I have another question. How do I optimize my indexes for the query that
contains a lot of ORed blocks, each of which contains a bunch of ANDed
expressions? The structure of each ORed block is the same except the
right-hand-side values vary.
The first expression of each AND-block is a join condition. However,
postgres tries to use a sequential scan on both of the tables applying
the OR-ed blocks of ANDed expressions. So, the cost of the plan is
around 700,000,000,000.

Here is an example:
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#24'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM
PKG')
        and l_quantity >= 4 and l_quantity <= 4 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#22'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG',
'MED PACK')
        and l_quantity >= 18 and l_quantity <= 18 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#33'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG
PKG')
        and l_quantity >= 24 and l_quantity <= 24 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

-----Original Message-----
From: scott.marlowe [mailto:scott.marlowe@ihs.com]
Sent: Thursday, October 02, 2003 1:44 PM
To: Oleg Lebedev
Cc: Josh Berkus; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] TPC-R benchmarks


On Thu, 2 Oct 2003, Oleg Lebedev wrote:

> I was trying to get the pg_stats information to Josh and decided to
> recreate the indexes on all my tables. After that I ran vacuum full
> analyze, re-enabled nestloop and ran explain analyze on the query. It
> ran in about 2 minutes. I attached the new query plan. I am not sure
> what did the trick, but 2 minutes is much better than 2 hours. But
> then again, I can't take long lunches anymore :)
> Is there any way to make this query run even faster without increasing
> the memory dedicated to postgres?
> Thanks.

As long as the estimated row counts and real ones match up, and
postgresql
seems to be picking the right plan, there's probably not a lot to be
done.
You might want to look at increasing sort_mem a bit, but don't go crazy,

as being too high can result in swap storms under load, which are a very

bad thing.

I'd check for index growth.  You may have been reloading your data over
and over and had an index growth problem.  Next time instead of
recreating
the indexed completely, you might wanna try reindex indexname.

Also, 7.4 mostly fixes the index growth issue, especially as it applies
to
truncating/reloading a table over and over, so moving to 7.4 beta3/4 and

testing might be a good idea (if you aren't there already).

What you want to avoid is having postgresql switch back to that nestloop

join on you in the middle of the day, and to prevent that you might need

to have higher statistics targets so the planner gets the right number
all the time.

*************************************

This e-mail may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network.

*************************************

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: count(*) slow on large tables
Next
From: Dror Matalon
Date:
Subject: Re: count(*) slow on large tables