Re: TPC-R benchmarks - Mailing list pgsql-performance
From | Oleg Lebedev |
---|---|
Subject | Re: TPC-R benchmarks |
Date | |
Msg-id | 993DBE5B4D02194382EC8DF8554A52731E783E@postoffice.waterford.org Whole thread Raw |
In response to | TPC-R benchmarks (Oleg Lebedev <oleg.lebedev@waterford.org>) |
Responses |
Re: TPC-R benchmarks
|
List | pgsql-performance |
Thanks everyone for the help. I have another question. How do I optimize my indexes for the query that contains a lot of ORed blocks, each of which contains a bunch of ANDed expressions? The structure of each ORed block is the same except the right-hand-side values vary. The first expression of each AND-block is a join condition. However, postgres tries to use a sequential scan on both of the tables applying the OR-ed blocks of ANDed expressions. So, the cost of the plan is around 700,000,000,000. Here is an example: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 4 and l_quantity <= 4 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#22' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 18 and l_quantity <= 18 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#33' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Thursday, October 02, 2003 1:44 PM To: Oleg Lebedev Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: RE: [PERFORM] TPC-R benchmarks On Thu, 2 Oct 2003, Oleg Lebedev wrote: > I was trying to get the pg_stats information to Josh and decided to > recreate the indexes on all my tables. After that I ran vacuum full > analyze, re-enabled nestloop and ran explain analyze on the query. It > ran in about 2 minutes. I attached the new query plan. I am not sure > what did the trick, but 2 minutes is much better than 2 hours. But > then again, I can't take long lunches anymore :) > Is there any way to make this query run even faster without increasing > the memory dedicated to postgres? > Thanks. As long as the estimated row counts and real ones match up, and postgresql seems to be picking the right plan, there's probably not a lot to be done. You might want to look at increasing sort_mem a bit, but don't go crazy, as being too high can result in swap storms under load, which are a very bad thing. I'd check for index growth. You may have been reloading your data over and over and had an index growth problem. Next time instead of recreating the indexed completely, you might wanna try reindex indexname. Also, 7.4 mostly fixes the index growth issue, especially as it applies to truncating/reloading a table over and over, so moving to 7.4 beta3/4 and testing might be a good idea (if you aren't there already). What you want to avoid is having postgresql switch back to that nestloop join on you in the middle of the day, and to prevent that you might need to have higher statistics targets so the planner gets the right number all the time. ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
pgsql-performance by date: