Re: Hash join on int takes 8..114 seconds - Mailing list pgsql-performance
From | Andrus |
---|---|
Subject | Re: Hash join on int takes 8..114 seconds |
Date | |
Msg-id | E3AFB54F617D47F4BE652E85ECC67153@andrusnotebook Whole thread Raw |
In response to | Re: Hash join on int takes 8..114 seconds (PFC <lists@peufeu.com>) |
Responses |
Re: Hash join on int takes 8..114 seconds
|
List | pgsql-performance |
> You could perhaps run a little check on the performance of the RAID, is > it better than linux software RAID ? > Does it leverage NCQ appropriately when running queries in parallel ? I was told that this RAID is software RAID. I have no experience what to check. This HP server was installed 3 years ago and in this time it was not high perfomance server. >>> explain analyze >>> SELECT sum(1) >>> FROM dok JOIN rid USING (dokumnr) >>> JOIN toode USING (toode) >>> LEFT JOIN artliik using(grupp,liik) >>> WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > > By the way, note that the presence of the toode table in the query above > is not required at all, unless you use columns of toode in your > aggregates. In real query, SELECT column list contains data form sales table dok (sale date and time) and sales detail table rid (quantity, price) WHERE clause may contain additional filters from product table (product category, supplier). > Let's play with that, after all, it's friday night. Thank you very much for great sample. I tried to create testcase from this to match production db: 1.2 million orders 3.5 million order details 13400 products with char(20) as primary keys containing ean-13 codes mostly 3 last year data every order has usually 1..3 detail lines same product can appear multiple times in order products are queried by start of code This sample does not distribute products randomly between orders. How to change this so that every order contains 3 (or 1..6 ) random products? I tried to use random row sample from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i but in this case constant product is returned always. It seems than query containing randon() is executed only once. Andrus. begin; CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL); CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name char(70) NOT NULL, quantity numeric(12,2) default 1); CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id CHAR(20),padding1 char(70), id serial, price numeric(12,2) default 1 ); INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20), 'product number ' || n::TEXT FROM generate_series(0,13410) AS n; INSERT INTO orders SELECT n,'2005-01-01'::date + (4000.0 * n/3500000.0 * '1 DAY'::interval) FROM generate_series(0,3500000/3) AS n; SET work_mem TO 2097151; -- 1048576; INSERT INTO orders_products SELECT generate_series/3 as order_id, ( ((generate_series/3500000.0)*13410.0)::int*power( 10,13))::INT8::CHAR(20) FROM generate_series(1,3500000) where generate_series/3>0; ALTER TABLE orders ADD PRIMARY KEY (order_id); ALTER TABLE products ADD PRIMARY KEY (product_id); ALTER TABLE orders_products ADD PRIMARY KEY (id); ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES products(product_id); ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE; CREATE INDEX orders_date ON orders( order_date ); CREATE INDEX order_product_pattern_idx ON orders_products( product_id bpchar_pattern_ops ); COMMIT; SET work_mem TO DEFAULT; ANALYZE; SELECT sum(quantity*price) FROM orders JOIN orders_products USING (order_id) JOIN products USING (product_id) WHERE orders.order_date>='2008-01-17' and orders_products.product_id like '130%'
pgsql-performance by date: