Re: Hash join on int takes 8..114 seconds - Mailing list pgsql-performance

From Andrus
Subject Re: Hash join on int takes 8..114 seconds
Date
Msg-id E3AFB54F617D47F4BE652E85ECC67153@andrusnotebook
Whole thread Raw
In response to Re: Hash join on int takes 8..114 seconds  (PFC <lists@peufeu.com>)
Responses Re: Hash join on int takes 8..114 seconds
List pgsql-performance
> You could perhaps run a little check on the performance of the RAID, is
> it better than linux software RAID ?
> Does it leverage NCQ appropriately when running queries in parallel ?

I was told that this RAID is software RAID.
I have no experience what to check.
This HP server was installed 3 years ago and in this time it was not
high perfomance server.

>>> explain analyze
>>> SELECT sum(1)
>>>   FROM dok JOIN rid USING (dokumnr)
>>>   JOIN toode USING (toode)
>>>   LEFT JOIN artliik using(grupp,liik)
>>>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
>
> By the way, note that the presence of the toode table in the query above
> is not required at all, unless you use columns of toode in your
> aggregates.

In real query, SELECT column list contains data form sales table dok (sale
date and time)
and sales detail table rid (quantity, price)
WHERE clause may contain additional filters from product table (product
category, supplier).

> Let's play with that, after all, it's friday night.

Thank you very much for great sample.
I tried to create testcase from this to match production db:

1.2 million orders
3.5 million order details
13400 products with char(20) as primary keys containing ean-13 codes mostly
3 last year data
every order has usually 1..3 detail lines
same product can appear multiple times in order
products are queried by start of code

This sample does not distribute products randomly between orders.
How to change this so that every order contains 3 (or 1..6 ) random
products?
I tried to use random row sample from
 http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i

but in this case constant product is returned always. It seems than query
containing randon() is executed only once.


Andrus.

begin;
CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT
NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id
CHAR(20),padding1 char(70),
  id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
       'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/3500000.0 * '1 DAY'::interval)
FROM generate_series(0,3500000/3) AS n;

SET work_mem TO 2097151; -- 1048576;

INSERT INTO orders_products SELECT
   generate_series/3 as  order_id,
   ( ((generate_series/3500000.0)*13410.0)::int*power(
10,13))::INT8::CHAR(20)
FROM generate_series(1,3500000)
where generate_series/3>0;

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES
orders(order_id) ON DELETE CASCADE;

CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id
bpchar_pattern_ops );

COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

  SELECT sum(quantity*price)
 FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>='2008-01-17'
and orders_products.product_id like '130%'


pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Perc 3 DC
Next
From: "Andrus"
Date:
Subject: seq scan over 3.3 million rows instead of single key index access