Home > mailing lists

Re: Talking about optimizer, my long dream - Mailing list pgsql-performance

From	Віталій Тимчишин
Subject	Re: Talking about optimizer, my long dream
Date	February 27, 2011 18:21:03
Msg-id	AANLkTin8z=cyBJJGa79wEXOFMZ7kYn7wxCF4ZsCkgpYw@mail.gmail.com Whole thread
In response to	Re: Talking about optimizer, my long dream (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Talking about optimizer, my long dream
List	pgsql-performance

Tree view

27 лютого 2011 р. 19:59 Robert Haas <robertmhaas@gmail.com> написав:

2011/2/4 Віталій Тимчишин <tivv00@gmail.com>:
> Hi, all.
> All this optimizer vs hint thread reminded me about crazy idea that got to
> my head some time ago.
> I currently has two problems with postgresql optimizer
> 1) Dictionary tables. Very usual thing is something like "select * from
> big_table where distionary_id = (select id from dictionary where
> name=value)". This works awful if dictionary_id distribution is not uniform.

Does it work better if you write it as a join?

SELECT b.* FROM big_table b, dictionary d WHERE b.dictionary_id = d.id
AND d.name = 'value'

I would like to see a concrete example of this not working well,
because I've been writing queries like this (with MANY tables) for
years and it's usually worked very well for me.

Here you are:

PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

create table a(dict int4, val int4);

create table b(dict int4, name text);

create index c on a(dict);

insert into b values (1, 'small'), (2, 'large');

insert into a values (1,1);

insert into a select 2,generate_series(1,10000);

analyze a;

analyze b;

test=# explain analyze select * from a where dict=1;

QUERY PLAN

-----------------------------------------------------------------------------------------------------

Index Scan using c on a (cost=0.00..8.27 rows=1 width=8) (actual time=0.014..0.016 rows=1 loops=1)

Index Cond: (dict = 1)

Total runtime: 0.041 ms

(3 rows)

test=# explain analyze select * from a where dict=2;

QUERY PLAN

-----------------------------------------------------------------------------------------------------

Seq Scan on a (cost=0.00..170.01 rows=10000 width=8) (actual time=0.014..6.876 rows=10000 loops=1)

Filter: (dict = 2)

Total runtime: 13.419 ms

(3 rows)

test=# explain analyze select * from a,b where a.dict=b.dict and b.name='small';

QUERY PLAN

-----------------------------------------------------------------------------------------------------------

Hash Join (cost=1.04..233.55 rows=5000 width=18) (actual time=0.047..13.159 rows=1 loops=1)

Hash Cond: (a.dict = b.dict)

-> Seq Scan on a (cost=0.00..145.01 rows=10001 width=8) (actual time=0.009..6.633 rows=10001 loops=1)

-> Hash (cost=1.02..1.02 rows=1 width=10) (actual time=0.011..0.011 rows=1 loops=1)

-> Seq Scan on b (cost=0.00..1.02 rows=1 width=10) (actual time=0.006..0.008 rows=1 loops=1)

Filter: (name = 'small'::text)

Total runtime: 13.197 ms

(7 rows)

test=# explain analyze select * from a,b where a.dict=b.dict and b.name='large';

QUERY PLAN

-----------------------------------------------------------------------------------------------------------

Hash Join (cost=1.04..233.55 rows=5000 width=18) (actual time=0.074..21.476 rows=10000 loops=1)

Hash Cond: (a.dict = b.dict)

-> Seq Scan on a (cost=0.00..145.01 rows=10001 width=8) (actual time=0.012..7.085 rows=10001 loops=1)

-> Hash (cost=1.02..1.02 rows=1 width=10) (actual time=0.021..0.021 rows=1 loops=1)

-> Seq Scan on b (cost=0.00..1.02 rows=1 width=10) (actual time=0.015..0.016 rows=1 loops=1)

Filter: (name = 'large'::text)

Total runtime: 28.293 ms

(7 rows)

It simply don't know that small=1 and large=2, so it never uses nested loop + iindex scan:

test=# set enable_hashjoin=false;

SET

test=# explain analyze select * from a,b where a.dict=b.dict and b.name='small';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..253.28 rows=5000 width=18) (actual time=0.041..0.047 rows=1 loops=1)

-> Seq Scan on b (cost=0.00..1.02 rows=1 width=10) (actual time=0.010..0.012 rows=1 loops=1)

Filter: (name = 'small'::text)

-> Index Scan using c on a (cost=0.00..189.75 rows=5000 width=8) (actual time=0.021..0.023 rows=1 loops=1)

Index Cond: (a.dict = b.dict)

Total runtime: 0.089 ms

(6 rows)

--
Best regards,
Vitalii Tymchyshyn

pgsql-performance by date:

From: Tom Lane
Date: 27 February 2011, 14:40:09
Subject: Re: Bad query plan when the wrong data type is used

From: Selva manickaraja
Date: 27 February 2011, 23:26:47
Subject: Performance Test for PostgreSQL9

Re: Talking about optimizer, my long dream - Mailing list pgsql-performance

Previous

Next