Optimizer's issue - Mailing list pgsql-performance

From Vlad Arkhipov
Subject Optimizer's issue
Date
Msg-id 480FDF0E.3000909@dc.baikal.ru
Whole thread Raw
Responses Re: Optimizer's issue
Re: Optimizer's issue
List pgsql-performance
I found strange issue in very simple query. Statistics for all columns
is on the level 1000 but I also tried other levels.

create table g (
  id bigint primary key,
  isgroup boolean not null);

create table a (
  groupid bigint references g(id),
  id bigint,
  unique(id, groupid));

analyze g;
analyze a;

select count(*) from a
294

select count(*) from g
320

explain analyze
select *
from g
  join a on a.groupid = g.id
where g.isgroup

Hash Join  (cost=5.35..11.50 rows=11 width=25) (actual time=0.261..1.755
rows=294 loops=1)
  Hash Cond: (a.groupid = g.id)
  ->  Seq Scan on a  (cost=0.00..4.94 rows=294 width=16) (actual
time=0.047..0.482 rows=294 loops=1)
  ->  Hash  (cost=5.20..5.20 rows=12 width=9) (actual time=0.164..0.164
rows=12 loops=1)
        ->  Seq Scan on g  (cost=0.00..5.20 rows=12 width=9) (actual
time=0.042..0.136 rows=12 loops=1)
              Filter: isgroup
Total runtime: 2.225 ms

And this is more interesting:
explain analyze
select *
from g
  join a on a.groupid = g.id
where not g.isgroup

Hash Join  (cost=9.05..17.92 rows=283 width=25) (actual
time=2.038..2.038 rows=0 loops=1)
  Hash Cond: (a.groupid = g.id)
  ->  Seq Scan on a  (cost=0.00..4.94 rows=294 width=16) (actual
time=0.046..0.478 rows=294 loops=1)
  ->  Hash  (cost=5.20..5.20 rows=308 width=9) (actual time=1.090..1.090
rows=308 loops=1)
        ->  Seq Scan on g  (cost=0.00..5.20 rows=308 width=9) (actual
time=0.038..0.557 rows=308 loops=1)
              Filter: (NOT isgroup)
Total runtime: 2.126 ms

PostgreSQL 8.3
These queries are part of big query and optimizer put them on the leaf
of query tree, so rows miscount causes a real problem.

Statistics for table a:
id
--
histogram_bounds: {1,40,73,111,143,174,204,484,683,715,753}
correlation: 0.796828

groupid
-------
n_distinct: 12
most_common_vals: {96,98,21,82,114,131,48,44,173,682,752}
most_common_freqs:
{0.265306,0.166667,0.163265,0.136054,0.0884354,0.0782313,0.0714286,0.00680272,0.00680272,0.00680272,0.00680272}
correlation: 0.366704

for table g:
id
--
histogram_bounds: {1,32,64,101,134,166,199,451,677,714,753}
correlation: 1

isgroup
-------
n_distinct: 2
most_common_freqs: {0.9625,0.0375}
correlation: 0.904198



pgsql-performance by date:

Previous
From: "Jeffrey Baker"
Date:
Subject: Re: SELECT 'DBD::Pg ping test'
Next
From: "Albe Laurenz"
Date:
Subject: Re: Optimizer's issue