Home > mailing lists

Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join - Mailing list pgsql-performance

From	Ehrenreich, Sigrid
Subject	Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Date	October 26, 2020 18:58:05
Msg-id	AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com Whole thread Raw
Responses	Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join (Justin Pryzby <pryzby@telsasoft.com>) Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-performance

Tree view

Hi Performance Guys,

I hope you can help me. I am joining two tables, that have a foreign key relationship. So I expect the optimizer to estimate the number of the resulting rows to be the same as the number of the returned rows of one of the tables. But the estimate is way too low.

I have built a test case, where the problem is easily to be seen.

Testcase:
-- create a large table with one column with only 3 possible values, the other rows are only there to increase the selectivity

create table fact (low_card integer, anydata1 integer, anydata2 integer);

insert into fact (low_card, anydata1, anydata2) select floor(random()*3+1),floor(random()*1000+1),floor(random()*100+1) from generate_series(1,10000);

-- create a smaller table with only unique values to be referenced by foreign key

create table dim as (select distinct low_card, anydata1, anydata2 from fact);

create unique index on dim (low_card, anydata1, anydata2);

alter table fact add constraint fk foreign key (low_card, anydata1, anydata2) references dim (low_card, anydata1, anydata2);

analyze fact;

analyze dim;

And here comes the query:

explain analyze

select count(*) from fact inner join dim on (fact.low_card=dim.low_card and fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2)

where fact.low_card=1;

Aggregate (cost=424.11..424.12 rows=1 width=8) (actual time=7.899..7.903 rows=1 loops=1)

-> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1) <=========== With the FK, the estimation should be 3344, but it is 115 rows

Hash Cond: ((fact.anydata1 = dim.anydata1) AND (fact.anydata2 = dim.anydata2))

-> Seq Scan on fact (cost=0.00..180.00 rows=3344 width=12) (actual time=0.025..2.289 rows=3344 loops=1)

Filter: (low_card = 1)

Rows Removed by Filter: 6656

-> Hash (cost=176.89..176.89 rows=3292 width=12) (actual time=3.105..3.107 rows=3292 loops=1)

Buckets: 4096 Batches: 1 Memory Usage: 174kB

-> Seq Scan on dim (cost=0.00..176.89 rows=3292 width=12) (actual time=0.014..2.103 rows=3292 loops=1)

Filter: (low_card = 1)

Rows Removed by Filter: 6539

Planning Time: 0.619 ms

Execution Time: 7.973 ms

My problem is, that I am joining a lot more tables in reality and since the row estimates are so low, the optimizer goes for nested loops, leading to inacceptable execution times.

Question: How can I get the optimizer to use the information about the foreign key relationship and get accurate estimates?

Sigrid Ehrenreich

pgsql-performance by date:

From: Debajyoti Datta
Date: 25 October 2020, 01:33:08
Subject: Profiling tool for postgresql queries

From: Philip Semanchuk
Date: 26 October 2020, 19:50:38
Subject: Understanding bad estimate (related to FKs?)

Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join - Mailing list pgsql-performance

Previous

Next