Thread: Wrong estimate in query plan

Wrong estimate in query plan

From

Eelke Klein

Date:

24 October 2013, 09:42:24

In a complex query I have query I noticed that the planner made a bad estimate for a join between two tables in the query (I made sure statistics were up to date).

The join was on a single column. In the first table are about 985 rows. All rows except one have a NULL value in the join column the one value in that column is 1. The other table has 237240 rows and all rows have a 1 in the join column. This column cannot contain NULL values. There is a foreign key constraint between the join columns.

In pgAdmin I had a look at the statistices for the two columns.

Column in first table

Null Fraction 0.998985

Average Width 4

Distinct Values -1

Most Common Values

Most Common Frequencies

Histogram Bounds

Correlation

What I noticed is that are no most common values mentioned ofcourse the value 1 only occurs once in the column but as all other values are NULL you could argue it is a common value.

Column in second table:

Null Fraction 0

Average Width 4

Distinct Values 1

Most Common Values {1}

Most Common Frequencies {1}

Histogram Bounds

Correlation 1

Looks fine :)

Relevant part of AXPLAIN ANALYZE output

' -> Hash Join (cost=40.16..6471.62 rows=241 width=58) (actual time=0.486..102.979 rows=237240 loops=1)'

' Hash Cond: (kb.filiaal_id = fil.filiaal)'

' -> Seq Scan on kassabon kb (cost=0.00..5539.40 rows=237240 width=42) (actual time=0.036..28.562 rows=237240 loops=1)'

' -> Hash (cost=27.85..27.85 rows=985 width=20) (actual time=0.434..0.434 rows=1 loops=1)'

' Buckets: 1024 Batches: 1 Memory Usage: 1kB'

' -> Seq Scan on relatie fil (cost=0.00..27.85 rows=985 width=20) (actual time=0.003..0.382 rows=985 loops=1)'

Notice how it expects 240 rows but gets 237240.

I wondered if this should be reported as a bug? It goes wrong I think because of the statistics of the column in the first table give no information about the values present except the NULL values.

Another thought I had was that the planner could have known there was a matching row in the first table for each row in the second table because there is a foreign key constraint between the two.

Regards, Eelke

Re: Wrong estimate in query plan

From

David Johnston

Date:

24 October 2013, 17:29:55

Eelke Klein wrote
> What I noticed is that are no most common values mentioned ofcourse the
> value 1 only occurs once in the column but as all other values are NULL
> you
> could argue it is a common value.

A random sampling is unlikely to choose a record that only appears in 0.1
percent of the table.

Two sequential scans plus a hash seems like a good plan.

The smaller table is so small a sequential scan is fast. The larger table
experts to have all records read so it to should be scanned.  Combining with
a hash seems sound.  The fact the cross-column estimate is way off isn't
that big a deal though I'd be curious to hear Tom's opinion on why this is
so for educational purposes.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Wrong-estimate-in-query-plan-tp5775727p5775785.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.