Re: Selectivity for lopsided foreign key columns - Mailing list pgsql-performance

From Mikkel Lauritsen
Subject Re: Selectivity for lopsided foreign key columns
Date
Msg-id e904acb8fb9dcefe076bd68ab7547763@webmail.tala.dk
Whole thread Raw
In response to Re: Selectivity for lopsided foreign key columns  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 2015-12-17 16:23, Tom Lane wrote:
> Mikkel Lauritsen <renard@tala.dk> writes:
>> The schema contains two tables, t1 and t2.
>> t2 has two fields, an id and a tag, and it contains 146 rows that are
>> unique.
>> t1 has two fields, a value and a foreign key referring to t2.id, and
>> it
>> contains 266177 rows.
>> The application retrieves the rows in t1 that match a specific tag in
>> t2, and it turned out that the contents of t1 were distributed in a
>> very
>> lopsided way, where more than 90% of the rows refer to one of two tags
>> from t2:
>> ...
>> The estimate for the number of rows in the result (1824) is way too
>> low,
>> and that leads to bad plans and queries involving more joins on the
>> tables that run about 1000x slower than they should.
>
>> I have currently rewritten the application code to do two queries; one
>> to retrieve the id from t2 that matches the given tag and one to
>> retrieve the rows from t1, and that's a usable workaround but not
>> something we really like doing as a permanent solution. Fiddling with
>> the various statistics related knobs seems to make no difference, but
>> is
>> there be some other way I can make Postgres assume high selectivity
>> for
>> certain tag values? Am I just SOL with the given schema?
>
> You're pretty much SOL.  Lacking cross-column statistics, the planner
> has
> no idea which t2.id goes with the given tag, so it can't see that the
> selected id is the one that is most common in t1.  You're getting a
> join size estimate that is basically size of t1 divided by number of
> possible values (146), which is about the best we can do without
> knowing
> which id is selected.

--- snip --

Thanks - I thought as much, but it's really nice to have it confirmed
from
people who are way more knowledgeable.

Best regards and thanks again,
   Mikkel Lauritsen


pgsql-performance by date:

Previous
From: Adam Brusselback
Date:
Subject: Re: Terrible plan choice for view with distinct on clause
Next
From: Mathieu VINCENT
Date:
Subject: Re: Estimation row error