Re: plan problem - Mailing list pgsql-performance

From Tom Lane
Subject Re: plan problem
Date
Msg-id 26675.1081529435@sss.pgh.pa.us
Whole thread Raw
In response to Re: plan problem  (Ken Geis <kgeis@speakeasy.org>)
List pgsql-performance
Ken Geis <kgeis@speakeasy.org> writes:
> Does anyone think that the planner issue has merit to address?  Can
> someone help me figure out what code I would look at?

The planner doesn't currently attempt to "drill down" into a sub-select-
in-FROM to find statistics about the variables emitted by the sub-select.
So it's just falling back to a default estimate of the number of
distinct values coming out of the sub-select.

The "drilling down" part is not hard; the difficulty comes from trying
to figure out whether and how the stats from the underlying column would
need to be adjusted for the behavior of the sub-select itself.  As an
example, the result of (SELECT DISTINCT foo FROM bar) would usually have
much different stats from the raw bar.foo column.  In your example, the
LIMIT clause potentially affects the stats by reducing the number of
distinct values.

Now in most situations where the sub-select wouldn't change the stats,
there's no issue anyway because the planner will flatten the sub-select
into the main query.  So we really have to figure out the adjustment
part before we can think about doing much here.

            regards, tom lane

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: [ADMIN] Raw devices vs. Filesystems
Next
From: "Patrick Hatcher"
Date:
Subject: Upgrading question (recycled transaction log)