Re: LIMIT causes SEQSCAN in subselect - Mailing list pgsql-performance

From Tom Lane
Subject Re: LIMIT causes SEQSCAN in subselect
Date
Msg-id 17528.1102788383@sss.pgh.pa.us
Whole thread Raw
In response to Re: LIMIT causes SEQSCAN in subselect  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
> The fact that the estimator knows that the LIMIT is pointless because there
> are less rows in the subselect than the LIMIT will return is not something we
> want to count on; sometimes the estimator has innaccurate information.

However, when the estimator is basing that estimate on the existence of
a unique index for the column, the estimate could be trusted.  There are
a couple of reasons that we don't perform that optimization at present,
though:

1. If the finished query plan doesn't actually *use* the index in
question, then dropping the index would not directly invalidate the
query plan, but nonetheless the query would be broken.  You could
subsequently get silently-wrong answers.

2. For the particular point at hand, there's an implementation problem,
which is that decisions about whether to flatten subqueries are taken
before we do any rowcount estimation.  So even if we discarded the LIMIT
clause once we realized it was redundant, it'd be too late to get the
optimal overall plan.

Point #1 is something I would like to fix whenever we get around to
implementing proper invalidation of cached plans.  There would need to
be a way to list "indirect" as well as direct dependencies of a plan.

            regards, tom lane

pgsql-performance by date:

Previous
From: Alvaro Nunes Melo
Date:
Subject: Very different index usage on similar tables
Next
From: tomas@nocrew.org (Tomas Skäre)
Date:
Subject: Re: [GENERAL] Query is not using index when it should