On Fri, Jun 15, 2012 at 3:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> However, as Ants points out, we could make it work better for the
> special case where we're not actually doing any aggregation, because
> in that case we can emit the row for each group when the group is
> created, rather than waiting until end-of-input. This is only going
> to help when there is a LIMIT, though. Moreover, if there happens to
> be an ORDER BY, then the data will have to be pre-sorted, in which
> case you may as well use a sorted aggregate. So the use case for this
> optimization is basically DISTINCT plus LIMIT but not ORDER BY.
Exactly. I think the first question for this patch should be whether
this use-case is worth the complexity of the patch. I can't imagine
any really compelling use cases that need an arbitrary distinct subset
of results. The original complaint on -performance [1], didn't specify
a real world use case, but it seemed to be a case of an ORM generating
suboptimal queries. On the other hand, the patch itself is in my
opinion rather simple, so it might be worth it.
It has one outstanding issue, query_planner chooses the cheapest path
based on total cost. This can be suboptimal when that path happens to
have high startup cost. It seems to me that enabling the query_planner
to find the cheapest unsorted path returning a limited amount of
tuples would require some major surgery to the planner. To be clear,
this is only a case of missed optimization, not a regression.
It won't help set returning functions because the tuplestore for those
is fully materialized when the first row is fetched.
[1] http://archives.postgresql.org/message-id/16737833.463.1332881676120.JavaMail.geo-discussion-forums%40pbcpw7
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de