On Tue, Aug 23, 2016 at 5:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 08/22/2016 07:42 PM, Alvaro Herrera wrote: >> Also, if we patch it this way and somebody has a slow query because of a >> lot of duplicate values, it's easy to solve the problem by >> de-duplicating. But with the current code, people that have the >> opposite problem has no way to work around it.
> I certainly agree it's better when a smart user can fix his query plan > by deduplicating the values than when we end up generating a poor plan > due to assuming some users are somewhat dumb.
Well, that seems to be the majority opinion, so attached is a patch to do it. Do we want to sneak this into 9.6, or wait? > I wonder how expensive would it be to actually count the number of > distinct values - there certainly are complex data types where the > comparisons are fairly expensive, but I would not expect those to be > used in explicit VALUES lists.
You'd have to sort the values before de-duping, and deal with VALUES expressions that aren't simple literals. And you'd have to do it a lot --- by my count, get_variable_numdistinct is invoked six times on the Var representing the VALUES output in Jeff's example query. Maybe you could cache the results somewhere, but that adds even more complication. Given the lack of prior complaints about this, it's hard to believe it's worth the trouble.
This patch still applies, and I think the argument for it is still valid. So I'm going to make a commit-fest entry for it. Is there additional evidence we should gather?