Re: distinct estimate of a hard-coded VALUES list - Mailing list pgsql-hackers

From Robert Haas
Subject Re: distinct estimate of a hard-coded VALUES list
Date
Msg-id CA+TgmoZVPBfUgcjwnC+HM+dAM6CcEw4vUL3x-EwqnY-oTa_69g@mail.gmail.com
Whole thread Raw
In response to Re: distinct estimate of a hard-coded VALUES list  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: distinct estimate of a hard-coded VALUES list  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: distinct estimate of a hard-coded VALUES list  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: distinct estimate of a hard-coded VALUES list  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> It does know it, what it doesn't know is how many duplicates there are.
>
>> Does it know whether the count comes from a parsed query-string list/array,
>> rather than being an estimate from something else?  If it came from a join,
>> I can see why it would be dangerous to assume they are mostly distinct.
>> But if someone throws 6000 things into a query string and only 200 distinct
>> values among them, they have no one to blame but themselves when it makes
>> bad choices off of that.
>
> I am not exactly sold on this assumption that applications have
> de-duplicated the contents of a VALUES or IN list.  They haven't been
> asked to do that in the past, so why do you think they are doing it?

It's hard to know, but my intuition is that most people would
deduplicate.  I mean, nobody is going to want to their query generator
to send X IN (1, 1, <repeat a zillion more times>) to the server if it
could have just sent X IN (1).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Proposal for CSN based snapshots