Re: [PERFORM] Number of characters in column preventing index usage - Mailing list pgsql-performance

From Hustler DBA
Subject Re: [PERFORM] Number of characters in column preventing index usage
Date
Msg-id CAM00CHFeSv-01S31WKNTkeyNJFf7buZMgrFqnU-4mbmdBKZbjQ@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] Number of characters in column preventing index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Thanks you guys are correct... the size of the table caused the optimizer to do a seq scan instead of using the index. I tried it on a  24 MB and 1 GB table and the expected index was used. 



On Fri, Feb 17, 2017 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Fri, Feb 17, 2017 at 3:49 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
> wrote:
>> That may seem a bit strange, but I'd bet it finds the short value in some
>> statistic (MCV, histogram) ans so can provide very accurate estimate.

> ​I'm not seeing how any of the statistic columns would capture a value that
> doesn't actually appear in the table...(actual ... row=0)​

I think it's the other way around.  It found
'01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' in the stats, concluded
(accurately) that there would be five matches, and on the strength of that
decided that a seqscan over this very tiny table would be faster than an
indexscan.  In the other case, the short string exists neither in the
table nor the stats, and the default estimate is turning out to be that
there's a single match, for which it likes the indexscan solution.  This
is all pretty unsurprising if '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'
is in the most-common-values list.  Anything that's *not* in that list
is going to get a smaller rowcount estimate.  (I don't think that the
string length, per se, has anything to do with it.)

I'm not sure what performance problem the OP was looking to solve,
but expecting experiments on toy-sized tables to give the same plans
as you get on large tables is a standard mistake when learning to work
with the PG planner.

Also, if toy-sized tables are all you've got, meaning the whole database
can be expected to stay RAM-resident at all times, it'd be a good idea
to reduce random_page_cost to reflect that.  The default planner cost
settings are meant for data that's mostly on spinning rust.

                        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PERFORM] Number of characters in column preventing index usage
Next
From: Mike Beaton
Date:
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres