Re: [PERFORM] Number of characters in column preventing index usage - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: [PERFORM] Number of characters in column preventing index usage
Date
Msg-id bdba910d-f3f5-7eaa-f66d-02ff29361cec@2ndquadrant.com
Whole thread Raw
In response to [PERFORM] Number of characters in column preventing index usage  (Hustler DBA <hustlerdba@gmail.com>)
Responses Re: [PERFORM] Number of characters in column preventing index usage
List pgsql-performance
Hi,

On 02/17/2017 11:19 PM, Hustler DBA wrote:
> I am seeing this strange behavior, I don't know if this is by design by
> Postgres.
>
> I have an index on a column which is defined as "character
> varying(255)". When the value I am searching for is of a certain length,
> the optimizer uses the index but when the value is long, the optimizer
> doesn't use the index but does a seq scan on the table. Is this by
> design? How can I make the optimizer use the index no matter what the
> size/length of the value being searched for?
>

AFAIK there are no such checks, i.e. the optimizer does not consider the
length of the value when deciding between scan types.

>
> PostgreSQL version: 9.4
>

That's good to know, but we also need information about the table
involved in your queries. I'd bet the table is tiny (it seems to be just
12 pages, so ~100kB), making the indexes rather useless.

> my_db=# explain (analyze, buffers) select count(*) from tab where ID =
> '01625cfa-2bf8-45cf' ;
>                                                               QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=8.29..8.30 rows=1 width=0) (actual time=0.048..0.048
> rows=1 loops=1)
>    Buffers: shared read=2
>    ->  Index Only Scan using tab_idx1 on tab  (cost=0.27..8.29 rows=1
> width=0) (actual time=0.043..0.043 rows=0 loops=1)
>          Index Cond: (ID = '01625cfa-2bf8-45cf'::text)
>          Heap Fetches: 0
>          Buffers: shared read=2
>  Planning time: 0.250 ms
>  Execution time: 0.096 ms
> (8 rows)
>
> my_db=# explain (analyze, buffers) select count(*) from tab where ID =
> '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' ;
>                                                 QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=14.80..14.81 rows=1 width=0) (actual time=0.115..0.115
> rows=1 loops=1)
>    Buffers: shared hit=12
>    ->  Seq Scan on tab  (cost=0.00..14.79 rows=5 width=0) (actual
> time=0.031..0.108 rows=5 loops=1)
>          Filter: ((ID)::text = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'::text)
>          Rows Removed by Filter: 218
>          Buffers: shared hit=12
>  Planning time: 0.122 ms
>  Execution time: 0.180 ms
> (8 rows)

The only difference I see is that for the long value the planner expects
5 rows, while for the short one it expects 1 row. That may seem a bit
strange, but I'd bet it finds the short value in some statistic (MCV,
histogram) ans so can provide very accurate estimate. While for the
longer one, it ends up using some default (0.5% for equality IIRC) or
value deduced from ndistinct. Or something like that.

The differences between the two plans are rather negligible, both in
terms of costs (8.3 vs. 14.81) and runtime (0.1 vs 0.2 ms). The choice
of a sequential scan seems perfectly reasonable for such tiny tables.

FWIW it's impossible to draw conclusions based on two EXPLAIN ANALYZE
executions. The timing instrumentation from EXPLAIN ANALYZE may have
significant impact impact (different for each plan!). You also need to
testing with more values and longer runs, not just a single execution
(there are caching effects etc.)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-performance by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: [PERFORM] Number of characters in column preventing index usage
Next
From: Alvaro Herrera
Date:
Subject: Re: [PERFORM] Number of characters in column preventing index usage