Query uses incorrect index - Mailing list pgsql-performance

From pasman pasmański
Subject Query uses incorrect index
Date
Msg-id AANLkTi=OeYtT=zEOP7aZg7vbvZD+so=Tk_W+UoOezJGt@mail.gmail.com
Whole thread Raw
Responses Re: Query uses incorrect index  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
hello.

I ve the table NumeryA with 3 indices. Query below uses incorrect index.


SELECT
  A."NKA",
  A."NTA",
  Min("PołączeniaMin") || ',' || Max("PołączeniaMax") AS "Biling",
  Sum("Ile")::text AS "Ilość CDR",
  R."LP"::text AS "Sprawa",
  R."Osoba weryfikująca" AS "Osoba",
  to_char(min("Wartość"),'FM9999990D00') AS "Wartość po kontroli",
  max(R."Kontrola po naprawie w Serat - CDR")::text AS "CDR po kontroli",
  min(A."KodBłędu")::text AS KodBłędu,
  Max(to_char(R."Data kontroli",'YYYY-MM-DD')) AS "Ostatnia Kontrola"
, max("Skutek wprowadzenia błednej ewidencji w Serat") as "Skutek"
FROM
  ONLY "NumeryA" A
LEFT JOIN
  (select * from "Rejestr stacji do naprawy" where "Data weryfikacji"
>= current_date-3*30) R
ON
  A."NKA" = R."Numer kierunkowy"
  and substr(A."NTA",1,5) = substr(R."Numer stacji",1,5)
  and A."NTA" like R."Numer stacji"
  and A."KodBłędu" = R."Kod Błędu"
WHERE
  A."DataPliku" >= current_date-3*30
  and A."KodBłędu" similar to '74'
GROUP
  BY R."Osoba weryfikująca",R."LP",A."NKA", A."NTA"
ORDER
  BY Sum("Ile") DESC
LIMIT 4000

This query has plan:

----------------------------------------------------------------
Limit  (cost=9656.43..9666.43 rows=4000 width=96) (actual
time=2149.383..2174.363 rows=4000 loops=1)
  ->  Sort  (cost=9656.43..9716.86 rows=24175 width=96) (actual
time=2149.373..2158.355 rows=4000 loops=1)
        Sort Key: (sum(a.Ile"))"
        Sort Method:  top-N heapsort  Memory: 1028kB
        ->  HashAggregate  (cost=6711.21..8089.19 rows=24175 width=96)
(actual time=2040.721..2110.075 rows=9080 loops=1)
              ->  Merge Left Join  (cost=5338.65..5925.53 rows=24175
width=96) (actual time=1180.490..1717.727 rows=33597 loops=1)
                    Merge Cond: (((a.NKA")::text = ("Rejestr stacji do
naprawy"."Numer kierunkowy")::text) AND ((substr((a."NTA")::text, 1,
5)) = (substr(("Rejestr stacji do naprawy"."Numer stacji")::text, 1,
5))) AND ((a."KodBłędu")::text = ("Rejestr stacji do naprawy"."Kod
Błędu")::text))"
                    Join Filter: ((a.NTA")::text ~~ ("Rejestr stacji
do naprawy"."Numer stacji")::text)"
                    ->  Sort  (cost=3565.16..3625.60 rows=24175
width=42) (actual time=819.034..900.141 rows=33597 loops=1)
                          Sort Key: a.NKA", (substr((a."NTA")::text,
1, 5)), a."KodBłędu""
                          Sort Method:  quicksort  Memory: 5487kB
                          ->  Index Scan using dp_kb on NumeryA" a
(cost=0.01..1805.07 rows=24175 width=42) (actual time=0.295..197.627
rows=33597 loops=1)"
                                Index Cond: (DataPliku" >=
(('now'::text)::date - 90))"
                                Filter: ((KodBłędu")::text ~
'***:^(?:74)$'::text)"
                    ->  Sort  (cost=1773.49..1811.23 rows=15096
width=67) (actual time=361.430..434.675 rows=32948 loops=1)
                          Sort Key: Rejestr stacji do naprawy"."Numer
kierunkowy", (substr(("Rejestr stacji do naprawy"."Numer
stacji")::text, 1, 5)), "Rejestr stacji do naprawy"."Kod Błędu""
                          Sort Method:  quicksort  Memory: 2234kB
                          ->  Bitmap Heap Scan on Rejestr stacji do
naprawy"  (cost=141.75..725.68 rows=15096 width=67) (actual
time=2.604..51.567 rows=14893 loops=1)"
                                Recheck Cond: (Data weryfikacji" >=
(('now'::text)::date - 90))"
                                ->  Bitmap Index Scan on Data
weryfikacji_Kod Błędu"  (cost=0.00..137.98 rows=15096 width=0) (actual
time=2.463..2.463 rows=15462 loops=1)"
                                      Index Cond: (Data weryfikacji"
>= (('now'::text)::date - 90))"
Total runtime: 2186.011 ms


When i delete index dp_kb, query runs faster:

-------------------------------------------------------------------------
Limit  (cost=15221.69..15231.69 rows=4000 width=96) (actual
time=1296.896..1322.144 rows=4000 loops=1)
  ->  Sort  (cost=15221.69..15282.13 rows=24175 width=96) (actual
time=1296.887..1305.993 rows=4000 loops=1)
        Sort Key: (sum(a.Ile"))"
        Sort Method:  top-N heapsort  Memory: 1028kB
        ->  HashAggregate  (cost=12276.48..13654.45 rows=24175
width=96) (actual time=1188.706..1257.669 rows=9080 loops=1)
              ->  Merge Left Join  (cost=0.01..11490.79 rows=24175
width=96) (actual time=0.220..840.102 rows=33597 loops=1)
                    Merge Cond: (((a.NKA")::text = ("Rejestr stacji do
naprawy"."Numer kierunkowy")::text) AND (substr((a."NTA")::text, 1, 5)
= substr(("Rejestr stacji do naprawy"."Numer stacji")::text, 1, 5))
AND ((a."KodBłędu")::text = ("Rejestr stacji do naprawy"."Kod
Błędu")::text))"
                    Join Filter: ((a.NTA")::text ~~ ("Rejestr stacji
do naprawy"."Numer stacji")::text)"
                    ->  Index Scan using NTA_5" on "NumeryA" a
(cost=0.01..10016.75 rows=24175 width=42) (actual time=0.132..308.018
rows=33597 loops=1)"
                          Index Cond: (((KodBłędu")::text =
'74'::text) AND ("DataPliku" >= (('now'::text)::date - 90)))"
                          Filter: ((KodBłędu")::text ~ '***:^(?:74)$'::text)"
                    ->  Index Scan using 3" on "Rejestr stacji do
naprawy"  (cost=0.01..1002.73 rows=15096 width=67) (actual
time=0.047..129.840 rows=32948 loops=1)"
                          Index Cond: (Rejestr stacji do
naprawy"."Data weryfikacji" >= (('now'::text)::date - 90))"
Total runtime: 1333.347 ms


How to tune settings to use good index ?
Include definitions of indexes:

CREATE TABLE "NumeryA"
(
  "Plik" character varying(254) NOT NULL,
  "DataPliku" date,
  "KodBłędu" character varying(254) NOT NULL,
  "NKA" character varying(254) NOT NULL,
  "NTA" character varying(254) NOT NULL,
  "Ile" integer,
  "PołączeniaMin" character varying,
  "PołączeniaMax" character varying,
  "Wycofane" "char",
  "Data" character varying[],
  "ID Kobat" character varying[],
  "NRB" character varying[],
  "LP" integer,
  CONSTRAINT "NumeryA_1_pkey" PRIMARY KEY ("NTA", "NKA", "KodBłędu", "Plik")
)
WITH (
  OIDS=FALSE
);


CREATE INDEX "NTA_5"
  ON "NumeryA"
  USING btree
  ("NKA", substr("NTA"::text, 1, 5), "KodBłędu", "DataPliku");

CREATE INDEX dp_kb
  ON "NumeryA"
  USING btree
  ("DataPliku");

CREATE INDEX nka_nta
  ON "NumeryA"
  USING btree
  ("NKA", "NTA");


Here my planner settings:

-----------------------------------------------------------
# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

seq_page_cost = 0.3            # measured on an arbitrary scale
random_page_cost = 0.5            # same scale as above
cpu_tuple_cost = 0.007            # same scale as above
#cpu_index_tuple_cost = 0.005        # same scale as above
#cpu_operator_cost = 0.0025        # same scale as above
#effective_cache_size = 128MB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5            # range 1-10
#geqo_pool_size = 0            # selects default based on effort
#geqo_generations = 0            # selects default based on effort
#geqo_selection_bias = 2.0        # range 1.5-2.0

# - Other Planner Options -

default_statistics_target = 25        # range 1-10000
constraint_exclusion = partition    # on, off, or partition
cursor_tuple_fraction = 0.05        # range 0.0-1.0
from_collapse_limit = 8
join_collapse_limit = 8            # 1 disables collapsing of explicit
                    # JOIN clauses



------------
pasman

pgsql-performance by date:

Previous
From: Mladen Gogala
Date:
Subject: Re: CPU bound
Next
From: "Kevin Grittner"
Date:
Subject: Re: Query uses incorrect index