Optimize sort before groupping - Mailing list pgsql-general

From pasman pasmański
Subject Optimize sort before groupping
Date
Msg-id CAOWY8=Z6ho+jLi2QgVfBDz8MjCmfrGOtJ3JLkYixK4NZT7ebkw@mail.gmail.com
Whole thread Raw
List pgsql-general
Hi.

Sometimes order of rows readed from index allows to skip sort node.
But sometimes planner don't deduce it. In example below order from index
"NumerStacji_NumerKierunkowy_KodBłędu_LP"
is preserved in merge join and can be used in groupping node, but planner
don't see it.

First query and explain:

SELECT
  "NKA","NTA",
  count(nullif(b."Pierwszy zwrot"=b."DataPliku",false)) as "CDR",
  min(b."Data") || ',' || max(b."Data") as "Biling",
  b."KodBłędu",
  sum((b."Pierwszy zwrot"=b."DataPliku" and not b."Darmowe")::integer)
as "Odpłatne",
  max(r."LP")
FROM
  "Bladpol2" b left join "Rejestr stacji do naprawy" r
  on (
    b."NTA" = r."Numer stacji"
    and b."NKA" = r."Numer kierunkowy"
    and b."KodBłędu" = r."Kod Błędu"
    and replace(b."Data",':','.')::cube && r."Zakres"
  )
WHERE
  b."KodBłędu" similar to '74|80|81' and r."Wartość" is null
GROUP BY
  b."NTA",b."NKA",b."KodBłędu",r."LP"
HAVING
  not bool_and(b."Darmowe")
ORDER BY
  max(b."Data") desc
LIMIT 4000;

QUERY PLAN
Limit  (cost=191422.67..191432.67 rows=4000 width=42) (actual
time=57136.554..57161.084 rows=4000 loops=1)
  ->  Sort  (cost=191422.67..192855.26 rows=573034 width=42) (actual
time=57136.546..57145.420 rows=4000 loops=1)
"        Sort Key: (max((b.""Data"")::text))"
        Sort Method: top-N heapsort  Memory: 660kB
        ->  GroupAggregate  (cost=122756.63..154273.50 rows=573034
width=42) (actual time=49821.500..56670.665 rows=64064 loops=1)
"              Filter: (NOT bool_and(b.""Darmowe""))"
              ->  Sort  (cost=122756.63..124189.21 rows=573034
width=42) (actual time=49821.318..51902.438 rows=865978 loops=1)
"                    Sort Key: b.""NTA"", b.""NKA"", b.""KodBłędu"", r.""LP"""
                    Sort Method: external sort  Disk: 42824kB
                    ->  Merge Left Join  (cost=19.16..60017.63
rows=573034 width=42) (actual time=0.337..26655.744 rows=865978
loops=1)
"                          Merge Cond: (((b.""NTA"")::text =
(r.""Numer stacji"")::text) AND ((b.""NKA"")::text = (r.""Numer
kierunkowy"")::text) AND ((b.""KodBłędu"")::text = (r.""Kod
Błędu"")::text))"
"                          Join Filter: ((replace((b.""Data"")::text,
':'::text, '.'::text))::cube && r.""Zakres"")"
"                          Filter: (r.""Wartość"" IS NULL)"
"                          ->  Index Scan using
""Bladpol2_nta_nka_kod_błędu_btree"" on ""Bladpol2"" b
(cost=0.00..46593.65 rows=1452312 width=38) (actual
time=0.152..8513.305 rows=1439383 loops=1)"
"                                Filter: ((""KodBłędu"")::text ~
'^(?:74|80|81)$'::text)"
"                          ->  Index Scan using
""NumerStacji_NumerKierunkowy_KodBłędu_LP"" on ""Rejestr stacji do
naprawy"" r  (cost=0.00..1405.83 rows=70476 width=47) (actual
time=0.014..3146.595 rows=1045687 loops=1)"
Total runtime: 57183.476 ms




In second query planner skip sort before groupping:

SELECT
  "NKA","NTA",
  count(nullif(b."Pierwszy zwrot"=b."DataPliku",false)) as "CDR",
  min(b."Data") || ',' || max(b."Data") as "Biling",
  b."KodBłędu",
  sum((b."Pierwszy zwrot"=b."DataPliku" and not b."Darmowe")::integer)
as "Odpłatne",
  max(r."LP")
FROM
  "Bladpol2" b left join "Rejestr stacji do naprawy" r
  on (
    b."NTA" = r."Numer stacji"
    and b."NKA" = r."Numer kierunkowy"
    and b."KodBłędu" = r."Kod Błędu"
    and replace(b."Data",':','.')::cube && r."Zakres"
  )
WHERE
  b."KodBłędu" similar to '74|80|81' and r."Wartość" is null
GROUP BY
  b."NTA",b."NKA",b."KodBłędu"--,r."LP"
HAVING
  not bool_and(b."Darmowe")
ORDER BY
  max(b."Data") desc
LIMIT 4000;

QUERY PLAN
Limit  (cost=91667.54..91677.54 rows=4000 width=42) (actual
time=32004.992..32029.539 rows=4000 loops=1)
  ->  Sort  (cost=91667.54..92030.62 rows=145232 width=42) (actual
time=32004.983..32013.844 rows=4000 loops=1)
"        Sort Key: (max((b.""Data"")::text))"
        Sort Method: top-N heapsort  Memory: 660kB
        ->  GroupAggregate  (cost=19.16..82252.30 rows=145232
width=42) (actual time=1.954..31534.246 rows=63759 loops=1)
"              Filter: (NOT bool_and(b.""Darmowe""))"
              ->  Merge Left Join  (cost=19.16..60017.63 rows=573034
width=42) (actual time=0.339..26669.766 rows=865978 loops=1)
"                    Merge Cond: (((b.""NTA"")::text = (r.""Numer
stacji"")::text) AND ((b.""NKA"")::text = (r.""Numer
kierunkowy"")::text) AND ((b.""KodBłędu"")::text = (r.""Kod
Błędu"")::text))"
"                    Join Filter: ((replace((b.""Data"")::text,
':'::text, '.'::text))::cube && r.""Zakres"")"
"                    Filter: (r.""Wartość"" IS NULL)"
"                    ->  Index Scan using
""Bladpol2_nta_nka_kod_błędu_btree"" on ""Bladpol2"" b
(cost=0.00..46593.65 rows=1452312 width=38) (actual
time=0.145..8622.003 rows=1439383 loops=1)"
"                          Filter: ((""KodBłędu"")::text ~
'^(?:74|80|81)$'::text)"
"                    ->  Index Scan using
""NumerStacji_NumerKierunkowy_KodBłędu_LP"" on ""Rejestr stacji do
naprawy"" r  (cost=0.00..1405.83 rows=70476 width=47) (actual
time=0.014..3113.398 rows=1045687 loops=1)"
Total runtime: 32045.317 ms





--
------------
pasman

pgsql-general by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: Unable to execute \copy from Client Application
Next
From: Jan Otto
Date:
Subject: Re: Dump functions alone