Re: Query that took a lot of time in Postgresql when not using trim in order by - Mailing list pgsql-performance

From Peter J. Holzer
Subject Re: Query that took a lot of time in Postgresql when not using trim in order by
Date
Msg-id 20151129132323.GA29654@hjp.at
Whole thread Raw
In response to Re: Query that took a lot of time in Postgresql when not using trim in order by  (Evgeniy Shishkin <itparanoia@gmail.com>)
Responses Re: Query that took a lot of time in Postgresql when not using trim in order by
List pgsql-performance
On 2015-11-25 19:35:15 +0300, Evgeniy Shishkin wrote:
> Fast:
>
>  Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,(btrim((dim_cliente.tipocliente)::text)) 
>    Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
>    Sort Method: quicksort  Memory: 13121kB
>    ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615
loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) 
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357
loops=1)
>
>
> Slow:
>
>  Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>    ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>          Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>          Sort Method: quicksort  Memory: 265592kB
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357
loops=1)
>
>
> The difference is in the top of plans.
> As we see, hashjoin time is practically the same.
> But fast plan uses hashagg first and only 43k rows require sorting.
> Slow plan dominated by sorting 900k rows.
>
> I wonder if increasing cpu_tuple_cost will help.
> As cost difference between two plans is negligible now.

Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
in-memory sort of 900k rows seems slow to me. I tested this on my PC at
home (1.8 GHz Core2 Dual, so a rather old and slow box) and I could sort
1E6 rows of 128 random bytes in 5.6 seconds. Even if I kept the first 96
bytes constant (so only the last 32 were random), it took only 21
seconds. Either this CPU is really slow or the data is heavily skewed -
is it possible that all dimensions except dim_vendedor.a3_nome have only
one or very few values? In that case changing the sort order might help.

    hp


--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Index scan cost calculation
Next
From: Tom Lane
Date:
Subject: Re: Query that took a lot of time in Postgresql when not using trim in order by