Home > mailing lists

Thread: Query that took a lot of time in Postgresql when not using trim in order by

Query that took a lot of time in Postgresql when not using trim in order by

From

Blas Pico

Date:

25 November 2015, 14:15:43

I have a query that produce a different query plan if I put a trim in one of the columns in the order by.

When i put the trim in any column it use hashaggregate and took 3 seconds against 30 when not.

Is wear because the columns is clean not need to be trimmed, I have check that.

The problem is that I can't change the query because is generate by the mondrian.

I do research and found in postgres list that I need to crank work_mem up high but don't work for me.

My postgresql.conf

# Add settings for extensions here

default_statistics_target = 50 # pgtune wizard 2014-06-04

maintenance_work_mem = 1GB # pgtune wizard 2014-06-04

constraint_exclusion = on # pgtune wizard 2014-06-04

checkpoint_completion_target = 0.9 # pgtune wizard 2014-06-04

effective_cache_size = 44GB # pgtune wizard 2014-06-04

work_mem = 1536MB # pgtune wizard 2014-06-04

#work_mem = 16GB # I have try this but don't work

wal_buffers = 32MB # pgtune wizard 2014-06-04

checkpoint_segments = 16 # pgtune wizard 2014-06-04

shared_buffers = 15GB # pgtune wizard 2014-06-04

max_connections = 20 # pgtune wizard 2014-06-04

___________________________________________________

Query with trim

SELECT "dim_cliente"."tipocliente" AS "c0",

"dim_cliente"."a1_ibge" AS "c1",

"dim_cliente"."a1_cod" AS "c2",

"dim_cliente"."a1_nome" AS "c3",

"dim_vendedor"."a3_nome" AS "c4"

FROM "public"."dim_cliente" AS "dim_cliente",

"public"."fato_ventas_productos" AS "fato_ventas_productos",

"public"."dim_vendedor" AS "dim_vendedor"

WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"

AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"

GROUP

BY "dim_cliente"."tipocliente" ,

"dim_cliente"."a1_ibge",

"dim_cliente"."a1_cod",

"dim_cliente"."a1_nome",

"dim_vendedor"."a3_nome"

ORDER

BY trim("dim_cliente"."tipocliente") ASC NULLS LAST,

"dim_cliente"."a1_ibge" ASC NULLS LAST, -- the same result if I put the trim here

"dim_cliente"."a1_cod" ASC NULLS LAST, -- or here

"dim_cliente"."a1_nome" ASC NULLS LAST; -- or here

-- this query took 3845.895 ms

___________________________________________________

Query Plan when using trim

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))

Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome

Sort Method: quicksort Memory: 13121kB

-> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)

-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)

-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=28.746..1183.691 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor

Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)

-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..699.779 rows=907357 loops=1)

Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor

-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=28.699..28.699 rows=16890 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

Buckets: 2048 Batches: 1 Memory Usage: 1980kB

-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1)

Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

Buckets: 1024 Batches: 1 Memory Usage: 56kB

-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1)

Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

Total runtime: 3845.895 ms

(25 filas)

___________________________________________________

Query without trim

SELECT "dim_cliente"."tipocliente" AS "c0",

"dim_cliente"."a1_ibge" AS "c1",

"dim_cliente"."a1_cod" AS "c2",

"dim_cliente"."a1_nome" AS "c3",

"dim_vendedor"."a3_nome" AS "c4"

FROM "public"."dim_cliente" AS "dim_cliente",

"public"."fato_ventas_productos" AS "fato_ventas_productos",

"public"."dim_vendedor" AS "dim_vendedor"

WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"

AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"

GROUP

BY "dim_cliente"."tipocliente" ,

"dim_cliente"."a1_ibge",

"dim_cliente"."a1_cod",

"dim_cliente"."a1_nome",

"dim_vendedor"."a3_nome"

ORDER

BY "dim_cliente"."tipocliente" ASC NULLS LAST,

"dim_cliente"."a1_ibge" ASC NULLS LAST,

"dim_cliente"."a1_cod" ASC NULLS LAST,

"dim_cliente"."a1_nome" ASC NULLS LAST;

-- this query took 37249.268 ms

___________________________________________________

Query Plan when not using trim

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------

Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

-> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

Sort Method: quicksort Memory: 265592kB

-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)

-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor

Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)

-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1)

Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor

-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

Buckets: 2048 Batches: 1 Memory Usage: 1980kB

-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1)

Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

Buckets: 1024 Batches: 1 Memory Usage: 56kB

-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1)

Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

Total runtime: 37249.268 ms

(25 filas)

___________________________________________________

Is anything that I can do to solve this problem, is that a bug or a config problem?

Here the link with a dump of the tables

https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing

I appreciate your help

Re: Query that took a lot of time in Postgresql when not using trim in order by

From

Alex Ignatov

Date:

25 November 2015, 16:22:35

On 25.11.2015 17:15, Blas Pico wrote:

I have a query that produce a different query plan if I put a trim in one of the columns in the order by.

When i put the trim in any column it use hashaggregate and took 3 seconds against 30 when not.

Is wear because the columns is clean not need to be trimmed, I have check that.

The problem is that I can't change the query because is generate by the mondrian.

I do research and found in postgres list that I need to crank work_mem up high but don't work for me.

My postgresql.conf

# Add settings for extensions here

default_statistics_target = 50 # pgtune wizard 2014-06-04

maintenance_work_mem = 1GB # pgtune wizard 2014-06-04

constraint_exclusion = on # pgtune wizard 2014-06-04

checkpoint_completion_target = 0.9 # pgtune wizard 2014-06-04

effective_cache_size = 44GB # pgtune wizard 2014-06-04

work_mem = 1536MB # pgtune wizard 2014-06-04

#work_mem = 16GB # I have try this but don't work

wal_buffers = 32MB # pgtune wizard 2014-06-04

checkpoint_segments = 16 # pgtune wizard 2014-06-04

shared_buffers = 15GB # pgtune wizard 2014-06-04

max_connections = 20 # pgtune wizard 2014-06-04

___________________________________________________

Query with trim

SELECT "dim_cliente"."tipocliente" AS "c0",

"dim_cliente"."a1_ibge" AS "c1",

"dim_cliente"."a1_cod" AS "c2",

"dim_cliente"."a1_nome" AS "c3",

"dim_vendedor"."a3_nome" AS "c4"

FROM "public"."dim_cliente" AS "dim_cliente",

"public"."fato_ventas_productos" AS "fato_ventas_productos",

"public"."dim_vendedor" AS "dim_vendedor"

WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"

AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"

GROUP

BY "dim_cliente"."tipocliente" ,

"dim_cliente"."a1_ibge",

"dim_cliente"."a1_cod",

"dim_cliente"."a1_nome",

"dim_vendedor"."a3_nome"

ORDER

BY trim("dim_cliente"."tipocliente") ASC NULLS LAST,

"dim_cliente"."a1_ibge" ASC NULLS LAST, -- the same result if I put the trim here

"dim_cliente"."a1_cod" ASC NULLS LAST, -- or here

"dim_cliente"."a1_nome" ASC NULLS LAST; -- or here

-- this query took 3845.895 ms

___________________________________________________

Query Plan when using trim

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))

Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome

Sort Method: quicksort Memory: 13121kB

-> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)

-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)

-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=28.746..1183.691 rows=907357 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor

Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)

-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..699.779 rows=907357 loops=1)

Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor

-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=28.699..28.699 rows=16890 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

Buckets: 2048 Batches: 1 Memory Usage: 1980kB

-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1)

Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1)

Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

Buckets: 1024 Batches: 1 Memory Usage: 56kB

-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1)

Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

Total runtime: 3845.895 ms

(25 filas)

___________________________________________________

Query without trim
SELECT "dim_cliente"."tipocliente" AS "c0",
"dim_cliente"."a1_ibge" AS "c1",
"dim_cliente"."a1_cod" AS "c2",
"dim_cliente"."a1_nome" AS "c3",
"dim_vendedor"."a3_nome" AS "c4"
FROM "public"."dim_cliente" AS "dim_cliente",
"public"."fato_ventas_productos" AS "fato_ventas_productos",
"public"."dim_vendedor" AS "dim_vendedor"
WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"
AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"
GROUP
BY "dim_cliente"."tipocliente" ,
"dim_cliente"."a1_ibge",
"dim_cliente"."a1_cod",
"dim_cliente"."a1_nome",
"dim_vendedor"."a3_nome"
ORDER
BY "dim_cliente"."tipocliente" ASC NULLS LAST,
"dim_cliente"."a1_ibge" ASC NULLS LAST,
"dim_cliente"."a1_cod" ASC NULLS LAST,
"dim_cliente"."a1_nome" ASC NULLS LAST;
-- this query took 37249.268 ms

___________________________________________________

Query Plan when not using trim
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
-> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Method: quicksort Memory: 265592kB
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)
-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor
Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)
-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1)
Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor
-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
Buckets: 2048 Batches: 1 Memory Usage: 1980kB
-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Buckets: 1024 Batches: 1 Memory Usage: 56kB
-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Total runtime: 37249.268 ms
(25 filas)

___________________________________________________

Is anything that I can do to solve this problem, is that a bug or a config problem?

Here the link with a dump of the tables
https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing

I appreciate your help

Hello!
What is your Postgres version?
Do you have correct statistics on this tables?
Please show yours execution plans with buffers i.e. explain (analyze,buffers) ...

-- 
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Query that took a lot of time in Postgresql when not using trim in order by

From

Evgeniy Shishkin

Date:

25 November 2015, 16:35:26

> What is your Postgres version?
> Do you have correct statistics on this tables?
> Please show  yours execution plans with buffers i.e. explain (analyze,buffers) ...
>


Fast:

 Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome,
(btrim((dim_cliente.tipocliente)::text))
   Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
   Sort Method: quicksort  Memory: 13121kB
   ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615
loops=1)
         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) 
         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357
loops=1)


Slow:

 Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
   ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
         Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
         Sort Method: quicksort  Memory: 265592kB
         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357
loops=1)


The difference is in the top of plans.
As we see, hashjoin time is practically the same.
But fast plan uses hashagg first and only 43k rows require sorting.
Slow plan dominated by sorting 900k rows.

I wonder if increasing cpu_tuple_cost will help.
As cost difference between two plans is negligible now.

Re: Query that took a lot of time in Postgresql when not using trim in order by

From

Blas Pico

Date:

25 November 2015, 18:01:14

My database version is 9.3 but I have test with 9.4 too with the same result, and I have test changing that parameter without success.

I want to know what does have to do the trim with the different query plans?

2015-11-25 13:35 GMT-03:00 Evgeniy Shishkin <itparanoia@gmail.com>:

> What is your Postgres version?
> Do you have correct statistics on this tables?
> Please show yours execution plans with buffers i.e. explain (analyze,buffers) ...
>

Fast:

Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
Sort Method: quicksort Memory: 13121kB
-> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)

Slow:

Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
-> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Method: quicksort Memory: 265592kB
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)

The difference is in the top of plans.
As we see, hashjoin time is practically the same.
But fast plan uses hashagg first and only 43k rows require sorting.
Slow plan dominated by sorting 900k rows.

I wonder if increasing cpu_tuple_cost will help.
As cost difference between two plans is negligible now.

Re: Query that took a lot of time in Postgresql when not using trim in order by

From

"Peter J. Holzer"

Date:

29 November 2015, 13:23:32

On 2015-11-25 19:35:15 +0300, Evgeniy Shishkin wrote:
> Fast:
>
>  Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,(btrim((dim_cliente.tipocliente)::text)) 
>    Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
>    Sort Method: quicksort  Memory: 13121kB
>    ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615
loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) 
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357
loops=1)
>
>
> Slow:
>
>  Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>    ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>          Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>          Sort Method: quicksort  Memory: 265592kB
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357
loops=1)
>
>
> The difference is in the top of plans.
> As we see, hashjoin time is practically the same.
> But fast plan uses hashagg first and only 43k rows require sorting.
> Slow plan dominated by sorting 900k rows.
>
> I wonder if increasing cpu_tuple_cost will help.
> As cost difference between two plans is negligible now.

Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
in-memory sort of 900k rows seems slow to me. I tested this on my PC at
home (1.8 GHz Core2 Dual, so a rather old and slow box) and I could sort
1E6 rows of 128 random bytes in 5.6 seconds. Even if I kept the first 96
bytes constant (so only the last 32 were random), it took only 21
seconds. Either this CPU is really slow or the data is heavily skewed -
is it possible that all dimensions except dim_vendedor.a3_nome have only
one or very few values? In that case changing the sort order might help.

    hp


--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

signature.asc

Re: Query that took a lot of time in Postgresql when not using trim in order by

From

Tom Lane

Date:

29 November 2015, 17:58:45

"Peter J. Holzer" <hjp-pgsql@hjp.at> writes:
> Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
> in-memory sort of 900k rows seems slow to me.

I'm wondering if it's textual data in some locale whose strcoll() behavior
is exceptionally slow.

            regards, tom lane