Thread: Query that took a lot of time in Postgresql when not using trim in order by

From:
Blas Pico
Date:

I have a query that produce a different query plan if I put a trim in one of the columns in the order by.


When i put the trim in any column it use hashaggregate and took 3 seconds against 30 when not.


Is wear because the columns is clean not need to be trimmed, I have check that.


The problem is that I can't change the query because is generate by the mondrian.


I do research and found in postgres list that I need to crank work_mem up high but don't work for me.



My postgresql.conf


# Add settings for extensions here


default_statistics_target = 50 # pgtune wizard 2014-06-04


maintenance_work_mem = 1GB # pgtune wizard 2014-06-04


constraint_exclusion = on # pgtune wizard 2014-06-04


checkpoint_completion_target = 0.9 # pgtune wizard 2014-06-04


effective_cache_size = 44GB # pgtune wizard 2014-06-04


work_mem = 1536MB # pgtune wizard 2014-06-04


#work_mem = 16GB # I have try this but don't work


wal_buffers = 32MB # pgtune wizard 2014-06-04


checkpoint_segments = 16 # pgtune wizard 2014-06-04


shared_buffers = 15GB # pgtune wizard 2014-06-04


max_connections = 20 # pgtune wizard 2014-06-04




___________________________________________________




Query with trim


SELECT "dim_cliente"."tipocliente" AS "c0", 


       "dim_cliente"."a1_ibge" AS "c1", 


       "dim_cliente"."a1_cod" AS "c2", 


       "dim_cliente"."a1_nome" AS "c3", 


       "dim_vendedor"."a3_nome" AS "c4" 


  FROM "public"."dim_cliente" AS "dim_cliente", 


       "public"."fato_ventas_productos" AS "fato_ventas_productos", 


       "public"."dim_vendedor" AS "dim_vendedor" 


 WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente" 


   AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor" 


 GROUP 


    BY "dim_cliente"."tipocliente" , 


       "dim_cliente"."a1_ibge", 


       "dim_cliente"."a1_cod", 


       "dim_cliente"."a1_nome", 


       "dim_vendedor"."a3_nome" 


 ORDER 


    BY trim("dim_cliente"."tipocliente") ASC NULLS LAST, 


       "dim_cliente"."a1_ibge" ASC NULLS LAST, -- the same result if I put the trim here


       "dim_cliente"."a1_cod" ASC NULLS LAST, -- or here


       "dim_cliente"."a1_nome" ASC NULLS LAST; -- or here


-- this query took 3845.895 ms




___________________________________________________




Query Plan when using trim


                                                                             QUERY PLAN                                                                              


---------------------------------------------------------------------------------------------------------------------------------------------------------------------


 Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)


   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))


   Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome


   Sort Method: quicksort  Memory: 13121kB


   ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)


         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)


         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)


               Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome


               Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)


               ->  Hash Join  (cost=830.02..68126.13 rows=907357 width=86) (actual time=28.746..1183.691 rows=907357 loops=1)


                     Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor


                     Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)


                     ->  Seq Scan on public.fato_ventas_productos  (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..699.779 rows=907357 loops=1)


                           Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor


                     ->  Hash  (cost=618.90..618.90 rows=16890 width=86) (actual time=28.699..28.699 rows=16890 loops=1)


                           Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente


                           Buckets: 2048  Batches: 1  Memory Usage: 1980kB


                           ->  Seq Scan on public.dim_cliente  (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1)


                                 Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente


               ->  Hash  (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1)


                     Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor


                     Buckets: 1024  Batches: 1  Memory Usage: 56kB


                     ->  Seq Scan on public.dim_vendedor  (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1)


                           Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor


 Total runtime: 3845.895 ms


(25 filas)



___________________________________________________


Query without trim

SELECT "dim_cliente"."tipocliente" AS "c0", 

       "dim_cliente"."a1_ibge" AS "c1", 

       "dim_cliente"."a1_cod" AS "c2", 

       "dim_cliente"."a1_nome" AS "c3", 

       "dim_vendedor"."a3_nome" AS "c4" 

  FROM "public"."dim_cliente" AS "dim_cliente", 

       "public"."fato_ventas_productos" AS "fato_ventas_productos", 

       "public"."dim_vendedor" AS "dim_vendedor" 

 WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente" 

   AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor" 

 GROUP 

    BY "dim_cliente"."tipocliente" , 

       "dim_cliente"."a1_ibge", 

       "dim_cliente"."a1_cod", 

       "dim_cliente"."a1_nome", 

       "dim_vendedor"."a3_nome" 

 ORDER 

    BY "dim_cliente"."tipocliente" ASC NULLS LAST, 

       "dim_cliente"."a1_ibge" ASC NULLS LAST, 

       "dim_cliente"."a1_cod" ASC NULLS LAST, 

       "dim_cliente"."a1_nome" ASC NULLS LAST;

-- this query took 37249.268 ms


___________________________________________________


Query Plan when not using trim

                                                                          QUERY PLAN                                                                           

---------------------------------------------------------------------------------------------------------------------------------------------------------------

 Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)

   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

   ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)

         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

         Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

         Sort Method: quicksort  Memory: 265592kB

         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)

               Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

               Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)

               ->  Hash Join  (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1)

                     Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor

                     Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)

                     ->  Seq Scan on public.fato_ventas_productos  (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1)

                           Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor

                     ->  Hash  (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1)

                           Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

                           Buckets: 2048  Batches: 1  Memory Usage: 1980kB

                           ->  Seq Scan on public.dim_cliente  (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1)

                                 Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

               ->  Hash  (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1)

                     Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

                     Buckets: 1024  Batches: 1  Memory Usage: 56kB

                     ->  Seq Scan on public.dim_vendedor  (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1)

                           Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

 Total runtime: 37249.268 ms

(25 filas)


___________________________________________________


Is anything that I can do to solve this problem, is that a bug or a config problem?


Here the link with a dump of the tables 

https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing


I appreciate your help

From:
Alex Ignatov
Date:

On 25.11.2015 17:15, Blas Pico wrote:

I have a query that produce a different query plan if I put a trim in one of the columns in the order by.


When i put the trim in any column it use hashaggregate and took 3 seconds against 30 when not.


Is wear because the columns is clean not need to be trimmed, I have check that.


The problem is that I can't change the query because is generate by the mondrian.


I do research and found in postgres list that I need to crank work_mem up high but don't work for me.



My postgresql.conf


# Add settings for extensions here


default_statistics_target = 50 # pgtune wizard 2014-06-04


maintenance_work_mem = 1GB # pgtune wizard 2014-06-04


constraint_exclusion = on # pgtune wizard 2014-06-04


checkpoint_completion_target = 0.9 # pgtune wizard 2014-06-04


effective_cache_size = 44GB # pgtune wizard 2014-06-04


work_mem = 1536MB # pgtune wizard 2014-06-04


#work_mem = 16GB # I have try this but don't work


wal_buffers = 32MB # pgtune wizard 2014-06-04


checkpoint_segments = 16 # pgtune wizard 2014-06-04


shared_buffers = 15GB # pgtune wizard 2014-06-04


max_connections = 20 # pgtune wizard 2014-06-04




___________________________________________________




Query with trim


SELECT "dim_cliente"."tipocliente" AS "c0", 


       "dim_cliente"."a1_ibge" AS "c1", 


       "dim_cliente"."a1_cod" AS "c2", 


       "dim_cliente"."a1_nome" AS "c3", 


       "dim_vendedor"."a3_nome" AS "c4" 


  FROM "public"."dim_cliente" AS "dim_cliente", 


       "public"."fato_ventas_productos" AS "fato_ventas_productos", 


       "public"."dim_vendedor" AS "dim_vendedor" 


 WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente" 


   AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor" 


 GROUP 


    BY "dim_cliente"."tipocliente" , 


       "dim_cliente"."a1_ibge", 


       "dim_cliente"."a1_cod", 


       "dim_cliente"."a1_nome", 


       "dim_vendedor"."a3_nome" 


 ORDER 


    BY trim("dim_cliente"."tipocliente") ASC NULLS LAST, 


       "dim_cliente"."a1_ibge" ASC NULLS LAST, -- the same result if I put the trim here


       "dim_cliente"."a1_cod" ASC NULLS LAST, -- or here


       "dim_cliente"."a1_nome" ASC NULLS LAST; -- or here


-- this query took 3845.895 ms




___________________________________________________




Query Plan when using trim


                                                                             QUERY PLAN                                                                              


---------------------------------------------------------------------------------------------------------------------------------------------------------------------


 Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)


   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))


   Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome


   Sort Method: quicksort  Memory: 13121kB


   ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)


         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)


         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)


               Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome


               Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)


               ->  Hash Join  (cost=830.02..68126.13 rows=907357 width=86) (actual time=28.746..1183.691 rows=907357 loops=1)


                     Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor


                     Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)


                     ->  Seq Scan on public.fato_ventas_productos  (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..699.779 rows=907357 loops=1)


                           Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor


                     ->  Hash  (cost=618.90..618.90 rows=16890 width=86) (actual time=28.699..28.699 rows=16890 loops=1)


                           Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente


                           Buckets: 2048  Batches: 1  Memory Usage: 1980kB


                           ->  Seq Scan on public.dim_cliente  (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1)


                                 Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente


               ->  Hash  (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1)


                     Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor


                     Buckets: 1024  Batches: 1  Memory Usage: 56kB


                     ->  Seq Scan on public.dim_vendedor  (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1)


                           Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor


 Total runtime: 3845.895 ms


(25 filas)



___________________________________________________


Query without trim

SELECT "dim_cliente"."tipocliente" AS "c0", 

       "dim_cliente"."a1_ibge" AS "c1", 

       "dim_cliente"."a1_cod" AS "c2", 

       "dim_cliente"."a1_nome" AS "c3", 

       "dim_vendedor"."a3_nome" AS "c4" 

  FROM "public"."dim_cliente" AS "dim_cliente", 

       "public"."fato_ventas_productos" AS "fato_ventas_productos", 

       "public"."dim_vendedor" AS "dim_vendedor" 

 WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente" 

   AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor" 

 GROUP 

    BY "dim_cliente"."tipocliente" , 

       "dim_cliente"."a1_ibge", 

       "dim_cliente"."a1_cod", 

       "dim_cliente"."a1_nome", 

       "dim_vendedor"."a3_nome" 

 ORDER 

    BY "dim_cliente"."tipocliente" ASC NULLS LAST, 

       "dim_cliente"."a1_ibge" ASC NULLS LAST, 

       "dim_cliente"."a1_cod" ASC NULLS LAST, 

       "dim_cliente"."a1_nome" ASC NULLS LAST;

-- this query took 37249.268 ms


___________________________________________________


Query Plan when not using trim

                                                                          QUERY PLAN                                                                           

---------------------------------------------------------------------------------------------------------------------------------------------------------------

 Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)

   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

   ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)

         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

         Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

         Sort Method: quicksort  Memory: 265592kB

         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)

               Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome

               Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)

               ->  Hash Join  (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1)

                     Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor

                     Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)

                     ->  Seq Scan on public.fato_ventas_productos  (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1)

                           Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor

                     ->  Hash  (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1)

                           Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

                           Buckets: 2048  Batches: 1  Memory Usage: 1980kB

                           ->  Seq Scan on public.dim_cliente  (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1)

                                 Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente

               ->  Hash  (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1)

                     Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

                     Buckets: 1024  Batches: 1  Memory Usage: 56kB

                     ->  Seq Scan on public.dim_vendedor  (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1)

                           Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor

 Total runtime: 37249.268 ms

(25 filas)


___________________________________________________


Is anything that I can do to solve this problem, is that a bug or a config problem?


Here the link with a dump of the tables 

https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing


I appreciate your help

Hello!
What is your Postgres version?
Do you have correct statistics on this tables?
Please show  yours execution plans with buffers i.e. explain (analyze,buffers) ...

-- 
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:
Evgeniy Shishkin
Date:

> What is your Postgres version?
> Do you have correct statistics on this tables?
> Please show  yours execution plans with buffers i.e. explain (analyze,buffers) ...
>


Fast:

 Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome,
(btrim((dim_cliente.tipocliente)::text))
   Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
   Sort Method: quicksort  Memory: 13121kB
   ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615
loops=1)
         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) 
         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357
loops=1)


Slow:

 Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
   ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
         Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
         Sort Method: quicksort  Memory: 265592kB
         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357
loops=1)


The difference is in the top of plans.
As we see, hashjoin time is practically the same.
But fast plan uses hashagg first and only 43k rows require sorting.
Slow plan dominated by sorting 900k rows.

I wonder if increasing cpu_tuple_cost will help.
As cost difference between two plans is negligible now.


From:
Blas Pico
Date:

My database version is 9.3 but I have test with 9.4 too with the same result, and I have test changing that parameter without success.
I want to know what does have to do the trim with the different query plans?

2015-11-25 13:35 GMT-03:00 Evgeniy Shishkin <>:
> What is your Postgres version?
> Do you have correct statistics on this tables?
> Please show  yours execution plans with buffers i.e. explain (analyze,buffers) ...
>


Fast:

 Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
   Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
   Sort Method: quicksort  Memory: 13121kB
   ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)


Slow:

 Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
   ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
         Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
         Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
         Sort Method: quicksort  Memory: 265592kB
         ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)


The difference is in the top of plans.
As we see, hashjoin time is practically the same.
But fast plan uses hashagg first and only 43k rows require sorting.
Slow plan dominated by sorting 900k rows.

I wonder if increasing cpu_tuple_cost will help.
As cost difference between two plans is negligible now.
   

From:
"Peter J. Holzer"
Date:

On 2015-11-25 19:35:15 +0300, Evgeniy Shishkin wrote:
> Fast:
>
>  Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,(btrim((dim_cliente.tipocliente)::text)) 
>    Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
>    Sort Method: quicksort  Memory: 13121kB
>    ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615
loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) 
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357
loops=1)
>
>
> Slow:
>
>  Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>    ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>          Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome,
dim_vendedor.a3_nome
>          Sort Method: quicksort  Memory: 265592kB
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357
loops=1)
>
>
> The difference is in the top of plans.
> As we see, hashjoin time is practically the same.
> But fast plan uses hashagg first and only 43k rows require sorting.
> Slow plan dominated by sorting 900k rows.
>
> I wonder if increasing cpu_tuple_cost will help.
> As cost difference between two plans is negligible now.

Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
in-memory sort of 900k rows seems slow to me. I tested this on my PC at
home (1.8 GHz Core2 Dual, so a rather old and slow box) and I could sort
1E6 rows of 128 random bytes in 5.6 seconds. Even if I kept the first 96
bytes constant (so only the last 32 were random), it took only 21
seconds. Either this CPU is really slow or the data is heavily skewed -
is it possible that all dimensions except dim_vendedor.a3_nome have only
one or very few values? In that case changing the sort order might help.

    hp


--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   |          | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

From:
Tom Lane
Date:

"Peter J. Holzer" <> writes:
> Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
> in-memory sort of 900k rows seems slow to me.

I'm wondering if it's textual data in some locale whose strcoll() behavior
is exceptionally slow.

            regards, tom lane