Re: Caching (was Re: choosing the right platform) - Mailing list pgsql-performance

From Ron Mayer
Subject Re: Caching (was Re: choosing the right platform)
Date
Msg-id POEDIPIPKGJJLDNIEMBEOEAACJAA.ron@intervideo.com
Whole thread Raw
In response to Re: Caching (was Re: choosing the right platform)  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Caching (was Re: choosing the right platform)
Re: Caching (was Re: choosing the right platform)
List pgsql-performance
Short summary...

  I think sort_mem matters quite a bit (20-40%) on
  my data-warehousing applications.

  Am I doing something wrong to need so much sort_mem?

Josh wrote:
>> ** It's better to not use huge amount of sort-mem...
>
>...However, I have never seen a database use more than 32mb sort mem
>in a single process, so I don't think the 2GB limit will hurt you much ...

Do you think this is true in data warehousing applications as well?

During the ETL part of data warehousing, large sorts are often
used to get the "new" values that need to be inserted
into "dimension" tables, like this:
  INSERT INTO dimension_val (id,val)
       SELECT nextval('val_seq'),val
         FROM (SELECT DISTINCT val FROM import_table
               EXCEPT
               SELECT          val FROM dimension_val) as a;
As far as I can tell, this query typically does two sorts,
one for the distinct, and one for the except.


In a data warehouse we have here, we load about 3 million rows
each week; load time improved from about 9 to 7 hours
by breaking up such queries into expressions that only require
one sort at a time, and surrounding the expressions with
"set sort_mem=something_big" statements to give it enough
space to not hit the disk.

  SET SORT_MEM=300000;
  CREATE TEMPORARY TABLE potential_new_values AS
     SELECT DISTINCT val FROM import_table;
  ...
  SET SORT_MEM=1000;

Anyone else have similar experience, or am I doing something
wrong to need so much SORT_MEM?


    Ron



PS:

Below is an example of another real-world query from the same
reporting system that benefits from a sort_mem over 32M.
Explain analyze (below) shows a 40% improvement by having
the sort fit in memory.

10Meg and 32Meg take over 22 seconds.  100Meg takes 14.

====================================================================================================
logs2=#
logs2=#
logs2=# set sort_mem=10000;
SET VARIABLE
logs2=# explain analyze select distinct category from c_transaction_credit;
NOTICE:  QUERY PLAN:

Unique  (cost=71612.82..72838.69 rows=49035 width=17) (actual time=20315.47..22457.21 rows=2914 loops=1)
  ->  Sort  (cost=71612.82..71612.82 rows=490348 width=17) (actual time=20315.46..21351.42 rows=511368 loops=1)
        ->  Seq Scan on c_transaction_credit  (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.08..2932.72
rows=511368
loops=1)
Total runtime: 22475.63 msec

EXPLAIN
logs2=# set sort_mem=32000;
SET VARIABLE
logs2=# explain analyze select distinct category from c_transaction_credit;
NOTICE:  QUERY PLAN:

Unique  (cost=60442.82..61668.69 rows=49035 width=17) (actual time=22657.31..24794.19 rows=2914 loops=1)
  ->  Sort  (cost=60442.82..60442.82 rows=490348 width=17) (actual time=22657.30..23714.43 rows=511368 loops=1)
        ->  Seq Scan on c_transaction_credit  (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.07..3020.83
rows=511368
loops=1)
Total runtime: 24811.65 msec

EXPLAIN
logs2=# set sort_mem=100000;
SET VARIABLE
logs2=# explain analyze select distinct category from c_transaction_credit;
NOTICE:  QUERY PLAN:

Unique  (cost=60442.82..61668.69 rows=49035 width=17) (actual time=12205.19..14012.57 rows=2914 loops=1)
  ->  Sort  (cost=60442.82..60442.82 rows=490348 width=17) (actual time=12205.18..12710.16 rows=511368 loops=1)
        ->  Seq Scan on c_transaction_credit  (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.08..3001.05
rows=511368
loops=1)
Total runtime: 14187.96 msec

EXPLAIN
logs2=#


pgsql-performance by date:

Previous
From: Jean-Luc Lachance
Date:
Subject: Re: Caching (was Re: choosing the right platform)
Next
From: Josh Berkus
Date:
Subject: Re: Help analyzing 7.2.4 EXPLAIN