Re: Caching (was Re: choosing the right platform) - Mailing list pgsql-performance
From | Ron Mayer |
---|---|
Subject | Re: Caching (was Re: choosing the right platform) |
Date | |
Msg-id | POEDIPIPKGJJLDNIEMBEOEAACJAA.ron@intervideo.com Whole thread Raw |
In response to | Re: Caching (was Re: choosing the right platform) (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Caching (was Re: choosing the right platform)
Re: Caching (was Re: choosing the right platform) |
List | pgsql-performance |
Short summary... I think sort_mem matters quite a bit (20-40%) on my data-warehousing applications. Am I doing something wrong to need so much sort_mem? Josh wrote: >> ** It's better to not use huge amount of sort-mem... > >...However, I have never seen a database use more than 32mb sort mem >in a single process, so I don't think the 2GB limit will hurt you much ... Do you think this is true in data warehousing applications as well? During the ETL part of data warehousing, large sorts are often used to get the "new" values that need to be inserted into "dimension" tables, like this: INSERT INTO dimension_val (id,val) SELECT nextval('val_seq'),val FROM (SELECT DISTINCT val FROM import_table EXCEPT SELECT val FROM dimension_val) as a; As far as I can tell, this query typically does two sorts, one for the distinct, and one for the except. In a data warehouse we have here, we load about 3 million rows each week; load time improved from about 9 to 7 hours by breaking up such queries into expressions that only require one sort at a time, and surrounding the expressions with "set sort_mem=something_big" statements to give it enough space to not hit the disk. SET SORT_MEM=300000; CREATE TEMPORARY TABLE potential_new_values AS SELECT DISTINCT val FROM import_table; ... SET SORT_MEM=1000; Anyone else have similar experience, or am I doing something wrong to need so much SORT_MEM? Ron PS: Below is an example of another real-world query from the same reporting system that benefits from a sort_mem over 32M. Explain analyze (below) shows a 40% improvement by having the sort fit in memory. 10Meg and 32Meg take over 22 seconds. 100Meg takes 14. ==================================================================================================== logs2=# logs2=# logs2=# set sort_mem=10000; SET VARIABLE logs2=# explain analyze select distinct category from c_transaction_credit; NOTICE: QUERY PLAN: Unique (cost=71612.82..72838.69 rows=49035 width=17) (actual time=20315.47..22457.21 rows=2914 loops=1) -> Sort (cost=71612.82..71612.82 rows=490348 width=17) (actual time=20315.46..21351.42 rows=511368 loops=1) -> Seq Scan on c_transaction_credit (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.08..2932.72 rows=511368 loops=1) Total runtime: 22475.63 msec EXPLAIN logs2=# set sort_mem=32000; SET VARIABLE logs2=# explain analyze select distinct category from c_transaction_credit; NOTICE: QUERY PLAN: Unique (cost=60442.82..61668.69 rows=49035 width=17) (actual time=22657.31..24794.19 rows=2914 loops=1) -> Sort (cost=60442.82..60442.82 rows=490348 width=17) (actual time=22657.30..23714.43 rows=511368 loops=1) -> Seq Scan on c_transaction_credit (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.07..3020.83 rows=511368 loops=1) Total runtime: 24811.65 msec EXPLAIN logs2=# set sort_mem=100000; SET VARIABLE logs2=# explain analyze select distinct category from c_transaction_credit; NOTICE: QUERY PLAN: Unique (cost=60442.82..61668.69 rows=49035 width=17) (actual time=12205.19..14012.57 rows=2914 loops=1) -> Sort (cost=60442.82..60442.82 rows=490348 width=17) (actual time=12205.18..12710.16 rows=511368 loops=1) -> Seq Scan on c_transaction_credit (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.08..3001.05 rows=511368 loops=1) Total runtime: 14187.96 msec EXPLAIN logs2=#
pgsql-performance by date: