Re: [GENERAL] Tuning queries on large database - Mailing list pgsql-performance
From | Valerie Schneider DSI/DEV |
---|---|
Subject | Re: [GENERAL] Tuning queries on large database |
Date | |
Msg-id | 200408050816.i758GxO21560@mu.meteo.fr Whole thread Raw |
Responses |
Re: [GENERAL] Tuning queries on large database
Re: [GENERAL] Tuning queries on large database Re: [GENERAL] Tuning queries on large database |
List | pgsql-performance |
Hi, I 've decreased the sort_mem to 5000 instead of 50000. I recreated ma table using integer and real types instead of numeric : the result is very improved for the disk space : schema | relfilenode | table | index | reltuples | size --------+-------------+------------------+------------+-------------+---------- public | 253442696 | data | | 1.25113e+08 | 29760016 public | 378639579 | data | i_data_dat | 1.25113e+08 | 2744400 public | 378555698 | data | pk_data | 1.25113e+08 | 3295584 so it takes about 28 Gb instead of 68 Gb ! For my different queries, it's better but less performant than oracle : oracle PG yesterday(numeric) PG today(integer/real) Q1 <1s <1s <1s Q2 3s 8s 4s Q3 8s 1m20s 27s Q4 28s 17m20s 6m47s Result of EXPLAIN ANALYZE : Q1 :bench=> explain analyze select 'Q1',min(td),max(u) from data where num_poste=1000 and dat between (date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '2000 days'))::timestamp and (date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '2000 days') + interval '1 month' - interval '1 hour')::timestamp; QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=2466.47..2466.47 rows=1 width=8) (actual time=261.777..261.778 rows=1 loops=1) -> Index Scan using pk_data on data (cost=0.00..2463.41 rows=611 width=8) (actual time=20.106..259.924 rows=744 loops=1) Index Cond: ((num_poste = 1000) AND (dat >= (date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - '2000 days'::interval)))::timestamp without time zone) AND (dat <= (((date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - '2000 days'::interval)) + '1 mon'::interval) - '01:00:00'::interval))::timestamp without time zone)) Total runtime: 262.145 ms (4 rows) Q2 : bench=> explain analyze select 'Q2',count(*) from data where num_poste between 100 and 100+10 and dat between (date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '3000 days'))::timestamp and (date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '3000 days') + interval '1 month' - interval '1 hour')::timestamp; QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=24777.68..24777.68 rows=1 width=0) (actual time=4253.977..4253.978 rows=1 loops=1) -> Index Scan using pk_data on data (cost=0.00..24762.34 rows=6138 width=0) (actual time=46.602..4244.984 rows=7920 loops=1) Index Cond: ((num_poste >= 100) AND (num_poste <= 110) AND (dat >= (date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - '3000 days'::interval)))::timestamp without time zone) AND (dat <= (((date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - '3000 days'::interval)) + '1 mon'::interval) - '01:00:00'::interval))::timestamp without time zone)) Total runtime: 4254.233 ms (4 rows) Q3 : bench=> explain analyze select 'Q3',sum(rr1),count(ff) from data where num_poste in (50,50+2); QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=963455.87..963455.87 rows=1 width=8) (actual time=27668.666..27668.667 rows=1 loops=1) -> Index Scan using pk_data, pk_data on data (cost=0.00..962236.31 rows=243910 width=8) (actual time=16.251..27275.468 rows=250226 loops=1) Index Cond: ((num_poste = 50) OR (num_poste = 52)) Total runtime: 27673.837 ms (4 rows) Q4 : bench=> explain analyze select 'Q4',count(*) from data where num_poste between 600 and 625; QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=14086174.57..14086174.57 rows=1 width=0) (actual time=428235.024..428235.025 rows=1 loops=1) -> Index Scan using pk_data on data (cost=0.00..14076910.99 rows=3705431 width=0) (actual time=45.283..424634.826 rows=3252938 loops=1) Index Cond: ((num_poste >= 600) AND (num_poste <= 625)) Total runtime: 428235.224 ms (4 rows) Thanks for all, Valerie. >X-Original-To: pgsql-general-postgresql.org@localhost.postgresql.org >X-Authentication-Warning: houston.familyhealth.com.au: chriskl owned process doing -bs >Date: Wed, 4 Aug 2004 21:21:51 +0800 (WST) >From: Christopher Kings-Lynne <chriskl@familyhealth.com.au> >To: Valerie Schneider DSI/DEV <Valerie.Schneider@meteo.fr> >Cc: pgsql-performance@postgresql.org, <pgsql-general@postgresql.org> >Subject: Re: [GENERAL] [PERFORM] Tuning queries on large database >MIME-Version: 1.0 >X-Virus-Scanned: by amavisd-new at hub.org >X-Spam-Status: No, hits=0.0 tagged_above=0.0 required=5.0 tests= >X-Spam-Level: >X-Mailing-List: pgsql-general > >> sort_mem = 50000 > >That is way, way too large. Try more like 5000 or lower. > >> num_poste | numeric(9,0) | not null > >For starters numerics are really, really slow compared to integers. Why >aren't you using an integer for this field since youhave '0' decimal >places. > >> schema | relfilenode | table | index | reltuples | size >> --------+-------------+------------------+------------+-------------+---------- >> public | 125615917 | data | | 1.25113e+08 | 72312040 >> public | 251139049 | data | i_data_dat | 1.25113e+08 | 2744400 >> public | 250870177 | data | pk_data | 1.25113e+08 | 4395480 >> >> My first remark is that the table takes a lot of place on disk, about >> 70 Gb, instead of 35 Gb with oracle. > >Integers will take a lot less space than numerics. > >> The different queries of the bench are "simple" queries (no join, >> sub-query, ...) and are using indexes (I "explained" each one to >> be sure) : >> Q1 select_court : access to about 700 rows : 1 "num_poste" and 1 month >> (using PK : num_poste=p1 and dat between p2 and p3) >> Q2 select_moy : access to about 7000 rows : 10 "num_poste" and 1 month >> (using PK : num_poste between p1 and p1+10 and dat between p2 and p3) >> Q3 select_long : about 250 000 rows : 2 "num_poste" >> (using PK : num_poste in (p1,p1+2)) >> Q4 select_tres_long : about 3 millions rows : 25 "num_poste" >> (using PK : num_poste between p1 and p1 + 25) >> >> The result is that for "short queries" (Q1 and Q2) it runs in a few >> seconds on both Oracle and PG. The difference becomes important with >> Q3 : 8 seconds with oracle >> 80 sec with PG >> and too much with Q4 : 28s with oracle >> 17m20s with PG ! >> >> Of course when I run 100 or 1000 parallel queries such as Q3 or Q4, >> it becomes a disaster ! > >Please reply with the EXPLAIN ANALYZE output of these queries so we can >have some idea of how to help you. > >Chris > > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html ******************************************************************** * Les points de vue exprimes sont strictement personnels et * * n'engagent pas la responsabilite de METEO-FRANCE. * ******************************************************************** * Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 * * METEO-FRANCE / DSI/DEV Fax : +33 (0)5 61 07 81 09 * * 42, avenue G. Coriolis Email : Valerie.Schneider@meteo.fr * * 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr * ********************************************************************
pgsql-performance by date: