Tuning queries on large database - Mailing list pgsql-general
From | Valerie Schneider DSI/DEV |
---|---|
Subject | Tuning queries on large database |
Date | |
Msg-id | 200408041244.i74CihO19344@mu.meteo.fr Whole thread Raw |
Responses |
Re: [PERFORM] Tuning queries on large database
|
List | pgsql-general |
Hi, I have some problem of performance on a PG database, and I don't know how to improve. I Have two questions : one about the storage of data, one about tuning queries. If possible ! My job is to compare Oracle and Postgres. All our operational databases have been running under Oracle for about fifteen years. Now I try to replace Oracle by Postgres. I have a test platform under linux (Dell server, 4 Gb RAM, bi-processor, Linux Red Hat 9 (2.4.20-31.9)) with 2 databases, 1 with Oracle (V8i or V9i it's quite the same), 1 with PG (7.4.2). Both databases have the same structure, same content, about 100 Gb each. I developped some benches, representative of our use of databases. My problem is that I have tables (relations) with more than 100 millions rows, and each row has about 160 fields and an average size 256 bytes. For Oracle I have a SGA size of 500 Mb. For PG I have a postgresql.conf as : max_connections = 1500 shared_buffers = 30000 sort_mem = 50000 effective_cache_size = 200000 and default value for other parameters. I have a table named "data" which looks like this : bench=> \d data Table "public.data" Column | Type | Modifiers ------------+-----------------------------+----------- num_poste | numeric(9,0) | not null dat | timestamp without time zone | not null datrecu | timestamp without time zone | not null rr1 | numeric(5,1) | qrr1 | numeric(2,0) | ... ... all numeric fields ... Indexes: "pk_data" primary key, btree (num_poste, dat) "i_data_dat" btree (dat) It contains 1000 different values of "num_poste" and for each one 125000 different values of "dat" (1 row per hour, 15 years). I run a vacuum analyze of the table. bench=> select * from tailledb ; schema | relfilenode | table | index | reltuples | size --------+-------------+------------------+------------+-------------+---------- public | 125615917 | data | | 1.25113e+08 | 72312040 public | 251139049 | data | i_data_dat | 1.25113e+08 | 2744400 public | 250870177 | data | pk_data | 1.25113e+08 | 4395480 My first remark is that the table takes a lot of place on disk, about 70 Gb, instead of 35 Gb with oracle. 125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea not so bad for oracle. What about for PG ? How data is stored ? The different queries of the bench are "simple" queries (no join, sub-query, ...) and are using indexes (I "explained" each one to be sure) : Q1 select_court : access to about 700 rows : 1 "num_poste" and 1 month (using PK : num_poste=p1 and dat between p2 and p3) Q2 select_moy : access to about 7000 rows : 10 "num_poste" and 1 month (using PK : num_poste between p1 and p1+10 and dat between p2 and p3) Q3 select_long : about 250 000 rows : 2 "num_poste" (using PK : num_poste in (p1,p1+2)) Q4 select_tres_long : about 3 millions rows : 25 "num_poste" (using PK : num_poste between p1 and p1 + 25) The result is that for "short queries" (Q1 and Q2) it runs in a few seconds on both Oracle and PG. The difference becomes important with Q3 : 8 seconds with oracle 80 sec with PG and too much with Q4 : 28s with oracle 17m20s with PG ! Of course when I run 100 or 1000 parallel queries such as Q3 or Q4, it becomes a disaster ! I can't understand these results. The way to execute queries is the same I think. I've read recommended articles on the PG site. I tried with a table containing 30 millions rows, results are similar. What can I do ? Thanks for your help ! ******************************************************************** * Les points de vue exprimes sont strictement personnels et * * n'engagent pas la responsabilite de METEO-FRANCE. * ******************************************************************** * Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 * * METEO-FRANCE / DSI/DEV Fax : +33 (0)5 61 07 81 09 * * 42, avenue G. Coriolis Email : Valerie.Schneider@meteo.fr * * 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr * ********************************************************************
pgsql-general by date: