Re: Speeding up query, Joining 55mil and 43mil records. - Mailing list pgsql-performance
From | Sven Geisler |
---|---|
Subject | Re: Speeding up query, Joining 55mil and 43mil records. |
Date | |
Msg-id | 449967A3.3030801@aeccom.com Whole thread Raw |
In response to | Speeding up query, Joining 55mil and 43mil records. (nicky <nicky@valuecare.nl>) |
List | pgsql-performance |
Hi Nicky, I guess, you should try to upgrade the memory setting of PostgreSQL first. work_mem = 65536 Is a bit low for such large joins. Did you get a change to watch the directory <PGDATA>/base/<DBOID>/pgsql_tmp to see how large the temporary file is during this query. I'm sure that there is large file. Anyhow, you can upgrade 'work_mem' to 1000000 which is 1 GB. Please note that the parameter work_mem is per backend process. You will get problems with multiple large queries at the same time. You may move (link) the directory 'pgsql_tmp' to a very fast file system if you still get large files in this directory. You also can try to increase this settings: checkpoint_segments = 256 checkpoint_timeout = 3600 # range 30-3600, in seconds checkpoint_warning = 0 # 0 is off Please read the PostgreSQL documentation about the drawbacks of this setting as well as your setting 'fsync=off'. Cheers Sven. nicky schrieb: > Hello People, > > I'm trying to solve a 'what i feel is a' performance/configuration/query > error on my side. I'm fairly new to configuring PostgreSQL so, i might > be completely wrong with my configuration. > > My database consists of 44 tables, about 20GB. Two of those tables are > 'big/huge'. Table src.src_faktuur_verricht contains 43million records > (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB). > > Below is the 'slow' query. > > INSERT INTO rpt.rpt_verrichting > (verrichting_id > ,verrichting_secid > ,fout_status > ,patientnr > ,verrichtingsdatum > ,locatie_code > ,afdeling_code > ,uitvoerder_code > ,aanvrager_code > ,verrichting_code > ,dbcnr > ,aantal_uitgevoerd > ,kostenplaats_code > ,vc_patientnr > ,vc_verrichting_code > ,vc_dbcnr > ) > SELECT t1.id > , t0.secid > , t1.status > , t1.patientnr > , t1.datum > , t1.locatie > , t1.afdeling > , t1.uitvoerder > , t1.aanvrager > , t0.code > , t1.casenr > , t0.aantal > , t0.kostplaats > , null > , null > , null > FROM src.src_faktuur_verrsec t0 JOIN > src.src_faktuur_verricht t1 ON > t0.id = t1.id > WHERE substr(t0.code,1,2) not in ('14','15','16','17') > AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null) > AND EXTRACT(YEAR from t1.datum) > 2004; > > > Output from explain > > Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) > Hash Cond: (("outer".id)::text = ("inner".id)::text) > > -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 > rows=40902852 width=52) > Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND > (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, > 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND > ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) > -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) > -> Bitmap Heap Scan on src_faktuur_verricht t1 > (cost=62392.02..1188102.97 rows=8942863 width=80) > Recheck Cond: (date_part('year'::text, datum) > > 2004::double precision) > -> Bitmap Index Scan on src_faktuur_verricht_idx1 > (cost=0.00..62392.02 rows=8942863 width=0) > Index Cond: (date_part('year'::text, datum) > > 2004::double precision) > > > The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM. > It contains two SATA150 disks, one contains PostgreSQL and the rest of > the operating system and the other disk holds the pg_xlog directory. > > Changed lines from my postgresql.conf file > > shared_buffers = 8192 > temp_buffers = 4096 > work_mem = 65536 > maintenance_work_mem = 1048576 > max_fsm_pages = 40000 > fsync = off > wal_buffers = 64 > effective_cache_size = 174848 > > The query above takes around 42 minutes. > > However, i also have a wimpy desktop machine with 1gb ram. Windows with > MSSQL 2000 (default installation), same database structure, same > indexes, same query, etc and it takes 17 minutes. The big difference > makes me think that i've made an error with my PostgreSQL configuration. > I just can't seem to figure it out. > > Could someone perhaps give me some pointers, advice? > > Thanks in advance. > > Nicky >
pgsql-performance by date: