Speeding up query, Joining 55mil and 43mil records. - Mailing list pgsql-performance
From | nicky |
---|---|
Subject | Speeding up query, Joining 55mil and 43mil records. |
Date | |
Msg-id | 44994DE7.5070801@valuecare.nl Whole thread Raw |
Responses |
Re: Speeding up query, Joining 55mil and 43mil records.
Re: Speeding up query, Joining 55mil and 43mil records. Re: Speeding up query, Joining 55mil and 43mil records. Re: Speeding up query, Joining 55mil and 43mil records. |
List | pgsql-performance |
Hello People,
I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration.
My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
Below is the 'slow' query.
INSERT INTO rpt.rpt_verrichting
(verrichting_id
,verrichting_secid
,fout_status
,patientnr
,verrichtingsdatum
,locatie_code
,afdeling_code
,uitvoerder_code
,aanvrager_code
,verrichting_code
,dbcnr
,aantal_uitgevoerd
,kostenplaats_code
,vc_patientnr
,vc_verrichting_code
,vc_dbcnr
)
SELECT t1.id
, t0.secid
, t1.status
, t1.patientnr
, t1.datum
, t1.locatie
, t1.afdeling
, t1.uitvoerder
, t1.aanvrager
, t0.code
, t1.casenr
, t0.aantal
, t0.kostplaats
, null
, null
, null
FROM src.src_faktuur_verrsec t0 JOIN
src.src_faktuur_verricht t1 ON
t0.id = t1.id
WHERE substr(t0.code,1,2) not in ('14','15','16','17')
AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
AND EXTRACT(YEAR from t1.datum) > 2004;
Output from explain
Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118)
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52)
Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
-> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80)
-> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80)
Recheck Cond: (date_part('year'::text, datum) > 2004::double precision)
-> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0)
Index Cond: (date_part('year'::text, datum) > 2004::double precision)
The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory.
Changed lines from my postgresql.conf file
shared_buffers = 8192
temp_buffers = 4096
work_mem = 65536
maintenance_work_mem = 1048576
max_fsm_pages = 40000
fsync = off
wal_buffers = 64
effective_cache_size = 174848
The query above takes around 42 minutes.
However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out.
Could someone perhaps give me some pointers, advice?
Thanks in advance.
Nicky
I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration.
My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
Below is the 'slow' query.
INSERT INTO rpt.rpt_verrichting
(verrichting_id
,verrichting_secid
,fout_status
,patientnr
,verrichtingsdatum
,locatie_code
,afdeling_code
,uitvoerder_code
,aanvrager_code
,verrichting_code
,dbcnr
,aantal_uitgevoerd
,kostenplaats_code
,vc_patientnr
,vc_verrichting_code
,vc_dbcnr
)
SELECT t1.id
, t0.secid
, t1.status
, t1.patientnr
, t1.datum
, t1.locatie
, t1.afdeling
, t1.uitvoerder
, t1.aanvrager
, t0.code
, t1.casenr
, t0.aantal
, t0.kostplaats
, null
, null
, null
FROM src.src_faktuur_verrsec t0 JOIN
src.src_faktuur_verricht t1 ON
t0.id = t1.id
WHERE substr(t0.code,1,2) not in ('14','15','16','17')
AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
AND EXTRACT(YEAR from t1.datum) > 2004;
Output from explain
Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118)
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52)
Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
-> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80)
-> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80)
Recheck Cond: (date_part('year'::text, datum) > 2004::double precision)
-> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0)
Index Cond: (date_part('year'::text, datum) > 2004::double precision)
The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory.
Changed lines from my postgresql.conf file
shared_buffers = 8192
temp_buffers = 4096
work_mem = 65536
maintenance_work_mem = 1048576
max_fsm_pages = 40000
fsync = off
wal_buffers = 64
effective_cache_size = 174848
The query above takes around 42 minutes.
However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out.
Could someone perhaps give me some pointers, advice?
Thanks in advance.
Nicky
pgsql-performance by date: