Thread: Speeding up query, Joining 55mil and 43mil records.
Hello People,
I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration.
My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
Below is the 'slow' query.
INSERT INTO rpt.rpt_verrichting
(verrichting_id
,verrichting_secid
,fout_status
,patientnr
,verrichtingsdatum
,locatie_code
,afdeling_code
,uitvoerder_code
,aanvrager_code
,verrichting_code
,dbcnr
,aantal_uitgevoerd
,kostenplaats_code
,vc_patientnr
,vc_verrichting_code
,vc_dbcnr
)
SELECT t1.id
, t0.secid
, t1.status
, t1.patientnr
, t1.datum
, t1.locatie
, t1.afdeling
, t1.uitvoerder
, t1.aanvrager
, t0.code
, t1.casenr
, t0.aantal
, t0.kostplaats
, null
, null
, null
FROM src.src_faktuur_verrsec t0 JOIN
src.src_faktuur_verricht t1 ON
t0.id = t1.id
WHERE substr(t0.code,1,2) not in ('14','15','16','17')
AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
AND EXTRACT(YEAR from t1.datum) > 2004;
Output from explain
Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118)
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52)
Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
-> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80)
-> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80)
Recheck Cond: (date_part('year'::text, datum) > 2004::double precision)
-> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0)
Index Cond: (date_part('year'::text, datum) > 2004::double precision)
The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory.
Changed lines from my postgresql.conf file
shared_buffers = 8192
temp_buffers = 4096
work_mem = 65536
maintenance_work_mem = 1048576
max_fsm_pages = 40000
fsync = off
wal_buffers = 64
effective_cache_size = 174848
The query above takes around 42 minutes.
However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out.
Could someone perhaps give me some pointers, advice?
Thanks in advance.
Nicky
I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration.
My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
Below is the 'slow' query.
INSERT INTO rpt.rpt_verrichting
(verrichting_id
,verrichting_secid
,fout_status
,patientnr
,verrichtingsdatum
,locatie_code
,afdeling_code
,uitvoerder_code
,aanvrager_code
,verrichting_code
,dbcnr
,aantal_uitgevoerd
,kostenplaats_code
,vc_patientnr
,vc_verrichting_code
,vc_dbcnr
)
SELECT t1.id
, t0.secid
, t1.status
, t1.patientnr
, t1.datum
, t1.locatie
, t1.afdeling
, t1.uitvoerder
, t1.aanvrager
, t0.code
, t1.casenr
, t0.aantal
, t0.kostplaats
, null
, null
, null
FROM src.src_faktuur_verrsec t0 JOIN
src.src_faktuur_verricht t1 ON
t0.id = t1.id
WHERE substr(t0.code,1,2) not in ('14','15','16','17')
AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
AND EXTRACT(YEAR from t1.datum) > 2004;
Output from explain
Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118)
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52)
Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
-> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80)
-> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80)
Recheck Cond: (date_part('year'::text, datum) > 2004::double precision)
-> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0)
Index Cond: (date_part('year'::text, datum) > 2004::double precision)
The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory.
Changed lines from my postgresql.conf file
shared_buffers = 8192
temp_buffers = 4096
work_mem = 65536
maintenance_work_mem = 1048576
max_fsm_pages = 40000
fsync = off
wal_buffers = 64
effective_cache_size = 174848
The query above takes around 42 minutes.
However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out.
Could someone perhaps give me some pointers, advice?
Thanks in advance.
Nicky
Could you post an explain analyze of the query? Just FYI, if you do an explain analyze of the insert statement, it will actually do the insert. If you don't want that just post an explain analyze of the select part.
To me it would be interesting to compare just the select parts of the query between Postgres and MSSQL. That way you would know if your Postgres install is slower at the query or slower at the insert.
-----Original Message-----Hello People,
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of nicky
Sent: Wednesday, June 21, 2006 8:47 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Speeding up query, Joining 55mil and 43mil records.
I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration.
My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
Below is the 'slow' query.
INSERT INTO rpt.rpt_verrichting
(verrichting_id
,verrichting_secid
,fout_status
,patientnr
,verrichtingsdatum
,locatie_code
,afdeling_code
,uitvoerder_code
,aanvrager_code
,verrichting_code
,dbcnr
,aantal_uitgevoerd
,kostenplaats_code
,vc_patientnr
,vc_verrichting_code
,vc_dbcnr
)
SELECT t1.id
, t0.secid
, t1.status
, t1.patientnr
, t1.datum
, t1.locatie
, t1.afdeling
, t1.uitvoerder
, t1.aanvrager
, t0.code
, t1.casenr
, t0.aantal
, t0.kostplaats
, null
, null
, null
FROM src.src_faktuur_verrsec t0 JOIN
src.src_faktuur_verricht t1 ON
t0.id = t1.id
WHERE substr(t0.code,1,2) not in ('14','15','16','17')
AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
AND EXTRACT(YEAR from t1.datum) > 2004;
Output from explain
Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118)
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52)
Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
-> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80)
-> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80)
Recheck Cond: (date_part('year'::text, datum) > 2004::double precision)
-> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0)
Index Cond: (date_part('year'::text, datum) > 2004::double precision)
The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory.
Changed lines from my postgresql.conf file
shared_buffers = 8192
temp_buffers = 4096
work_mem = 65536
maintenance_work_mem = 1048576
max_fsm_pages = 40000
fsync = off
wal_buffers = 64
effective_cache_size = 174848
The query above takes around 42 minutes.
However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out.
Could someone perhaps give me some pointers, advice?
Thanks in advance.
Nicky
Hi Nicky, I guess, you should try to upgrade the memory setting of PostgreSQL first. work_mem = 65536 Is a bit low for such large joins. Did you get a change to watch the directory <PGDATA>/base/<DBOID>/pgsql_tmp to see how large the temporary file is during this query. I'm sure that there is large file. Anyhow, you can upgrade 'work_mem' to 1000000 which is 1 GB. Please note that the parameter work_mem is per backend process. You will get problems with multiple large queries at the same time. You may move (link) the directory 'pgsql_tmp' to a very fast file system if you still get large files in this directory. You also can try to increase this settings: checkpoint_segments = 256 checkpoint_timeout = 3600 # range 30-3600, in seconds checkpoint_warning = 0 # 0 is off Please read the PostgreSQL documentation about the drawbacks of this setting as well as your setting 'fsync=off'. Cheers Sven. nicky schrieb: > Hello People, > > I'm trying to solve a 'what i feel is a' performance/configuration/query > error on my side. I'm fairly new to configuring PostgreSQL so, i might > be completely wrong with my configuration. > > My database consists of 44 tables, about 20GB. Two of those tables are > 'big/huge'. Table src.src_faktuur_verricht contains 43million records > (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB). > > Below is the 'slow' query. > > INSERT INTO rpt.rpt_verrichting > (verrichting_id > ,verrichting_secid > ,fout_status > ,patientnr > ,verrichtingsdatum > ,locatie_code > ,afdeling_code > ,uitvoerder_code > ,aanvrager_code > ,verrichting_code > ,dbcnr > ,aantal_uitgevoerd > ,kostenplaats_code > ,vc_patientnr > ,vc_verrichting_code > ,vc_dbcnr > ) > SELECT t1.id > , t0.secid > , t1.status > , t1.patientnr > , t1.datum > , t1.locatie > , t1.afdeling > , t1.uitvoerder > , t1.aanvrager > , t0.code > , t1.casenr > , t0.aantal > , t0.kostplaats > , null > , null > , null > FROM src.src_faktuur_verrsec t0 JOIN > src.src_faktuur_verricht t1 ON > t0.id = t1.id > WHERE substr(t0.code,1,2) not in ('14','15','16','17') > AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null) > AND EXTRACT(YEAR from t1.datum) > 2004; > > > Output from explain > > Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) > Hash Cond: (("outer".id)::text = ("inner".id)::text) > > -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 > rows=40902852 width=52) > Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND > (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, > 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND > ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) > -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) > -> Bitmap Heap Scan on src_faktuur_verricht t1 > (cost=62392.02..1188102.97 rows=8942863 width=80) > Recheck Cond: (date_part('year'::text, datum) > > 2004::double precision) > -> Bitmap Index Scan on src_faktuur_verricht_idx1 > (cost=0.00..62392.02 rows=8942863 width=0) > Index Cond: (date_part('year'::text, datum) > > 2004::double precision) > > > The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM. > It contains two SATA150 disks, one contains PostgreSQL and the rest of > the operating system and the other disk holds the pg_xlog directory. > > Changed lines from my postgresql.conf file > > shared_buffers = 8192 > temp_buffers = 4096 > work_mem = 65536 > maintenance_work_mem = 1048576 > max_fsm_pages = 40000 > fsync = off > wal_buffers = 64 > effective_cache_size = 174848 > > The query above takes around 42 minutes. > > However, i also have a wimpy desktop machine with 1gb ram. Windows with > MSSQL 2000 (default installation), same database structure, same > indexes, same query, etc and it takes 17 minutes. The big difference > makes me think that i've made an error with my PostgreSQL configuration. > I just can't seem to figure it out. > > Could someone perhaps give me some pointers, advice? > > Thanks in advance. > > Nicky >
On Wed, 2006-06-21 at 08:47, nicky wrote: > Hello People, SNIPPAGE > The query above takes around 42 minutes. > > However, i also have a wimpy desktop machine with 1gb ram. Windows > with MSSQL 2000 (default installation), same database structure, same > indexes, same query, etc and it takes 17 minutes. The big difference > makes me think that i've made an error with my PostgreSQL > configuration. I just can't seem to figure it out. What is the difference between the two plans (i.e. explain on both boxes and compare)
Scott Marlowe <smarlowe@g2switchworks.com> writes: > What is the difference between the two plans (i.e. explain on both boxes > and compare) Even more to the point, let's see EXPLAIN ANALYZE output from both boxes... regards, tom lane
On Wed, Jun 21, 2006 at 03:47:19PM +0200, nicky wrote: > WHERE substr(t0.code,1,2) not in ('14','15','16','17') > AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null) > AND EXTRACT(YEAR from t1.datum) > 2004; How much data do you expect to be getting back from that where clause? Unless you plan on inserting most of the table, some well-placed indexes would probably help, and fixing the datum portion might as well (depending on how far back the data goes). Specifically: CREATE INDEX t0_code_partial ON t0(substr(code,1,2)); (yeah, I know t0 is an alias, but I already snipped the table name) and AND t1.datum >= '1/1/2005' (might need to cast that to a date or whatever). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hello again, thanks for all the quick replies. It seems i wasn't entirely correct on my previous post, i've mixed up some times/numbers. Below the correct numbers MSSQL: SELECT COUNT(*) from JOIN (without insert) 17 minutes PostgreSQL: SELECT COUNT(*) from JOIN (without insert) 33 minutes PostgreSQL: complete query 55 minutes The part i'm really troubled with is the difference in performance for the select part. Which takes twice as long on PostgreSQL even though it has a better server then MSSQL. Changed i've made to postgressql.conf work_mem = 524288 (1GB, results in out of memory error) checkpoints_segments = 256 checkpoints_timeout = 3600 checkpoints_warning = 0 I've ran the complete 'explain analyse query' twice. First with pgsql_tmp on the same disk, then again with pgsql_tmp on a seperate disk. **** (PostgreSQL) (*pgsql_tmp on same disk*): Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) (actual time=327982.425..1903423.769 rows=7551616 loops=1) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52) (actual time=8.935..613455.204 rows=37368390 loops=1) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) (actual time=327819.698..327819.698 rows=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80) (actual time=75911.336..295510.647 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=75082.080..75082.080 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision) Total runtime: 3355696.015 ms **** (PostgreSQL) (*pgsql_tmp on seperate disk*) Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) (actual time=172797.736..919869.708 rows=7551616 loops=1) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52) (actual time=0.015..362154.822 rows=37368390 loops=1) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) (actual time=172759.255..172759.255 rows=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80) (actual time=4244.840..142144.606 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=3431.361..3431.361 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision) Total runtime: 2608316.714 ms A lot of difference in performance. 55 minutes to 42 minutes. I've ran the 'select count(*) from JOIN' to see the difference on that part. **** (PostgreSQL) Explain analyse from SELECT COUNT(*) from the JOIN. (*pgsql_tmp on seperate disk*) Aggregate (cost=5632244.93..5632244.94 rows=1 width=0) (actual time=631993.425..631993.427 rows=1 loops=1) -> Hash Join (cost=1258493.12..5614251.00 rows=7197568 width=0) (actual time=237999.277..620018.706 rows=7551616 loops=1) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=14) (actual time=23.449..200532.422 rows=37368390 loops=1) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::tex (..) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=14) (actual time=237939.262..237939.262 rows=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=14) (actual time=74713.092..216206.478 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=73892.153..73892.153 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision) Total runtime: 631994.172 ms A lot of improvement also in the select count: 33 minutes vs 10 minutes. To us, the speeds are good. Very happy with the performance increase on that select with join, since 90% of the queries are SELECT based. The query results in 7551616 records, so that's about 4500 inserts per second. I'm not sure if that is fast or not. Any further tips would be welcome. Thanks everyone. Nicky
Hi Nicky, Did you tried to create an index to avoid the sequential scans? Seq Scan on src_faktuur_verrsec t0... I think, you should try CREATE INDEX src.src_faktuur_verrsec_codesubstr ON src.src_faktuur_verrsec (substr(src.src_faktuur_verrsec.code,1,2)) Cheers Sven. nicky schrieb: > Hello again, > > thanks for all the quick replies. > > It seems i wasn't entirely correct on my previous post, i've mixed up > some times/numbers. > > Below the correct numbers > > MSSQL: SELECT COUNT(*) from JOIN (without insert) 17 minutes > PostgreSQL: SELECT COUNT(*) from JOIN (without insert) 33 minutes > PostgreSQL: complete query 55 minutes <snip snip snip> > > A lot of improvement also in the select count: 33 minutes vs 10 minutes. > > > To us, the speeds are good. Very happy with the performance increase on > that select with join, since 90% of the queries are SELECT based. > > The query results in 7551616 records, so that's about 4500 inserts per > second. I'm not sure if that is fast or not. Any further tips would be > welcome.
Hello Sven, We have the following indexes on src_faktuur_verrsec / CREATE INDEX src_faktuur_verrsec_idx0 ON src.src_faktuur_verrsec USING btree (id); CREATE INDEX src_faktuur_verrsec_idx1 ON src.src_faktuur_verrsec USING btree (substr(code::text, 1, 2)); CREATE INDEX src_faktuur_verrsec_idx2 ON src.src_faktuur_verrsec USING btree (substr(correctie::text, 4, 1));/ and another two on src_faktuur_verricht / CREATE INDEX src_faktuur_verricht_idx0 ON src.src_faktuur_verricht USING btree (id); CREATE INDEX src_faktuur_verricht_idx1 ON src.src_faktuur_verricht USING btree (date_part('year'::text, datum)) TABLESPACE src_index;/ PostgreSQL elects not to use them. I assume, because it most likely needs to traverse the entire table anyway. if i change: / substr(t0.code,1,2) not in ('14','15','16','17')/ to (removing the NOT): / substr(t0.code,1,2) in ('14','15','16','17')/ it uses the index, but it's not the query that needs to be run anymore. Greetings, Nick Sven Geisler wrote: > Hi Nicky, > > Did you tried to create an index to avoid the sequential scans? > > Seq Scan on src_faktuur_verrsec t0... > > I think, you should try > > CREATE INDEX src.src_faktuur_verrsec_codesubstr ON > src.src_faktuur_verrsec (substr(src.src_faktuur_verrsec.code,1,2)) > > Cheers > Sven. > > nicky schrieb: >> Hello again, >> >> thanks for all the quick replies. >> >> It seems i wasn't entirely correct on my previous post, i've mixed up >> some times/numbers. >> >> Below the correct numbers >> >> MSSQL: SELECT COUNT(*) from JOIN (without insert) 17 minutes >> PostgreSQL: SELECT COUNT(*) from JOIN (without insert) 33 minutes >> PostgreSQL: complete query 55 minutes > > <snip snip snip> >> >> A lot of improvement also in the select count: 33 minutes vs 10 minutes. >> >> >> To us, the speeds are good. Very happy with the performance increase >> on that select with join, since 90% of the queries are SELECT based. >> >> The query results in 7551616 records, so that's about 4500 inserts >> per second. I'm not sure if that is fast or not. Any further tips >> would be welcome. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > >
> PostgreSQL elects not to use them. I assume, because it most > likely needs to traverse the entire table anyway. > > if i change: / substr(t0.code,1,2) not in > ('14','15','16','17')/ > to (removing the NOT): / substr(t0.code,1,2) in > ('14','15','16','17')/ > > it uses the index, but it's not the query that needs to be > run anymore. If this is the only query that you're having problems with, you might be helped with a partial index - depending on how much 14-17 really filters. Try something like: CREATE INDEX foo ON src.src_faktuur_verrsec (id) WHERE substr(t0.code,1,2) not in ('14','15','16','17') AND (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null) That index shuold be usable for the JOIN while filtering out all the unnecessary rows before you even get tehre. In the same way, if it filters a lot of rows, you might want to try CREATE INDEX foo ON src.src_faktuur_verricht (id) WHERE EXTRACT(YEAR from t1.datum) > 2004 But this kind of requires that the partial indexes actually drop significant amounts of the table. If not, then they'll be of no help. //Magnus
Hi Nick, I'm not that good to advice how to get PostgreSQL to use an index to get your results faster. Did you try "not (substr(t0.code,1,2) in ('14','15','16','17'))"? Cheers Sven. nicky schrieb: > Hello Sven, > > We have the following indexes on src_faktuur_verrsec > / > CREATE INDEX src_faktuur_verrsec_idx0 > ON src.src_faktuur_verrsec > USING btree > (id); > > CREATE INDEX src_faktuur_verrsec_idx1 > ON src.src_faktuur_verrsec > USING btree > (substr(code::text, 1, 2)); > > CREATE INDEX src_faktuur_verrsec_idx2 > ON src.src_faktuur_verrsec > USING btree > (substr(correctie::text, 4, 1));/ > > and another two on src_faktuur_verricht > > / CREATE INDEX src_faktuur_verricht_idx0 > ON src.src_faktuur_verricht > USING btree > (id); > > CREATE INDEX src_faktuur_verricht_idx1 > ON src.src_faktuur_verricht > USING btree > (date_part('year'::text, datum)) > TABLESPACE src_index;/ > > PostgreSQL elects not to use them. I assume, because it most likely > needs to traverse the entire table anyway. > > if i change: / substr(t0.code,1,2) not in > ('14','15','16','17')/ > to (removing the NOT): / substr(t0.code,1,2) in ('14','15','16','17')/ > > it uses the index, but it's not the query that needs to be run anymore. >