Thread: Query runs too long for indexed tables
I have relatively small tables (toode and rid) in fast server. Both tables are indexed on toode field. Following query takes long time to run. toode field type is char(20). It is difficult to change this field type. Any idea how to speed up this query ? UPDATE firma1.rid SET toode=NULL WHERE toode IS NOT NULL AND toode NOT IN (SELECT TOODE FROM firma1.TOODE); Query returned successfully: 0 rows affected, 594813 ms execution time. explain window shows: Seq Scan on rid (cost=2581.07..20862553.77 rows=51848 width=1207) Filter: ((toode IS NOT NULL) AND (NOT (subplan))) SubPlan -> Materialize (cost=2581.07..2944.41 rows=14734 width=84) -> Seq Scan on toode (cost=0.00..2350.34 rows=14734 width=84) Andrus.
On Tue, 2006-04-04 at 14:37, Andrus wrote: > I have relatively small tables (toode and rid) in fast server. > Both tables are indexed on toode field. > > Following query takes long time to run. > toode field type is char(20). It is difficult to change this field type. > > Any idea how to speed up this query ? > > UPDATE firma1.rid SET toode=NULL > WHERE toode IS NOT NULL AND > toode NOT IN (SELECT TOODE FROM firma1.TOODE); > > Query returned successfully: 0 rows affected, 594813 ms execution time. > > explain window shows: > > Seq Scan on rid (cost=2581.07..20862553.77 rows=51848 width=1207) > Filter: ((toode IS NOT NULL) AND (NOT (subplan))) > SubPlan > -> Materialize (cost=2581.07..2944.41 rows=14734 width=84) > -> Seq Scan on toode (cost=0.00..2350.34 rows=14734 width=84) Let me guess, you've updated it a lot and aren't familiar with Vacuum? run a vacuum full on your database. schedule a vacuum (plain one) to run every so often (hours or days are a good interval for most folks) If that's NOT your problem, then please, let us know.
Wondering if Update firma1.rid set toode=null where toode is not null and not exists(select 1 from firma1.toode where toode=rid.toode); Would be faster... Problem appears to be the seqscan of seqscan... No? > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > Scott Marlowe > Sent: Tuesday, April 04, 2006 3:49 PM > To: Andrus > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Query runs too long for indexed tables > > On Tue, 2006-04-04 at 14:37, Andrus wrote: > > I have relatively small tables (toode and rid) in fast server. > > Both tables are indexed on toode field. > > > > Following query takes long time to run. > > toode field type is char(20). It is difficult to change > this field type. > > > > Any idea how to speed up this query ? > > > > UPDATE firma1.rid SET toode=NULL > > WHERE toode IS NOT NULL AND > > toode NOT IN (SELECT TOODE FROM firma1.TOODE); > > > > Query returned successfully: 0 rows affected, 594813 ms > execution time. > > > > explain window shows: > > > > Seq Scan on rid (cost=2581.07..20862553.77 rows=51848 width=1207) > > Filter: ((toode IS NOT NULL) AND (NOT (subplan))) > > SubPlan > > -> Materialize (cost=2581.07..2944.41 rows=14734 width=84) > > -> Seq Scan on toode (cost=0.00..2350.34 rows=14734 > > width=84) > > Let me guess, you've updated it a lot and aren't familiar with Vacuum? > > run a vacuum full on your database. schedule a vacuum (plain > one) to run every so often (hours or days are a good interval > for most folks) > > If that's NOT your problem, then please, let us know. > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Explain analyze would be nice ;-) > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andrus > Sent: Tuesday, April 04, 2006 3:37 PM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Query runs too long for indexed tables > > I have relatively small tables (toode and rid) in fast server. > Both tables are indexed on toode field. > > Following query takes long time to run. > toode field type is char(20). It is difficult to change this > field type. > > Any idea how to speed up this query ? > > UPDATE firma1.rid SET toode=NULL > WHERE toode IS NOT NULL AND > toode NOT IN (SELECT TOODE FROM firma1.TOODE); > > Query returned successfully: 0 rows affected, 594813 ms > execution time. > > explain window shows: > > Seq Scan on rid (cost=2581.07..20862553.77 rows=51848 width=1207) > Filter: ((toode IS NOT NULL) AND (NOT (subplan))) > SubPlan > -> Materialize (cost=2581.07..2944.41 rows=14734 width=84) > -> Seq Scan on toode (cost=0.00..2350.34 > rows=14734 width=84) > > > Andrus. > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
> Let me guess, you've updated it a lot and aren't familiar with Vacuum? > > run a vacuum full on your database. schedule a vacuum (plain one) to > run every so often (hours or days are a good interval for most folks) > > If that's NOT your problem, then please, let us know. Scot, thank you. Excellent. If database is created and VACUUM ANALYZE is issued, this query runs fast. However, I need to speed up it during running script. This is a database creation script. Script does the following: 1. CREATE DATABASE foo; 2. START TRANSACTION; 3. Create 145 tables with primary keys. Add data to those tables. 4. Create some additional indexes 5. ANALYZE 6. Clear bad bad foreign keys fields using commands like UPDATE firma1.rid SET toode=NULL WHERE toode IS NOT NULL AND toode NOT IN (SELECT TOODE FROM firma1.TOODE); 7. Create foreign key references 8. COMMIT This script runs about 1 hour in modern server with fsync off. Largest table has 100000 records, few other tables have 15000 records and remaining have fewer records. How to speed this up ? Is'nt running ANALYZE sufficient to speed up foreign key clearing ? It seems that ANALYZE does'nt work. Should I isse COMMIT before running ANALYZE or issue more commits? Server has 4 GB RAM postgres.conf file is default from 8.1.3 window zip file except the following settings are added to end: fsync=off shared_buffers = 30000 redirect_stderr = on log_min_error_statement = error autovacuum = on ... also 2 stats settings from aurtovacuur max_fsm_pages = 30000 Andrus.
> UPDATE firma1.rid SET toode=NULL > WHERE toode IS NOT NULL AND > toode NOT IN (SELECT TOODE FROM firma1.TOODE); Why not use a LEFT JOIN for this ?
> Why not use a LEFT JOIN for this ? UPDATE firma1.rid SET rid.toode=NULL LEFT join firma1.toode using(toode) WHERE rid.toode IS NOT NULL AND toode.toode IS NULL; Causes: ERROR: syntax error at or near "LEFT" at character 41 outer joins are not supported in Postgres UPDATE command. Andrus.
> outer joins are not supported in Postgres UPDATE command. True (and sad). You can try the following script to play with the various options : DROP TABLE one; DROP TABLE two; CREATE TABLE one (a SERIAL PRIMARY KEY, b INT NULL); CREATE TABLE two (b INT NOT NULL PRIMARY KEY); INSERT INTO two (b) SELECT x*2 FROM generate_series( 1, 50000 ) AS x; INSERT INTO one (b) SELECT x FROM generate_series( 1, 100000 ) AS x; EXPLAIN ANALYZE SELECT count(*) FROM one LEFT JOIN two ON one.b=two.b WHERE two.b IS NULL; --Try with and without... --CREATE INDEX one_b ON one(b); VACUUM ANALYZE one; VACUUM ANALYZE two; EXPLAIN ANALYZE SELECT count(*) FROM one LEFT JOIN two ON one.b=two.b WHERE two.b IS NULL; BEGIN; EXPLAIN ANALYZE UPDATE one SET b=NULL WHERE b NOT IN (SELECT b FROM two ); SELECT * FROM one ORDER BY a LIMIT 5; ROLLBACK; VACUUM one; BEGIN; EXPLAIN ANALYZE UPDATE one SET b=NULL WHERE b IN (SELECT one.b FROM one LEFT JOIN two ON one.b=two.b WHERE two.b IS NULL); SELECT * FROM one ORDER BY a LIMIT 5; ROLLBACK; VACUUM one; BEGIN; EXPLAIN ANALYZE UPDATE one SET b=NULL FROM one x LEFT JOIN two ON x.b=two.b WHERE two.b IS NULL AND one.a=x.a; SELECT * FROM one ORDER BY a LIMIT 5; ROLLBACK; VACUUM one; BEGIN; EXPLAIN ANALYZE UPDATE one SET b=NULL FROM one x LEFT JOIN two ON x.b=two.b WHERE two.b IS NULL AND one.b=x.b; SELECT * FROM one ORDER BY a LIMIT 5; ROLLBACK; VACUUM one; BEGIN; EXPLAIN ANALYZE UPDATE one SET b=(SELECT two.b FROM two WHERE two.b=one.b); SELECT * FROM one ORDER BY a LIMIT 5; ROLLBACK; VACUUM one; BEGIN; EXPLAIN ANALYZE UPDATE one SET b=NULL WHERE NOT EXISTS (SELECT 1 FROM two WHERE two.b = one.b); SELECT * FROM one ORDER BY a LIMIT 5; ROLLBACK; VACUUM one; BEGIN; CREATE TABLE tmp AS SELECT one.a, two.b FROM one LEFT JOIN two ON one.b=two.b; SELECT * FROM tmp ORDER BY a LIMIT 5; DROP TABLE one; ALTER TABLE tmp RENAME TO one; ROLLBACK;
"Andrus" <eetasoft@online.ee> writes: > UPDATE firma1.rid SET toode=NULL > WHERE toode IS NOT NULL AND > toode NOT IN (SELECT TOODE FROM firma1.TOODE); > How to speed this up ? Increasing work_mem to the point where you get a hashed NOT-IN would help, probably. Have you tried using EXPLAIN to see what the plan is for the UPDATEs? regards, tom lane