Thread: Query runs too long for indexed tables

Query runs too long for indexed tables

From
"Andrus"
Date:
I have relatively small tables (toode and rid) in fast server.
Both tables are indexed on toode field.

Following query takes long time to run.
toode field type is char(20). It is difficult to change this field type.

Any idea how to speed up this query ?

UPDATE firma1.rid SET toode=NULL
       WHERE toode IS NOT NULL AND
       toode NOT IN (SELECT TOODE      FROM firma1.TOODE);

Query returned successfully: 0 rows affected, 594813 ms execution time.

explain window shows:

Seq Scan on rid  (cost=2581.07..20862553.77 rows=51848 width=1207)
  Filter: ((toode IS NOT NULL) AND (NOT (subplan)))
  SubPlan
    ->  Materialize  (cost=2581.07..2944.41 rows=14734 width=84)
          ->  Seq Scan on toode  (cost=0.00..2350.34 rows=14734 width=84)


Andrus.



Re: Query runs too long for indexed tables

From
Scott Marlowe
Date:
On Tue, 2006-04-04 at 14:37, Andrus wrote:
> I have relatively small tables (toode and rid) in fast server.
> Both tables are indexed on toode field.
>
> Following query takes long time to run.
> toode field type is char(20). It is difficult to change this field type.
>
> Any idea how to speed up this query ?
>
> UPDATE firma1.rid SET toode=NULL
>        WHERE toode IS NOT NULL AND
>        toode NOT IN (SELECT TOODE      FROM firma1.TOODE);
>
> Query returned successfully: 0 rows affected, 594813 ms execution time.
>
> explain window shows:
>
> Seq Scan on rid  (cost=2581.07..20862553.77 rows=51848 width=1207)
>   Filter: ((toode IS NOT NULL) AND (NOT (subplan)))
>   SubPlan
>     ->  Materialize  (cost=2581.07..2944.41 rows=14734 width=84)
>           ->  Seq Scan on toode  (cost=0.00..2350.34 rows=14734 width=84)

Let me guess, you've updated it a lot and aren't familiar with Vacuum?

run a vacuum full on your database.  schedule a vacuum (plain one) to
run every so often (hours or days are a good interval for most folks)

If that's NOT your problem, then please, let us know.

Re: Query runs too long for indexed tables

From
"Marc Morin"
Date:
Wondering if

Update firma1.rid set toode=null where toode is not null and not
exists(select 1 from firma1.toode where toode=rid.toode);

Would be faster... Problem appears to be the seqscan of seqscan... No?

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> Scott Marlowe
> Sent: Tuesday, April 04, 2006 3:49 PM
> To: Andrus
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Query runs too long for indexed tables
>
> On Tue, 2006-04-04 at 14:37, Andrus wrote:
> > I have relatively small tables (toode and rid) in fast server.
> > Both tables are indexed on toode field.
> >
> > Following query takes long time to run.
> > toode field type is char(20). It is difficult to change
> this field type.
> >
> > Any idea how to speed up this query ?
> >
> > UPDATE firma1.rid SET toode=NULL
> >        WHERE toode IS NOT NULL AND
> >        toode NOT IN (SELECT TOODE      FROM firma1.TOODE);
> >
> > Query returned successfully: 0 rows affected, 594813 ms
> execution time.
> >
> > explain window shows:
> >
> > Seq Scan on rid  (cost=2581.07..20862553.77 rows=51848 width=1207)
> >   Filter: ((toode IS NOT NULL) AND (NOT (subplan)))
> >   SubPlan
> >     ->  Materialize  (cost=2581.07..2944.41 rows=14734 width=84)
> >           ->  Seq Scan on toode  (cost=0.00..2350.34 rows=14734
> > width=84)
>
> Let me guess, you've updated it a lot and aren't familiar with Vacuum?
>
> run a vacuum full on your database.  schedule a vacuum (plain
> one) to run every so often (hours or days are a good interval
> for most folks)
>
> If that's NOT your problem, then please, let us know.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: Query runs too long for indexed tables

From
"Marc Morin"
Date:
Explain analyze would be nice ;-)

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andrus
> Sent: Tuesday, April 04, 2006 3:37 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Query runs too long for indexed tables
>
> I have relatively small tables (toode and rid) in fast server.
> Both tables are indexed on toode field.
>
> Following query takes long time to run.
> toode field type is char(20). It is difficult to change this
> field type.
>
> Any idea how to speed up this query ?
>
> UPDATE firma1.rid SET toode=NULL
>        WHERE toode IS NOT NULL AND
>        toode NOT IN (SELECT TOODE      FROM firma1.TOODE);
>
> Query returned successfully: 0 rows affected, 594813 ms
> execution time.
>
> explain window shows:
>
> Seq Scan on rid  (cost=2581.07..20862553.77 rows=51848 width=1207)
>   Filter: ((toode IS NOT NULL) AND (NOT (subplan)))
>   SubPlan
>     ->  Materialize  (cost=2581.07..2944.41 rows=14734 width=84)
>           ->  Seq Scan on toode  (cost=0.00..2350.34
> rows=14734 width=84)
>
>
> Andrus.
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Query runs too long for indexed tables

From
"Andrus"
Date:
> Let me guess, you've updated it a lot and aren't familiar with Vacuum?
>
> run a vacuum full on your database.  schedule a vacuum (plain one) to
> run every so often (hours or days are a good interval for most folks)
>
> If that's NOT your problem, then please, let us know.

Scot, thank you. Excellent. If database is created and VACUUM ANALYZE is
issued, this query runs fast.
However, I need to speed up it during running script.

This is a database creation script. Script does the following:

1. CREATE DATABASE foo;
2. START TRANSACTION;
3. Create 145 tables with primary keys. Add data to those tables.
4. Create some additional indexes
5. ANALYZE
6. Clear bad bad foreign keys fields using commands like

UPDATE firma1.rid SET toode=NULL
        WHERE toode IS NOT NULL AND
        toode NOT IN (SELECT TOODE      FROM firma1.TOODE);

7. Create foreign key references
8. COMMIT

This script runs about 1 hour  in modern server with  fsync off.
Largest table has 100000 records, few other tables have 15000 records and
remaining have fewer records.

How to speed this up ?
Is'nt running ANALYZE sufficient to speed up foreign key clearing ?

It seems that ANALYZE does'nt work. Should I isse COMMIT before running
ANALYZE or issue more commits?

Server has 4 GB RAM

postgres.conf file is default from 8.1.3 window zip file  except the
following settings are added to end:

fsync=off
shared_buffers = 30000
redirect_stderr = on
log_min_error_statement = error
autovacuum = on
... also 2 stats settings from aurtovacuur
max_fsm_pages = 30000

Andrus.



Re: Query runs too long for indexed tables

From
PFC
Date:

> UPDATE firma1.rid SET toode=NULL
>         WHERE toode IS NOT NULL AND
>         toode NOT IN (SELECT TOODE      FROM firma1.TOODE);

    Why not use a LEFT JOIN for this ?

Re: Query runs too long for indexed tables

From
"Andrus"
Date:
> Why not use a LEFT JOIN for this ?

UPDATE firma1.rid SET rid.toode=NULL
LEFT join firma1.toode using(toode)
         WHERE rid.toode IS NOT NULL AND toode.toode IS NULL;

Causes:

ERROR:  syntax error at or near "LEFT" at character 41

outer joins are not supported in Postgres UPDATE command.

Andrus.



Re: Query runs too long for indexed tables

From
PFC
Date:
> outer joins are not supported in Postgres UPDATE command.

    True (and sad).

    You can try the following script to play with the various options :


DROP TABLE one;
DROP TABLE two;

CREATE TABLE one (a SERIAL PRIMARY KEY, b INT NULL);
CREATE TABLE two (b INT NOT NULL PRIMARY KEY);
INSERT INTO two (b) SELECT x*2 FROM generate_series( 1, 50000 ) AS x;
INSERT INTO one (b) SELECT x FROM generate_series( 1, 100000 ) AS x;

EXPLAIN ANALYZE SELECT count(*) FROM one LEFT JOIN two ON one.b=two.b
WHERE two.b IS NULL;

--Try with and without...
--CREATE INDEX one_b ON one(b);

VACUUM ANALYZE one;
VACUUM ANALYZE two;

EXPLAIN ANALYZE SELECT count(*) FROM one LEFT JOIN two ON one.b=two.b
WHERE two.b IS NULL;

BEGIN;
EXPLAIN ANALYZE UPDATE one SET b=NULL WHERE b NOT IN (SELECT b FROM two );
SELECT * FROM one ORDER BY a LIMIT 5;
ROLLBACK;
VACUUM one;

BEGIN;
EXPLAIN ANALYZE UPDATE one SET b=NULL WHERE b IN (SELECT one.b FROM one
LEFT JOIN two ON one.b=two.b WHERE two.b IS NULL);
SELECT * FROM one ORDER BY a LIMIT 5;
ROLLBACK;
VACUUM one;

BEGIN;
EXPLAIN ANALYZE UPDATE one SET b=NULL FROM one x LEFT JOIN two ON
x.b=two.b WHERE two.b IS NULL AND one.a=x.a;
SELECT * FROM one ORDER BY a LIMIT 5;
ROLLBACK;
VACUUM one;

BEGIN;
EXPLAIN ANALYZE UPDATE one SET b=NULL FROM one x LEFT JOIN two ON
x.b=two.b WHERE two.b IS NULL AND one.b=x.b;
SELECT * FROM one ORDER BY a LIMIT 5;
ROLLBACK;
VACUUM one;

BEGIN;
EXPLAIN ANALYZE UPDATE one SET b=(SELECT two.b FROM two WHERE two.b=one.b);
SELECT * FROM one ORDER BY a LIMIT 5;
ROLLBACK;
VACUUM one;

BEGIN;
EXPLAIN ANALYZE UPDATE one SET b=NULL WHERE NOT EXISTS (SELECT 1 FROM two
WHERE two.b = one.b);
SELECT * FROM one ORDER BY a LIMIT 5;
ROLLBACK;
VACUUM one;

BEGIN;
CREATE TABLE tmp AS SELECT one.a, two.b FROM one LEFT JOIN two ON
one.b=two.b;
SELECT * FROM tmp ORDER BY a LIMIT 5;
DROP TABLE one;
ALTER TABLE tmp RENAME TO one;
ROLLBACK;



Re: Query runs too long for indexed tables

From
Tom Lane
Date:
"Andrus" <eetasoft@online.ee> writes:
> UPDATE firma1.rid SET toode=NULL
>         WHERE toode IS NOT NULL AND
>         toode NOT IN (SELECT TOODE      FROM firma1.TOODE);

> How to speed this up ?

Increasing work_mem to the point where you get a hashed NOT-IN would
help, probably.  Have you tried using EXPLAIN to see what the plan is
for the UPDATEs?

            regards, tom lane