Re: how to speed up query - Mailing list pgsql-general

From Andrus
Subject Re: how to speed up query
Date
Msg-id f4oqhc$29ta$1@news.hub.org
Whole thread Raw
In response to Re: how to speed up query  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: how to speed up query  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: how to speed up query  (Erwin Brandstetter <brsaweda@gmail.com>)
List pgsql-general
> >  delete from firma1.rid where dokumnr not in (select dokumnr from
> >firma1.dok)

>For future reference, I beleive the problem is the NOT IN. It has this
>"feature" where if any of the rows it searches has a NULL, it will
>return FALSE for *all* rows. So the whole table has to be scanned to
>check that there arn't any NULLs, before a single row can be returned.
>This is why it can't be converted to a join.

Thank you.
As I understand, only way to optimize the statement

delete from firma1.rid where dokumnr not in (select dokumnr from
firma1.dok);

assuming that  firma1.dok.dokumnr does not contain null values is to change
it to

CREATE TEMP TABLE mydel AS
 SELECT r.dokumnr
 FROM rid r
 LEFT JOIN dok d USING (dokumnr)
 WHERE d.dokumnr IS NULL;
DELETE FROM rid USING mydel WHERE rid.dokumnr =mydel.dokumnr;
drop table mydel;


I run the following commands (first number of minutes from script start) in
my script:

18 Duration 2,9 minutes: ALTER TABLE dok ADD PRIMARY KEY (dokumnr)
...
81 Duration 9,6 minutes:  CREATE INDEX rid_dokumnr_idx ON rid (dokumnr)
...
101 Duration 10,5 minutes:  analyze
...
113  Duration 11 minutes: CREATE TEMP TABLE mydel AS
 SELECT r.dokumnr
 FROM rid r
 LEFT JOIN dok d USING (dokumnr)
 WHERE d.dokumnr IS NULL

122 Duration 9,6 minutes:  DELETE FROM rid USING mydel WHERE rid.dokumnr
=mydel.dokumnr

133 Duration 11 minutes:  ALTER TABLE rid ADD FOREIGN KEY (dokumnr)
REFERENCES dok
  ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE

When I run command

CREATE TEMP TABLE mydel AS
 SELECT r.dokumnr
 FROM rid r
 LEFT JOIN dok d USING (dokumnr)
 WHERE d.dokumnr IS NULL

from pgAdmin, it takes 1 second.
When I run this command from script it takes 11 minutes!

Any idea why running this command from script takes 11 minutes?

I have created
indexes on dok and rid and ran analyze before using CREATE TEMP TABLE

So I expect that CREATE TEMP TABLE command must take same time to run from
script and from pgAdmin.

My script in running in single transaction.
Should I use commit after index creation or after ANALYZE command?


In pgAdmin

explain analyze  SELECT r.dokumnr
 FROM rid r
 LEFT JOIN dok d USING (dokumnr)
 WHERE d.dokumnr IS NULL

returns

"Hash Left Join  (cost=7760.27..31738.02 rows=1 width=4) (actual
time=2520.904..2520.904 rows=0 loops=1)"
"  Hash Cond: (r.dokumnr = d.dokumnr)"
"  Filter: (d.dokumnr IS NULL)"
"  ->  Seq Scan on rid r  (cost=0.00..17424.24 rows=202424 width=4) (actual
time=0.032..352.225 rows=202421 loops=1)"
"  ->  Hash  (cost=6785.01..6785.01 rows=56101 width=4) (actual
time=211.150..211.150 rows=56079 loops=1)"
"        ->  Seq Scan on dok d  (cost=0.00..6785.01 rows=56101 width=4)
(actual time=0.021..147.805 rows=56079 loops=1)"
"Total runtime: 2521.091 ms"

Andrus.


pgsql-general by date:

Previous
From: Johannes Konert
Date:
Subject: Re: pg_xlog - files are guaranteed to be sequentialy named?
Next
From: Greg Smith
Date:
Subject: Re: pg_xlog - files are guaranteed to be sequentialy named?