Re: how to speed up query - Mailing list pgsql-general

From Andrus
Subject Re: how to speed up query
Date
Msg-id f56ia8$2ins$1@news.hub.org
Whole thread Raw
In response to Re: how to speed up query  (Erwin Brandstetter <brsaweda@gmail.com>)
List pgsql-general
>> CREATE TEMP TABLE mydel AS
>>  SELECT r.dokumnr
>>  FROM rid r
>>  LEFT JOIN dok d USING (dokumnr)
>>  WHERE d.dokumnr IS NULL;
>> DELETE FROM rid USING mydel WHERE rid.dokumnr =mydel.dokumnr;
>> drop table mydel;
>
> As I mentioned when I proposed it, the temp table may not even be
> necessary. The important part is the LEFT JOIN instead of the NOT IN
> (as Martijn has explained).
> You could try the direct approach ...
>
> DELETE FROM rid
> USING ( SELECT r.dokumnr
> FROM rid r
> LEFT JOIN dok d USING (dokumnr)
> WHERE d.dokumnr IS NULL) x
> WHERE rid.dokumnr = x.dokumnr;
> ... and see which runs faster. Probably it does not make much of a
> difference.


Thank you.
I changed my DELETE commands to use internal table. This works fast.

I tried to change my update commands also to use internal table.
However, this causes update command to run 310 minutes:

update bilkaib SET cr4objekt=NULL
       FROM ( SELECT r.cr4objekt as key
  FROM bilkaib r
  LEFT JOIN yksus4 d ON d.YKSUS     =r.cr4objekt
     WHERE d.YKSUS      IS NULL) mydel
    WHERE cr4objekt IS NOT NULL AND
bilkaib.cr4objekt= mydel.key;

No idea why this does not work fast like in DELETE command. cr4objekt type
is CHAR(10)  maybe this makes internal table slow.

So I changed my script to

DROP TABLE if exists mydel;

CREATE TEMP TABLE mydel AS
  SELECT r.<<cchildkey>> as key
  FROM <<m.cChildtable>> r
  LEFT JOIN <<cmaintable>> d ON d.<<mainkey>>=r.<<cchildkey>>
     WHERE d.<<mainkey>> IS NULL;

update <<m.cChildtable>> SET <<cchildkey>>=NULL
       FROM mydel
    WHERE <<cchildkey>> IS NOT NULL AND
<<m.cChildtable>>.<<cchildkey>>= mydel.key;

Hope this will run fast (will test tomorrow).

My original skript

UPDATE <<m.cChildtable>> SET <<cchildkey>>=NULL
        WHERE <<cchildkey>> IS NOT NULL AND
        <<cchildkey>> NOT IN (SELECT <<mainkey>> FROM <<cmaintable>>);

runs 27 minutes in some cases.


> If the temp table works for you, you might be interested in a new
> feature of 8.2: CREATE TEMP TABLE AS ...ON COMMIT DROP;
> http://www.postgresql.org/docs/current/static/sql-createtableas.html

Per Tom remart , I removed transactions. Now every statement runs in
separate transaction.
In this case ON COMMIT DROP is useless.
ON COMMIT DROP exists in 8.1 also.
8.2 adds DROP IF EXISTS.

>> explain analyze  SELECT r.dokumnr
>>  FROM rid r
>>  LEFT JOIN dok d USING (dokumnr)
>>  WHERE d.dokumnr IS NULL
>>
>> returns
>>
>> "Hash Left Join  (cost=7760.27..31738.02 rows=1 width=4) (actual
>> time=2520.904..2520.904 rows=0 loops=1)"
>> "  Hash Cond: (r.dokumnr = d.dokumnr)"
>> "  Filter: (d.dokumnr IS NULL)"
>> "  ->  Seq Scan on rid r  (cost=0.00..17424.24 rows=202424 width=4)
>> (actual
>> time=0.032..352.225 rows=202421 loops=1)"
>> "  ->  Hash  (cost=6785.01..6785.01 rows=56101 width=4) (actual
>> time=211.150..211.150 rows=56079 loops=1)"
>> "        ->  Seq Scan on dok d  (cost=0.00..6785.01 rows=56101 width=4)
>> (actual time=0.021..147.805 rows=56079 loops=1)"
>> "Total runtime: 2521.091 ms"
>
> If the indices are present (and visible) at the time of execution, as
> you described it, we should be seeing index scans on dok_dokumnr_idx
> and rid_dokumnr_idx instead of sequential scans.
>
> That's what I get on a similar query in one of my databases:
> EXPLAIN ANALYZE SELECT a.adr_id FROM cp.adr a LEFT JOIN cp.kontakt k
> USING (adr_id) WHERE k.adr_id IS NULL;
>
> Merge Left Join  (cost=0.00..1356.31 rows=10261 width=4) (actual
> time=0.096..56.759 rows=3868 loops=1)
>  Merge Cond: ("outer".adr_id = "inner".adr_id)
>  Filter: ("inner".adr_id IS NULL)
>  ->  Index Scan using adr_pkey on adr a  (cost=0.00..947.54
> rows=10261 width=4) (actual time=0.012..23.118 rows=10261 loops=1)
>  ->  Index Scan using kontakt_adr_id_idx on kontakt k
> (cost=0.00..295.47 rows=7011 width=4) (actual time=0.007..13.299
> rows=7011 loops=1)
> Total runtime: 58.510 ms

I have no idea why my query plan shows hash and your plan show merge.
My primary key (dokumnr is of type integer). Maybe this selects hash plan.

For my big database I got the following plan:

 explain analyze  SELECT r.dokumnr
 FROM rid r
 LEFT JOIN dok d USING (dokumnr)
 WHERE d.dokumnr IS NULL

      1  Hash Left Join  (cost=7759.44..31738.44 rows=1
         width=4) (actual time=112.572..761.121 rows=3
         loops=1)
      2    Hash Cond: (r.dokumnr = d.dokumnr)
      3    Filter: (d.dokumnr IS NULL)
      4    ->  Seq Scan on rid r  (cost=0.00..17424.64
         rows=202464 width=4) (actual time=0.007..175.538
         rows=202424 loops=1)
      5    ->  Hash  (cost=6784.64..6784.64 rows=56064
         width=4) (actual time=111.296..111.296 rows=56079
         loops=1)
      6          ->  Seq Scan on dok d  (cost=0.00..6784.64
         rows=56064 width=4) (actual time=0.005..58.686
         rows=56079 loops=1)
      7  Total runtime: 761.311 ms

Since there are a lot of rows (202424 swown), this select command must use
indexes.
Without indexes it is not possible toobtain  speed  of 0.7 seconds.

Andrus.


pgsql-general by date:

Previous
From: Ranieri Mazili
Date:
Subject: Setting variable
Next
From: lawpoop@gmail.com
Date:
Subject: Re: persistent db connections in PHP