Automatic analyze on select into - Mailing list pgsql-performance

From Jim C. Nasby
Subject Automatic analyze on select into
Date
Msg-id 20030426101537.F66185@flake.decibel.org
Whole thread Raw
Responses Re: Automatic analyze on select into  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
BEGIN;
    SET LOCAL enable_seqscan = off;
    SELECT id, team_id, sum(work_units) AS work_units
        INTO TEMP email_contrib_summary
        FROM email_contrib
        WHERE project_id = :ProjectID
        GROUP by id, team_id
    ;
COMMIT;

inserts 29000 rows...

UPDATE email_contrib_summary
    SET id = sp.retire_to
    FROM stats_participant sp
    WHERE sp.id = email_contrib_summary.id
        AND sp.retire_to >= 0
        AND (sp.retire_date >= (SELECT ps.last_date FROM project_statsrun ps WHERE ps.project_id = :ProjectID)
                OR sp.retire_date IS NULL)
;
 Nested Loop  (cost=0.00..5475.20 rows=982 width=54) (actual time=25.54..2173363.11 rows=29181 loops=1)
   InitPlan
     ->  Seq Scan on project_statsrun ps  (cost=0.00..1.06 rows=1 width=4) (actual time=0.06..0.07 rows=1 loops=1)
           Filter: (project_id = 8)
   ->  Seq Scan on email_contrib_summary  (cost=0.00..20.00 rows=1000 width=46) (actual time=25.11..1263.26 rows=29753
loops=1)
   ->  Index Scan using stats_participant__participantretire_id on stats_participant sp  (cost=0.00..5.44 rows=1
width=8)(actual time=2.16..72.93 rows=1 loops=29753) 
         Index Cond: ((sp.retire_to >= 0) AND (sp.id = "outer".id))
         Filter: ((retire_date >= $0) OR (retire_date IS NULL))
 Total runtime: 2174315.61 msec

GAH! 45 minutes to update 29k rows! BUT, if I do

 Hash Join  (cost=41104.03..42410.14 rows=29166 width=38) (actual time=8391.81..10925.07 rows=29181 loops=1)
   Hash Cond: ("outer".id = "inner".id)
   InitPlan
     ->  Seq Scan on project_statsrun ps  (cost=0.00..1.06 rows=1 width=4) (actual time=0.05..0.06 rows=1 loops=1)
           Filter: (project_id = 8)
   ->  Seq Scan on email_contrib_summary  (cost=0.00..496.01 rows=29701 width=30) (actual time=0.20..387.95 rows=29753
loops=1)
   ->  Hash  (cost=13939.69..13939.69 rows=394217 width=8) (actual time=8390.72..8390.72 rows=0 loops=1)
         ->  Seq Scan on stats_participant sp  (cost=0.00..13939.69 rows=394217 width=8) (actual time=0.22..5325.38
rows=389115loops=1) 
               Filter: ((retire_to >= 0) AND ((retire_date >= $0) OR (retire_date IS NULL)))
 Total runtime: 11584.09 msec


Ahhh... soothing relief...

So, question is, would it make sense to automatically do an analyze
after/during a SELECT INTO? Would it be very expensive to analyze the
data as it's being inserted? I think it's pretty well understood that
you want to vacuum/vacuum analyze the entire database regularly, but
that obviously wouldn't help temporary tables... maybe it makes the most
sense to automatically analyze temporary tables only. For that matter,
since temp tables only have one operation performed on them at a time,
maybe it makes sense to keep stats for them up-to-date as part of every
operation?
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-performance by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: Indexes with different datatypes:Correction
Next
From: Tom Lane
Date:
Subject: Re: Automatic analyze on select into