Re: feeding big script to psql - Mailing list pgsql-general

From Peter Wilson
Subject Re: feeding big script to psql
Date
Msg-id dcokg3$31d5$1@news.hub.org
Whole thread Raw
In response to Re: feeding big script to psql  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: feeding big script to psql  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
 > Peter Wilson <petew@yellowhawk.co.uk> writes:
 >> I found a while ago that after inserting a lot of rows into a clean
 >> Postgres table it would take several minutes just to analyse a command,
 >> not even starting the execution.
 >
 > Oh?  Could you provide a test case for this?  I can certainly believe
 > that the planner might choose a bad plan if it has no statistics, but
 > it shouldn't take a long time to do it.
 >
 >             regards, tom lane
 >
 > ---------------------------(end of broadcast)---------------------------
 > TIP 2: Don't 'kill -9' the postmaster
 >

Hi Tom,
I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump
isabout 95Mbytes, none of the  
tables are particularly big.

On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain
button.According to the manual,  
this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a
commandyou want the plan it's  
going to use - not wait for it to finish!)

That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't
understandhow this query even  
with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze  651 seconds for execute).

OK - the query is:

SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c
    WHERE c.client_id = 'gadget'
      AND c.instance = '0'
      AND (    c.type = 'COMMUNITY'
            OR c.type = 'OU'
            OR c.type = 'INDIVIDUAL'
          )
      AND c.contact_id in (
             SELECT subb.community_id  FROM contact_att subb
                WHERE subb.client_id = 'gadget'
                  AND subb.instance = '0'
                  AND subb.contact_id = 3854.000000
         ) ;

By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a
constantset in effect. It would  
seem that in the worst case assuming no index or size information, the planner should spot the invariance of the
subselectand a sequential scan of  
the 'contacts' table would be the worst result I would expect.

There are two tables involved in this query.

    'contacts' contains 3728 records.
    'contact_att' contains 19217 records.

The query plan yields :

Unique  (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1)
   ->  Sort  (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1)
         Sort Key: c.client_id, c.instance, c.contact_id, c.uname
         ->  Nested Loop IN Join  (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3
loops=1)
               Join Filter: ("outer".contact_id = "inner".community_id)
               ->  Index Scan using cos_uk on contacts c  (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862
rows=2791loops=1) 
                     Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                     Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text
='INDIVIDUAL'::text)) 
               ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..6.01 rows=1 width=8) (actual
time=0.207..234.423rows=3 loops=2791) 
                     Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                     Filter: ((contact_id)::numeric = 3854.000000)
Total runtime: 654492.320 ms

Definitions for the two relevant tables are:

create table contacts (
     INSTANCE CHARACTER (1)  NOT NULL ,
     client_id varchar (50) not null ,
     contact_id bigint default nextval('contacts_contact_id_seq'::text),
     UNAME VARCHAR (32)  NOT NULL ,
     TYPE VARCHAR (20)  NOT NULL DEFAULT 'INDIVIDUAL',
     parent bigint,
     NAME VARCHAR (240) ,
     PHONE VARCHAR (32) ,
     FAX VARCHAR (32) ,
     EMAIL VARCHAR (240) ,
     BASIC_PW VARCHAR (128) ,
     DESCRIPTION VARCHAR (240),
     custom_data varchar(8192),
CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID)
)

create table contact_att(
     instance character(1),
     client_id varchar(50) not null,
     contact_id bigint,
     community_id bigint,
     inherited smallint,
CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID)
)

CREATE INDEX CO_PA_IND ON CONTACTS (PARENT);
CREATE INDEX CO_TY_IND ON CONTACTS (TYPE);

--
Hope you find that useful! If there is anything else I can provide you with then please do let me know.

Pete


------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk



pgsql-general by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: indexes are fucked
Next
From: Peter Wilson
Date:
Subject: Re: feeding big script to psql