Re: planner question.. - Mailing list pgsql-sql

From Rajesh Kumar Mallah
Subject Re: planner question..
Date
Msg-id 200304172040.06654.mallah@trade-india.com
Whole thread Raw
In response to Re: planner question..  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: planner question..
List pgsql-sql
On Thursday 17 Apr 2003 7:56 pm, you wrote:
> Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> > For a distribution of data like below why does the planner
> > choses to do an index scan by default for source = 'REGIS' when > 50%
> > of the rows are having source='REGIS'.
>
> Are there a huge number of dead rows in the table?  ("vacuum verbose"
> would give some info)
>

Yes as you predicted after a vacuum verbose the planner did
switch to seq scan and was only marginally slower than index
scan. (do u think we shud live with it?)

so whats the moral ?
ANALYZE itself is not sufficient in updating pg_statistic,
we must do VACUUM ANALYZE always ,
if so why would a seperate ANALYZE command exist ?

here is the info (in the same order as commands were executed)

tradein_clients=# explain analyze SELECT count(*) from  general.profile_master where source='REGIS';

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                           QUERY PLAN
                                        |
 

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Aggregate  (cost=18225.48..18225.48 rows=1 width=0) (actual time=6295.33..6295.34 rows=1 loops=1)
                                        |
 
|   ->  Index Scan using profile_master_index_source on profile_master  (cost=0.00..18213.31 rows=4871 width=0) (actual
time=14.54..6132.61rows=102090 loops=1) |
 
|         Index Cond: (source = 'REGIS'::character varying)
                                        |
 
| Total runtime: 6295.41 msec
                                        |
 

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
(4 rows)



tradein_clients=# VACUUM VERBOSE general.profile_master;
INFO:  --Relation general.profile_master--
INFO:  Index profile_master_profile_id_pkey: Pages 426; Tuples 178031: Deleted 11802.       CPU 0.02s/0.12u sec elapsed
0.24sec.
 
INFO:  Index profile_master_index_userid: Pages 431; Tuples 178031: Deleted 8940.       CPU 0.00s/0.13u sec elapsed
0.14sec.
 
INFO:  Index profile_master_index_creation_date: Pages 420; Tuples 178031: Deleted 8940.       CPU 0.02s/0.12u sec
elapsed0.20 sec.
 
INFO:  Index profile_master_index_company_id: Pages 419; Tuples 178031: Deleted 8940.       CPU 0.01s/0.10u sec elapsed
0.15sec.
 
INFO:  Index profile_master_index_eyp_list_id: Pages 419; Tuples 178031: Deleted 8940.       CPU 0.01s/0.11u sec
elapsed0.45 sec.
 
INFO:  Index profile_master_index_iid_list_id: Pages 420; Tuples 178031: Deleted 8940.       CPU 0.01s/0.10u sec
elapsed0.46 sec.
 
INFO:  Index profile_master_index_ip_list_id: Pages 421; Tuples 178031: Deleted 8940.       CPU 0.00s/0.11u sec elapsed
0.12sec.
 
INFO:  Index profile_master_index_catalog_company_id: Pages 419; Tuples 178031: Deleted 8940.       CPU 0.00s/0.12u sec
elapsed0.46 sec.
 
INFO:  Index profile_master_index_source: Pages 619; Tuples 178031: Deleted 8940.       CPU 0.02s/0.09u sec elapsed
0.29sec.
 
INFO:  Removed 11802 tuples in 495 pages.       CPU 0.01s/0.03u sec elapsed 0.14 sec.
INFO:  Pages 23923: Changed 36, Empty 0; Tup 178031: Vac 11802, Keep 0, UnUsed 325029.       Total CPU 0.96s/1.19u sec
elapsed11.97 sec.
 
INFO:  --Relation pg_toast.pg_toast_122045388--
INFO:  Pages 31: Changed 0, Empty 0; Tup 189: Vac 0, Keep 0, UnUsed 0.       Total CPU 0.00s/0.00u sec elapsed 0.03
sec.
VACUUM

tradein_clients=# explain analyze SELECT count(*) from  general.profile_master where source='REGIS';

+-----------------------------------------------------------------------------------------------------------------------------+
|                                                         QUERY PLAN
     |
 

+-----------------------------------------------------------------------------------------------------------------------------+
| Aggregate  (cost=26398.37..26398.37 rows=1 width=0) (actual time=8355.76..8355.76 rows=1 loops=1)
     |
 
|   ->  Seq Scan on profile_master  (cost=0.00..26148.39 rows=99994 width=0) (actual time=16.23..8237.53 rows=102090
loops=1)|
 
|         Filter: (source = 'REGIS'::character varying)
     |
 
| Total runtime: 8355.88 msec
     |
 

+-----------------------------------------------------------------------------------------------------------------------------+
(4 rows)


> The given result seems suspect; an indexscan couldn't possibly read >50%
> of the rows in less than a quarter of the time for a seqscan.  Unless
> (a) the table contains vast amounts of empty space that the seqscan has to
> slog through, or 

Sorry i dont understand the vacuum verbose output throughly , does it looks like
though ?

(b) your second measurement is bogus due to caching
> performed by the first measurement.

I am not sure this time but i have the habit of running 
EXPLAIN ANALYZE thrice and post the middle one ;-)



>
> Also, might the table be in order by the "source" column?  A
> sufficiently high correlation might have persuaded the planner to try an
> indexscan even if point (a) isn't true.

Yes the data is loaded from one source completely before loading
from another source. So they were in order but i did a lot of updates
and deletes.

regds
mallah.

>
>             regards, tom lane
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.



pgsql-sql by date:

Previous
From: Rajesh Kumar Mallah
Date:
Subject: Re: analyse question..
Next
From: Rajesh Kumar Mallah
Date:
Subject: Re: planner question..