Thread: help with getting index scan

help with getting index scan

From
"Thomas T. Thai"
Date:
i can't seem to get index scan to work on table phone_cat_address. can
anyone suggest a way to get index scan to work? here are my schemas:

CREATE TABLE "phone_address" (
        "id" integer DEFAULT nextval('"phone_address_id_seq"'::text) NOT
NULL,
        "aid" bigint,
        "name" character varying(96),
        "address" character varying(60),
  ...
        "nameftx" txtidx
);
CREATE UNIQUE INDEX phone_address_id_key
  ON phone_address USING btree (id);
CREATE UNIQUE INDEX phone_address_aid_key
  ON phone_address USING btree (aid);
CREATE INDEX phone_address_name_idx
  ON phone_address USING btree (lower(name));
CREATE INDEX phone_address_nameftx_idx
  ON phone_address USING gist (nameftx);

CREATE TABLE "phone_cat" (
        "id" integer DEFAULT nextval('"phone_cat_id_seq"'::text) NOT NULL,
        "cid" integer,
        "name" character varying(96),
        "popular" character(1) DEFAULT 'N',
        "nameftx" txtidx
);

CREATE UNIQUE INDEX phone_cat_id_key ON phone_cat USING btree (id);
CREATE UNIQUE INDEX phone_cat_cid_key ON phone_cat USING btree (cid);
CREATE INDEX phone_cat_name_idx ON phone_cat USING btree (lower(name));
CREATE INDEX phone_cat_nameftx_idx ON phone_cat USING gist (nameftx);

CREATE TABLE "phone_cat_address" (
        "cid" integer NOT NULL,
        "aid" bigint NOT NULL
);

CREATE UNIQUE INDEX phone_cat_address_cid_key
  ON phone_cat_address USING btree (cid, aid);

----

here is the explain:

yellowpages=# explain SELECT p.name,p.address,p.city,p.state
yellowpages-# FROM phone_address AS p, phone_cat AS pFROM phone_address AS
p, phone_cat AS pc, ph
one_cat_address AS pca
yellowpages-# WHERE pc.nameftx ## 'automobile&repair' AND pc.cid=pca.cid
AND pca.aid=p.aid
yellowpages-# ;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=44.12..9272.76 rows=337 width=83)
  ->  Hash Join  (cost=44.12..7243.86 rows=337 width=16)
        ->  Seq Scan on phone_cat_address pca  (cost=0.00..5512.02
rows=336702 width=12)
        ->  Hash  (cost=44.09..44.09 rows=11 width=4)
              ->  Index Scan using phone_cat_nameftx_idx on phone_cat pc
(cost=0.00..44.09 rows=
11 width=4)
  ->  Index Scan using phone_address_aid_key on phone_address p
(cost=0.00..6.01 rows=1 width=67
)

NOTICE:  QUERY PLAN:

Nested Loop  (cost=44.12..9272.76 rows=337 width=83)
  ->  Hash Join  (cost=44.12..7243.86 rows=337 width=16)
        ->  Seq Scan on phone_cat_address pca  (cost=0.00..5512.02
rows=336702 width=12)
        ->  Hash  (cost=44.09..44.09 rows=11 width=4)
              ->  Index Scan using phone_cat_nameftx_idx on phone_cat pc
(cost=0.00..44.09 rows=
11 width=4)
  ->  Index Scan using phone_address_aid_key on phone_address p
(cost=0.00..6.01 rows=1 width=67
)

EXPLAIN


Re: help with getting index scan

From
Doug McNaught
Date:
"Thomas T. Thai" <tom@minnesota.com> writes:

> > > i can't seem to get index scan to work on table phone_cat_address. here
> > > are my schemas:
> >
> > Standard question: have you run VACUUM ANALYZE?
>
> yes :) before each test.

Excellent.  ;)

>
> > Also, the estimate of rows returned from the phone_cat_address scan is
> > pretty large--how large is the table itself?  Sequential scan is
> > actually faster if you're going to end up returning most of the rows
> > in the table...
>
> yellowpages=# select count(*) from phone_cat_address;
>  count
> --------
>  336702
> (1 row)
>
> type typical results should be a tiny fraction of that number.

Well, EXPLAIN is indicating (unless I misread it) that the estimate of
rows returned is 336702, so it's not surprising that it opts for a
sequential scan.  Is this under 7.1 or 7.2?  The latter keeps much
better statistics about table populations...

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: help with getting index scan

From
Doug McNaught
Date:
"Thomas T. Thai" <tom@minnesota.com> writes:

> On 25 Feb 2002, Doug McNaught wrote:
> > Well, EXPLAIN is indicating (unless I misread it) that the estimate of
> > rows returned is 336702, so it's not surprising that it opts for a
> > sequential scan.  Is this under 7.1 or 7.2?  The latter keeps much
> > better statistics about table populations...
>
> this is under 7.2. is there away to force it to use index scan? cause
> right now when i'm searching using a cat reference, it's taking a few
> seconds.

I'm still suspicious that something is wrong, but you can do

SET enable_seqscan TO off;

before your query and see if it helps your performance.  If it makes a
significant difference let us know--Tom may be interested in trying to
improve the statistics.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: help with getting index scan

From
Doug McNaught
Date:
"Thomas T. Thai" <tom@minnesota.com> writes:

> i just tried turning off seq scan and the query still takes up to 8
> seconds which is 7 seconds too long. btw, how do you clean the cache
> from the last query?

I took a detailed look at your schema and I'm not enough of a guru to
offer any more advice (I've never used gist indexes or anything like
that).  Maybe Tom or someone with more knowledge than I will weigh
in.

The only way I know of to "clear the cache" is to restart the
database.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: help with getting index scan

From
"Thomas T. Thai"
Date:
On 25 Feb 2002, Doug McNaught wrote:

> > > Also, the estimate of rows returned from the phone_cat_address scan is
> > > pretty large--how large is the table itself?  Sequential scan is
> > > actually faster if you're going to end up returning most of the rows
> > > in the table...
> >
> > yellowpages=# select count(*) from phone_cat_address;
> >  count
> > --------
> >  336702
> > (1 row)
> >
> > type typical results should be a tiny fraction of that number.
>
> Well, EXPLAIN is indicating (unless I misread it) that the estimate of
> rows returned is 336702, so it's not surprising that it opts for a
> sequential scan.  Is this under 7.1 or 7.2?  The latter keeps much
> better statistics about table populations...

this is under 7.2. is there away to force it to use index scan? cause
right now when i'm searching using a cat reference, it's taking a few
seconds.

---
Thomas T. Thai | Minnesota.com | tom@minnesota.com | 612.220.6220
Visit http://www.minnesota.com/




Re: help with getting index scan

From
"Thomas T. Thai"
Date:
On Mon, 25 Feb 2002, Tom Lane wrote:

> "Thomas T. Thai" <tom@minnesota.com> writes:
> > On Mon, 25 Feb 2002, Tom Lane wrote:
> >> How many distinct cid values do you have?  Also, which PG version is
> >> this?
>
> >   5139
>
> Hmm, seems like that ought to be selective enough.  What does pg_stats
> show for phone_cat_address?  (And phone_cat, for that matter.)

sorry tom, i'm still new to PostgreSQL. what is pg_stats and how do i use
it in the way you've asked?

> If you set enable_seqscan to off, do you get a plan you like better?
> If so, what is it?

with seqscan off, the query still takes about the same about of time
(around 8 secs). i'd like to get it down to 1 if possible.



Re: help with getting index scan

From
"Thomas T. Thai"
Date:
On 25 Feb 2002, Doug McNaught wrote:

> "Thomas T. Thai" <tom@minnesota.com> writes:
>
> > i just tried turning off seq scan and the query still takes up to 8
> > seconds which is 7 seconds too long. btw, how do you clean the cache
> > from the last query?
>
> I took a detailed look at your schema and I'm not enough of a guru to
> offer any more advice (I've never used gist indexes or anything like
> that).  Maybe Tom or someone with more knowledge than I will weigh
> in.

oh don't worry about the gist index. that is actually really fast.

> The only way I know of to "clear the cache" is to restart the
> database.

ouch. lol. cause when i try to time the same query again, the cached time
isn't what i want.

thanks though doug.


Re: help with getting index scan

From
Tom Lane
Date:
"Thomas T. Thai" <tom@minnesota.com> writes:
> sorry tom, i'm still new to PostgreSQL. what is pg_stats and how do i use
> it in the way you've asked?

select * from pg_stats where tablename = 'foo';

>> If you set enable_seqscan to off, do you get a plan you like better?
>> If so, what is it?

> with seqscan off, the query still takes about the same about of time
> (around 8 secs). i'd like to get it down to 1 if possible.

I wanted to know what the plan and cost estimates are.  Also, it'd be
good to show EXPLAIN ANALYZE results, so that we can compare reality
to planner cost estimates ...

            regards, tom lane

Re: help with getting index scan

From
"Thomas T. Thai"
Date:
On 25 Feb 2002, Doug McNaught wrote:

> "Thomas T. Thai" <tom@minnesota.com> writes:
>
> > On 25 Feb 2002, Doug McNaught wrote:
> > > Well, EXPLAIN is indicating (unless I misread it) that the estimate of
> > > rows returned is 336702, so it's not surprising that it opts for a
> > > sequential scan.  Is this under 7.1 or 7.2?  The latter keeps much
> > > better statistics about table populations...
> >
> > this is under 7.2. is there away to force it to use index scan? cause
> > right now when i'm searching using a cat reference, it's taking a few
> > seconds.
>
> I'm still suspicious that something is wrong, but you can do
>
> SET enable_seqscan TO off;
>
> before your query and see if it helps your performance.  If it makes a
> significant difference let us know--Tom may be interested in trying to
> improve the statistics.

i just tried turning off seq scan and the query still takes up to 8
seconds which is 7 seconds too long. btw, how do you clean the cache
from the last query?



Re: help with getting index scan

From
"Thomas T. Thai"
Date:
On Mon, 25 Feb 2002, Tom Lane wrote:

This is getting rather long with the explains etc. should i continue to
cc: to the list?

> "Thomas T. Thai" <tom@minnesota.com> writes:
> > sorry tom, i'm still new to PostgreSQL. what is pg_stats and how do i use
> > it in the way you've asked?
>
> select * from pg_stats where tablename = 'foo';

yellowpages=# select * from pg_stats where tablename =
'phone_cat_address';
     tablename     | attname | null_frac | avg_width | n_distinct |
   most_common_vals                                  |
          most_common_freqs
|
                       histogram_bounds
| correlation
-------------------+---------+-----------+-----------+------------+------------------------------
-----------------------------------------------------+-------------------------------------------
--------------------------------------------------------------------------------+----------------
-------------------------------------------------------------------------------+-------------
 phone_cat_address | cid     |         0 |         4 |       1176 |
{2,10,3,12,11,16,6,56,18293,7
5}                                                   |
{0.035,0.0283333,0.021,0.0206667,0.0133333
,0.0123333,0.00933333,0.00933333,0.009,0.00833333}
| {4,43,98,177,43
2,1603,2076,11212,15979,18262,18775}
|           1
 phone_cat_address | aid     |         0 |         8 |  -0.526973 |
{873766,1468933,3316338,39146
78,4955422,6073155,6369270,8431670,9012026,10684620} |
{0.000666667,0.000666667,0.000666667,0.000
666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
| {10755,1321415,
2288476,3469515,4445287,5650291,7029439,8344730,9662520,11016908,100000851228}
|    0.011702
(2 rows)

yellowpages=# select * from pg_stats where tablename = 'phone_cat';
 tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals | most_common_freqs
|
                                                     histogram_bounds

                        | correlation
-----------+---------+-----------+-----------+------------+------------------+-------------------
+------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
------------------------+-------------
 phone_cat | id      |         0 |         4 |         -1 |
|
| {2,543,1050,1594,2106,2641,3892,5566,7244,8979,10673}


                        |    0.861298
 phone_cat | cid     |         0 |         4 |         -1 |
|
| {3,1189,2182,10960,12145,13215,14410,15649,16799,18058,19247}


                        |  0.00397399
 phone_cat | name    |         0 |        29 |         -1 |
|
| {"ABRASIVE CUTTING",Auditors,"Boat Covers Tops &
Upholstery-Wholesale","Cash Registers & Suppli
es (Wholesale)","Communication Equipment-Manufacturers","Decoration
Supplies-Wholesale","Framing
Contractors-Buildings","Livestock Commission",Plants-Horticultural,"State
Government-General Offi
ces",Zippers-Repairing} |    0.860659
 phone_cat | popular |         0 |         5 |          2 | {N,Y}
| {0.988,0.012}
|


                        |    0.996101
(4 rows)

>
> >> If you set enable_seqscan to off, do you get a plan you like better?
> >> If so, what is it?
>
> > with seqscan off, the query still takes about the same about of time
> > (around 8 secs). i'd like to get it down to 1 if possible.
>
> I wanted to know what the plan and cost estimates are.  Also, it'd be
> good to show EXPLAIN ANALYZE results, so that we can compare reality
> to planner cost estimates ...

yellowpages=# set enable_seqscan to on;
SET VARIABLE
yellowpages=# explain analyze SELECT *
yellowpages-# FROM
yellowpages-#   (SELECT p.name,p.address,p.city,p.state,
yellowpages(#
geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat))
as dist
yellowpages(#    FROM phone_address AS p, phone_cat AS pc,
phone_cat_address AS
pca, zipcodes AS z
yellowpages(#    WHERE z.zip_code='55404'
yellowpages(#      AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
pca.aid=p.a
id)
yellowpages(#   ) AS ss
yellowpages-# WHERE dist < 35
yellowpages-# ORDER BY dist LIMIT 20;ORDER BY dist LIMIT 20;
NOTICE:  QUERY PLAN:

Limit  (cost=9286.33..9286.33 rows=20 width=115) (actual
time=6748.11..6748.17 r
ows=20 loops=1)
  ->  Sort  (cost=9286.33..9286.33 rows=112 width=115) (actual
time=6748.10..674
8.12 rows=21 loops=1)
        ->  Nested Loop  (cost=44.12..9282.51 rows=112 width=115) (actual
time=6
505.91..6701.65 rows=1745 loops=1)
              ->  Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.
01 rows=1 width=16) (actual time=0.97..0.97 rows=1 loops=1)
              ->  Materialize  (cost=9272.76..9272.76 rows=337 width=99)
(actual
 time=6504.94..6558.83 rows=4217 loops=1)
                    ->  Nested Loop  (cost=44.12..9272.76 rows=337
width=99) (ac
tual time=555.67..6398.70 rows=4217 loops=1)
                          ->  Hash Join  (cost=44.12..7243.86 rows=337
width=16)
 (actual time=555.67..5440.17 rows=4217 loops=1)
                                ->  Seq Scan on phone_cat_address pca
(cost=0.0
0..5512.02 rows=336702 width=12) (actual time=0.00..3376.45 rows=336702
loops=1)
                                ->  Hash  (cost=44.09..44.09 rows=11
width=4) (a
ctual time=5.86..5.86 rows=0 loops=1)
                                      ->  Index Scan using
phone_cat_nameftx_idx
 on phone_cat pc  (cost=0.00..44.09 rows=11 width=4) (actual
time=2.93..5.85 row
s=8 loops=1)
                          ->  Index Scan using phone_address_aid_key on
phone_ad
dress p  (cost=0.00..6.01 rows=1 width=83) (actual time=0.16..0.18 rows=1
loops=
4217)
Total runtime: 6786.19 msec

EXPLAIN
yellowpages=#

----
yellowpages=# set enable_seqscan to off;
SET VARIABLE
yellowpages=# explain analyze                           ->  Index Scan
using pho
ne_address_aid_key on phone_ad
yellowpages-# dress p  (cost=0.00..6.01 rows=1 width=83) (actual
time=0.16..0.18
 rows=1 loops=
yellowpages(# 4217)
yellowpages-# Total runtime: 6786.19 msec
yellowpages-#
yellowpages-# EXPLAIN
yellowpages-# yellowpages=#
yellowpages-# ;
ERROR:  parser: parse error at or near "->"
yellowpages=# set enable_seqscan to off;
SET VARIABLE
yellowpages=# explain analyze SELECT *
yellowpages-# FROM
yellowpages-#   (SELECT p.name,p.address,p.city,p.state,
yellowpages(#
geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat))
as dist
yellowpages(#    FROM phone_address AS p, phone_cat AS pc,
phone_cat_address AS
pca, zipcodes AS z
yellowpages(#    WHERE z.zip_code='55404'
yellowpages(#      AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
pca.aid=p.a
id)
yellowpages(#   ) AS ss
yellowpages-# WHERE dist < 35
yellowpages-# ORDER BY dist LIMIT 20;
NOTICE:  QUERY PLAN:

Limit  (cost=10799.67..10799.67 rows=20 width=115) (actual
time=1564.47..1564.53
 rows=20 loops=1)
  ->  Sort  (cost=10799.67..10799.67 rows=112 width=115) (actual
time=1564.47..1
564.49 rows=21 loops=1)
        ->  Nested Loop  (cost=0.00..10795.85 rows=112 width=115) (actual
time=1
323.26..1517.14 rows=1745 loops=1)
              ->  Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.
01 rows=1 width=16) (actual time=0.00..0.00 rows=1 loops=1)
              ->  Materialize  (cost=10786.10..10786.10 rows=337 width=99)
(actu
al time=1322.28..1372.30 rows=4217 loops=1)
                    ->  Nested Loop  (cost=0.00..10786.10 rows=337
width=99) (ac
tual time=3.90..1209.10 rows=4217 loops=1)
                          ->  Nested Loop  (cost=0.00..8757.20 rows=337
width=16
) (actual time=2.93..245.72 rows=4217 loops=1)
                                ->  Index Scan using phone_cat_nameftx_idx
on ph
one_cat pc  (cost=0.00..44.09 rows=11 width=4) (actual time=2.92..5.79
rows=8 lo
ops=1)
                                ->  Index Scan using
phone_cat_address_cid_key o
n phone_cat_address pca  (cost=0.00..812.56 rows=286 width=12) (actual
time=0.60
..21.64 rows=527 loops=8)
                          ->  Index Scan using phone_address_aid_key on
phone_ad
dress p  (cost=0.00..6.01 rows=1 width=83) (actual time=0.17..0.19 rows=1
loops=
4217)
Total runtime: 1596.69 msec

EXPLAIN