An Analyze question - Mailing list pgsql-admin

From Nick Fankhauser
Subject An Analyze question
Date
Msg-id NEBBLAAHGLEEPCGOBHDGIEDKENAA.nickf@ontko.com
Whole thread Raw
Responses Re: An Analyze question  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin
Hello-

Does analyze consider the entire key when creating statistics for a table,
or a substring composed of the leading chars?

Some background:

I've just modified *all* of the keys on my database by prefixing them with
the same source_id that is 5 chars long. This is in preparation for merging
together data from several different sources where I know that the keys are
unique to the source, but not necessarily between sources.

So on every primary & foreign key, I have executed this update:

update table set key = '18105'||key;

Now, a few queries that used to be swift are very slow, and on further
investigation, I found that the planner is making different decisions-
essentially it looks like the statistics now indicate that each key is much
less selective & hence a poor candidate for an index scan.

I did an analyze on the whole database, and then did an analyze specifically
on the tables involved, and checked against an original copy of the database
to make sure the indexes are identical in  both.

If the keys are taken in their entirety, nothing has changed- they are just
as selective as ever. However, if only the leading chars are considered, or
if the leading chars have a higher weighting, they would certainly appear
much less selective.

Any thoughts on what happened here?

Thanks-

-Nick

PS: The before & after explains are pasted in below:

Before:

monroe=# explain select * from actor_cases where actor_id = '18105A7313 53';
NOTICE:  QUERY PLAN:

Merge Join  (cost=27748.94..27807.92 rows=145 width=192)
  ->  Sort  (cost=27713.16..27713.16 rows=3410 width=144)
        ->  Nested Loop  (cost=0.00..27372.75 rows=3410 width=144)
              ->  Index Scan using actor_case_assignment_both on
actor_case_assignment  (cost=0.00..11766.67 rows=3410 width=24)
              ->  Index Scan using case_data_case_id on case_data
(cost=0.00..4.56 rows=1 width=120)
  ->  Sort  (cost=35.78..35.78 rows=522 width=48)
        ->  Seq Scan on local_case_type  (cost=0.00..12.22 rows=522
width=48)



After:

develop=#  explain select * from actor_cases where actor_id = '18105A7313
53';
NOTICE:  QUERY PLAN:

Hash Join  (cost=27801.99..53031.15 rows=306 width=192)
  ->  Hash Join  (cost=27788.47..51957.43 rows=11377 width=144)
        ->  Seq Scan on case_data  (cost=0.00..6932.35 rows=226535
width=120)
        ->  Hash  (cost=27693.03..27693.03 rows=11377 width=24)
              ->  Seq Scan on actor_case_assignment  (cost=0.00..27693.03
rows=11377 width=24)
  ->  Hash  (cost=12.22..12.22 rows=522 width=48)
        ->  Seq Scan on local_case_type  (cost=0.00..12.22 rows=522
width=48)




--------------------------------------------------------------------------
Nick Fankhauser  nickf@ontko.com  Phone 1.765.935.4283  Fax 1.765.962.9788
Ray Ontko & Co.     Software Consulting Services     http://www.ontko.com/


pgsql-admin by date:

Previous
From: Ferdinand Smit
Date:
Subject: Re: Connection problem
Next
From: Tom Lane
Date:
Subject: Re: An Analyze question