Thread: index usage

index usage

From
brad-pgperf@duttonbros.com
Date:
Hi,

I have a query which I think should be using an index all of the time but
postgres only uses the index part of the time.  The index
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
first followed by the selected column (support_person_id).  Wouldn't the
most efficient plan be to scan the index each time because the only columns
needed are in the index?  Below is the table, 2 queries showing the
difference in plans, followed by the record distribution of ticket_crm_map.
I first did a 'vacuum analyze' to update the statistics.

Thanks,
Brad


athenapost=> \d ticket_crm_map
                                  Table "public.ticket_crm_map"
        Column         |            Type             |
Modifiers
 ------------------------+-----------------------------+---------------------
 -----------------------
tcrm_map_id            | integer                     | not null
ticket_id              | integer                     | not null
crm_id                 | integer                     | not null
support_person_id      | integer                     | not null
escalated_to_person_id | integer                     | not null
status                 | character varying(50)       | not null default
'Open'::character varying
close_date             | timestamp without time zone |
updated_date           | timestamp without time zone |
updated_by             | character varying(255)      |
created_date           | timestamp without time zone |
created_by             | character varying(255)      |
additional_info        | text                        |
subject                | character varying(255)      |
Indexes:
   "ticket_crm_map_pkey" primary key, btree (tcrm_map_id)
   "ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id)
   "ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id)
   "ticket_crm_map_status" btree (status)
   "ticket_crm_map_ticket_id" btree (ticket_id)
Foreign-key constraints:
   "$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
   "$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
   "$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id)
   "$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
   "$5" FOREIGN KEY (status) REFERENCES ticket_status(status)

athenapost=> explain analyze select distinct support_person_id from
ticket_crm_map where crm_id = 7;

QUERY PLAN
 ----------------------------------------------------------------------------
 ----------------------------------------------------------------------------
 ----------
Unique  (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245
rows=20 loops=1)
  ->  Sort  (cost=1262.99..1264.13 rows=456 width=4) (actual
time=15.332..16.605 rows=2275 loops=1)
        Sort Key: support_person_id
        ->  Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275
loops=1)
              Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows)

Time: 20.598 ms
athenapost=> explain analyze select distinct support_person_id from
ticket_crm_map where crm_id = 1;
                                                          QUERY PLAN
 ----------------------------------------------------------------------------
 -----------------------------------------------------
Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual
time=659.102..791.517 rows=24 loops=1)
  ->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual
time=659.090..713.285 rows=93889 loops=1)
        Sort Key: support_person_id
        ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
              Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows)

Time: 817.095 ms
athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
 -------+--------
 2554 |     63
  129 |     25
   17 |     24
  110 |     23
   74 |     22
   69 |     21
    2 |     20
   53 |     82
   10 |     17
   16 |     81
46637 |     16
   14 |     80
    2 |     15
 1062 |     79
   87 |     78
   93 |     77
   60 |     44
  363 |     76
  225 |     10
    4 |     74
   83 |      9
   27 |     73
  182 |      8
 2275 |      7
   15 |     71
  554 |      6
   44 |     70
  631 |      5
   37 |      4
  190 |      3
  112 |      2
93889 |      1
(32 rows)

Time: 436.697 ms

Re: index usage

From
brad-pgperf@duttonbros.com
Date:
When checking an index in postgres the original table has to be checked for
each result to find if the index entry is still valid?  In which case you
can't blindly scan the whole index and assume the data is good. I was used
to Oracle behavior where the index is up to date so it can do the scan
without hitting the original table.

Does this sound correct to anyone?

Thanks,
Brad


Stephan Szabo writes:
> On Fri, 23 Apr 2004 brad-pgperf@duttonbros.com wrote:
>
>> I have a query which I think should be using an index all of the time but
>> postgres only uses the index part of the time.  The index
>> (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
>> first followed by the selected column (support_person_id).  Wouldn't the
>> most efficient plan be to scan the index each time because the only columns
>> needed are in the index?  Below is the table, 2 queries showing the
>
> Not necessarily.  The rows in the actual file still need to be checked to
> see if they're visible to the select and if it's expected that the entire
> file (or a reasonable % of the pages anyway) will need to be loaded using
> the index isn't necessarily a win.
>



Re: index usage

From
Stephan Szabo
Date:
On Fri, 23 Apr 2004 brad-pgperf@duttonbros.com wrote:

> I have a query which I think should be using an index all of the time but
> postgres only uses the index part of the time.  The index
> (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
> first followed by the selected column (support_person_id).  Wouldn't the
> most efficient plan be to scan the index each time because the only columns
> needed are in the index?  Below is the table, 2 queries showing the

Not necessarily.  The rows in the actual file still need to be checked to
see if they're visible to the select and if it's expected that the entire
file (or a reasonable % of the pages anyway) will need to be loaded using
the index isn't necessarily a win.

> athenapost=> explain analyze select distinct support_person_id from
> ticket_crm_map where crm_id = 1;
>                                                           QUERY PLAN
>  ----------------------------------------------------------------------------
>  -----------------------------------------------------
> Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual
> time=659.102..791.517 rows=24 loops=1)
>   ->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual
> time=659.090..713.285 rows=93889 loops=1)
>         Sort Key: support_person_id
>         ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628
> width=4) (actual time=0.027..359.299 rows=93889 loops=1)
>               Filter: (crm_id = 1)
> Total runtime: 814.601 ms

How far off is this from the index scan version in time?  Try doing
set enable_seqscan=off; and then explain analyzing again.
It's possible that you may wish to lower random_page_cost to change the
estimated effect of how much more expensive random reads are compared to
sequential ones.

Re: index usage

From
"scott.marlowe"
Date:
On Mon, 26 Apr 2004, Stephan Szabo wrote:

>
> On Fri, 23 Apr 2004 brad-pgperf@duttonbros.com wrote:
>
> > I have a query which I think should be using an index all of the time but
> > postgres only uses the index part of the time.  The index
> > (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
> > first followed by the selected column (support_person_id).  Wouldn't the
> > most efficient plan be to scan the index each time because the only columns
> > needed are in the index?  Below is the table, 2 queries showing the
>
> Not necessarily.  The rows in the actual file still need to be checked to
> see if they're visible to the select and if it's expected that the entire
> file (or a reasonable % of the pages anyway) will need to be loaded using
> the index isn't necessarily a win.

While those of us familiar with PostgreSQL are well aware of the fact that
indexes can't be used directly to garner information, but only as a lookup
to a tuple in the table, it seems this misconception is quite common among
those coming to postgreSQL from other databases.

Is there any information that directly reflects this issue in the docs?
There are tons of hints that it works this way in how they're written, but
nothing that just comes out and says that with pgsql's mvcc
implementation, an index scan still has to hit the pages that contain the
tuples, so often in pgsql a seq scan is a win where in other databases and
index scan would have been a win?

If not, where would I add it if I were going to write something up for the
docs?  Just wondering...


Re: index usage

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> There are tons of hints that it works this way in how they're written, but
> nothing that just comes out and says that with pgsql's mvcc
> implementation, an index scan still has to hit the pages that contain the
> tuples, so often in pgsql a seq scan is a win where in other databases and
> index scan would have been a win?

> If not, where would I add it if I were going to write something up for the
> docs?  Just wondering...

AFAIR the only place in the docs that mentions seqscan or indexscan at
all is the discussion of EXPLAIN in "Performance Tips".  Perhaps a
suitably-enlarged version of that section could cover this.

            regards, tom lane