Home > mailing lists

: PostgreSQL Index behavior - Mailing list pgsql-performance

From	Venkat Balaji
Subject	: PostgreSQL Index behavior
Date	September 10, 2012 12:39:50
Msg-id	CAFrxt0go0hb7bJqPjxicZc6G6gJeRHmYM7rJowE1vzZd7t3BkQ@mail.gmail.com Whole thread
Responses	Re: : PostgreSQL Index behavior
List	pgsql-performance

Tree view

Hello Community,

I intend to understand further on PostgreSQL Index behavior on a "SELECT" statement.

We have a situation where-in Index on unique column is not being picked up as expected when used with-in the WHERE clause with other non-unique columns using AND operator.

explain SELECT tv.short_code, tv.chn as pkg_subscription_chn,

tv.vert as pkg_vert, ubs.campaign_id as campaign, 'none'::varchar as referer,

CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn, count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue

FROM campaign_base ubs

JOIN tab_current_day_v2 tv

ON ubs.ubs_seq_id = tv.ubs_seq_id

AND tv.dt = CAST('2012-09-08' AS DATE)

GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

QUERY PLAN

----------------------------------------------------------------------------------------------------

HashAggregate (cost=77754.57..77754.58 rows=1 width=38)

-> Nested Loop (cost=0.00..77754.56 rows=1 width=38)

-> Seq Scan on tab_current_day_v2 tv (cost=0.00..77746.26 rows=1 width=39)

Filter: (dt = '2012-09-08'::date)

-> Index Scan using cb_ubs_id_idx on campaign_base ubs (cost=0.00..8.28 rows=1 width=15)

Index Cond: (ubs.ubs_seq_id = tv.ubs_seq_id)

(6 rows)

The above plan shows "seq scan" on tab_current_day_v2 table, though there is an index on "ubs_seq_id" column which is an unique column.

Can anyone please help us understand, why PostgreSQL optimizer is not prioritizing the unique column and hitting ubs_seq_id_idx Index here ?

Later -

We have created composite Index on "dt" (one distinct value) and "ubs_seq_id" (no duplicate values) and the index has been picked up.

Below is the scenario where-in the same query's plan picking up the composite Index.

prod-db=# create index concurrently tab_dt_ubs_seq_id_idx on tab_current_day_v2(dt,ubs_seq_id);

CREATE INDEX

prod-db=# explain SELECT tv.short_code, tv.chn as pkg_subscription_chn,

tv.vert as pkg_vert, ubs.campaign_id as campaign, 'none'::varchar as referer,

CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn, count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue

FROM campaign_base ubs

JOIN tab_current_day_v2 tv

ON ubs.ubs_seq_id = tv.ubs_seq_id

AND tv.dt = CAST('2012-09-08' AS DATE)

GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

HashAggregate (cost=16.88..16.89 rows=1 width=38)

-> Nested Loop (cost=0.00..16.86 rows=1 width=38)

-> Index Scan using tab_dt_ubs_seq_id_idx on tab_current_day_v2 tv (cost=0.00..8.57 rows=1 width=39)

Index Cond: (dt = '2012-09-08'::date)

-> Index Scan using cb_ubs_id_idx on campaign_base ubs (cost=0.00..8.28 rows=1 width=15)

Index Cond: (ubs.ubs_seq_id = tv.ubs_seq_id)

(6 rows)

I was expecting the above behavior without a composite Index. A column with most unique values must be picked up when multiple columns are used in WHERE clause using AND operator. Any thoughts ?

prod-db# \d tab_current_day_v2

Table "public.tab_current_day_v2"

Column | Type | Modifiers

--------------------------+--------------------------+-----------

dt | date |

chn | character varying(10) |

vert | character varying(20) |

isdn | character varying |

bc | character varying(40) |

status | text |

is_rene | boolean |

age_in_sys | integer |

age_in_grace | integer |

has_prof | boolean |

short_code | character varying |

sub_vert | character varying(30) |

mode | character varying |

ubs_seq_id | bigint |

pkg_name | character varying(200) |

pkg_id | integer |

subs_charge | money |

subs_time | timestamp with time zone |

ulq_seq_id | bigint |

valid_till_time | timestamp with time zone |

valid_from_time | timestamp with time zone |

latest_ube_seq_id | bigint |

latest_pkg_id | integer |

price | integer |

Indexes:

"tab_dt_ubs_seq_id_idx" btree (dt, ubs_seq_id)

"tab_isdn_idx" btree (msisdn)

"tab_status_idx" btree (status)

"ubs_seq_id_idx" btree (ubs_seq_id)

Below is the table structure and the uniqueness of each of the columns.

airtel_user_data_oltp=# select attname, n_distinct from pg_Stats where tablename='tab_current_day_v2';

attname | n_distinct

--------------------------+------------

dt | 1

chn | 7

vert | 94

isdn | -0.727331

bc | 4

status | 3

is_rene | 2

age_in_sys | 1018

age_in_grac | 369

has_prof | 2

short_code | 23

sub_vert | 5

mode | 0

ubs_seq_id | -1

pkg_name | 461

pkg_id | 461

subs_charge | 7

subs_time | -1

ulq_seq_id | 122887

valid_till_time | -0.966585

valid_from_time | -0.962563

latest_ube_seq_id | -1

latest_pkg_id | 475

price | 18

(24 rows)

This is not an issue, but, would like to understand how PostgreSQL optimizer picks up Indexes in SELECT queries.

In an other scenario, we had used 4 columns in WHERE clause with AND operator with an Index on the column with most unique values -- The Index was picked up.

Looking forward for your help !

Regards,

DISCLAIMER:

Please note that this message and any attachments may contain confidential and proprietary material and information and are intended only for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any review, use, disclosure, dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this email in error, please immediately notify the sender and delete this e-mail , whether electronic or printed. Please also note that any views, opinions, conclusions or commitments expressed in this message are those of the individual sender and do not necessarily reflect the views of Ver sé Innovation Pvt Ltd.

pgsql-performance by date:

From: Ireneusz Pluta
Date: 10 September 2012, 09:01:41
Subject: Re: libpq or postgresql performance

From: bill_martin@freenet.de
Date: 10 September 2012, 14:27:12
Subject: Planner selects different execution plans depending on limit

: PostgreSQL Index behavior - Mailing list pgsql-performance

Previous

Next