Thread: Index Analysis: Filters

Index Analysis: Filters

From
Thomas F.O'Connell
Date:
I'm interested to know a little bit more about the postgres
implementation of indexes. I'm specifically wondering what it means in
the output of EXPLAIN when a filter is applied.

I'm trying to decide whether it makes sense to use indexes on
expressions rather than relying on a left-anchored LIKE for date
filtering.

Here's what I've got:

WHERE some_date LIKE '<year>-<month>%' *

And what I'm wondering is whether it would be faster to add indexes on
expressions for something like:

WHERE EXTRACT( year from some_date ) = '<year>'
AND EXTRACT( month from some_date ) = '<month>'

In practice, the point in the implementation has other parameters, so
it ends up looking something like:

SELECT COUNT( * )
FROM some_table AS st, some_other_table AS sot
WHERE st.id = '<id>'
AND st.some_date LIKE '<year>-<month>%'
AND st.other_id = sot.other_id

Here's what I get from an EXPLAIN:

                                                                QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------
  Aggregate  (cost=4749.19..4749.19 rows=1 width=0)
    ->  Nested Loop  (cost=0.00..4749.17 rows=9 width=0)
          ->  Index Scan using st_id_idx on emma_mailings em
(cost=0.00..147.56 rows=1 width=8)
                Index Cond: (emma_account_id = 1::bigint)
                Filter: ((some_date)::text ~~ '2004-06%'::text)
          ->  Index Scan using sot_other_id_idx on some_other_table sot
(cost=0.00..3164.42 rows=114975 width=8)
                Index Cond: ("outer".other_id = sot.other_id)


Then I try:

SELECT COUNT( * )
FROM some_table AS st, some_other_table AS sot
WHERE st.id = '<id>'
AND EXTRACT( year FROM st.some_date ) = '<year>'
AND EXTRACT( month FROM st.some_date  ) = '<month>'
AND st.other_id = sot.other_id

When I first added indexes on the EXTRACT expressions on
some_table.some_date, I basically (the numbers are fudged because now
the new indexes are working) saw:

                                                                QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------
  Aggregate  (cost=4749.19..4749.19 rows=1 width=0)
    ->  Nested Loop  (cost=0.00..4749.17 rows=9 width=0)
          ->  Index Scan using st_id_idx on emma_mailings em
(cost=0.00..147.56 rows=1 width=8)
                Index Cond: (emma_account_id = 1::bigint)
                Filter: ((date_part('year'::text, emma_mailing_start_ts)
= 2004::double precision) AND (date_part('year'::text,
emma_mailing_start_ts) = 6::double precision))
          ->  Index Scan using sot_other_id_idx on some_other_table sot
(cost=0.00..3164.42 rows=114975 width=8)
                Index Cond: ("outer".other_id = sot.other_id)

Now, though, it seems to be using the expression indexes. I guess my
main question is what it means to apply a Filter to an Index Scan, and
whether the LIKE filter would be slower than the EXTRACT filter. In the
queries I've tested, EXPLAIN ANALYZE resulted in almost identical
runtimes until the indexes on expressions kicked in. Once that
happened, the new indexes were much faster.

A related question might be what might've happened between the times I
tested when the indexes weren't working and when they were. I've been
testing on a low-traffic development server, and I ANALYZED after
adding the indexes. It was in a new session, though, that the indexes
on EXTRACT actually kicked in.

-tfo

* <> = generic pseudocode placeholder for variable/constant data


Re: Index Analysis: Filters

From
Tom Lane
Date:
"Thomas F.O'Connell" <tfo@sitening.com> writes:
> I'm interested to know a little bit more about the postgres
> implementation of indexes. I'm specifically wondering what it means in
> the output of EXPLAIN when a filter is applied.

The index itself is using the condition(s) indicated as "Index Cond" ---
that is, the index scan will pull all rows satisfying "Index Cond" from
the table.  The "Filter" condition, if any, is then evaluated at each
such row to decide whether to return it up to the next plan level.
Basically the filter is whatever conditions apply to the table but can't
be implemented directly with the chosen index.

> Here's what I've got:
> WHERE some_date LIKE '<year>-<month>%' *

Seems like you'd be better off turning this into a range query.  A
textual LIKE is just about the most inefficient way of testing a date
range that I can think of.  How about

WHERE some_date >= 'year-month-01'::date AND some_date <
('year-month-01'::date + '1 month'::interval)::date

(adjust as appropriate if it's really a timestamp column).

            regards, tom lane

Re: Index Analysis: Filters

From
Thomas F.O'Connell
Date:
Yeah, I suppose this would be faster than the EXTRACT technique, too,
eh? Because it requires only a single index and is not an index on an
expression, which are generally more expensive?

-tfo

On Sep 29, 2004, at 6:59 PM, Tom Lane wrote:

> "Thomas F.O'Connell" <tfo@sitening.com> writes:
>> Here's what I've got:
>> WHERE some_date LIKE '<year>-<month>%' *
>
> Seems like you'd be better off turning this into a range query.  A
> textual LIKE is just about the most inefficient way of testing a date
> range that I can think of.  How about
>
> WHERE some_date >= 'year-month-01'::date AND some_date <
> ('year-month-01'::date + '1 month'::interval)::date
>
> (adjust as appropriate if it's really a timestamp column).
>
>             regards, tom lane