Thread: Way to avoid expensive Recheck Cond in index lookup?

Way to avoid expensive Recheck Cond in index lookup?

From
"Matt Magoffin"
Date:
Hello, I'm trying to find a way to use a text[] index lookup using an
xpath() function in 8.3, but I suspect this situation is not specific to
8.3 or this exact query style. The query plan looks like

 Bitmap Heap Scan on lead  (cost=37.39..7365.22 rows=2206 width=8)
   Recheck Cond:
((xpath('/als:auto-lead-service/als:meta[@key="com.autoleadservice.TypeFlag"]/text()'::text,
xml, '{{als,http://autoleadservice.com/xml/als}}'::text[]))::text[] &&
'{foo,bar}'::text[])
   ->  Bitmap Index Scan on lead_type_flag_gin_idx  (cost=0.00..36.83
rows=2206 width=0)
         Index Cond:
((xpath('/als:auto-lead-service/als:meta[@key="com.autoleadservice.TypeFlag"]/text()'::text,
xml,
'{{als,http://autoleadservice.com/xml/als}}'::text[]))::text[] &&
'{foo,bar}'::text[])

The problem for me is, the Recheck Cond is then on the xpath() function
used by the function-based index. My understanding is that then the
database must actually call the xpath() function again on all matches from
the index lookup. Are there ways to re-write the query so the recheck
condition is not necessary? Or a way to define the index differently so
that I might be able to still compare text[] values from the index without
needing the recheck?

-- m@

Re: Way to avoid expensive Recheck Cond in index lookup?

From
Tom Lane
Date:
"Matt Magoffin" <postgresql.org@msqr.us> writes:
> The problem for me is, the Recheck Cond is then on the xpath() function
> used by the function-based index. My understanding is that then the
> database must actually call the xpath() function again on all matches from
> the index lookup.

This is mistaken.  It only happens if there are so many hits that the
bitmap becomes lossy (which you can control to some extent anyway by
adjusting work_mem).

            regards, tom lane

Re: Way to avoid expensive Recheck Cond in index lookup?

From
"Matt Magoffin"
Date:
>> The problem for me is, the Recheck Cond is then on the xpath() function
>> used by the function-based index. My understanding is that then the
>> database must actually call the xpath() function again on all matches
>> from
>> the index lookup.
>
> This is mistaken.  It only happens if there are so many hits that the
> bitmap becomes lossy (which you can control to some extent anyway by
> adjusting work_mem).

Ah, great. Thanks for clarifying.

-- m@

Re: Way to avoid expensive Recheck Cond in index lookup?

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Matt Magoffin" <postgresql.org@msqr.us> writes:
>> The problem for me is, the Recheck Cond is then on the xpath() function
>> used by the function-based index. My understanding is that then the
>> database must actually call the xpath() function again on all matches from
>> the index lookup.
>
> This is mistaken.  It only happens if there are so many hits that the
> bitmap becomes lossy (which you can control to some extent anyway by
> adjusting work_mem).

But it's true that it's possible for a slow expression to make the recheck
very expensive. The planner doesn't have a very good understanding of how to
tell whether the expression is likely to be slow.

The case I ran into is thing like "WHERE x = ANY $1::integer[]" which become
very slow for very large arrays. So I'm sure xpath() could possibly trigger
the same case.

But the number of matching pages would have to be quite large. And in that
case the alternative (regular index scans) is going to suck too.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: Way to avoid expensive Recheck Cond in index lookup?

From
"Matt Magoffin"
Date:
> But it's true that it's possible for a slow expression to make the recheck
> very expensive. The planner doesn't have a very good understanding of how
> to
> tell whether the expression is likely to be slow.
>
> The case I ran into is thing like "WHERE x = ANY $1::integer[]" which
> become
> very slow for very large arrays. So I'm sure xpath() could possibly
> trigger
> the same case.
>
> But the number of matching pages would have to be quite large. And in that
> case the alternative (regular index scans) is going to suck too.

So the actual index function expression is _only_ evaluated in the
re-check for some (or all?) matches, if there are more matching pages than
can fit into the memory allocated by work_mem?

I also seemed to notice that after running a query that did return a large
number of results where the query plan did use the text[] index, running
the same query, or a similar one, would stop using the index lookup and
just do a full table scan. Would that be the optimizer changing plans
because of the statistics it gathered when it ran the query initially with
the index lookup but found the re-check condition took such a long time to
execute?

What I was trying to accomplish was to define a text[] index created from
the results of an xpath() expression, for the purposes of being able to do
fast index lookups using the && operator. But I'm finding that even when
the index is used, the query is very slow and I was assuming it was coming
from the re-check condition, which is defined as that xpath() call. So I'm
finding that this approach isn't working out as I had hoped.

-- m@

Re: Way to avoid expensive Recheck Cond in index lookup?

From
Tom Lane
Date:
"Matt Magoffin" <postgresql.org@msqr.us> writes:
> I also seemed to notice that after running a query that did return a large
> number of results where the query plan did use the text[] index, running
> the same query, or a similar one, would stop using the index lookup and
> just do a full table scan. Would that be the optimizer changing plans
> because of the statistics it gathered when it ran the query initially with
> the index lookup but found the re-check condition took such a long time to
> execute?

No, there is not any feed-forward from previous query executions to new
plans.  (There's been discussion of that, but nothing done as yet.
Personally I'm worried that it's hard enough to understand what's
happening without any such effect.)

If you saw the plan changing for apparently no reason, maybe you have
autovacuum enabled?  A background autovac could update the stored table
statistics and thereby cause a plan change.

> What I was trying to accomplish was to define a text[] index created from
> the results of an xpath() expression, for the purposes of being able to do
> fast index lookups using the && operator. But I'm finding that even when
> the index is used, the query is very slow and I was assuming it was coming
> from the re-check condition, which is defined as that xpath() call. So I'm
> finding that this approach isn't working out as I had hoped.

I'm not sure that anyone's done any performance analysis on xpath as
yet.  Do you want to try oprofile or gprof or some other tool to see
where the time is going?

            regards, tom lane