Thread: Loose Index Scans by Planner?

Loose Index Scans by Planner?

From
Shaun Thomas
Date:
Maybe I should post this in Hackers instead, but I figured I'd start
here to avoid cluttering up that list.

So, we know we have a way of doing a loose index scan with CTEs:

http://wiki.postgresql.org/wiki/Loose_indexscan

But that got me wondering. The planner knows from pg_stats that col1
could have low cardinality. If that's the case, and a WHERE clause uses
a two column index, and col2 is specified, why can't it walk each
individual bucket in the two-column index, and use col2? So I forced
such a beast with a CTE:

WITH RECURSIVE t AS (
   SELECT min(col1) AS col1
     FROM tablename
   UNION ALL
   SELECT (SELECT min(col1)
             FROM tablename
            WHERE col1 > t.col1)
     FROM t
    WHERE t.col1 IS NOT NULL
)
SELECT p.*
   FROM t
   JOIN tablename p USING (col1)
  where p.col2 = 12345

I ask, because while the long-term fix would be to re-order the index to
(col2, col1), this seems like a situation the planner could easily
detect and compensate for. In our particular example, execution time
went from 160ms to 2ms with the CTE rewrite. This is a contrived
example, but it seems like loose index scans would be useful in other
ways. Heck, this:

SELECT DISTINCT col1
   FROM tablename;

Has terrible performance because it always seems to revert to a sequence
scan, but it's something people do *all the time*. I can't reasonably
expect all of my devs to switch to that admittedly gross CTE to get a
faster effect, so I'm just thinking out loud.

Until PG puts in something to fix this, I plan on writing a stored
procedure that writes a dynamic CTE and returns a corresponding result
set. It's not ideal, but it would solve our particular itch. Really,
this should be possible with any indexed column, so I might abstract it.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Loose Index Scans by Planner?

From
"Kevin Grittner"
Date:
Shaun Thomas <sthomas@optionshouse.com> wrote:

> So, we know we have a way of doing a loose index scan with CTEs:
>
> http://wiki.postgresql.org/wiki/Loose_indexscan

I tried this on a table in production with 23 million rows for a
column with 45 distinct values which is the high-order column of a
four-column index.  This ran in 445 ms first time and 2 ms on the
second and subsequent tries.  The equivalent SELECT DISTINCT ran in
30 seconds first time, and got down to 11.5 seconds after a few
runs.  So roughly two orders of magnitude faster with a cold cache
and three orders of magnitude faster with a warm cache.

That sure would be a nice optimization to have in the planner.

> But that got me wondering. The planner knows from pg_stats that
> col1 could have low cardinality. If that's the case, and a WHERE
> clause uses a two column index, and col2 is specified, why can't
> it walk each individual bucket in the two-column index, and use
> col2? So I forced such a beast with a CTE:
>
> WITH RECURSIVE t AS (
>    SELECT min(col1) AS col1
>      FROM tablename
>    UNION ALL
>    SELECT (SELECT min(col1)
>              FROM tablename
>             WHERE col1 > t.col1)
>      FROM t
>     WHERE t.col1 IS NOT NULL
> )
> SELECT p.*
>    FROM t
>    JOIN tablename p USING (col1)
>   where p.col2 = 12345
>
> I ask, because while the long-term fix would be to re-order the
> index to (col2, col1), this seems like a situation the planner
> could easily detect and compensate for. In our particular example,
> execution time went from 160ms to 2ms with the CTE rewrite.

Well, that'd be the icing on the cake.  I'd be overjoyed to get the
cake.  :-)

-Kevin


Re: Loose Index Scans by Planner?

From
Shaun Thomas
Date:
On 08/24/2012 02:40 PM, Kevin Grittner wrote:

> Well, that'd be the icing on the cake.  I'd be overjoyed to get the
> cake.  :-)

Yes indeed. The "cake" would fix the DISTINCT case, which I see way more
often in the wild than my index column-skip.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Loose Index Scans by Planner?

From
Jeff Janes
Date:
On Fri, Aug 24, 2012 at 9:20 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> Maybe I should post this in Hackers instead, but I figured I'd start here to
> avoid cluttering up that list.
>
> So, we know we have a way of doing a loose index scan with CTEs:
>
> http://wiki.postgresql.org/wiki/Loose_indexscan
>
> But that got me wondering. The planner knows from pg_stats that col1 could
> have low cardinality. If that's the case, and a WHERE clause uses a two
> column index, and col2 is specified, why can't it walk each individual
> bucket in the two-column index, and use col2? So I forced such a beast with
> a CTE:
>
> WITH RECURSIVE t AS (
>   SELECT min(col1) AS col1
>     FROM tablename
>   UNION ALL
>   SELECT (SELECT min(col1)
>             FROM tablename
>            WHERE col1 > t.col1)
>     FROM t
>    WHERE t.col1 IS NOT NULL
> )
> SELECT p.*
>   FROM t
>   JOIN tablename p USING (col1)
>  where p.col2 = 12345

That is awesome.  I had never though of trying to do it that way.

> I ask, because while the long-term fix would be to re-order the index to
> (col2, col1),

Not always.  The case for having (col1,col2) might be very compelling.
 And having to maintain both orderings when just maintaining one would
be "good enough" would kind of suck.  Having the planner do the best
it can given the index it has is a good thing.

I would also note that having this feature (called "skip scan" in some
other products) would mimic what happens when you need to do a query
specifying col2 but not col1 on a table family which is list
partitioned on col1.  Getting some of the benefits of partitioning
without having to actually do the partitioning would be a good thing.

> this seems like a situation the planner could easily detect
> and compensate for.

Yes, it is just a Small Matter Of Programming :)

And one I've wanted for a while.

If only someone else would offer to do it for me....

Cheers,

Jeff


Re: Loose Index Scans by Planner?

From
Shaun Thomas
Date:
> Not always.  The case for having (col1,col2) might be very
> compelling.

True. But in our case, the table has like 8M rows, so and col1 is some kind of job identifier, so it's evenly
distributed.Col2 on the other hand is a customer id, so it has much higher cardinality. Previous DBA missed it during
creation,and it was never loud enough in the logs for me to notice it until recently. Looks like I need to do a
column-orderingaudit. :) 

> If only someone else would offer to do it for me....

Don't look at me. My C is rustier than a 50-year-old bike chain. :)



______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email