Thread: cluster on conditional index?

cluster on conditional index?

From

Doug Hunley

Date:

14 August 2012, 15:27:22

According to the docs on cluster:
if you tend to access some data more than others, and there is an
index that groups them together, you will benefit from using CLUSTER

however, this doesn't address the situation where you have a
conditional index. For example, we have certain large tables that have
a column called 'is_deleted'. It's a boolean, to indicate whether the
record is 'deleted' as far as the app is concerned. Since the app only
ever shows data where is_deleted is false, I created an index:
create index foo on bar where is_deleted is false;
and now I'm wondering if clustering on this index will bring the
benefits noted above or if I should rebuild my index w/o the where
clause to obtain the best 'improvement' from cluster.

Anyone know?

--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web:
douglasjhunley.com
G+: http://goo.gl/sajR3

Re: cluster on conditional index?

From

Jeff Janes

Date:

14 August 2012, 17:10:55

On Tue, Aug 14, 2012 at 8:27 AM, Doug Hunley <doug.hunley@gmail.com> wrote:
> According to the docs on cluster:
> if you tend to access some data more than others, and there is an
> index that groups them together, you will benefit from using CLUSTER
>
> however, this doesn't address the situation where you have a
> conditional index.

It seems like it is not allowed.

jjanes=# create index on pgbench_accounts (aid) where bid=33;
jjanes=# cluster pgbench_accounts USING pgbench_accounts_aid_idx ;
ERROR:  cannot cluster on partial index "pgbench_accounts_aid_idx"

But I don't see a fundamental reason it can't be allowed, maybe
implementing that should be on the to-do list.

Cheers,

Jeff

Re: cluster on conditional index?

From

"ktm@rice.edu"

Date:

14 August 2012, 17:29:17

On Tue, Aug 14, 2012 at 10:10:47AM -0700, Jeff Janes wrote:
> On Tue, Aug 14, 2012 at 8:27 AM, Doug Hunley <doug.hunley@gmail.com> wrote:
> > According to the docs on cluster:
> > if you tend to access some data more than others, and there is an
> > index that groups them together, you will benefit from using CLUSTER
> >
> > however, this doesn't address the situation where you have a
> > conditional index.
>
> It seems like it is not allowed.
>
> jjanes=# create index on pgbench_accounts (aid) where bid=33;
> jjanes=# cluster pgbench_accounts USING pgbench_accounts_aid_idx ;
> ERROR:  cannot cluster on partial index "pgbench_accounts_aid_idx"
>
> But I don't see a fundamental reason it can't be allowed, maybe
> implementing that should be on the to-do list.
>
> Cheers,
>
> Jeff
>

It probably has to do with the fact that a conditional index, does
not include every possible row in the table. Although, a "cluster" of
the matching rows and then leave the rest in place, should work. How
is that for hand-waving.  :)

Regards,
Ken

Re: cluster on conditional index?

From

Doug Hunley

Date:

15 August 2012, 13:43:10

On Tue, Aug 14, 2012 at 1:29 PM, ktm@rice.edu <ktm@rice.edu> wrote:
>
> It probably has to do with the fact that a conditional index, does
> not include every possible row in the table. Although, a "cluster" of
> the matching rows and then leave the rest in place, should work. How
> is that for hand-waving.  :)
>

That actually makes sense to me. Cluster the rows covered by that
index, let the rest fall where they may. I'm typically only accessing
the rows covered by that index, so I'd get the benefit of the cluster
command but wouldn't have to spend cycles doing the cluster for rows I
don't care about.

--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web:
douglasjhunley.com
G+: http://goo.gl/sajR3

Re: cluster on conditional index?

From

Josh Berkus

Date:

15 August 2012, 21:05:12

> That actually makes sense to me. Cluster the rows covered by that
> index, let the rest fall where they may. I'm typically only accessing
> the rows covered by that index, so I'd get the benefit of the cluster
> command but wouldn't have to spend cycles doing the cluster for rows I
> don't care about.

Sure, that's a feature request though.  And thinking about it, I'm
willing to bet that it's far harder to implement than it sounds.

In the meantime, you could ad-hoc this by splitting the table into two
partitions and clustering one of the two partitions.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: cluster on conditional index?

From

Bosco Rama

Date:

15 August 2012, 21:19:14

On 08/15/12 14:05, Josh Berkus wrote:
>
>> That actually makes sense to me. Cluster the rows covered by that
>> index, let the rest fall where they may. I'm typically only accessing
>> the rows covered by that index, so I'd get the benefit of the cluster
>> command but wouldn't have to spend cycles doing the cluster for rows I
>> don't care about.
>
> Sure, that's a feature request though.  And thinking about it, I'm
> willing to bet that it's far harder to implement than it sounds.
>
> In the meantime, you could ad-hoc this by splitting the table into two
> partitions and clustering one of the two partitions.

Wouldn't creating a second index on the boolean itself and then clustering
on that be much easier?

Bosco.

Re: cluster on conditional index?

From

Doug Hunley

Date:

16 August 2012, 15:25:55

On Wed, Aug 15, 2012 at 5:19 PM, Bosco Rama <postgres@boscorama.com> wrote:
> On 08/15/12 14:05, Josh Berkus wrote:
>>
>>> That actually makes sense to me. Cluster the rows covered by that
>>> index, let the rest fall where they may. I'm typically only accessing
>>> the rows covered by that index, so I'd get the benefit of the cluster
>>> command but wouldn't have to spend cycles doing the cluster for rows I
>>> don't care about.
>>
>> Sure, that's a feature request though.  And thinking about it, I'm
>> willing to bet that it's far harder to implement than it sounds.

How/where does file feature requests?

>>
>> In the meantime, you could ad-hoc this by splitting the table into two
>> partitions and clustering one of the two partitions.
>
> Wouldn't creating a second index on the boolean itself and then clustering
> on that be much easier?

that's what I was looking into doing actuallly

--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web:
douglasjhunley.com
G+: http://goo.gl/sajR3

Re: cluster on conditional index?

From

Jeff Janes

Date:

16 August 2012, 16:04:17

On Wed, Aug 15, 2012 at 6:43 AM, Doug Hunley <doug.hunley@gmail.com> wrote:
> On Tue, Aug 14, 2012 at 1:29 PM, ktm@rice.edu <ktm@rice.edu> wrote:
>>
>> It probably has to do with the fact that a conditional index, does
>> not include every possible row in the table. Although, a "cluster" of
>> the matching rows and then leave the rest in place, should work. How
>> is that for hand-waving.  :)
>>
>
> That actually makes sense to me. Cluster the rows covered by that
> index, let the rest fall where they may. I'm typically only accessing
> the rows covered by that index, so I'd get the benefit of the cluster
> command but wouldn't have to spend cycles doing the cluster for rows I
> don't care about.

IIRC, there isn't currently an in-place version of CLUSTER, it always
rewrites the entire table.  So it would still have to do something
with those rows, so that they show up in the new table.  But it could
just treat them all as equal to one another and have them be in
whatever order they happen to fall in.

Cheers,

Jeff

Re: cluster on conditional index?

From

Jeff Janes

Date:

16 August 2012, 16:16:48

On Wed, Aug 15, 2012 at 2:19 PM, Bosco Rama <postgres@boscorama.com> wrote:
> On 08/15/12 14:05, Josh Berkus wrote:
>>
>>> That actually makes sense to me. Cluster the rows covered by that
>>> index, let the rest fall where they may. I'm typically only accessing
>>> the rows covered by that index, so I'd get the benefit of the cluster
>>> command but wouldn't have to spend cycles doing the cluster for rows I
>>> don't care about.
>>
>> Sure, that's a feature request though.  And thinking about it, I'm
>> willing to bet that it's far harder to implement than it sounds.
>>
>> In the meantime, you could ad-hoc this by splitting the table into two
>> partitions and clustering one of the two partitions.
>
> Wouldn't creating a second index on the boolean itself and then clustering
> on that be much easier?

I would take an existing useful index, and build a new one on the same
columns but with is_deleted prepended.

That way, since you are going through the effort to rewrite the whole
table anyway, the ties are broken in a way that might be of further
use.

Once the CLUSTER is done, the index might even be useful enough to
keep around for use with queries, or even replace the original index
altogether.

Cheers,

Jeff