Thread: cluster on conditional index?
According to the docs on cluster: if you tend to access some data more than others, and there is an index that groups them together, you will benefit from using CLUSTER however, this doesn't address the situation where you have a conditional index. For example, we have certain large tables that have a column called 'is_deleted'. It's a boolean, to indicate whether the record is 'deleted' as far as the app is concerned. Since the app only ever shows data where is_deleted is false, I created an index: create index foo on bar where is_deleted is false; and now I'm wondering if clustering on this index will bring the benefits noted above or if I should rebuild my index w/o the where clause to obtain the best 'improvement' from cluster. Anyone know? -- Douglas J Hunley (doug.hunley@gmail.com) Twitter: @hunleyd Web: douglasjhunley.com G+: http://goo.gl/sajR3
On Tue, Aug 14, 2012 at 8:27 AM, Doug Hunley <doug.hunley@gmail.com> wrote: > According to the docs on cluster: > if you tend to access some data more than others, and there is an > index that groups them together, you will benefit from using CLUSTER > > however, this doesn't address the situation where you have a > conditional index. It seems like it is not allowed. jjanes=# create index on pgbench_accounts (aid) where bid=33; jjanes=# cluster pgbench_accounts USING pgbench_accounts_aid_idx ; ERROR: cannot cluster on partial index "pgbench_accounts_aid_idx" But I don't see a fundamental reason it can't be allowed, maybe implementing that should be on the to-do list. Cheers, Jeff
On Tue, Aug 14, 2012 at 10:10:47AM -0700, Jeff Janes wrote: > On Tue, Aug 14, 2012 at 8:27 AM, Doug Hunley <doug.hunley@gmail.com> wrote: > > According to the docs on cluster: > > if you tend to access some data more than others, and there is an > > index that groups them together, you will benefit from using CLUSTER > > > > however, this doesn't address the situation where you have a > > conditional index. > > It seems like it is not allowed. > > jjanes=# create index on pgbench_accounts (aid) where bid=33; > jjanes=# cluster pgbench_accounts USING pgbench_accounts_aid_idx ; > ERROR: cannot cluster on partial index "pgbench_accounts_aid_idx" > > But I don't see a fundamental reason it can't be allowed, maybe > implementing that should be on the to-do list. > > Cheers, > > Jeff > It probably has to do with the fact that a conditional index, does not include every possible row in the table. Although, a "cluster" of the matching rows and then leave the rest in place, should work. How is that for hand-waving. :) Regards, Ken
On Tue, Aug 14, 2012 at 1:29 PM, ktm@rice.edu <ktm@rice.edu> wrote: > > It probably has to do with the fact that a conditional index, does > not include every possible row in the table. Although, a "cluster" of > the matching rows and then leave the rest in place, should work. How > is that for hand-waving. :) > That actually makes sense to me. Cluster the rows covered by that index, let the rest fall where they may. I'm typically only accessing the rows covered by that index, so I'd get the benefit of the cluster command but wouldn't have to spend cycles doing the cluster for rows I don't care about. -- Douglas J Hunley (doug.hunley@gmail.com) Twitter: @hunleyd Web: douglasjhunley.com G+: http://goo.gl/sajR3
> That actually makes sense to me. Cluster the rows covered by that > index, let the rest fall where they may. I'm typically only accessing > the rows covered by that index, so I'd get the benefit of the cluster > command but wouldn't have to spend cycles doing the cluster for rows I > don't care about. Sure, that's a feature request though. And thinking about it, I'm willing to bet that it's far harder to implement than it sounds. In the meantime, you could ad-hoc this by splitting the table into two partitions and clustering one of the two partitions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 08/15/12 14:05, Josh Berkus wrote: > >> That actually makes sense to me. Cluster the rows covered by that >> index, let the rest fall where they may. I'm typically only accessing >> the rows covered by that index, so I'd get the benefit of the cluster >> command but wouldn't have to spend cycles doing the cluster for rows I >> don't care about. > > Sure, that's a feature request though. And thinking about it, I'm > willing to bet that it's far harder to implement than it sounds. > > In the meantime, you could ad-hoc this by splitting the table into two > partitions and clustering one of the two partitions. Wouldn't creating a second index on the boolean itself and then clustering on that be much easier? Bosco.
On Wed, Aug 15, 2012 at 5:19 PM, Bosco Rama <postgres@boscorama.com> wrote: > On 08/15/12 14:05, Josh Berkus wrote: >> >>> That actually makes sense to me. Cluster the rows covered by that >>> index, let the rest fall where they may. I'm typically only accessing >>> the rows covered by that index, so I'd get the benefit of the cluster >>> command but wouldn't have to spend cycles doing the cluster for rows I >>> don't care about. >> >> Sure, that's a feature request though. And thinking about it, I'm >> willing to bet that it's far harder to implement than it sounds. How/where does file feature requests? >> >> In the meantime, you could ad-hoc this by splitting the table into two >> partitions and clustering one of the two partitions. > > Wouldn't creating a second index on the boolean itself and then clustering > on that be much easier? that's what I was looking into doing actuallly -- Douglas J Hunley (doug.hunley@gmail.com) Twitter: @hunleyd Web: douglasjhunley.com G+: http://goo.gl/sajR3
On Wed, Aug 15, 2012 at 6:43 AM, Doug Hunley <doug.hunley@gmail.com> wrote: > On Tue, Aug 14, 2012 at 1:29 PM, ktm@rice.edu <ktm@rice.edu> wrote: >> >> It probably has to do with the fact that a conditional index, does >> not include every possible row in the table. Although, a "cluster" of >> the matching rows and then leave the rest in place, should work. How >> is that for hand-waving. :) >> > > That actually makes sense to me. Cluster the rows covered by that > index, let the rest fall where they may. I'm typically only accessing > the rows covered by that index, so I'd get the benefit of the cluster > command but wouldn't have to spend cycles doing the cluster for rows I > don't care about. IIRC, there isn't currently an in-place version of CLUSTER, it always rewrites the entire table. So it would still have to do something with those rows, so that they show up in the new table. But it could just treat them all as equal to one another and have them be in whatever order they happen to fall in. Cheers, Jeff
On Wed, Aug 15, 2012 at 2:19 PM, Bosco Rama <postgres@boscorama.com> wrote: > On 08/15/12 14:05, Josh Berkus wrote: >> >>> That actually makes sense to me. Cluster the rows covered by that >>> index, let the rest fall where they may. I'm typically only accessing >>> the rows covered by that index, so I'd get the benefit of the cluster >>> command but wouldn't have to spend cycles doing the cluster for rows I >>> don't care about. >> >> Sure, that's a feature request though. And thinking about it, I'm >> willing to bet that it's far harder to implement than it sounds. >> >> In the meantime, you could ad-hoc this by splitting the table into two >> partitions and clustering one of the two partitions. > > Wouldn't creating a second index on the boolean itself and then clustering > on that be much easier? I would take an existing useful index, and build a new one on the same columns but with is_deleted prepended. That way, since you are going through the effort to rewrite the whole table anyway, the ties are broken in a way that might be of further use. Once the CLUSTER is done, the index might even be useful enough to keep around for use with queries, or even replace the original index altogether. Cheers, Jeff