Thread: Re: [HACKERS] WARM and indirect indexes

Re: [HACKERS] WARM and indirect indexes

From
Pavan Deolasee
Date:


On Wed, Jan 11, 2017 at 7:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Jan 10, 2017 at 04:24:42PM -0300, Alvaro Herrera wrote:
> Two options are on the table to attack the problem of updates causing
> write amplification: WARM and indirect indexes.  They are completely
> different approaches but have overlapping effects on what scenarios are
> improved.  Here's a recap of both features, with the intent that we make
> a well-considered decision about each.
>
> The main effect of both features is that an updated tuple doesn't
> require updating indexes that are on unmodified columns.  Indirect
> indexes are a completely new server feature which may enable other
> refinements later on; WARM is a targeted optimization on top of the HOT
> optimization.
>
> The big advantage of WARM is that it works automatically, like HOT: the
> user doesn't need to do anything different than today to get the
> benefit.  With indirect indexes, the user needs to create the index as
> indirect explicitely.

Thank you for the summary.  I think we have to consider two things with
indirect indexes:

1.  What percentage speedup is the _average_ user going to get?  You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2.  What percentage of users are going to use indirect indexes?

That could also be seen as an advantage to indirect indexes. While I haven't seen the code, I believe indirect index code will only be hit if someone actually uses them. So there won't be any overhead for other users who do not wish to use the feature. WARM on the other hand will be "always on" feature, even for system tables. That clearly has advantages, both from usability perspective as well as the fact that the code will be heavily tested. But if there are cases which get adversely affected by WARM, they will have to pay the price for larger benefit.

To me, a better strategy is probably to focus on one of the patches, get that in and then evaluate the second patch, both from complexity as well as performance given that the first patch may have narrowed the gaps.

I was going to ask if we could implement indirect indexes as a separate IndexAM. But I re-read this thread and found that you'd in fact done it that way in the first version but then discarded it for performance reasons. Is there a merit in evaluating that path for indirect indexes again?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WARM and indirect indexes

From
Alvaro Herrera
Date:
Pavan Deolasee wrote:

> I was going to ask if we could implement indirect indexes as a separate
> IndexAM. But I re-read this thread and found that you'd in fact done it
> that way in the first version but then discarded it for performance
> reasons. Is there a merit in evaluating that path for indirect indexes
> again?

Yeah, that was my first approach, and I got it to work to some extent,
but the design felt wrong.  What I wrote was "ibtree", an indirect
version of the btree AM.  The performance wasn't any better than the
current one (though neither have been optimized at all), and the code
felt very ugly, probably because it was poking holes into abstraction
layers.  I also had to duplicate all pg_amop/pg_amproc catalog entries,
etc.

Doing it as a new capability on top of an existing index AM feels much
more natural and seems to lead to a more reasonable model, all things
considered.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WARM and indirect indexes

From
Bruce Momjian
Date:
On Wed, Jan 11, 2017 at 12:36:24PM +0530, Pavan Deolasee wrote:
> That could also be seen as an advantage to indirect indexes. While I haven't
> seen the code, I believe indirect index code will only be hit if someone
> actually uses them. So there won't be any overhead for other users who do not
> wish to use the feature. WARM on the other hand will be "always on" feature,
> even for system tables. That clearly has advantages, both from usability
> perspective as well as the fact that the code will be heavily tested. But if
> there are cases which get adversely affected by WARM, they will have to pay the
> price for larger benefit.
> 
> To me, a better strategy is probably to focus on one of the patches, get that
> in and then evaluate the second patch, both from complexity as well as
> performance given that the first patch may have narrowed the gaps.

Yes, that is exactly what I suggested in a post I just sent to this
thread.  With WARM always-on, it seems like the best thing to do first
because everyone will use it silently, and we can then decide if a user
controlled feature is warranted, and under what circumstances we should
recommend the user of the feature.

However, I am concerned that doing the two features serially (not in
parallel) might mean that the second feature doesn't make it into
Postgres 10, but considering we will live with this feature probably
forever, I think it is the best course.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +