Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CA+TgmoZCxSV_NVjkkcchH9fJZkFU82z5zGALQdH_LkFDdbNnxQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Peter Geoghegan <pg@bowt.ie>)
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Tue, Apr 11, 2017 at 1:20 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> I don't know why you say that regressions are not addressed. Here are a few
> things I did to address the regressions/reviews/concerns, apart from fixing
> all the bugs discovered, but please let me know if there are things I've not
> addressed.

I'm making statements based on my perception of the discussion on the
thread.  Perhaps you did some work which you either didn't mention or
I missed you mentioning it, but it sure didn't feel like all of the
things reported got addressed.

> 1. Improved the interesting attrs patch that Alvaro wrote to address the
> regression discovered in fetching more heap attributes. The patch that got
> committed in fact improved certain synthetic workloads over then master.

Yep, though it was not clear that all of the regressing cases were
actually addressed, at least not to me.

> 2. Based on Petr and your feedback, disabled WARM on toasted attributes to
> reduce overhead of fetching/decompressing the attributes.

But that's not necessarily the right fix, as per
http://postgr.es/m/CA+TgmoYUfxy1LseDzsw8uuuLUJHH0r8NCD-Up-HZMC1fYDPH3Q@mail.gmail.com
and subsequent discussion.  It's not clear to me from that discussion
that we've got to a place where the method used to identify whether a
WARM update happened during a scan is exactly identical to the method
used to decide whether to perform one in the first place.

> 3. Added code to avoid doing second index scan when the index does not
> contain any WARM pointers. This should address the situation Amit brought up
> where only one of the indexes receive WARM inserts
> 4. Added code to kill wrong index pointers to do online cleanup.

Good changes.

> 5. Added code to set a CLEAR pointer to a WARM pointer when we know that the
> entire chain is WARM. This should address the workload Dilip ran and found
> regression (I don't think he got chance to confirm that)

Which is clearly a thing that should happen before commit, and really,
you ought to be leading the effort to confirm that, not him.  It's
good for him to verify that your fix worked, but you should test it
first.

> 6. Enhanced stats collector to collect information about candidate WARM
> chains and added mechanism to control WARM cleanup at the heap as well as
> index level, based on configurable parameters. This gives user better
> control over the additional work that is required for WARM cleanup.

I haven't seen previous discussion of this; therefore I doubt whether
we have agreement on these parameters.

> 7. Added table level option to disable WARM if nothing else works.

-1 from me.

> 8. Added mechanism to disable WARM when more than 50% indexes are being
> updated. I ran some benchmarks with different percentage of indexes getting
> updated and thought this is a good threshold.

+1 from me.

> I may have missed something, but there is no intention to ignore known
> regressions/reviews. Of course, I don't think that every regression will be
> solvable, like if you run a CPU-bound workload, setting it up in a way such
> that you repeatedly exercise the area where WARM is doing additional work,
> without providing any benefit, may be you can still find regression. I am
> willing to fix them as long as they are fixable and we are comfortable with
> the additional code complexity. IMHO certain trade-offs are good, but I
> understand that not everybody will agree with my views and that's ok.

The point here is that we can't make intelligent decisions about
whether to commit this feature unless we know which situations get
better and which get worse and by how much.  I don't accept as a
general principle the idea that CPU-bound workloads don't matter.
Obviously, I/O-bound workloads matter too, but we can't throw
CPU-bound workloads under the bus.  Now, avoiding index bloat does
also save CPU, so it is easy to imagine that WARM could come out ahead
even if each update consumes slightly more CPU when actually updating,
so we might not actually regress.  If we do, I guess I'd want to know
why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Some thoughts about SCRAM implementation
Next
From: Álvaro Hernández Tortosa
Date:
Subject: Re: [HACKERS] Some thoughts about SCRAM implementation