Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CABOikdNKE4HmAha-wQqAnduRgh1tLhSX8ON7Vt2OmKW8SYe1vA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Wed, Apr 12, 2017 at 10:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 11, 2017 at 1:20 PM, Pavan Deolasee
 
> 5. Added code to set a CLEAR pointer to a WARM pointer when we know that the
> entire chain is WARM. This should address the workload Dilip ran and found
> regression (I don't think he got chance to confirm that)

Which is clearly a thing that should happen before commit, and really,
you ought to be leading the effort to confirm that, not him.  It's
good for him to verify that your fix worked, but you should test it
first.

Not sure why you think I did not do the tests. I did and reported that it helps reduce the regression. Last para here:  https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com
 
I understand it might have got lost in the conversation and I possibly did a poor job of explaining it. From my perspective, I did not want say that everything is hunky-dory based on my own tests because 1. I probably do not have access to the same kind of machine Dilip has and 2. It's better to get it confirmed by someone who initially reported it. Again, I fully respect that he would be busy with other things and I do not expect him or anyone else to test/review my patch on priority. The only point I am trying to make is that I did my own tests and made sure that it helps.

(Having said that, I am not sure if changing pointer state from CLEAR to WARM is indeed a good change. Having thought more about it and after looking at the page-split code, I now think that this might just confuse the WARM cleanup code and make algorithm that much harder to prove)


> 6. Enhanced stats collector to collect information about candidate WARM
> chains and added mechanism to control WARM cleanup at the heap as well as
> index level, based on configurable parameters. This gives user better
> control over the additional work that is required for WARM cleanup.

I haven't seen previous discussion of this; therefore I doubt whether
we have agreement on these parameters.

Sure. I will bring these up in a more structured manner for everyone to comment. 
 

> 7. Added table level option to disable WARM if nothing else works.

-1 from me.

Ok. It's kinda last resort for me too. But at some point, we might want to make that call if we find an important use case that regresses because of WARM and we see no way to fix that or at least not without a whole lot of complexity.
 


> I may have missed something, but there is no intention to ignore known
> regressions/reviews. Of course, I don't think that every regression will be
> solvable, like if you run a CPU-bound workload, setting it up in a way such
> that you repeatedly exercise the area where WARM is doing additional work,
> without providing any benefit, may be you can still find regression. I am
> willing to fix them as long as they are fixable and we are comfortable with
> the additional code complexity. IMHO certain trade-offs are good, but I
> understand that not everybody will agree with my views and that's ok.

The point here is that we can't make intelligent decisions about
whether to commit this feature unless we know which situations get
better and which get worse and by how much.

Sure.
 
  I don't accept as a
general principle the idea that CPU-bound workloads don't matter.
Obviously, I/O-bound workloads matter too, but we can't throw
CPU-bound workloads under the bus.

Yeah, definitely not suggesting that. 
 
  Now, avoiding index bloat does
also save CPU, so it is easy to imagine that WARM could come out ahead
even if each update consumes slightly more CPU when actually updating,
so we might not actually regress.  If we do, I guess I'd want to know
why.

Well the kind of tests we did to look for regression were worst case scenarios. For example, in the test where we found 10-15% regression, we used a wide index (so recheck cost is high), WARM updated all rows, disabled auto-vacuum (so no chain conversion) and then repeatedly selected the rows from the index, thus incurring recheck overhead and in fact, measuring only that. 

When I measured WARM on tables with small scale factor so that everything fits in memory, I found a modest 20% improvement in tps. So, you're right, WARM might also help in-memory workloads. But that will show up only if we measure UPDATEs and SELECTs both. If we measure only SELECTs and that too in a state where we are paying all price for having done a WARM update, obviously we will only see regression, if any. Not saying we should ignore that. We should in fact measure all possible loads, and try to fix as many as we can, especially if they resemble to a real-world use case,  but there will be a trade-off to make. So I highly appreciate Amit and Dilip's help with coming up additional tests. At least it gives us opportunity to think how to fix them, even if we can't fix all of them.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Next
From: Andrew Borodin
Date:
Subject: Re: [HACKERS] Merge join for GiST