Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
Date | |
Msg-id | CA+TgmoZCxSV_NVjkkcchH9fJZkFU82z5zGALQdH_LkFDdbNnxQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
List | pgsql-hackers |
On Tue, Apr 11, 2017 at 1:20 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > I don't know why you say that regressions are not addressed. Here are a few > things I did to address the regressions/reviews/concerns, apart from fixing > all the bugs discovered, but please let me know if there are things I've not > addressed. I'm making statements based on my perception of the discussion on the thread. Perhaps you did some work which you either didn't mention or I missed you mentioning it, but it sure didn't feel like all of the things reported got addressed. > 1. Improved the interesting attrs patch that Alvaro wrote to address the > regression discovered in fetching more heap attributes. The patch that got > committed in fact improved certain synthetic workloads over then master. Yep, though it was not clear that all of the regressing cases were actually addressed, at least not to me. > 2. Based on Petr and your feedback, disabled WARM on toasted attributes to > reduce overhead of fetching/decompressing the attributes. But that's not necessarily the right fix, as per http://postgr.es/m/CA+TgmoYUfxy1LseDzsw8uuuLUJHH0r8NCD-Up-HZMC1fYDPH3Q@mail.gmail.com and subsequent discussion. It's not clear to me from that discussion that we've got to a place where the method used to identify whether a WARM update happened during a scan is exactly identical to the method used to decide whether to perform one in the first place. > 3. Added code to avoid doing second index scan when the index does not > contain any WARM pointers. This should address the situation Amit brought up > where only one of the indexes receive WARM inserts > 4. Added code to kill wrong index pointers to do online cleanup. Good changes. > 5. Added code to set a CLEAR pointer to a WARM pointer when we know that the > entire chain is WARM. This should address the workload Dilip ran and found > regression (I don't think he got chance to confirm that) Which is clearly a thing that should happen before commit, and really, you ought to be leading the effort to confirm that, not him. It's good for him to verify that your fix worked, but you should test it first. > 6. Enhanced stats collector to collect information about candidate WARM > chains and added mechanism to control WARM cleanup at the heap as well as > index level, based on configurable parameters. This gives user better > control over the additional work that is required for WARM cleanup. I haven't seen previous discussion of this; therefore I doubt whether we have agreement on these parameters. > 7. Added table level option to disable WARM if nothing else works. -1 from me. > 8. Added mechanism to disable WARM when more than 50% indexes are being > updated. I ran some benchmarks with different percentage of indexes getting > updated and thought this is a good threshold. +1 from me. > I may have missed something, but there is no intention to ignore known > regressions/reviews. Of course, I don't think that every regression will be > solvable, like if you run a CPU-bound workload, setting it up in a way such > that you repeatedly exercise the area where WARM is doing additional work, > without providing any benefit, may be you can still find regression. I am > willing to fix them as long as they are fixable and we are comfortable with > the additional code complexity. IMHO certain trade-offs are good, but I > understand that not everybody will agree with my views and that's ok. The point here is that we can't make intelligent decisions about whether to commit this feature unless we know which situations get better and which get worse and by how much. I don't accept as a general principle the idea that CPU-bound workloads don't matter. Obviously, I/O-bound workloads matter too, but we can't throw CPU-bound workloads under the bus. Now, avoiding index bloat does also save CPU, so it is easy to imagine that WARM could come out ahead even if each update consumes slightly more CPU when actually updating, so we might not actually regress. If we do, I guess I'd want to know why. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: