Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
Date | |
Msg-id | CABOikdNDO3zhCgFWwgBCUs=xhctdXNbNpfdeA9uNSz_CeOFmsA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) (Peter Geoghegan <pg@bowt.ie>) |
List | pgsql-hackers |
On Thu, Apr 13, 2017 at 2:04 AM, Peter Geoghegan <pg@bowt.ie> wrote:
Thanks,On Wed, Apr 12, 2017 at 10:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I may have missed something, but there is no intention to ignore known
>> regressions/reviews. Of course, I don't think that every regression will be
>> solvable, like if you run a CPU-bound workload, setting it up in a way such
>> that you repeatedly exercise the area where WARM is doing additional work,
>> without providing any benefit, may be you can still find regression. I am
>> willing to fix them as long as they are fixable and we are comfortable with
>> the additional code complexity. IMHO certain trade-offs are good, but I
>> understand that not everybody will agree with my views and that's ok.
>
> The point here is that we can't make intelligent decisions about
> whether to commit this feature unless we know which situations get
> better and which get worse and by how much. I don't accept as a
> general principle the idea that CPU-bound workloads don't matter.
> Obviously, I/O-bound workloads matter too, but we can't throw
> CPU-bound workloads under the bus. Now, avoiding index bloat does
> also save CPU, so it is easy to imagine that WARM could come out ahead
> even if each update consumes slightly more CPU when actually updating,
> so we might not actually regress. If we do, I guess I'd want to know
> why.
I myself wonder if this CPU overhead is at all related to LP_DEAD
recycling during page splits.
With the respect to the tests that myself, Dilip and others did for WARM, I think we were kinda exercising the worst case scenario. Like in one case, we created a table with 40% fill factor, created an index with a large text column, WARM updated all rows in the table, turned off autovacuum so that chain conversion does not take place, and then repeatedly run select query on those rows using the index which did not receive WARM insert.
IOW we were only measuring the overhead of doing recheck by constructing an index tuple from the heap tuple and then comparing it against the existing index tuple. And we did find regression, which is not entirely surprising because obviously that code path does extra work when it needs to do recheck. And we're only measuring that overhead without taking into account the benefits of WARM to the system in general. I think counter-argument to that is, such workload may exists somewhere which might be regressed.
I have my suspicions that the recyling
has some relationship to locality, which leads me to want to
investigate how Claudio Freire's patch to consistently treat heap TID
as part of the B-Tree sort order could help, both in general, and for
WARM.
It could be, especially if we re-redesign recheck solely based on the index pointer state and the heap tuple state. That could be more performant for selects and could also be more robust, but will require index inserts to get hold of the old index pointer (based on root TID), compare it against the new index tuple and either skip the insert (if everything matches) or set a PREWARM flag on the old pointer, and insert the new tuple with POSTWARM flag.
Searching for old index pointer will be non-starter for non-unique indexes, unless they are also sorted by TID, something that Claudio's patch does. What I am not sure is whether the patch on its own will stand the performance implications because it increases the index tuple width (and probably index maintenance cost too).
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: