Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
Date | |
Msg-id | CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) |
List | pgsql-hackers |
On Thu, Jan 26, 2017 at 2:38 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Looking at your 0002 patch now. It no longer applies, but the conflicts
are trivial to fix. Please rebase and resubmit.
Please see rebased and updated patches attached.
I think having the "recheck" index methods create an ExecutorState looks
out of place. How difficult is it to pass the estate from the calling
code?
I couldn't find a good way to pass estate from the calling code. It would require changes to many other APIs. I saw all other callers who need to form index keys do that too. But please suggest if there are better ways.
OffsetNumber
heap_get_root_tuple(Page page, OffsetNumber target_offnum)
{
OffsetNumber off;
heap_get_root_tuples_internal(page, target_offnum, &off);
return off;
}
Ok. Changed this way. Definitely looks better.
The simple_heap_update + CatalogUpdateIndexes pattern is getting
obnoxious. How about creating something like catalog_heap_update which
does both things at once, and stop bothering each callsite with the WARM
stuff?
What I realised that there are really 2 patterns:
1. simple_heap_insert, CatalogUpdateIndexes
2. simple_heap_update, CatalogUpdateIndexes
There are only couple of places where we already have indexes open or have more than one tuple to update, so we call CatalogIndexInsert directly. What I ended up doing in the attached patch is add two new APIs which combines the two steps of each of these patterns. It seems much cleaner to me and also less buggy for future users. I hope I am not missing a reason not to do combine these steps.
I'm not real sure about the interface between index AM and executor,
namely IndexScanDesc->xs_tuple_recheck. For example, this pattern:
if (!scan->xs_recheck)
scan->xs_tuple_recheck = false;
else
scan->xs_tuple_recheck = true;
can become simply
scan->xs_tuple_recheck = scan->xs_recheck;
Fixed.
which looks odd. I can't pinpoint exactly what's the problem, though.
I'll continue looking at this one.
What we do is if the index scan is marked to do recheck, we do it for each tuple anyways. Otherwise recheck is required only if a tuple comes from a WARM chain.
I wonder if heap_hot_search_buffer() and heap_hot_search() should return
a tri-valued enum instead of boolean; that idea looks reasonable in
theory but callers have to do more work afterwards, so maybe not.
I did not do anything with this yet. But I agree with you that we need to make it better/simpler. Will continue to work on that.
I've addressed other review comments on the 0001 patch, except this one.
> +/*
> + * Get TID of next tuple in the update chain. Caller should have checked that
> + * we are not already at the end of the chain because in that case t_ctid may
> + * actually store the root line pointer of the HOT chain whose member this
> + * tuple is.
> + */
> +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> +do { \
> + AssertMacro(!((tup)->t_infoma sk2 & HEAP_LATEST_TUPLE)); \
> + ItemPointerCopy(&(tup)->t_cti d, (next_ctid)); \
> +} while (0)
> Actually, I think this macro could just return the TID so that it can be
> used as struct assignment, just like ItemPointerCopy does internally --
> callers can do
> ctid = HeapTupleHeaderGetNextTid(tup) ;
> +/*
> + * Get TID of next tuple in the update chain. Caller should have checked that
> + * we are not already at the end of the chain because in that case t_ctid may
> + * actually store the root line pointer of the HOT chain whose member this
> + * tuple is.
> + */
> +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> +do { \
> + AssertMacro(!((tup)->t_infoma
> + ItemPointerCopy(&(tup)->t_cti
> +} while (0)
> Actually, I think this macro could just return the TID so that it can be
> used as struct assignment, just like ItemPointerCopy does internally --
> callers can do
> ctid = HeapTupleHeaderGetNextTid(tup)
While I agree with your proposal, I wonder why we have ItemPointerCopy() in the first place because we freely copy TIDs as struct assignment. Is there a reason for that? And if there is, does it impact this specific case?
Other than the review comments, there were couple of bugs that I discovered while running the stress test notably around visibility map handling. The patch has those fixes. I also ripped out the kludge to record WARM-ness in the line pointer because that is no longer needed after I reworked the code a few versions back.
The other critical bug I found, which unfortunately exists in the master too, is the index corruption during CIC. The patch includes the same fix that I've proposed on the other thread. With these changes, WARM stress is running fine for last 24 hours on a decently powerful box. Multiple CREATE/DROP INDEX cycles and updates via different indexed columns, with a mix of FOR SHARE/UPDATE and rollbacks did not produce any consistency issues. A side note: while performance measurement wasn't a goal of stress tests, WARM has done about 67% more transaction than master in 24 hour period (95M in master vs 156M in WARM to be precise on a 30GB table including indexes). I believe the numbers would be far better had the test not dropping and recreating the indexes, thus effectively cleaning up all index bloats. Also the table is small enough to fit in the shared buffers. I'll rerun these tests with much larger scale factor and without dropping indexes.
Of course, make check-world, including all TAP tests, passes too.
The CREATE INDEX CONCURRENTLY now works. The way we handle this is by ensuring that no broken WARM chains are created while the initial index build is happening. We check the list of attributes of indexes currently in-progress (i.e. not ready for inserts) and if any of these attributes are being modified, we don't do a WARM update. This is enough to address CIC issue and all other mechanisms remain same as HOT. I've updated README to include CIC algorithm.
There is one issue that bothers me. The current implementation lacks ability to convert WARM chains into HOT chains. The README.WARM has some proposal to do that. But it requires additional free bit in tuple header (which we don't have) and of course, it needs to be vetted and implemented. If the heap ends up with many WARM tuples, then index-only-scans will become ineffective because index-only-scan can not skip a heap page, if it contains a WARM tuple. Alternate ideas/suggestions and review of the design are welcome!
Thanks,
Pavan
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: