Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes |
Date | |
Msg-id | 201206270213.18367.andres@2ndquadrant.com Whole thread Raw |
In response to | Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes (Steve Singer <steve@ssinger.info>) |
Responses |
Re: [PATCH 08/16] Introduce the ApplyCache module which can
reassemble transactions from a stream of interspersed changes
|
List | pgsql-hackers |
Hi Steve, On Tuesday, June 26, 2012 02:14:22 AM Steve Singer wrote: > I planned to have some cutoff 'max_changes_in_memory_per_txn' value. > > If it has > > been reached for one transaction all existing changes are spilled to > > disk. New changes again can be kept in memory till its reached again. > Do you want max_changes_per_in_memory_txn or do you want to put a limit > on the total amount of memory that the cache is able to use? How are you > going to tell a DBA to tune max_changes_in_memory_per_txn? They know how > much memory their system has and that they can devote to the apply cache > versus other things, giving them guidance on how estimating how much > open transactions they might have at a point in time and how many > WAL change records each transaction generates seems like a step > backwards from the progress we've been making in getting Postgresql to > be easier to tune. The maximum number of transactions that could be > opened at a time is governed by max_connections on the master at the > time the WAL was generated , so I don't even see how the machine > processing the WAL records could autotune/guess that. It even can be significantly higher than max_connections because subtransactions are only recognizable as part of their parent transaction uppon commit. I think max_changes_in_memory_per_txn will be the number of changes for now. Making memory based accounting across multiple concurrent transactions work efficiently and correctly isn't easy. > > We need to support serializing the cache for crash recovery + shutdown of > > the receiving side as well. Depending on how we do the wal decoding we > > will need it more frequently... > Have you described your thoughts on crash recovery on another thread? I think I have somewhere, but given how much in flux our thoughts on decoding are I think its not that important yet. > I am thinking that this module would have to serialize some state > everytime it calls cache->commit() to ensure that consumers don't get > invoked twice on the same transaction. In one of the other patches I implemented it by adding the (origin_id, origin_lsn) pair to replicated commits. During recovery the startup process sets up the shared memory status up to which point we applied. If you then every now and then perform a 'logical checkpoint' writing down whats the beginning lsn of the longest in-progress transaction is you can fully recover from that point on. > If the apply module is making changes to the same backend that the apply > cache serializes to then both the state for the apply cache and the > changes that committed changes/transactions make will be persisted (or > not persisted) together. What if I am replicating from x86 to x86_64 > via a apply module that does textout conversions? > > x86 Proxy x86_64 > ----WAL------> apply > cache > > | (proxy catalog) > > apply module > textout ---------------------> > SQL statements > > > How do we ensure that the commits are all visible(or not visible) on > the catalog on the proxy instance used for decoding WAL, the destination > database, and the state + spill files of the apply cache stay consistent > in the event of a crash of either the proxy or the target? > I don't think you can (unless we consider two-phase commit, and I'd > rather we didn't). Can we come up with a way of avoiding the need for > them to be consistent with each other? Thats discussed in the "Catalog/Metadata consistency during changeset extraction from wal" thread and we haven't yet determined which solution is the best ;) > Code Review > ========= > > applycache.h > ----------------------- > +typedef struct ApplyCacheTupleBuf > +{ > + /* position in preallocated list */ > + ilist_s_node node; > + > + HeapTupleData tuple; > + HeapTupleHeaderData header; > + char data[MaxHeapTupleSize]; > +} ApplyCacheTupleBuf; > > Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big > the data in the transaction is? Wouldn't workloads with inserts of lots > of small rows in a transaction eat up lots of memory that is allocated > but storing nothing? The only alternative I can think of is dynamically > allocating these and I don't know what the cost/benefit of that overhead > will be versus spilling to disk sooner. Dynamically allocating them totally destroys performance, I tried that. I think at some point we should have 4 or so list of preallocated tuple bufs of different sizes and then use the smallest possible one. But I think this solution is ok in the very first version. If you allocate dynamically you also get a noticeable performance drop when you let the decoding run for a while because of fragmentation inside the memory allocator. > +* FIXME: better name > + */ > +ApplyCacheChange* > +ApplyCacheGetChange(ApplyCache*); > > How about: > > ApplyCacheReserveChangeStruct(..) > ApplyCacheReserveChange(...) > ApplyCacheAllocateChange(...) > > as ideas? > +/* > + * Return an unused ApplyCacheChange struct > +*/ > +void > +ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*); > > ApplyCacheReleaseChange(...) ? I keep thinking of 'Return' as us > returning the data somewhere not the memory. Hm. Reserve/Release doesn't sound bad. Acquire/Release is possibly even better because reserve could be understood as a preparatory step? > applycache.c: > ------------------- > > I've taken a quick look through this file and I don't see any issues > other than the many FIXME's and other issues you've identified already, > which I don't expect you to address in this CF. Thanks for the review so far! Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: