Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
Date
Msg-id 201206270213.18367.andres@2ndquadrant.com
Whole thread Raw
In response to Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes  (Steve Singer <steve@ssinger.info>)
Responses Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
List pgsql-hackers
Hi Steve,

On Tuesday, June 26, 2012 02:14:22 AM Steve Singer wrote:
> I planned to have some cutoff 'max_changes_in_memory_per_txn' value.
> > If it has
> > been reached for one transaction all existing changes are spilled to
> > disk. New changes again can be kept in memory till its reached again.
> Do you want max_changes_per_in_memory_txn or do you want to put a limit
> on the total amount of memory that the cache is able to use? How are you
> going to tell a DBA to tune max_changes_in_memory_per_txn? They know how
> much memory their system has and that they can devote to the apply cache
> versus other things, giving them guidance on how estimating how much
> open transactions they might have at a point in time  and how many
> WAL change records each transaction generates seems like a step
> backwards from the progress we've been making in getting Postgresql to
> be easier to tune.  The maximum number of transactions that could be
> opened at a time is governed by max_connections on the master at the
> time the WAL was generated , so I don't even see how the machine
> processing the WAL records could autotune/guess that.
It even can be significantly higher than max_connections because 
subtransactions are only recognizable as part of their parent transaction 
uppon commit.

I think max_changes_in_memory_per_txn will be the number of changes for now. 
Making memory based accounting across multiple concurrent transactions work 
efficiently and correctly isn't easy.


> > We need to support serializing the cache for crash recovery + shutdown of
> > the receiving side as well. Depending on how we do the wal decoding we
> > will need it more frequently...
> Have you described your thoughts on crash recovery on another thread?
I think I have somewhere, but given how much in flux our thoughts on decoding 
are I think its not that important yet.

> I am thinking that this module would have to serialize some state
> everytime it calls cache->commit() to ensure that consumers don't get
> invoked twice on the same transaction.
In one of the other patches I implemented it by adding the (origin_id, 
origin_lsn) pair to replicated commits. During recovery the startup process 
sets up the shared memory status up to which point we applied.
If you then every now and then perform a 'logical checkpoint' writing down 
whats the beginning lsn of the longest in-progress transaction is you can 
fully recover from that point on.

> If the apply module is making changes to the same backend that the apply
> cache serializes to then both the state for the apply cache and the
> changes that committed changes/transactions make will be persisted (or
> not persisted) together.   What if I am replicating from x86 to x86_64
> via a apply module that does textout conversions?
> 
> x86         Proxy                                 x86_64
> ----WAL------> apply
>                       cache
> 
>                        |   (proxy catalog)
> 
>                       apply module
>                        textout  --------------------->
>                                        SQL statements
> 
> 
> How do we ensure that the commits are all visible(or not visible)  on
> the catalog on the proxy instance used for decoding WAL, the destination
> database, and the state + spill files of the apply cache stay consistent
> in the event of a crash of either the proxy or the target?
> I don't think you can (unless we consider two-phase commit, and I'd
> rather we didn't).  Can we come up with a way of avoiding the need for
> them to be consistent with each other?
Thats discussed in the "Catalog/Metadata consistency during changeset 
extraction from wal" thread and we haven't yet determined which solution is 
the best ;)

> Code Review
> =========
> 
> applycache.h
> -----------------------
> +typedef struct ApplyCacheTupleBuf
> +{
> +    /* position in preallocated list */
> +    ilist_s_node node;
> +
> +    HeapTupleData tuple;
> +    HeapTupleHeaderData header;
> +    char data[MaxHeapTupleSize];
> +} ApplyCacheTupleBuf;
> 
> Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big
> the data in the transaction is? Wouldn't workloads with inserts of lots
> of small rows in a transaction eat up lots of memory that is allocated
> but storing nothing?  The only alternative I can think of is dynamically
> allocating these and I don't know what the cost/benefit of that overhead
> will be versus spilling to disk sooner.
Dynamically allocating them totally destroys performance, I tried that. I 
think at some point we should have 4 or so list of preallocated tuple bufs of 
different sizes and then use the smallest possible one. But I think this 
solution is ok in the very first version.

If you allocate dynamically you also get a noticeable performance drop when 
you let the decoding run for a while because of fragmentation inside the 
memory allocator.

> +* FIXME: better name
> + */
> +ApplyCacheChange*
> +ApplyCacheGetChange(ApplyCache*);
> 
> How about:
> 
> ApplyCacheReserveChangeStruct(..)
> ApplyCacheReserveChange(...)
> ApplyCacheAllocateChange(...)
> 
> as ideas?
> +/*
> + * Return an unused ApplyCacheChange struct
>   +*/
> +void
> +ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);
> 
> ApplyCacheReleaseChange(...) ?  I keep thinking of 'Return' as us
> returning the data somewhere not the memory.
Hm. Reserve/Release doesn't sound bad. Acquire/Release is possibly even better 
because reserve could be understood as a preparatory step?

> applycache.c:
> -------------------
> 
> I've taken a quick look through this file and I don't see any issues
> other than the many FIXME's and other issues you've identified already,
> which I don't expect you to address in this CF.
Thanks for the review so far!

Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Posix Shared Mem patch
Next
From: "A.M."
Date:
Subject: Re: Posix Shared Mem patch