Home > mailing lists

Re: Some questions about mammoth replication - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Some questions about mammoth replication
Date	October 12, 2007 07:47:56
Msg-id	1192186064.16408.7.camel@hannu-laptop Whole thread Raw
In response to	Re: Some questions about mammoth replication (Alexey Klyukin <alexk@commandprompt.com>)
Responses	Re: Some questions about mammoth replication
List	pgsql-hackers

Tree view

Ühel kenal päeval, R, 2007-10-12 kell 12:39, kirjutas Alexey Klyukin:
> Hannu Krosing wrote:
> 
> > > We have hooks in executor calling our own collecting functions, so we
> > > don't need the trigger machinery to launch replication.
> > 
> > But where do you store the collected info - in your own replication_log
> > table, or do reuse data in WAL you extract it on master befor
> > replication to slave (or on slave after moving the WAL) ?
> 
> We don't use either a log table in database or WAL. The data to
> replicate is stored in disk files, one per transaction.

Clever :)

How well does it scale ? That is, at what transaction rate can your
replication keep up with database ?

>  As Joshua said,
> the WAL is used to ensure that only those transactions that are recorded
> as committed in WAL are sent to slaves.

How do you force correct commit order of applying the transactions ?

> > 
> > > > Do you make use of snapshot data, to make sure, what parts of WAL log
> > > > are worth migrating to slaves , or do you just apply everything in WAL
> > > > in separate transactions and abort if you find out that original
> > > > transaction aborted ?
> > > 
> > > We check if a data transaction is recorded in WAL before sending
> > > it to a slave. For an aborted transaction we just discard all data collected 
> > > from that transaction.
> > 
> > Do you duplicate postgresql's MVCC code for that, or will this happen
> > automatically via using MVCC itself for collected data ?
> 
> Every transaction command that changes data in a replicated relation is
> stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
> when transaction commands are replayed there.

Do you replay several transaction files in the same transaction on
slave ?

Can you replay several transaction files in parallel ?

> > How do you handle really large inserts/updates/deletes, which change say 10M 
> > rows in one transaction ?
> 
> We produce really large disk files ;). When a transaction commits - a
> special queue lock is acquired and transaction is enqueued to a sending
> queue. 
> Since the locking mode for that lock is exclusive a commit of a
> very large transaction would delay commits of other transactions until
> the lock is held. We are working on minimizing the time of holding this
> lock in the new version of Replicator.

Why does it take longer to queue a large file ? dou you copy data from
one file to another ?

> > > > Do you extract / generate full sql DML queries from data in WAL logs, or
> > > > do you apply the changes at some lower level ?
> > > 
> > > We replicate the binary data along with a command type. Only the data
> > > necessary to replay the command on a slave are replicated.
> > 
> > Do you replay it as SQL insert/update/delete commands, or directly on
> > heap/indexes ?
> 
> We replay the commands directly using heap/index functions on a slave.

Does that mean that the table structures will be exactly the same on
both master slave ? That is, do you replicate a physical table image
(maybe not including transaction ids on master) ?

Or you just use lower-level versions on INSERT/UPDATE/DELETE ?

---------------------
Hannu

pgsql-hackers by date:

From: Magnus Hagander
Date: 12 October 2007, 07:41:50
Subject: Re: ECPG regression tests

From: Simon Riggs
Date: 12 October 2007, 08:14:59
Subject: Re: First steps with 8.3 and autovacuum launcher

Re: Some questions about mammoth replication - Mailing list pgsql-hackers

Previous

Next