Thread: Some questions about mammoth replication
> > btw, can you publicly discuss how CommandPrompts WAL-based > replication works ? It's my company, if course I am ;)... but not in this thread. If you are interested feel free to email me directly or start a new thread. Good :) Here come my questions : >From looking at http://www.commandprompt.com/images/MR_components.jpg it seems that you don't do replication just from WAL logs, but also collect some extra info inside postgreSQL server. Is this so ? If it is, then in what way does it differ from simple trigger-based change logging ? Do you make use of snapshot data, to make sure, what parts of WAL log are worth migrating to slaves , or do you just apply everything in WAL in separate transactions and abort if you find out that original transaction aborted ? Are your slaves a) standby b) read-only or c) read-write ? Do you extract / generate full sql DML queries from data in WAL logs, or do you apply the changes at some lower level ? For what use cases do you think your WAL-based approach is better than Slony/Skytools trigger-based one ? -------------- Hannu
Hello, Hannu Krosing wrote: > > Here come my questions : > > >From looking at http://www.commandprompt.com/images/MR_components.jpg it > seems that you don't do replication just from WAL logs, but also collect > some extra info inside postgreSQL server. Is this so ? > > If it is, then in what way does it differ from simple trigger-based > change logging ? We have hooks in executor calling our own collecting functions, so we don't need the trigger machinery to launch replication. > Do you make use of snapshot data, to make sure, what parts of WAL log > are worth migrating to slaves , or do you just apply everything in WAL > in separate transactions and abort if you find out that original > transaction aborted ? We check if a data transaction is recorded in WAL before sending it to a slave. For an aborted transaction we just discard all data collected from that transaction. > > Are your slaves a) standby b) read-only or c) read-write ? Replicated relations are read-only on slaves. > > Do you extract / generate full sql DML queries from data in WAL logs, or > do you apply the changes at some lower level ? We replicate the binary data along with a command type. Only the data necessary to replay the command on a slave are replicated. > > For what use cases do you think your WAL-based approach is better than > Slony/Skytools trigger-based one ? A pure trigger based approach can only replicate data for the commands which fire triggers. AFAIK Slony is unable to replicate TRUNCATE command (I don't know if Skytools can). Replicator doesn't have this limitation. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alexey Klyukin wrote: > > >> For what use cases do you think your WAL-based approach is better than >> Slony/Skytools trigger-based one ? >> > > A pure trigger based approach can only replicate data for the commands > which fire triggers. AFAIK Slony is unable to replicate TRUNCATE > command It could be wrapped with ddl_script which obviously isn't transparent to the application, but I guess a table that's truncated regularly won't be a typical candidate for (async) replication either. Regards, Andreas
On 10/11/07, Alexey Klyukin <alexk@commandprompt.com> wrote: > Hannu Krosing wrote: > > For what use cases do you think your WAL-based approach is better than > > Slony/Skytools trigger-based one ? > > A pure trigger based approach can only replicate data for the commands > which fire triggers. AFAIK Slony is unable to replicate TRUNCATE > command (I don't know if Skytools can). Replicator doesn't have this > limitation. No, Skytools is same as Slony. Can you also replicate changes to system tables? -- marko
Marko Kreen wrote: > On 10/11/07, Alexey Klyukin <alexk@commandprompt.com> wrote: > > Hannu Krosing wrote: > > > For what use cases do you think your WAL-based approach is better than > > > Slony/Skytools trigger-based one ? > > > > A pure trigger based approach can only replicate data for the commands > > which fire triggers. AFAIK Slony is unable to replicate TRUNCATE > > command (I don't know if Skytools can). Replicator doesn't have this > > limitation. > > No, Skytools is same as Slony. > > Can you also replicate changes to system tables? No, we need a table to have primary key to be able to replicate it. From the other hand replicating a system relation can be dangerous, i.e. what if you replicate the contents of pg_class without corresponing relations on the slave, that's why explicitly forbid enablingreplication for relations from pg_catalog namespace. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 11 Oct 2007 19:10:18 +0300 Alexey Klyukin <alexk@commandprompt.com> wrote: > Marko Kreen wrote: > > On 10/11/07, Alexey Klyukin <alexk@commandprompt.com> wrote: > > > Hannu Krosing wrote: > > > > For what use cases do you think your WAL-based approach is > > > > better than Slony/Skytools trigger-based one ? > > > > > > A pure trigger based approach can only replicate data for the > > > commands which fire triggers. AFAIK Slony is unable to replicate > > > TRUNCATE command (I don't know if Skytools can). Replicator > > > doesn't have this limitation. > > > > No, Skytools is same as Slony. > > > > Can you also replicate changes to system tables? > > No, we need a table to have primary key to be able to replicate it. > From the other hand replicating a system relation can be dangerous, > i.e. what if you replicate the contents of pg_class without > corresponing relations on the slave, that's why explicitly forbid > enabling replication for relations from pg_catalog namespace. We can however replicate large objects and specific system commands like GRANT/REVOKE and CREATE/DROP user. The latter being done via triggers. Joshua D. Drake > > Regards, -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Ühel kenal päeval, N, 2007-10-11 kell 18:25, kirjutas Alexey Klyukin: > Hello, > > Hannu Krosing wrote: > > > > Here come my questions : > > > > >From looking at http://www.commandprompt.com/images/MR_components.jpg it > > seems that you don't do replication just from WAL logs, but also collect > > some extra info inside postgreSQL server. Is this so ? > > > > If it is, then in what way does it differ from simple trigger-based > > change logging ? > > We have hooks in executor calling our own collecting functions, so we > don't need the trigger machinery to launch replication. But where do you store the collected info - in your own replication_log table, or do reuse data in WAL you extract it on master befor replication to slave (or on slave after moving the WAL) ? > > Do you make use of snapshot data, to make sure, what parts of WAL log > > are worth migrating to slaves , or do you just apply everything in WAL > > in separate transactions and abort if you find out that original > > transaction aborted ? > > We check if a data transaction is recorded in WAL before sending > it to a slave. For an aborted transaction we just discard all data collected > from that transaction. Do you duplicate postgresql's MVCC code for that, or will this happen automatically via using MVCC itself for collected data ? How do you handle really large inserts/updates/deletes, which change say 10M rows in one transaction ? > > Are your slaves a) standby b) read-only or c) read-write ? > > Replicated relations are read-only on slaves. > > > > > Do you extract / generate full sql DML queries from data in WAL logs, or > > do you apply the changes at some lower level ? > > We replicate the binary data along with a command type. Only the data > necessary to replay the command on a slave are replicated. Do you replay it as SQL insert/update/delete commands, or directly on heap/indexes ? ------------ Hannu
On Thu, 11 Oct 2007 21:58:45 +0300 Hannu Krosing <hannu@skype.net> wrote: > > We have hooks in executor calling our own collecting functions, so > > we don't need the trigger machinery to launch replication. > > But where do you store the collected info - in your own > replication_log table, No, we have our own transaction log outside the database. > or do reuse data in WAL you extract it on > master befor replication to slave (or on slave after moving the WAL) ? > We currently make zero use of WAL except to verify that the replication data can be written. > > > Do you make use of snapshot data, to make sure, what parts of WAL > > > log are worth migrating to slaves , or do you just apply > > > everything in WAL in separate transactions and abort if you find > > > out that original transaction aborted ? > > > > We check if a data transaction is recorded in WAL before sending > > it to a slave. For an aborted transaction we just discard all data > > collected from that transaction. > > Do you duplicate postgresql's MVCC code for that, or will this happen > automatically via using MVCC itself for collected data ? > Gonna have to wait for Alexey for the rest. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Hannu Krosing wrote: > > We have hooks in executor calling our own collecting functions, so we > > don't need the trigger machinery to launch replication. > > But where do you store the collected info - in your own replication_log > table, or do reuse data in WAL you extract it on master befor > replication to slave (or on slave after moving the WAL) ? We don't use either a log table in database or WAL. The data to replicate is stored in disk files, one per transaction. As Joshua said, the WAL is used to ensure that only those transactions that are recorded as committed in WAL are sent to slaves. > > > > Do you make use of snapshot data, to make sure, what parts of WAL log > > > are worth migrating to slaves , or do you just apply everything in WAL > > > in separate transactions and abort if you find out that original > > > transaction aborted ? > > > > We check if a data transaction is recorded in WAL before sending > > it to a slave. For an aborted transaction we just discard all data collected > > from that transaction. > > Do you duplicate postgresql's MVCC code for that, or will this happen > automatically via using MVCC itself for collected data ? Every transaction command that changes data in a replicated relation is stored on disk. PostgreSQL MVCC code is used on a slave in a natural way when transaction commands are replayed there. > > How do you handle really large inserts/updates/deletes, which change say 10M > rows in one transaction ? We produce really large disk files ;). When a transaction commits - a special queue lock is acquired and transaction is enqueued to a sending queue. Since the locking mode for that lock is exclusive a commit of a very large transaction would delay commits of other transactions until the lock is held. We are working on minimizing the time of holding this lock in the new version of Replicator. > > > Do you extract / generate full sql DML queries from data in WAL logs, or > > > do you apply the changes at some lower level ? > > > > We replicate the binary data along with a command type. Only the data > > necessary to replay the command on a slave are replicated. > > Do you replay it as SQL insert/update/delete commands, or directly on > heap/indexes ? We replay the commands directly using heap/index functions on a slave. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Ühel kenal päeval, R, 2007-10-12 kell 12:39, kirjutas Alexey Klyukin: > Hannu Krosing wrote: > > > > We have hooks in executor calling our own collecting functions, so we > > > don't need the trigger machinery to launch replication. > > > > But where do you store the collected info - in your own replication_log > > table, or do reuse data in WAL you extract it on master befor > > replication to slave (or on slave after moving the WAL) ? > > We don't use either a log table in database or WAL. The data to > replicate is stored in disk files, one per transaction. Clever :) How well does it scale ? That is, at what transaction rate can your replication keep up with database ? > As Joshua said, > the WAL is used to ensure that only those transactions that are recorded > as committed in WAL are sent to slaves. How do you force correct commit order of applying the transactions ? > > > > > > Do you make use of snapshot data, to make sure, what parts of WAL log > > > > are worth migrating to slaves , or do you just apply everything in WAL > > > > in separate transactions and abort if you find out that original > > > > transaction aborted ? > > > > > > We check if a data transaction is recorded in WAL before sending > > > it to a slave. For an aborted transaction we just discard all data collected > > > from that transaction. > > > > Do you duplicate postgresql's MVCC code for that, or will this happen > > automatically via using MVCC itself for collected data ? > > Every transaction command that changes data in a replicated relation is > stored on disk. PostgreSQL MVCC code is used on a slave in a natural way > when transaction commands are replayed there. Do you replay several transaction files in the same transaction on slave ? Can you replay several transaction files in parallel ? > > How do you handle really large inserts/updates/deletes, which change say 10M > > rows in one transaction ? > > We produce really large disk files ;). When a transaction commits - a > special queue lock is acquired and transaction is enqueued to a sending > queue. > Since the locking mode for that lock is exclusive a commit of a > very large transaction would delay commits of other transactions until > the lock is held. We are working on minimizing the time of holding this > lock in the new version of Replicator. Why does it take longer to queue a large file ? dou you copy data from one file to another ? > > > > Do you extract / generate full sql DML queries from data in WAL logs, or > > > > do you apply the changes at some lower level ? > > > > > > We replicate the binary data along with a command type. Only the data > > > necessary to replay the command on a slave are replicated. > > > > Do you replay it as SQL insert/update/delete commands, or directly on > > heap/indexes ? > > We replay the commands directly using heap/index functions on a slave. Does that mean that the table structures will be exactly the same on both master slave ? That is, do you replicate a physical table image (maybe not including transaction ids on master) ? Or you just use lower-level versions on INSERT/UPDATE/DELETE ? --------------------- Hannu
Hannu Krosing wrote: > > We don't use either a log table in database or WAL. The data to > > replicate is stored in disk files, one per transaction. > > Clever :) > > How well does it scale ? That is, at what transaction rate can your > replication keep up with database ? This depend on a number of concurrent transactions (the data is collected by every backend process), max transaction size etc. I don't have numbers here, sorry. > > > As Joshua said, > > the WAL is used to ensure that only those transactions that are recorded > > as committed in WAL are sent to slaves. > > How do you force correct commit order of applying the transactions ? The first transaction that is committed in PostgreSQL is the first transaction placed into the queue, and the first that is restored by the slave. > > > > > > > > > Do you make use of snapshot data, to make sure, what parts of WAL log > > > > > are worth migrating to slaves , or do you just apply everything in WAL > > > > > in separate transactions and abort if you find out that original > > > > > transaction aborted ? > > > > > > > > We check if a data transaction is recorded in WAL before sending > > > > it to a slave. For an aborted transaction we just discard all data collected > > > > from that transaction. > > > > > > Do you duplicate postgresql's MVCC code for that, or will this happen > > > automatically via using MVCC itself for collected data ? > > > > Every transaction command that changes data in a replicated relation is > > stored on disk. PostgreSQL MVCC code is used on a slave in a natural way > > when transaction commands are replayed there. > > Do you replay several transaction files in the same transaction on > slave ? > Can you replay several transaction files in parallel ? No, we have plans for concurrent restore of replicated data, but currently we a single slave process responsible for restoring data received from MCP. > > > > How do you handle really large inserts/updates/deletes, which change say 10M > > > rows in one transaction ? > > > > We produce really large disk files ;). When a transaction commits - a > > special queue lock is acquired and transaction is enqueued to a sending > > queue. > > Since the locking mode for that lock is exclusive a commit of a > > very large transaction would delay commits of other transactions until > > the lock is held. We are working on minimizing the time of holding this > > lock in the new version of Replicator. > > Why does it take longer to queue a large file ? dou you copy data from > one file to another ? Yes, currently the data is copied from the transaction files into the queue (this doesn't apply to dump transactions). However, we have recently changed this, the new code will acquire the queue lock only to record transaction as committed in replication log without moving the data. > > > Do you replay it as SQL insert/update/delete commands, or directly on > > > heap/indexes ? > > > > We replay the commands directly using heap/index functions on a slave. > > Does that mean that the table structures will be exactly the same on > both master slave ? Yes, the table structure on the slaves should match the table structure on master. > That is, do you replicate a physical table image > (maybe not including transaction ids on master) ? Yes, we call this 'full dump', and it is fired automatically for every replicated table. We replicate only data however, not DDL commands to create/alter table or sequence. > > Or you just use lower-level versions on INSERT/UPDATE/DELETE ? > > --------------------- > Hannu > > > Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc.