Thread: draft RFC: concept for partial, wal-based replication
draft RFC: concept for partial, wal-based replication
From
Hans-Juergen Schoenig -- PostgreSQL
Date:
hello ... as my day has worked out quite nicely up to know i thought to f... it up and post a new concept which has been requested by a customer. the goal is to collect some feedback, ideas and so on (not to be mixed up with "flames"). we have funding for this and we are trying to sort out how to do it the best way. comments are welcome ... note, this is a first draft i want to refine based on some comments. here we go ... Partial WAL Replication for PostgreSQL: --------------------------------------- As of now the PostgreSQL community has provided patches and functionalities which allow full WAL-based replication as well as hot-standby. To extend this functionality and to make PostgreSQL even more suitable for "enterprise" computing than it is today, we have the commitment of a sponsor to fund partial replication for PostgreSQL 8.5 / 8.6. This is the first draft of a proposal to make partial WAL-based replication work and to provide and additional set of fancy features to the community which has been waiting for real in-core replication for a decade or more. Why partial replication? ------------------------ In some cases people have master servers which contain enormous amounts of data (XX TB or so). If more than just one replica of this data is needed it might happen that different slaves are used for different purposes. This implies that not all data will be used by all maschines. An example: Consider a server at a phone company collecting phone calls, billing data, and maybe network routing data. Data is used by different department and one maschine is not enough to serve all three departments. With the new functionality proposed here we could make 3 replicas each holding just a group of tables for specific tasks thus allowing people to buy cheaper hardware for slaves and use more maschines instead. Current status: --------------- Hot-standy and streaming replication have been a huge leap step forward for the community and what is proposed here will be an extension to those patches and functionalities. This concept is NOT aimed to replace anything - it is mainly an addon. Nodes and replication filters: ------------------------------ As of 8.4 standby systems are done by creating an archive_command along with a base backup. Although it is easy to do some users still reported some difficulties due to a total misunderstanding of PITR. The idea is to add a functionality to add slaves like this: CREATE REPLICA node_name CONNECT FROM SLAVE 'connect_string' TRANSFER COMMAND 'command' [ USING replication_filter]; 'command' would be any shell script copying data from the local master to the new database node called node_name. Replication filters can be used to make X replicas contain the same tables. Filtersets can be created like this: CREATE REPLICATION FILTER filter_name [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ]; Replication filters can be modified ... ALTER REPLICATION FILTER filter_name RENAME TO new_filtername; ALTER REPLICATION FILTER filter_name { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object; Filter sets can be dropped like this ... DROP REPLICATION FILTER filter_name; Internally CREATE REPLICA would initiate a base backup to the new slave server just like we would do it manually otherwise. The server would automatically use the user defined 'command' to copy one file after the other to the slave box. The idea is basically stolen from archive_command and friends. At this stage we either copy the entire instance as we would do it with a normal base backup or just what is needed (defined by the replication filter). Users would automatically only copy data to a slave which is really needed there and which matches their filter config. If the copy is done, we can register the new node inside a system table and commit the transaction. Also, we can automatically create a useful recovery.conf setup - we know how to connect from the slave to the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper recovery.conf file). Tables can easily be added or removed from a replication filter with ALTER REPLICATION FILTER. Replicas can be removed easily: DROP REPLICA node_name; Why SQL to add a node? We are convinced that this is the most simplistic way of doing things. It is the most intuitive way of doing things. We believe it gives users a real feeling of simplicity. The current way of doing base backups should stay in place as it is - it has proven to be nice for countless tasks. However, it is not suitable for managing 10 or more replicas easily. Especially not when they are not full blown copies of the master. Technical ideas: ---------------- System tables: We suggest to always replicate the entire system catalog. It woulde be a total disaster to try some other implementation. The same applies for other tables - we always replicate entire tables; no WHERE-clauses allowed when it comes to replicating any table. How can a query on the slave figure out if a table is around? The slave just to know "who it is". Then it can lookup easily from the replication filter it is using if a table is actually physically in place or not. If a table is not in place, we can easily error out. Remove a table from the slave: This is not too hard; the master received the command to kill a table the slave. We will send a request to remove all storage files related to a table and adjust the replication filter to make sure that the slave will not replay content of this table anymore. Add a table to a slave: This is slightly more tricky. We start collecting WAL for a table, stop shipping WAL, use the TRANSFER COMMAND to copy the files related to the table added and resume recovery / sending once the storage file is on the slave. Addition stuff: Of course there are many more consistency considerations here. We cannot replicate an index if the table is not present, etc. many thanks, hans -- Cybertec Schoenig & Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de
Hi, we tried to discuss on a lower level what should be needed for a partial replication based on streaming replication. a. transferring base data after a slave got added a relation/index/... (and initial setup) b. communicating the the slaves which relations they currently should have available c. splitting wal into different replication sets d. some configuration frontend. Possibly directly via sql or via functions I don't think its reasonable trying to discuss and implement this all in one huge patch so I propose implementing at least 1) as a seperate patch. a) is very useful outside the context of this specific feature and kind of a requisite so i suggest tackling that first. Plan: M: checkpoint, full page writes, access share lock on the relation S: stop receiving wal SM: Using $method transfer required base data for every required segment/fork M: normal writes S: restart wal replay Do you see any fundamental problem with this? Now there unforatunately are two different schools of thought how to implement transfering base data. 1. archive_command like transfer command taking a source path/host and target path/host - very flexible (think e.g. nottransferring the data for multiple slaves over the whole country) - harder to setup - more in style of classical walarchiving 2. add the capability to the WAL Streaming patch's libpq based protocol - no additional configuration needed - inflexible - makes usage from non streaming replication is impossible I favor 1. but only lightly so. Detail Questions: - How to deal with multiple transfer requests at the same time? There would be a need for multiple full backup requests forindividual tables by several clients at once. Currently pg_start_backup() isn't allowed from two clients in parallel,the second one gets an error. We thought that pg_start_backup() and pg_stop_backup() can turn into simple referencecounts. IIRC, WALs are still generated and _shipped to slaves_ during a full backup, they are simply not yet appliedto base table files. So, in this case a pg_stop_backup() issued from a slave decreases refcount of the base backupsand the slave can simply resume applying its newly received WALs to base files. - Keep track of current number of transfers We would also need a way to query the refcount of the base backup, so if a slavedies, the master can be recovered manually, so it can also resume applying leftover WALs. b) The slave needs to know whether a relation got added to itself in order to request a base backup of the relevant files. I would suggest adding a new wal record type for this: - DROP_RELATION_SLAVE(node_id, relation) - ADD_RELATION_SLAVE(node_id,relation) We thought about two ways of administering the replication set: - slaves with full replication, optionally and explicitely excluding relations - slaves with minimal replication, explicitely included relations The above WAL record types will be needed both cases, but we also need two new catalog tables: - the slave nodes, indicating the type of the slave above - explicit table indications (treated depending on the type of the slave) Any out of band communication has severe problems with crashes/unavailability of the slave or not allowing classic, non streaming, wal replication. Questions: - How to deal with the fact that the slave may be unavailable during adding something to its replication set? - Possiblyforbid all DDL to the table until the slave got the update c) What to filter: Every slave gets a node_id which is assigned in a system catalog on the master pg_node(nodeoid) (per cluster) And a catalog contains all replicated relations. pg_replication_set(nodeoid, classoid, acknowledged_on_slave) (per database) How to filter: Heap2, Heap, Btree, Gin, Gist, Storage, Sequence need to be filtered by database/relation. I am by far not yet familiar enough with the relevant code to see if it is feasible and worthwile at all to filter clog,transaction and multixact per database. Where to filter: I propose doing so in the walsender. While this would prohibit using classical wal based standby I do not see a big problem in that. If done via wal streaming it would be a simple addition of a node_id in PQstartXLogStreaming. This id obviously should notbe resettable to something else... Questions: - How to deal with access to the different database-wide catalogs? - Storing that data cluster wide seems really ugly. -read the code... d) I do not have any strong or even moderate opinions about this. I think its sensible to get something prototypish, functionbased done before deciding about the real interface and getting into syntax wars. Steps: 1: - Transfer of Relations during runtime - needed to use wal-splitting - internally: - Possibility 1: - transfer_command= ... %filename% - should not error out if data changes beneath it. - called for every file (i.e. everyfork and segment) - how to deal with new segments? - lock out writes after transferring the last segment and recheck? - Possibility 2: - Own networking protocol - Easier to setup - less flexible - more work- take heavy lock frombeginning - sounds annoying but could easily dealt with later- How to transfer: - stop wal replay on requesting side - checkpoint on master - start streaming wal for relation - pg_start_backup() equiv - transfer relation data - pg_stop_backup() equiv. - restart wal replay on relevant nodes 2: - wal splitting for Heap2, Heap, Btree, Hash, Gin, Gist - all others either have to be replicated anyway or are semanticallya bit more complex (sequences) - lower level does not care about dependencies? - Additional system catalogs: pg_node(nodeoid) pg_replication_set(nodeoid, classoid) - Using a hook in walsender - correct place, right? - Extending the streamin protocol to transfer the node id from slave - 'slave_node_id' in PQstartXLogStreaming - A slave errors out if a relation gets accessed which is not replicated to it - There is some need of more communication for some of the actions: - remove table/index/... from replication set - add table/index/...to replication set - both require action from the slave: stop replaying wal, request base data of * - bothrequire action from the master: lock relation shared, checkpoint, stop wal reusage, full_page_writes - could be startedfrom slave - How should master/slave communicate - extra wal type - crash safe- PQgetXLogData returns -3, userrequests actions from master - how to handle crashes? - possibly a 'acknowledged' flag in pg_replication_set?-both is not exactly nice and race free - Add to replication set, drop table before slave has coughtup - normal locking does not work (shutdown) - using something around prepared transactions seems ugly -synchronous operation? - alter/drop table is only possible after acknowledgement from client- wal type seems nicer -possibly allowing to do it via "classical" wal replay We would not build any bells and whistles around this for now. They would work as function calls. Later one could build a SQL-ish interface Hans-Juergen Schoenig -- PostgreSQL írta: > hello ... > > as my day has worked out quite nicely up to know i thought to f... it > up and post a new concept which has been requested by a customer. the > goal is to collect some feedback, ideas and so on (not to be mixed up > with "flames"). > we have funding for this and we are trying to sort out how to do it > the best way. comments are welcome ... > note, this is a first draft i want to refine based on some comments. > here we go ... > > > Partial WAL Replication for PostgreSQL: > --------------------------------------- > > As of now the PostgreSQL community has provided patches and > functionalities > which allow full WAL-based replication as well as hot-standby. To > extend this > functionality and to make PostgreSQL even more suitable for "enterprise" > computing than it is today, we have the commitment of a sponsor to > fund partial > replication for PostgreSQL 8.5 / 8.6. > > This is the first draft of a proposal to make partial WAL-based > replication work > and to provide and additional set of fancy features to the community > which has > been waiting for real in-core replication for a decade or more. > > > Why partial replication? > ------------------------ > > In some cases people have master servers which contain enormous > amounts of data > (XX TB or so). If more than just one replica of this data is needed it > might > happen that different slaves are used for different purposes. This > implies that > not all data will be used by all maschines. > An example: Consider a server at a phone company collecting phone > calls, billing > data, and maybe network routing data. Data is used by different > department and > one maschine is not enough to serve all three departments. With the new > functionality proposed here we could make 3 replicas each holding just > a group > of tables for specific tasks thus allowing people to buy cheaper > hardware for > slaves and use more maschines instead. > > > Current status: > --------------- > > Hot-standy and streaming replication have been a huge leap step > forward for the > community and what is proposed here will be an extension to those > patches and > functionalities. This concept is NOT aimed to replace anything - it is > mainly an > addon. > > > Nodes and replication filters: > ------------------------------ > > As of 8.4 standby systems are done by creating an archive_command > along with a > base backup. Although it is easy to do some users still reported some > difficulties due to a total misunderstanding of PITR. > > The idea is to add a functionality to add slaves like this: > > CREATE REPLICA node_name > CONNECT FROM SLAVE 'connect_string' > TRANSFER COMMAND 'command' > [ USING replication_filter ]; > > 'command' would be any shell script copying data from the local master > to the > new database node called node_name. Replication filters can be used to > make X > replicas contain the same tables. Filtersets can be created like this: > > CREATE REPLICATION FILTER filter_name > [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ]; > > Replication filters can be modified ... > > ALTER REPLICATION FILTER filter_name RENAME TO new_filtername; > ALTER REPLICATION FILTER filter_name > { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object; > > Filter sets can be dropped like this ... > > DROP REPLICATION FILTER filter_name; > > Internally CREATE REPLICA would initiate a base backup to the new > slave server > just like we would do it manually otherwise. The server would > automatically use > the user defined 'command' to copy one file after the other to the > slave box. > The idea is basically stolen from archive_command and friends. At this > stage we > either copy the entire instance as we would do it with a normal base > backup or > just what is needed (defined by the replication filter). Users would > automatically only copy data to a slave which is really needed there > and which > matches their filter config. If the copy is done, we can register the > new node > inside a system table and commit the transaction. Also, we can > automatically > create a useful recovery.conf setup - we know how to connect from the > slave to > the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper > recovery.conf file). > > Tables can easily be added or removed from a replication filter with > ALTER > REPLICATION FILTER. > > Replicas can be removed easily: > > DROP REPLICA node_name; > > Why SQL to add a node? We are convinced that this is the most > simplistic way of > doing things. It is the most intuitive way of doing things. We > believe it > gives users a real feeling of simplicity. The current way of doing > base backups > should stay in place as it is - it has proven to be nice for countless > tasks. > However, it is not suitable for managing 10 or more replicas easily. > Especially > not when they are not full blown copies of the master. > > > Technical ideas: > ---------------- > > System tables: > > We suggest to always replicate the entire system catalog. It woulde > be a total > disaster to try some other implementation. The same applies for other > tables - we > always replicate entire tables; no WHERE-clauses allowed when it comes to > replicating any table. > How can a query on the slave figure out if a table is around? The > slave just to > know "who it is". Then it can lookup easily from the replication > filter it is > using if a table is actually physically in place or not. If a table is > not in > place, we can easily error out. > > > Remove a table from the slave: > > This is not too hard; the master received the command to kill a table > the slave. > We will send a request to remove all storage files related to a table > and adjust > the replication filter to make sure that the slave will not replay > content of > this table anymore. > > > Add a table to a slave: > > This is slightly more tricky. We start collecting WAL for a table, > stop shipping > WAL, use the TRANSFER COMMAND to copy the files related to the table > added and > resume recovery / sending once the storage file is on the slave. > > > Addition stuff: > > Of course there are many more consistency considerations here. We cannot > replicate an index if the table is not present, etc. > > > many thanks, > > hans > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> wrote: > we tried to discuss on a lower level what should be needed > for a partial replication based on streaming replication. We need to discuss a "partial recovery" before the partial replication. There are some related items in out ToDo list and previous discussions: http://wiki.postgresql.org/wiki/Todo - Allow WAL logging to be turned off for a table, but the table might be droppedor truncated during crash recovery - Allow WAL logging to be turned off for a table, but the table would avoidbeing truncated/dropped - rmgr_hook http://archives.postgresql.org/pgsql-hackers/2008-12/msg01361.php Could you try them first as a preparation for the partial replication? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Mon, Nov 30, 2009 at 4:56 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > I don't think its reasonable trying to discuss and implement this all > in one huge patch so I propose implementing at least 1) as a seperate > patch. I agree with this development plan. > Now there unforatunately are two different schools of thought how to implement transfering base data. > 1. archive_command like transfer command taking a source path/host and target path/host > - very flexible (think e.g. not transferring the data for multiple slaves over the whole country) > - harder to setup > - more in style of classical wal archiving > 2. add the capability to the WAL Streaming patch's libpq based protocol > - no additional configuration needed > - inflexible > - makes usage from non streaming replication is impossible > > I favor 1. but only lightly so. I favor 2 ;) Because I think that it's too hard for users to set up a transfer command. One of streaming replication's merits is that users no longer need to specify a transfer command for log-shipping. So users can configure and use replication without complex settings. But, #1 would spoil this merit. > Detail Questions: > - How to deal with multiple transfer requests at the same time? > There would be a need for multiple full backup requests for > individual tables by several clients at once. > Currently pg_start_backup() isn't allowed from > two clients in parallel, the second one gets an error. > We thought that pg_start_backup() and pg_stop_backup() > can turn into simple reference counts. IIRC, WALs > are still generated and _shipped to slaves_ during > a full backup, they are simply not yet applied to > base table files. So, in this case a pg_stop_backup() > issued from a slave decreases refcount of the base backups > and the slave can simply resume applying its newly > received WALs to base files. I'm not sure how. But at first multiple online-backup feature rather than backup-shipping itself might have to be addressed. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Boszormenyi Zoltan wrote: > c. splitting wal into different replication sets Just a side note: in addition to its use for partial replication, this might have potential for performance-prioritizing databases or tablespaces. Being able to separate WAL logging so that different DBs, tablespaces, etc went to different sets of WAL logs would allow a DBA to give some databases or tablespaces dedicated WAL logging space on faster storage. If partial recovery is implemented, it might also permit less important databases to be logged to fast-but-unsafe storage such as a non-BBU disk controller with write cache enabled, without putting more important databases in the same cluster in danger. More importantly, if the WAL writing was done in different wal writer backends, the admin could also use nice and ionice to encourage the OS to favour WAL logging for some DBs over others. Currently all these things require splitting the install into multiple clusters, incurring config management and backup overhead and most importantly partitioning shared memory. OTOH, even with split WAL logging, you still have the shared bgwriter to contend with, and the effects of an unimportant query pushing data related to more performance-critical DBs out of shm or OS cache. So perhaps splitting the cluster is actually the best answer, and a complete implementation of DB prioritization would land up looking a lot like multiple Pg clusters multiplexed on one port anyway... In any case, I thought it worth mentioning as something that may be worth keeping in mind - or considering and disregarding - while looking at the WAL changes involved in partial replication. -- Craig Ringer
On Monday 30 November 2009 03:57:11 Itagaki Takahiro wrote: > Boszormenyi Zoltan <zb@cybertec.at> wrote: > > we tried to discuss on a lower level what should be needed > > for a partial replication based on streaming replication. > We need to discuss a "partial recovery" before the partial replication. If you do the filtering on the sending side you dont actually need partial recover in the sense that you filter in the rmgr or similar. Or do I miss something? Andres
Andres Freund <andres@anarazel.de> wrote: > > We need to discuss a "partial recovery" before the partial replication. > If you do the filtering on the sending side you dont actually need partial > recover in the sense that you filter in the rmgr or similar. > > Or do I miss something? Sorry, I didn't explain well. I just suggested the order of development. I think paritial recovery is easier than partition replication because we don't need to think network nor multiple clients in recovery. Also, this feature can be developed on the Streming Replication, but it is under development -- the code is not fixed. So, I'd suggest to start the development from independent parts from Streaming Replication. I believe the partial replication will require modules developed for the partial recovery in the future. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Andres Freund wrote: > On Monday 30 November 2009 03:57:11 Itagaki Takahiro wrote: >> Boszormenyi Zoltan <zb@cybertec.at> wrote: >>> we tried to discuss on a lower level what should be needed >>> for a partial replication based on streaming replication. >> We need to discuss a "partial recovery" before the partial replication. > If you do the filtering on the sending side you dont actually need partial > recover in the sense that you filter in the rmgr or similar. > > Or do I miss something? the question is if filtering on the sending side is actually the "right thing" to do. It increases the overhead and the complexity on the master, especially if you think about different (partial) replication agreements for different slaves and it might also be hard to integrate with the planned sync/async modes. On the other hand if you filter on the master you might be able to avoid a lot of network traffic du to filtered wal records. I think for a first step it might make more sense to look into doing the filtering on the receiving side and look into actual integration with SR at a later stage. Stefan
On Monday 30 November 2009 10:32:50 Stefan Kaltenbrunner wrote: > Andres Freund wrote: > > On Monday 30 November 2009 03:57:11 Itagaki Takahiro wrote: > >> Boszormenyi Zoltan <zb@cybertec.at> wrote: > >>> we tried to discuss on a lower level what should be needed > >>> for a partial replication based on streaming replication. > >> > >> We need to discuss a "partial recovery" before the partial replication. > > > > If you do the filtering on the sending side you dont actually need > > partial recover in the sense that you filter in the rmgr or similar. > > > > Or do I miss something? > > the question is if filtering on the sending side is actually the "right > thing" to do. > It increases the overhead and the complexity on the master, especially > if you think about different (partial) replication agreements for > different slaves and it might also be hard to integrate with the planned > sync/async modes. > On the other hand if you filter on the master you might be able to avoid > a lot of network traffic du to filtered wal records. > I think for a first step it might make more sense to look into doing the > filtering on the receiving side and look into actual integration with SR > at a later stage. I think filtering on the receiving side is harder by many degrees because you don't have an up 2 date copy of the catalog. I cant think of a design that does not impose severe constraints on catalog and especially replication settings to implement on the receiving side. Andres
On Nov 30, 2009, at 1:55 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > Boszormenyi Zoltan wrote: > >> c. splitting wal into different replication sets > > Just a side note: in addition to its use for partial replication, this > might have potential for performance-prioritizing databases or > tablespaces. > > Being able to separate WAL logging so that different DBs, tablespaces, > etc went to different sets of WAL logs would allow a DBA to give some > databases or tablespaces dedicated WAL logging space on faster > storage. > If partial recovery is implemented, it might also permit less > important > databases to be logged to fast-but-unsafe storage such as a non-BBU > disk > controller with write cache enabled, without putting more important > databases in the same cluster in danger. The danger here is that if we make crash recovery more complex, we'll introduce subtle bugs that will only be discovered after someone's data is toast. ...Robert
Craig Ringer <craig@postnewspapers.com.au> writes: > Just a side note: in addition to its use for partial replication, this > might have potential for performance-prioritizing databases or tablespaces. > Being able to separate WAL logging so that different DBs, tablespaces, > etc went to different sets of WAL logs would allow a DBA to give some > databases or tablespaces dedicated WAL logging space on faster storage. I don't think this can possibly work without introducing data corruption issues. What happens when a transaction touches tables in different tablespaces? You can't apply the changes out-of-order. regards, tom lane
On Nov 30, 2009, at 10:32 AM, Stefan Kaltenbrunner wrote: > Andres Freund wrote: >> On Monday 30 November 2009 03:57:11 Itagaki Takahiro wrote: >>> Boszormenyi Zoltan <zb@cybertec.at> wrote: >>>> we tried to discuss on a lower level what should be needed >>>> for a partial replication based on streaming replication. >>> We need to discuss a "partial recovery" before the partial >>> replication. >> If you do the filtering on the sending side you dont actually need >> partial recover in the sense that you filter in the rmgr or similar. >> Or do I miss something? > > the question is if filtering on the sending side is actually the > "right thing" to do. > It increases the overhead and the complexity on the master, > especially if you think about different (partial) replication > agreements for different slaves and it might also be hard to > integrate with the planned sync/async modes. > On the other hand if you filter on the master you might be able to > avoid a lot of network traffic du to filtered wal records. > I think for a first step it might make more sense to look into doing > the filtering on the receiving side and look into actual integration > with SR at a later stage. > > > Stefan hello ... one problem with not-filtering on the master is that you will end up with a lot of complexity if you start adding new tables to a replica because you just cannot add tables as easy as when you are doing stuff on the slave. the procedure seems ways more complex. in addition to that you are sending WAL which has to be discarded anyway. we thought about filtering "outside the master" a lot but to me it did not sound like good plan. regards, hans -- Cybertec Schönig & Schönig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de
> > Just a side note: in addition to its use for partial replication, this > might have potential for performance-prioritizing databases or > tablespaces. hello ... this is an absolutely non-starter. the WAL is designed to be "hyper ordered" and hyper critical. once you fuck up order you will end up with a total disaster. WAL has to be applied in perfect order without skipping depending objects and so on. any concept which tries to get around those fundamental law is either broken. hans -- Cybertec Schönig & Schönig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de
On Monday 30 November 2009 17:46:45 Hans-Jürgen Schönig wrote: > On Nov 30, 2009, at 10:32 AM, Stefan Kaltenbrunner wrote: > > the question is if filtering on the sending side is actually the > > "right thing" to do. > > It increases the overhead and the complexity on the master, > > especially if you think about different (partial) replication > > agreements for different slaves and it might also be hard to > > integrate with the planned sync/async modes. > > On the other hand if you filter on the master you might be able to > > avoid a lot of network traffic du to filtered wal records. > > I think for a first step it might make more sense to look into doing > > the filtering on the receiving side and look into actual integration > > with SR at a later stage. > one problem with not-filtering on the master is that you will end up > with a lot of complexity if you start adding new tables to a replica > because you just cannot add tables as easy as when you are doing stuff > on the slave. the procedure seems ways more complex. > in addition to that you are sending WAL which has to be discarded > anyway. > we thought about filtering "outside the master" a lot but to me it did > not sound like good plan. One possibility for the far future would be to allow filtering on a slave as well: master ---- full replication ---> primary slave --- split ---> slaves Possibly doing only catalog recovery on the primary slave. In my opinion thats heaps more complex and not better in all situation. So I would probably write it down as a nice idea but not more. Andres
On 30/11/2009 11:07 PM, Tom Lane wrote: > Craig Ringer<craig@postnewspapers.com.au> writes: >> Just a side note: in addition to its use for partial replication, this >> might have potential for performance-prioritizing databases or tablespaces. > >> Being able to separate WAL logging so that different DBs, tablespaces, >> etc went to different sets of WAL logs would allow a DBA to give some >> databases or tablespaces dedicated WAL logging space on faster storage. > > I don't think this can possibly work without introducing data corruption > issues. What happens when a transaction touches tables in different > tablespaces? You can't apply the changes out-of-order. Argh, good point, and one that should've been blindingly obvious. At a database level something like that may still be handy, though I haven't the foggiest how one would handle the shared system catalogs. -- Craig Ringer