Thread: Synch Replication - Synch rep 0114
Hi,
I have been testing in recent, the Synch Replication(Synch rep 0114 (Jan 14, 2009) ) on PostgreSQL version 8.4 (
postgresql-8.4devel_20081229.tar.bz2) |
I followed the steps in Readme as well used the test script provided in patch for the setup.
As per wiki, I am able to bring up the walsender and the walreceiver process in a single server as well when primary and seconday are setup on different nodes(making necessary changes to the test script)
Then I am able to see the walsender and walreceiver process are in progress.
Then I try to insert some records into the table created (within the script) as below:
./psql
psql (8.4devel)
Type "help" for help.
psql (8.4devel)
Type "help" for help.
postgres=# insert into temp values(5,'e');
I get the following output :
Standby 6820 FATAL: unexpected EOF on replication connection: lost synchronization with server: got message type "c", length -805175295
Primary 6821 LOG: unexpected EOF on replication connection
Primary 6821 LOG: replication done at: write 0/1000000 (file 000000010000000000000000), flush 0/1000000 (file 000000010000000000000000)
Standby 6820 LOG: replication done at: write 0/1000000 (file 000000010000000000000000), flush 0/1000000 (file 000000010000000000000000)
Standby 6812 LOG: could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
Standby 6812 LOG: redo done at 0/4A983C
Standby 6812 PANIC: could not open file "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file or directory
Standby 6809 LOG: startup process (PID 6812) was terminated by signal 6: Aborted
Standby 6809 LOG: aborting startup due to startup process failure
INSERT 0 1
Primary 6821 LOG: replication done at: write 0/1000000 (file 000000010000000000000000), flush 0/1000000 (file 000000010000000000000000)
Standby 6820 LOG: replication done at: write 0/1000000 (file 000000010000000000000000), flush 0/1000000 (file 000000010000000000000000)
Standby 6812 LOG: could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
Standby 6812 LOG: redo done at 0/4A983C
Standby 6812 PANIC: could not open file "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file or directory
Standby 6809 LOG: startup process (PID 6812) was terminated by signal 6: Aborted
Standby 6809 LOG: aborting startup due to startup process failure
INSERT 0 1
After this, I see both walsender and walreceiver are down and writer process is still running.
Is this because, there is no provision of replication between primary and secondary?
Or is it because write transactions are not supported?
In case where primary and standby are run on two different nodes, I am able to bring up the walsender and walreceiver process.
But atleast read transactions( records inserted in primary ) are not getting reflected in the standby node.
In such cases I would like to know about what exact features are working with this patch?
Because, in the Readme section of Synch Replication wiki, it is mentioned to check whether the walsender and walreceiver process are in progress.
How about replication and read - write transactions?
Also with the latest patch Synch rep 0128 (Jan 28, 2009), Am getting compilation errors.
Please let me about the correct status of the Synch Replication about what features are working properly.
Regards,
Smita Patil
Smita Patil
Attachment
Hi, On Fri, Jan 30, 2009 at 8:05 PM, Patil, Smita (NSN - IN/Bangalore) <smita.patil@nsn.com> wrote: > Hi, > I have been testing in recent, the Synch Replication(Synch rep 0114 (Jan 14, > 2009) ) on PostgreSQL version 8.4 ( > postgresql-8.4devel_20081229.tar.bz2) Thanks for your testing and report! I'm afraid that the base HEAD version (postgresql-8.4devel_20081229.tar.bz2) is old, which might have caused the following error. So, please try to apply synch-rep v0128 patch to the latest HEAD, and test it. If you can use cvs, the following document might be helpful for you to get the latest HEAD. http://www.postgresql.org/docs/8.3/static/anoncvs.html > As per wiki, I am able to bring up the walsender and the walreceiver process > in a single server as well when primary and seconday are setup on different > nodes(making necessary changes to the test script) What kind of change was required? > Then I am able to see the walsender and walreceiver process are in progress. Good! > Then I try to insert some records into the table created (within the script) > as below: > ./psql > psql (8.4devel) > Type "help" for help. > > postgres=# insert into temp values(5,'e'); Please let me know the DDL of creating "temp" table. I'll test it also on my machine. > After this, I see both walsender and walreceiver are down and writer process > is still running. > Is this because, there is no provision of replication between primary and > secondary? Yes, it's because unexpected error terminated replication (ie. walsender and walreceiver). But, such termination of replication doesn't affect the primary's normal processing, so walwriter was still running on the primary. > Or is it because write transactions are not supported? Write transactions are also supported like original postgres. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, I tried using the Synchronous replication and I'am unable to replicate the queries (Ex. create table temp1(int int);). I have downloaded the latest postgres dev snapshot and the latest sync replication patch from wiki. I have attached the log file which has the trace statements displayed on the console. I actually changed the 'test_synch_rep.sh', so that the standby database will put detailed trace messages. I'am actaully getting error related to history file. Could you please help regarding this. regards, Niranjan
Attachment
Hi, Thanks for your testing and report! On Tue, Feb 3, 2009 at 8:18 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Hi, > > I tried using the Synchronous replication and I'am unable to replicate > the queries (Ex. create table temp1(int int);). > > I have downloaded the latest postgres dev snapshot and the latest sync > replication patch from wiki. > > I have attached the log file which has the trace statements displayed on > the console. I actually changed the 'test_synch_rep.sh', so that the > standby database will put detailed trace messages. > > I'am actaully getting error related to history file. > > Could you please help regarding this. The problem which you pointed out was not reproduced on my machine. I suspect that the base HEAD version might be different between us. I updated my pgsq -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, Ooops, my mail client has sent the previous message, on the way. Sorry. On Wed, Feb 4, 2009 at 10:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Hi, > > Thanks for your testing and report! > > On Tue, Feb 3, 2009 at 8:18 PM, K, Niranjan (NSN - IN/Bangalore) > <niranjan.k@nsn.com> wrote: >> Hi, >> >> I tried using the Synchronous replication and I'am unable to replicate >> the queries (Ex. create table temp1(int int);). >> >> I have downloaded the latest postgres dev snapshot and the latest sync >> replication patch from wiki. >> >> I have attached the log file which has the trace statements displayed on >> the console. I actually changed the 'test_synch_rep.sh', so that the >> standby database will put detailed trace messages. >> >> I'am actaully getting error related to history file. >> >> Could you please help regarding this. The problem which you pointed out was not reproduced on my machine. I suspect that the base HEAD version might be different between us. I uploaded my pgsql source with synch rep patch (v0128), so please try it. http://senduit.com/e5a942 Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Wed, Feb 4, 2009 at 4:17 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Thanks for the response. > I tried with the your sources and found the same issue. Thanks for the retesting! > Could you please help. If you need any more symptoms, I can re-work. May I ask you some questions? - Do you use a packet filtering software (eg. firewall)? If yes, what happens if you disabled such a software? - Do you use SELinux? If yes, what happens if you disabled SELinux? - Please run the following commands and report those results. * uname -a * pg_config Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, Thanks for the response. I tried with the your sources and found the same issue. I explain the steps that I followed, just to avoid any gaps in understanding. 1) I assumes that the patch was already applied in your sources. So did went ahead doing "configure", "make" and "make install" 2) Built pg_standby and copied manually to postgres bin directory 3) Copied the test_sync_repl.sh in postgres directory 4) Run the script. Refer "startup.log". Also verified the walsender and walreceiver processes are running. Refer "ps.log". Tests done: 1) Execute "psql -l". Refer "list_database.log" 2) Login to standby with command "psql -d replication" and execute SQL "select * from pg_roles;". This worked fine. 3) Login to primary with command "psql -d postgres" and execute SQL "create table temp1 (int int);". On execution of this query the standby instance came down. Refer "createTable.log" (printed on console) One of the hacker suggested to remove the "echo "host all all 0.0.0.0/0 trust" >> ${SBYDATA}/pg_hba.conf" for standby database only. But here too the result was same. Environment that I use: - RHEL 4.7 - Both Primary & standby in the same machine Could you please help. If you need any more symptoms, I can re-work. Thanks, Niranjan > -----Original Message----- > From: ext Fujii Masao [mailto:masao.fujii@gmail.com] > Sent: Wednesday, February 04, 2009 7:05 AM > To: K, Niranjan (NSN - IN/Bangalore) > Cc: PostgreSQL-development > Subject: Re: Synch Replication > > Hi, > > Ooops, my mail client has sent the previous message, on the > way. Sorry. > > On Wed, Feb 4, 2009 at 10:24 AM, Fujii Masao > <masao.fujii@gmail.com> wrote: > > Hi, > > > > Thanks for your testing and report! > > > > On Tue, Feb 3, 2009 at 8:18 PM, K, Niranjan (NSN - IN/Bangalore) > > <niranjan.k@nsn.com> wrote: > >> Hi, > >> > >> I tried using the Synchronous replication and I'am unable to > >> replicate the queries (Ex. create table temp1(int int);). > >> > >> I have downloaded the latest postgres dev snapshot and the latest > >> sync replication patch from wiki. > >> > >> I have attached the log file which has the trace > statements displayed > >> on the console. I actually changed the > 'test_synch_rep.sh', so that > >> the standby database will put detailed trace messages. > >> > >> I'am actaully getting error related to history file. > >> > >> Could you please help regarding this. > > The problem which you pointed out was not reproduced on my machine. > I suspect that the base HEAD version might be different between us. > I uploaded my pgsql source with synch rep patch (v0128), so > please try it. > > http://senduit.com/e5a942 > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source > Software Center >
Attachment
> - Do you use a packet filtering software (eg. firewall)? > If yes, what happens if you disabled such a software? Firewall was disabled > - Do you use SELinux? > If yes, what happens if you disabled SELinux? SELinux is disabled. > - Please run the following commands and report those results. > * uname -a > * pg_config Please see the logs attached. Thanks, Niranjan
Attachment
Hi Niranjan, On Wed, Feb 4, 2009 at 6:21 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > >> - Do you use a packet filtering software (eg. firewall)? >> If yes, what happens if you disabled such a software? > Firewall was disabled > >> - Do you use SELinux? >> If yes, what happens if you disabled SELinux? > SELinux is disabled. > >> - Please run the following commands and report those results. >> * uname -a >> * pg_config > Please see the logs attached. Thanks for the information! > [postgres@node1 ~]$ uname -a > Linux node1 2.6.9-78.ELsmp #1 SMP Wed Jul 9 15:39:47 EDT 2008 i686 i686 i386 GNU/Linux Though I also tested synch-rep on i386 (I mainly use x86_64 for testing), the problem which you pointed out was not reproduced. Since I couldn't identify the cause of the trouble from current logs, I added the further logging codes into the patch, and uploaded the source. If you have time, please try it and report the results. http://senduit.com/ed60cc Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
>I added the further logging codes > into the patch, and uploaded the source. If you have time, > please try it and report the results. Refer attached logs. Regards, Niranjan
Attachment
Hi Niranjan, On Thu, Feb 5, 2009 at 3:34 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: >>I added the further logging codes >> into the patch, and uploaded the source. If you have time, >> please try it and report the results. > > Refer attached logs. Thanks, but I have not identified the cause yet. Sorry. I changed the code to dump all messages for replication, so please try it and report the result. The messages which the standby server receives are logged in the file (*1). Please send also that file. http://senduit.com/d48e0a (*1) <installation_directory>/sbydata/walreceiver_trace.out Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, > Thanks, but I have not identified the cause yet. Sorry. No problem for me to re-test. > I changed the code to dump all messages for replication, so > please try it and report the result. The messages which the > standby server receives are logged in the file (*1). Please > send also that file. > > (*1) <installation_directory>/sbydata/walreceiver_trace.out Refer attachments. Regards, Niranjan
Attachment
Hi Niranjan, On Thu, Feb 5, 2009 at 10:50 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: >> I changed the code to dump all messages for replication, so >> please try it and report the result. The messages which the >> standby server receives are logged in the file (*1). Please >> send also that file. >> >> (*1) <installation_directory>/sbydata/walreceiver_trace.out > > Refer attachments. Thanks for interesting results! According to the logs, walreceiver receives 'c' message (which is invalid for walreceiver) though walsender doesn't send it. So, next, I should diagnose more carefully the messages between those processes. Please test synch-rep again and report the following information. * server log * walreceiver_trace.out * netstat -nap (before and after replication crashes) * tcpdump -i lo -n tcp Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, > * server log > * walreceiver_trace.out > * netstat -nap (before and after replication crashes) > * tcpdump -i lo -n tcp > In addition to the requested logs, I have provided the network interface information. regards, Niranjan
Attachment
Hi, On Fri, Feb 6, 2009 at 2:14 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Hi, > >> * server log >> * walreceiver_trace.out >> * netstat -nap (before and after replication crashes) >> * tcpdump -i lo -n tcp >> > In addition to the requested logs, I have provided the network interface > information. Your tcpdump.log shows that walsender doesn't send any invalid messages, but walreceiver seems to receive it. So, I suspect libpq functions which walreceiver uses as the cause of the trouble. I changed synch-rep to dump all the behaviors of those libpq functions. http://senduit.com/42da75 Please retry new synch-rep code and report the following information. * server log * <installation_directory>/sbydata/walreceiver_trace.out * tcpdump -i lo -n tcp Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, > Please retry new synch-rep code and report the following information. > > * server log > * <installation_directory>/sbydata/walreceiver_trace.out > * tcpdump -i lo -n tcp > Refer attachments. regards, Niranjan
Attachment
Hi Niranjan, On Fri, Feb 6, 2009 at 6:46 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Hi, > >> Please retry new synch-rep code and report the following information. >> >> * server log >> * <installation_directory>/sbydata/walreceiver_trace.out >> * tcpdump -i lo -n tcp >> > > Refer attachments. I'm afraid that the message length may be greater than INT_MAX, which might cause the trouble on your machine. I changed my code again, in order to test my hypothesis. http://senduit.com/3ccd93 Please run new synch-rep again and report the following information. Thank you very much for testing it repeatedly! * server log * <installation_directory>/sbydata/walreceiver_trace.out * tcpdump -i lo -n tcp Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, > * server log > * <installation_directory>/sbydata/walreceiver_trace.out > * tcpdump -i lo -n tcp > Please refer attachments. Regards, Niranjan
Attachment
Hi Niranjan, On Fri, Feb 6, 2009 at 10:17 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > > Hi, > >> * server log >> * <installation_directory>/sbydata/walreceiver_trace.out >> * tcpdump -i lo -n tcp >> > Please refer attachments. Please try the updated synch-rep, which probably fixes the problem, I hope. If the same trouble was reproduced, please report the logs. http://senduit.com/c21db5 Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Niranjan, Thanks very much! On Mon, Feb 9, 2009 at 3:08 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Now, the active and standby database are up & running even after the > execution of the SQL (create table). What was the problem? The problem is that 1-byte variable was assigned the value casted to 4-bytes, which overwrote another variable (which lives next to the 1-byte val) wrongly. This behavior varies based on environment (ex. memory alignment). So, the trouble wasn't reproduced on my machine though it occurred on yours. It's my disgraceful bug.. :( > But when I logged in the standby instance by executing 'psql -d > replication', I did not see the table that was created on the primary. > I have few questions: > > - I'am not sure whether the replication is done but I'am not able to > view? Will I be able to view the replication by logging inside to > standby instance? Hotstandby patch will allow to read from standby. Is > this patch integrated in sync replication patch? No, hot standby and synch rep are independent patch now. So, you cannot issue any queries to the standby server during replication. The progress of replication can be checked via 'ps' command as follows. This reports the LSN already the standby server has received and written (or fsynced). ------------ [primary] $ pgrep -fl wal 1803 postgres: wal writer process 1830 postgres: wal sender process postgres 127.0.0.1(34604) replicated to: write 0/1F74DD0, flush 0/1F68878 [standby] $ pgrep -fl wal 1828 postgres: wal receiver process replicated to: write 0/1F74DD0, flush 0/1F68878 ------------ > - I brought down the active instance by executing 'pg_ctl -D > /home/postgres/postgresHSB/actdata stop' hoping that trigger file will > enable failover. But I was not able to login to standby instance. Not > sure why? Please let me know the failover procedure which you carried out. As follows? 1) pg_ctl -D /home/postgres/postgresHSB/actdata stop 2) touch /home/postgres/postgresHSB/finish.trigger Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Niranjan, On Mon, Feb 9, 2009 at 6:58 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: >> 1) pg_ctl -D /home/postgres/postgresHSB/actdata stop >> 2) touch /home/postgres/postgresHSB/finish.trigger > > Yes. This the procedure that I followed. I have attached the relevant > logs. > "change_standby_mode.log" - Commands used to change from continous > recovery mode of the standby instance > "ps.log" - ps command before and after executing the SQL. Thanks for the informations! --------------------------- [postgres@node1 ~]$ psql -d replication psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"? --------------------------- I think that your standby postmaster is running under port = 5433, so please specify "-p 5433". Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Niranjan, On Mon, Feb 9, 2009 at 10:39 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > But after I login to replication database (note the active I had brought > it down earlier & created a finish.trigger), I still cannot see the > table that was created on the primary. > Also please note that the LSN had changed after replication in the ps > command. Did you create the table in 'replication' database? If not, please connect to the correct database which includes the table. In log-shipping, the database objects are basically identical between the primary and the standby server. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, Now, the active and standby database are up & running even after the execution of the SQL (create table). What was the problem? But when I logged in the standby instance by executing 'psql -d replication', I did not see the table that was created on the primary. I have few questions: - I'am not sure whether the replication is done but I'am not able to view? Will I be able to view the replication by logging inside to standby instance? Hotstandby patch will allow to read from standby. Is this patch integrated in sync replication patch? - I brought down the active instance by executing 'pg_ctl -D /home/postgres/postgresHSB/actdata stop' hoping that trigger file will enable failover. But I was not able to login to standby instance. Not sure why? regards, Niranjan > -----Original Message----- > From: ext Fujii Masao [mailto:masao.fujii@gmail.com] > Sent: Monday, February 09, 2009 7:53 AM > To: K, Niranjan (NSN - IN/Bangalore) > Cc: PostgreSQL-development > Subject: Re: Synch Replication > > Hi Niranjan, > > On Fri, Feb 6, 2009 at 10:17 PM, K, Niranjan (NSN - > IN/Bangalore) <niranjan.k@nsn.com> wrote: > > > > Hi, > > > >> * server log > >> * <installation_directory>/sbydata/walreceiver_trace.out > >> * tcpdump -i lo -n tcp > >> > > Please refer attachments. > > Please try the updated synch-rep, which probably fixes the > problem, I hope. > If the same trouble was reproduced, please report the logs. > > http://senduit.com/c21db5 > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source > Software Center >
Attachment
Hi, > > - I brought down the active instance by executing 'pg_ctl -D > > /home/postgres/postgresHSB/actdata stop' hoping that trigger file will > > enable failover. But I was not able to login to standby instance. Not > > sure why? > > Please let me know the failover procedure which you carried > out. As follows? > > 1) pg_ctl -D /home/postgres/postgresHSB/actdata stop > 2) touch /home/postgres/postgresHSB/finish.trigger Yes. This the procedure that I followed. I have attached the relevant logs. "change_standby_mode.log" - Commands used to change from continous recovery mode of the standby instance "ps.log" - ps command before and after executing the SQL. Regards, Niranjan
Attachment
Hi, > [postgres@node1 ~]$ psql -d replication > psql: could not connect to server: No such file or directory > Is the server running locally and accepting > connections on Unix domain socket > "/tmp/.s.PGSQL.5432"? > --------------------------- > > I think that your standby postmaster is running under port = > 5433, so please specify "-p 5433". It was silly that I missed this. :-( But after I login to replication database (note the active I had brought it down earlier & created a finish.trigger), I still cannot see the table that was created on the primary. Also please note that the LSN had changed after replication in the ps command. I have attached the logs. regards, Niranjan
Attachment
Hi, Thanks. Now it works. Few questions: 1) Do you have an idea by when the Hot standby patch and Sync replication patch will be integrated? 2) I have used 1 physical machine to try current patch of synchronous replication. Is it OK for me to try with 2 separate machines for Primary & Standby servers 3) Do you have test programs that can used for synchronous replication testing? 4) I'am thinking of trying load/performance tests as well. What do you feel? Will it be too early to do this test? regards, Niranjan > -----Original Message----- > From: ext Fujii Masao [mailto:masao.fujii@gmail.com] > Sent: Monday, February 09, 2009 7:47 PM > To: K, Niranjan (NSN - IN/Bangalore) > Cc: PostgreSQL-development > Subject: Re: Synch Replication > > Hi Niranjan, > > On Mon, Feb 9, 2009 at 10:39 PM, K, Niranjan (NSN - > IN/Bangalore) <niranjan.k@nsn.com> wrote: > > But after I login to replication database (note the active I had > > brought it down earlier & created a finish.trigger), I still cannot > > see the table that was created on the primary. > > Also please note that the LSN had changed after replication > in the ps > > command. > > Did you create the table in 'replication' database? If not, > please connect to the correct database which includes the table. > In log-shipping, the database objects are basically identical > between the primary and the standby server. > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source > Software Center >
Hi Niranjan, Sorry for this late reply. On Tue, Feb 10, 2009 at 3:25 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Thanks. Now it works. Good news :) > Few questions: > 1) Do you have an idea by when the Hot standby patch and Sync > replication patch will be integrated? I think (hope) that they will be integrated in v8.5. > 2) I have used 1 physical machine to try current patch of synchronous > replication. Is it OK for me to try with 2 separate machines for Primary > & Standby servers Of course. The basic procedure to construct the synch rep environment is described in the attached document (log-shipping-record.html). If you want the more detailed information, please refer to the following link and build the documentation of synch rep. http://www.postgresql.org/docs/current/interactive/docguide-build.html > 3) Do you have test programs that can used for synchronous replication > testing? No, I've not used the automated test program. Yeah, since it's very useful, I'll make it before long. > 4) I'am thinking of trying load/performance tests as well. What do you > feel? Will it be too early to do this test? Any kinds of testing welcome! I attached the patch which fixed the problem which you reported. The source code which I uploaded before outputs many logs, which would harm the performance, so please try this new patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Fujii Masao escribió: I noticed two very minor issues while reading your docs: > This is because WAL files generated in the primary server before this built-in > replication starts have to be transferred to the standby server by > using file-based log shipping. When <TT > CLASS="VARNAME" > >archive_mode</TT > > is <TT > CLASS="LITERAL" > >unsent</TT > >, You probably mean "unset" here. > <TT > CLASS="VARNAME" > >enable_replication</TT > > (<TT > CLASS="TYPE" > >boolean</TT > >) It has been said that variables that enable/disable features should only be named after the feature that they affect, omitting the "enable" verb. So in this case it should be set as "replication=off" or "replication=on". -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, Thanks for the comments! On Fri, Feb 13, 2009 at 5:00 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Fujii Masao escribió: > > I noticed two very minor issues while reading your docs: > >> This is because WAL files generated in the primary server before this built-in >> replication starts have to be transferred to the standby server by >> using file-based log shipping. When <TT >> CLASS="VARNAME" >> >archive_mode</TT >> > is <TT >> CLASS="LITERAL" >> >unsent</TT >> >, > > You probably mean "unset" here. I mean "unsent" here. This is one of valid values of archive_mode which I changed for synch-rep. Please see the description of archive_mode in the attached document. > >> <TT >> CLASS="VARNAME" >> >enable_replication</TT >> > (<TT >> CLASS="TYPE" >> >boolean</TT >> >) > > It has been said that variables that enable/disable features should only > be named after the feature that they affect, omitting the "enable" verb. > So in this case it should be set as "replication=off" or > "replication=on". Okay, I will rename the parameter like you say. Thanks! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
hi, [I am working in the same team as Niranjan] Niranjan wrote: > > 3) Do you have test programs that can used > > for synchronous replication testing? > > No, I've not used the automated test program. Yeah, since > it's very useful, I'll make it before long. > > > 4) I'am thinking of trying load/performance tests as well. > > What do you feel? Will it be too early to do this test? > > Any kinds of testing welcome! Actually, this is just to let you know that for _stability_ and performance tests we use the "Network Database Benchmark" which we open-sourced (GPLv2) in 2006. Just recently one of our colleagues wrote a _small_ patch that makes it work out of the box with _PostgreSQL_/UnixODBC. The patch is now also available. The main project page(s): http://hoslab.cs.helsinki.fi/savane/projects/ndbbenchmark/ http://hoslab.cs.helsinki.fi/homepages/ndbbenchmark/ The patch: http://hoslab.cs.helsinki.fi/savane/cookbook/?func=detailitem&item_id=14 1 The benchmark models a Telco home location register (HLR) application with lots of short read/write transactions whose ratio can be adjusted on the command line, e.g. to model read or write heavy transaction loads. We'll re-use this benchmark as we have lots of existing measurements for other databases. Also we have a pretty good understanding of what to expect performance-wise with the different transaction mixes. The actual benchmark specification is available from here The benchmark spec: http://hoslab.cs.helsinki.fi/downloads/ndbbenchmark/Network_Database_Ben chmark_Definition_2006-02-01.pdf Thoralf
Synch Replication: Synchronization of files between Primary & Standby
From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi, Starting a new thread related to synchronization of the data files, WAL etc.. between Primary and standby servers in Synchronous replication patch. Use case: Whenever the primary and standby are out of sync due to network problems. Existing handling is to prepare the standby by 1) Deleting the $PGDATA on standby 2) Make a fresh base backup of the primary and load this data to the standby 3) Setup the necessary configurations (ex. recovery) and start the standby server In the earlier discussions, please check the link (point 2 related to direct connection between primary and standby), i think we still need to work to conclude on what will be done. http://archives.postgresql.org/pgsql-hackers/2009-02/msg01160.php One issue to be addressed is also the usability aspect of the solution for the mentioned use case. For synchronization of files with direct connection, there were suggestions to consider VLDB cases too. Do you already have some ideas which is getting implemented? We can kick start the discussion so as to conclude on the possible solution. Regards, Niranjan
Hi, On Wed, Apr 22, 2009 at 9:21 PM, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote: > Starting a new thread related to synchronization of the data files, WAL > etc.. between Primary and standby servers in Synchronous replication > patch. > > Use case: Whenever the primary and standby are out of sync due to > network problems. > > Existing handling is to prepare the standby by > 1) Deleting the $PGDATA on standby > 2) Make a fresh base backup of the primary and load this data to the > standby > 3) Setup the necessary configurations (ex. recovery) and start the > standby server > > In the earlier discussions, please check the link (point 2 related to > direct connection between primary and standby), i think we still need to > work to conclude on what will be done. > http://archives.postgresql.org/pgsql-hackers/2009-02/msg01160.php I'm now implementing the capability to transfer a file related to xlog (i.e. xlog segment file, backup history file and timeline history file). This is used when there are missing files in the standby, and they are automatically copied from the primary. As usability aspect, you don't need to configure warm-standby for Synch Rep any longer before starting the standby. I'll show the detailed design of it before very long. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center