Thread: Synch Replication - Synch rep 0114

Synch Replication - Synch rep 0114

From
"Patil, Smita (NSN - IN/Bangalore)"
Date:
Hi,
I have been testing in recent, the Synch Replication(Synch rep 0114 (Jan 14, 2009) ) on PostgreSQL version 8.4 (
postgresql-8.4devel_20081229.tar.bz2)
I followed the steps in Readme as well used the test script provided in patch for the setup.
As per wiki, I am able to bring up the walsender and the walreceiver process in a single server as well when primary and seconday are setup on different nodes(making necessary changes to the test script)
 
Then I am able to see the walsender and walreceiver process are in progress.
 
Then I try to insert some records into the table created (within the script) as below:
./psql
psql (8.4devel)
Type "help" for help.
 
postgres=# insert into temp values(5,'e');
 
I get the following output :
Standby 6820 FATAL:  unexpected EOF on replication connection: lost synchronization with server: got message type "c", length -805175295
 
Primary 6821 LOG:  unexpected EOF on replication connection
Primary 6821 LOG:  replication done at: write 0/1000000 (file 000000010000000000000000), flush 0/1000000 (file 000000010000000000000000)
Standby 6820 LOG:  replication done at: write 0/1000000 (file 000000010000000000000000), flush 0/1000000 (file 000000010000000000000000)
Standby 6812 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
Standby 6812 LOG:  redo done at 0/4A983C
Standby 6812 PANIC:  could not open file "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file or directory
Standby 6809 LOG:  startup process (PID 6812) was terminated by signal 6: Aborted
Standby 6809 LOG:  aborting startup due to startup process failure
INSERT 0 1
After this, I see both walsender and walreceiver are down and writer process is still running.
Is this because, there is no provision of replication between primary and secondary?
Or is it because write transactions are not supported?
 
In case where primary and standby are run on two different nodes, I am able to bring up the walsender and walreceiver process.
But atleast read transactions( records inserted in primary ) are not getting reflected in the standby node.
 
In such cases I would like to know about what exact features are working with this patch?
Because, in the Readme section of Synch Replication wiki, it is mentioned to check whether the walsender and walreceiver process are in progress.
How about replication and read - write transactions?
 
Also with the latest patch Synch rep 0128 (Jan 28, 2009), Am getting compilation errors.
Please let me about the correct status of the Synch Replication about what features are working properly.
 
Regards,
Smita Patil

 
Attachment

Re: Synch Replication - Synch rep 0114

From
Fujii Masao
Date:
Hi,

On Fri, Jan 30, 2009 at 8:05 PM, Patil, Smita (NSN - IN/Bangalore)
<smita.patil@nsn.com> wrote:
> Hi,
> I have been testing in recent, the Synch Replication(Synch rep 0114 (Jan 14,
> 2009) ) on PostgreSQL version 8.4 (
> postgresql-8.4devel_20081229.tar.bz2)

Thanks for your testing and report!

I'm afraid that the base HEAD version
(postgresql-8.4devel_20081229.tar.bz2) is old,
which might have caused the following error. So, please try to apply
synch-rep v0128
patch to the latest HEAD, and test it.

If you can use cvs, the following document might be helpful for you to
get the latest HEAD.
http://www.postgresql.org/docs/8.3/static/anoncvs.html

> As per wiki, I am able to bring up the walsender and the walreceiver process
> in a single server as well when primary and seconday are setup on different
> nodes(making necessary changes to the test script)

What kind of change was required?

> Then I am able to see the walsender and walreceiver process are in progress.

Good!

> Then I try to insert some records into the table created (within the script)
> as below:
> ./psql
> psql (8.4devel)
> Type "help" for help.
>
> postgres=# insert into temp values(5,'e');

Please let me know the DDL of creating "temp" table. I'll test it also on
my machine.

> After this, I see both walsender and walreceiver are down and writer process
> is still running.
> Is this because, there is no provision of replication between primary and
> secondary?

Yes, it's because unexpected error terminated replication (ie. walsender
and walreceiver). But, such termination of replication doesn't affect the
primary's normal processing, so walwriter was still running on the primary.

> Or is it because write transactions are not supported?

Write transactions are also supported like original postgres.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

I tried using the Synchronous replication and I'am unable to replicate
the queries (Ex. create table temp1(int int);).

I have downloaded the latest postgres dev snapshot and the latest sync
replication patch from wiki.

I have attached the log file which has the trace statements displayed on
the console. I actually changed the 'test_synch_rep.sh', so that the
standby database will put detailed trace messages.

I'am actaully getting error related to history file.

Could you please help regarding this.

regards,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi,

Thanks for your testing and report!

On Tue, Feb 3, 2009 at 8:18 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Hi,
>
> I tried using the Synchronous replication and I'am unable to replicate
> the queries (Ex. create table temp1(int int);).
>
> I have downloaded the latest postgres dev snapshot and the latest sync
> replication patch from wiki.
>
> I have attached the log file which has the trace statements displayed on
> the console. I actually changed the 'test_synch_rep.sh', so that the
> standby database will put detailed trace messages.
>
> I'am actaully getting error related to history file.
>
> Could you please help regarding this.

The problem which you pointed out was not reproduced on my machine.
I suspect that the base HEAD version might be different between us.
I updated my pgsq

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
Fujii Masao
Date:
Hi,

Ooops, my mail client has sent the previous message, on the way. Sorry.

On Wed, Feb 4, 2009 at 10:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi,
>
> Thanks for your testing and report!
>
> On Tue, Feb 3, 2009 at 8:18 PM, K, Niranjan (NSN - IN/Bangalore)
> <niranjan.k@nsn.com> wrote:
>> Hi,
>>
>> I tried using the Synchronous replication and I'am unable to replicate
>> the queries (Ex. create table temp1(int int);).
>>
>> I have downloaded the latest postgres dev snapshot and the latest sync
>> replication patch from wiki.
>>
>> I have attached the log file which has the trace statements displayed on
>> the console. I actually changed the 'test_synch_rep.sh', so that the
>> standby database will put detailed trace messages.
>>
>> I'am actaully getting error related to history file.
>>
>> Could you please help regarding this.

The problem which you pointed out was not reproduced on my machine.
I suspect that the base HEAD version might be different between us.
I uploaded my pgsql source with synch rep patch (v0128), so please try it.

http://senduit.com/e5a942

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
Fujii Masao
Date:
Hi,

On Wed, Feb 4, 2009 at 4:17 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Thanks for the response.
> I tried with the your sources and found the same issue.

Thanks for the retesting!

> Could you please help. If you need any more symptoms, I can re-work.

May I ask you some questions?

- Do you use a packet filtering software (eg. firewall)? If yes, what happens if you disabled such a software?

- Do you use SELinux? If yes, what happens if you disabled SELinux?

- Please run the following commands and report those results. * uname -a * pg_config

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

Thanks for the response.
I tried with the your sources and found the same issue.

I explain the steps that I followed, just to avoid any gaps in
understanding.
1) I assumes that the patch was already applied in your sources. So did
went ahead doing "configure", "make" and "make install"
2) Built pg_standby and copied manually to postgres bin directory
3) Copied the test_sync_repl.sh in postgres directory
4) Run the script. Refer "startup.log". Also verified the walsender and
walreceiver processes are running. Refer "ps.log".

Tests done:
1) Execute "psql -l". Refer "list_database.log"
2) Login to standby with command "psql -d replication" and execute SQL
"select * from pg_roles;". This worked fine.
3) Login to primary with command "psql -d postgres" and execute SQL
"create table temp1 (int int);". On execution of this query the standby
instance came down. Refer "createTable.log" (printed on console)

One of the hacker suggested to remove the "echo "host all all 0.0.0.0/0
trust"    >> ${SBYDATA}/pg_hba.conf" for standby database only. But here
too the result was same.

Environment that I use:
- RHEL 4.7
- Both Primary & standby in the same machine

Could you please help. If you need any more symptoms, I can re-work.

Thanks,
Niranjan

> -----Original Message-----
> From: ext Fujii Masao [mailto:masao.fujii@gmail.com]
> Sent: Wednesday, February 04, 2009 7:05 AM
> To: K, Niranjan (NSN - IN/Bangalore)
> Cc: PostgreSQL-development
> Subject: Re: Synch Replication
>
> Hi,
>
> Ooops, my mail client has sent the previous message, on the
> way. Sorry.
>
> On Wed, Feb 4, 2009 at 10:24 AM, Fujii Masao
> <masao.fujii@gmail.com> wrote:
> > Hi,
> >
> > Thanks for your testing and report!
> >
> > On Tue, Feb 3, 2009 at 8:18 PM, K, Niranjan (NSN - IN/Bangalore)
> > <niranjan.k@nsn.com> wrote:
> >> Hi,
> >>
> >> I tried using the Synchronous replication and I'am unable to
> >> replicate the queries (Ex. create table temp1(int int);).
> >>
> >> I have downloaded the latest postgres dev snapshot and the latest
> >> sync replication patch from wiki.
> >>
> >> I have attached the log file which has the trace
> statements displayed
> >> on the console. I actually changed the
> 'test_synch_rep.sh', so that
> >> the standby database will put detailed trace messages.
> >>
> >> I'am actaully getting error related to history file.
> >>
> >> Could you please help regarding this.
>
> The problem which you pointed out was not reproduced on my machine.
> I suspect that the base HEAD version might be different between us.
> I uploaded my pgsql source with synch rep patch (v0128), so
> please try it.
>
> http://senduit.com/e5a942
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source
> Software Center
>

Attachment

Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
> - Do you use a packet filtering software (eg. firewall)?
>   If yes, what happens if you disabled such a software?
Firewall was disabled

> - Do you use SELinux?
>   If yes, what happens if you disabled SELinux?
SELinux is disabled.

> - Please run the following commands and report those results.
>   * uname -a
>   * pg_config
Please see the logs attached.

Thanks,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Wed, Feb 4, 2009 at 6:21 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
>
>> - Do you use a packet filtering software (eg. firewall)?
>>   If yes, what happens if you disabled such a software?
> Firewall was disabled
>
>> - Do you use SELinux?
>>   If yes, what happens if you disabled SELinux?
> SELinux is disabled.
>
>> - Please run the following commands and report those results.
>>   * uname -a
>>   * pg_config
> Please see the logs attached.

Thanks for the information!

> [postgres@node1 ~]$ uname -a
> Linux node1 2.6.9-78.ELsmp #1 SMP Wed Jul 9 15:39:47 EDT 2008 i686 i686 i386 GNU/Linux

Though I also tested synch-rep on i386 (I mainly use x86_64 for testing),
the problem which you pointed out was not reproduced. Since I couldn't
identify the cause of the trouble from current logs, I added the further
logging codes into the patch, and uploaded the source. If you have time,
please try it and report the results.

http://senduit.com/ed60cc

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
>I added the further logging codes
> into the patch, and uploaded the source. If you have time,
> please try it and report the results.

Refer attached logs.

Regards,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Thu, Feb 5, 2009 at 3:34 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
>>I added the further logging codes
>> into the patch, and uploaded the source. If you have time,
>> please try it and report the results.
>
> Refer attached logs.

Thanks, but I have not identified the cause yet. Sorry.
I changed the code to dump all messages for replication, so
please try it and report the result. The messages which the
standby server receives are logged in the file (*1). Please
send also that file.

http://senduit.com/d48e0a

(*1) <installation_directory>/sbydata/walreceiver_trace.out

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

> Thanks, but I have not identified the cause yet. Sorry.

No problem for me to re-test.

> I changed the code to dump all messages for replication, so
> please try it and report the result. The messages which the
> standby server receives are logged in the file (*1). Please
> send also that file.
>
> (*1) <installation_directory>/sbydata/walreceiver_trace.out

Refer attachments.

Regards,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Thu, Feb 5, 2009 at 10:50 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
>> I changed the code to dump all messages for replication, so
>> please try it and report the result. The messages which the
>> standby server receives are logged in the file (*1). Please
>> send also that file.
>>
>> (*1) <installation_directory>/sbydata/walreceiver_trace.out
>
> Refer attachments.

Thanks for interesting results!

According to the logs, walreceiver receives 'c' message (which is invalid
for walreceiver) though walsender doesn't send it. So, next, I should
diagnose more carefully the messages between those processes.
Please test synch-rep again and report the following information.

* server log
* walreceiver_trace.out
* netstat -nap (before and after replication crashes)
* tcpdump -i lo -n tcp

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

> * server log
> * walreceiver_trace.out
> * netstat -nap (before and after replication crashes)
> * tcpdump -i lo -n tcp
>
In addition to the requested logs, I have provided the network interface
information.

regards,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi,

On Fri, Feb 6, 2009 at 2:14 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Hi,
>
>> * server log
>> * walreceiver_trace.out
>> * netstat -nap (before and after replication crashes)
>> * tcpdump -i lo -n tcp
>>
> In addition to the requested logs, I have provided the network interface
> information.

Your tcpdump.log shows that walsender doesn't send any invalid messages,
but walreceiver seems to receive it. So, I suspect libpq functions which
walreceiver uses as the cause of the trouble. I changed synch-rep to dump
all the behaviors of those libpq functions.

http://senduit.com/42da75

Please retry new synch-rep code and report the following information.

* server log
* <installation_directory>/sbydata/walreceiver_trace.out
* tcpdump -i lo -n tcp

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

> Please retry new synch-rep code and report the following information.
>
> * server log
> * <installation_directory>/sbydata/walreceiver_trace.out
> * tcpdump -i lo -n tcp
>

Refer attachments.

regards,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Fri, Feb 6, 2009 at 6:46 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Hi,
>
>> Please retry new synch-rep code and report the following information.
>>
>> * server log
>> * <installation_directory>/sbydata/walreceiver_trace.out
>> * tcpdump -i lo -n tcp
>>
>
> Refer attachments.

I'm afraid that the message length may be greater than INT_MAX,
which might cause the trouble on your machine. I changed my code
again, in order to test my hypothesis.

http://senduit.com/3ccd93

Please run new synch-rep again and report the following information.
Thank you very much for testing it repeatedly!

* server log
* <installation_directory>/sbydata/walreceiver_trace.out
* tcpdump -i lo -n tcp

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

> * server log
> * <installation_directory>/sbydata/walreceiver_trace.out
> * tcpdump -i lo -n tcp
>
Please refer attachments.

Regards,
Niranjan

Attachment

Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Fri, Feb 6, 2009 at 10:17 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
>
> Hi,
>
>> * server log
>> * <installation_directory>/sbydata/walreceiver_trace.out
>> * tcpdump -i lo -n tcp
>>
> Please refer attachments.

Please try the updated synch-rep, which probably fixes the problem, I hope.
If the same trouble was reproduced, please report the logs.

http://senduit.com/c21db5

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

Thanks very much!

On Mon, Feb 9, 2009 at 3:08 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Now, the active and standby database are up & running even after the
> execution of the SQL (create table). What was the problem?

The problem is that 1-byte variable was assigned the value casted to 4-bytes,
which overwrote another variable (which lives next to the 1-byte val) wrongly.
This behavior varies based on environment (ex. memory alignment). So,
the trouble wasn't reproduced on my machine though it occurred on yours.

It's my disgraceful bug.. :(

> But when I logged in the standby instance by executing 'psql -d
> replication', I did not see the table that was created on the primary.
> I have few questions:
>
> - I'am not sure whether the replication is done but I'am not able to
> view? Will I be able to view the replication by logging inside to
> standby instance? Hotstandby patch will allow to read from standby. Is
> this patch integrated in sync replication patch?

No, hot standby and synch rep are independent patch now. So, you
cannot issue any queries to the standby server during replication.
The progress of replication can be checked via 'ps' command as follows.
This reports the LSN already the standby server has received and written
(or fsynced).

------------
[primary] $ pgrep -fl wal
1803 postgres: wal writer process
1830 postgres: wal sender process postgres 127.0.0.1(34604) replicated
to: write 0/1F74DD0, flush 0/1F68878

[standby] $ pgrep -fl wal
1828 postgres: wal receiver process   replicated to: write 0/1F74DD0,
flush 0/1F68878
------------

> - I brought down the active instance by executing 'pg_ctl -D
> /home/postgres/postgresHSB/actdata stop' hoping that trigger file will
> enable failover. But I was not able to login to standby instance. Not
> sure why?

Please let me know the failover procedure which you carried out. As follows?

1) pg_ctl -D /home/postgres/postgresHSB/actdata stop
2) touch /home/postgres/postgresHSB/finish.trigger

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Mon, Feb 9, 2009 at 6:58 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
>> 1) pg_ctl -D /home/postgres/postgresHSB/actdata stop
>> 2) touch /home/postgres/postgresHSB/finish.trigger
>
> Yes. This the procedure that I followed. I have attached the relevant
> logs.
> "change_standby_mode.log" - Commands used to change from continous
> recovery mode of the standby instance
> "ps.log" - ps command before and after executing the SQL.

Thanks for the informations!

---------------------------
[postgres@node1 ~]$ psql -d replication
psql: could not connect to server: No such file or directory       Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
 
---------------------------

I think that your standby postmaster is running under port = 5433, so
please specify "-p 5433".

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

On Mon, Feb 9, 2009 at 10:39 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> But after I login to replication database (note the active I had brought
> it down earlier & created a finish.trigger), I still cannot see the
> table that was created on the primary.
> Also please note that the LSN had changed after replication in the ps
> command.

Did you create the table in 'replication' database? If not, please
connect to the correct database which includes the table.
In log-shipping, the database objects are basically identical
between the primary and the standby server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

Now, the active and standby database are up & running even after the
execution of the SQL (create table). What was the problem?

But when I logged in the standby instance by executing 'psql -d
replication', I did not see the table that was created on the primary.
I have few questions:

- I'am not sure whether the replication is done but I'am not able to
view? Will I be able to view the replication by logging inside to
standby instance? Hotstandby patch will allow to read from standby. Is
this patch integrated in sync replication patch?

- I brought down the active instance by executing 'pg_ctl -D
/home/postgres/postgresHSB/actdata stop' hoping that trigger file will
enable failover. But I was not able to login to standby instance. Not
sure why?

regards,
Niranjan

> -----Original Message-----
> From: ext Fujii Masao [mailto:masao.fujii@gmail.com]
> Sent: Monday, February 09, 2009 7:53 AM
> To: K, Niranjan (NSN - IN/Bangalore)
> Cc: PostgreSQL-development
> Subject: Re: Synch Replication
>
> Hi Niranjan,
>
> On Fri, Feb 6, 2009 at 10:17 PM, K, Niranjan (NSN -
> IN/Bangalore) <niranjan.k@nsn.com> wrote:
> >
> > Hi,
> >
> >> * server log
> >> * <installation_directory>/sbydata/walreceiver_trace.out
> >> * tcpdump -i lo -n tcp
> >>
> > Please refer attachments.
>
> Please try the updated synch-rep, which probably fixes the
> problem, I hope.
> If the same trouble was reproduced, please report the logs.
>
> http://senduit.com/c21db5
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source
> Software Center
>

Attachment

Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,


> > - I brought down the active instance by executing 'pg_ctl -D
> > /home/postgres/postgresHSB/actdata stop' hoping that trigger file
will
> > enable failover. But I was not able to login to standby instance.
Not
> > sure why?
>
> Please let me know the failover procedure which you carried
> out. As follows?
>
> 1) pg_ctl -D /home/postgres/postgresHSB/actdata stop
> 2) touch /home/postgres/postgresHSB/finish.trigger

Yes. This the procedure that I followed. I have attached the relevant
logs.
"change_standby_mode.log" - Commands used to change from continous
recovery mode of the standby instance
"ps.log" - ps command before and after executing the SQL.

Regards,
Niranjan

Attachment

Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

> [postgres@node1 ~]$ psql -d replication
> psql: could not connect to server: No such file or directory
>         Is the server running locally and accepting
>                 connections on Unix domain socket
> "/tmp/.s.PGSQL.5432"?
> ---------------------------
>
> I think that your standby postmaster is running under port =
> 5433, so please specify "-p 5433".

It was silly that I missed this. :-(

But after I login to replication database (note the active I had brought
it down earlier & created a finish.trigger), I still cannot see the
table that was created on the primary.
Also please note that the LSN had changed after replication in the ps
command.

I have attached the logs.

regards,
Niranjan

Attachment

Re: Synch Replication

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

Thanks. Now it works.

Few questions:
1) Do you have an idea by when the Hot standby patch and Sync
replication patch will be integrated?
2) I have used 1 physical machine to try current patch of synchronous
replication. Is it OK for me to try with 2 separate machines for Primary
& Standby servers
3) Do you have test programs that can used for synchronous replication
testing?
4) I'am thinking of trying load/performance tests as well. What do you
feel? Will it be too early to do this test?

regards,
Niranjan

> -----Original Message-----
> From: ext Fujii Masao [mailto:masao.fujii@gmail.com]
> Sent: Monday, February 09, 2009 7:47 PM
> To: K, Niranjan (NSN - IN/Bangalore)
> Cc: PostgreSQL-development
> Subject: Re: Synch Replication
>
> Hi Niranjan,
>
> On Mon, Feb 9, 2009 at 10:39 PM, K, Niranjan (NSN -
> IN/Bangalore) <niranjan.k@nsn.com> wrote:
> > But after I login to replication database (note the active I had
> > brought it down earlier & created a finish.trigger), I still cannot
> > see the table that was created on the primary.
> > Also please note that the LSN had changed after replication
> in the ps
> > command.
>
> Did you create the table in 'replication' database? If not,
> please connect to the correct database which includes the table.
> In log-shipping, the database objects are basically identical
> between the primary and the standby server.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source
> Software Center
>


Re: Synch Replication

From
Fujii Masao
Date:
Hi Niranjan,

Sorry for this late reply.

On Tue, Feb 10, 2009 at 3:25 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Thanks. Now it works.

Good news :)

> Few questions:
> 1) Do you have an idea by when the Hot standby patch and Sync
> replication patch will be integrated?

I think (hope) that they will be integrated in v8.5.

> 2) I have used 1 physical machine to try current patch of synchronous
> replication. Is it OK for me to try with 2 separate machines for Primary
> & Standby servers

Of course. The basic procedure to construct the synch rep environment
is described in the attached document (log-shipping-record.html).
If you want the more detailed information, please refer to the following link
and build the documentation of synch rep.

http://www.postgresql.org/docs/current/interactive/docguide-build.html

> 3) Do you have test programs that can used for synchronous replication
> testing?

No, I've not used the automated test program. Yeah, since it's very useful,
I'll make it before long.

> 4) I'am thinking of trying load/performance tests as well. What do you
> feel? Will it be too early to do this test?

Any kinds of testing welcome!

I attached the patch which fixed the problem which you reported.
The source code which I uploaded before outputs many logs,
which would harm the performance, so please try this new patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Synch Replication

From
Alvaro Herrera
Date:
Fujii Masao escribió:

I noticed two very minor issues while reading your docs:

>        This is because WAL files generated in the primary server before this built-in
>        replication starts have to be transferred to the standby server by
>        using file-based log shipping. When <TT
> CLASS="VARNAME"
> >archive_mode</TT
> > is <TT
> CLASS="LITERAL"
> >unsent</TT
> >,

You probably mean "unset" here.

> <TT
> CLASS="VARNAME"
> >enable_replication</TT
> > (<TT
> CLASS="TYPE"
> >boolean</TT
> >)

It has been said that variables that enable/disable features should only
be named after the feature that they affect, omitting the "enable" verb.
So in this case it should be set as "replication=off" or
"replication=on".

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Synch Replication

From
Fujii Masao
Date:
Hi,

Thanks for the comments!

On Fri, Feb 13, 2009 at 5:00 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Fujii Masao escribió:
>
> I noticed two very minor issues while reading your docs:
>
>>        This is because WAL files generated in the primary server before this built-in
>>        replication starts have to be transferred to the standby server by
>>        using file-based log shipping. When <TT
>> CLASS="VARNAME"
>> >archive_mode</TT
>> > is <TT
>> CLASS="LITERAL"
>> >unsent</TT
>> >,
>
> You probably mean "unset" here.

I mean "unsent" here. This is one of valid values of archive_mode
which I changed for synch-rep. Please see the description of
archive_mode in the attached document.

>
>> <TT
>> CLASS="VARNAME"
>> >enable_replication</TT
>> > (<TT
>> CLASS="TYPE"
>> >boolean</TT
>> >)
>
> It has been said that variables that enable/disable features should only
> be named after the feature that they affect, omitting the "enable" verb.
> So in this case it should be set as "replication=off" or
> "replication=on".

Okay, I will rename the parameter like you say. Thanks!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Synch Replication

From
"Czichy, Thoralf (NSN - FI/Helsinki)"
Date:
hi,

[I am working in the same team as Niranjan]

Niranjan wrote:
> > 3) Do you have test programs that can used
> > for synchronous replication testing?
>
> No, I've not used the automated test program. Yeah, since
> it's very useful, I'll make it before long.
>
> > 4) I'am thinking of trying load/performance tests as well.
> > What do you feel? Will it be too early to do this test?
>
> Any kinds of testing welcome!

Actually, this is just to let you know that for _stability_ and
performance tests we use the "Network Database Benchmark" which
we open-sourced (GPLv2) in 2006. Just recently one of our
colleagues wrote a _small_ patch that makes it work out of the
box with _PostgreSQL_/UnixODBC. The patch is now also available.

The main project page(s): http://hoslab.cs.helsinki.fi/savane/projects/ndbbenchmark/
http://hoslab.cs.helsinki.fi/homepages/ndbbenchmark/

The patch:
http://hoslab.cs.helsinki.fi/savane/cookbook/?func=detailitem&item_id=14
1

The benchmark models a Telco home location register (HLR)
application with lots of short read/write transactions whose
ratio can be adjusted on the command line, e.g. to model read
or write heavy transaction loads. We'll re-use this benchmark
as we have lots of existing measurements for other databases.
Also we have a pretty good understanding of what to expect
performance-wise with the different transaction mixes. The
actual benchmark specification is available from here

The benchmark spec:
http://hoslab.cs.helsinki.fi/downloads/ndbbenchmark/Network_Database_Ben
chmark_Definition_2006-02-01.pdf

Thoralf


Synch Replication: Synchronization of files between Primary & Standby

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Hi,

Starting a new thread related to synchronization of the data files, WAL
etc.. between Primary and standby servers in Synchronous replication
patch.

Use case: Whenever the primary and standby are out of sync due to
network problems.

Existing handling is to prepare the standby by
1) Deleting the $PGDATA on standby
2) Make a fresh base backup of the primary and load this data to the
standby
3) Setup the necessary configurations (ex. recovery) and start the
standby server

In the earlier discussions, please check the link (point 2 related to
direct connection between primary and standby), i think we still need to
work to conclude on what will be done.
http://archives.postgresql.org/pgsql-hackers/2009-02/msg01160.php

One issue to be addressed is also the usability aspect of the solution
for the mentioned use case.

For synchronization of files with direct connection, there were
suggestions to consider VLDB cases too. Do you already have some ideas
which is getting implemented? We can kick start the discussion so as to
conclude on the possible solution.

Regards,
Niranjan


Re: Synch Replication: Synchronization of files between Primary & Standby

From
Fujii Masao
Date:
Hi,

On Wed, Apr 22, 2009 at 9:21 PM, K, Niranjan (NSN - IN/Bangalore)
<niranjan.k@nsn.com> wrote:
> Starting a new thread related to synchronization of the data files, WAL
> etc.. between Primary and standby servers in Synchronous replication
> patch.
>
> Use case: Whenever the primary and standby are out of sync due to
> network problems.
>
> Existing handling is to prepare the standby by
> 1) Deleting the $PGDATA on standby
> 2) Make a fresh base backup of the primary and load this data to the
> standby
> 3) Setup the necessary configurations (ex. recovery) and start the
> standby server
>
> In the earlier discussions, please check the link (point 2 related to
> direct connection between primary and standby), i think we still need to
> work to conclude on what will be done.
> http://archives.postgresql.org/pgsql-hackers/2009-02/msg01160.php

I'm now implementing the capability to transfer a file related to xlog
(i.e. xlog segment file, backup history file and timeline history file).
This is used when there are missing files in the standby, and they
are automatically copied from the primary. As usability aspect, you
don't need to configure warm-standby for Synch Rep any longer
before starting the standby.

I'll show the detailed design of it before very long.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center