Thread: Hot standby v5 patch - Databases created post backup remain inaccessible + replica SIGSEGV when coming out of standby

Another corner case:

1/ Setup master and replica with replica using pg_standby
2/ Create a new database (I used 'bench')
3/ Initialize the pgbench schema of size 100 in 'bench' (just to ensure 
the logs with the db creation get archived)
3/ Attempt to connect to 'bench' on the replica

Head from 2nd Nov with v5 patch applied on Freebsd 7.1-Prerelease as 
usual....


postgres=# \l                                    List of databases  Name    |  Owner   | Encoding  | Collation | Ctype
|         Access 
 
Privileges         
-----------+----------+-----------+-----------+-------+-------------------------------------bench     | postgres |
SQL_ASCII| C         | C     |postgres  | postgres | SQL_ASCII | C         | C     |template0 | postgres | SQL_ASCII |
C        | C     | 
 
{=c/postgres,postgres=CTc/postgres}template1 | postgres | SQL_ASCII | C         | C     | 
{=c/postgres,postgres=CTc/postgres}
(4 rows)

postgres=# \c bench
FATAL:  database "bench" does not exist
Previous connection kept


Not sure if this is related at all, but if the replica is then 
instructed to finish recovery via touching its trigger file, then we get:

DEBUG:  executing restore command "pg_standby -l -d -s 2 -t 
/tmp/pgsql.trigger.5439 /data0/pgarchive/8.4 00000001.history 
pg_xlog/RECOVERYHISTORY 000000000000000000000000 2>>standby.log"
DEBUG:  could not restore file "00000001.history" from archive: return 
code 0
DEBUG:  moving last restored xlog to "pg_xlog/000000020000000000000068"
LOG:  archive recovery complete
DEBUG:  Clear UnobservedXids
LOG:  clearing recovery locks
DEBUG:  reaping dead processes
LOG:  startup process (PID 4254) was terminated by signal 11: 
Segmentation fault
LOG:  aborting startup due to startup process failure
DEBUG:  proc_exit(1)
DEBUG:  shmem_exit(1)
DEBUG:  exit(1)

Using gdb:
#0  RelationClearRecoveryLocks () at inval.c:1702
1702            xl_rel_lock *lock = (xl_rel_lock *) lfirst(l);
(gdb) bt
#0  RelationClearRecoveryLocks () at inval.c:1702
#1  0x080d3849 in StartupXLOG () at xlog.c:5959
#2  0x080f1680 in AuxiliaryProcessMain (argc=2, argv=0xbfbfe6e8)   at bootstrap.c:421
#3  0x08214d4d in StartChildProcess (type=StartupProcess) at 
postmaster.c:4104
#4  0x0821725b in PostmasterMain (argc=1, argv=0xbfbfec50) at 
postmaster.c:1034
#5  0x081bfa7b in main (argc=1, argv=0xbfbfec50) at main.c:188


regards

Mark


On Tue, 2008-11-04 at 18:33 +1300, Mark Kirkwood wrote:
> Another corner case:
> 
> 1/ Setup master and replica with replica using pg_standby
> 2/ Create a new database (I used 'bench')
> 3/ Initialize the pgbench schema of size 100 in 'bench' (just to ensure 
> the logs with the db creation get archived)
> 3/ Attempt to connect to 'bench' on the replica
> 
> Head from 2nd Nov with v5 patch applied on Freebsd 7.1-Prerelease as 
> usual....

Case acknowledged.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



On Tue, 2008-11-04 at 18:33 +1300, Mark Kirkwood wrote:

> postgres=# \l
>                                      List of databases
>    Name    |  Owner   | Encoding  | Collation | Ctype |          Access 
> Privileges         
> -----------+----------+-----------+-----------+-------+-------------------------------------
>  bench     | postgres | SQL_ASCII | C         | C     |
>  postgres  | postgres | SQL_ASCII | C         | C     |
>  template0 | postgres | SQL_ASCII | C         | C     | 
> {=c/postgres,postgres=CTc/postgres}
>  template1 | postgres | SQL_ASCII | C         | C     | 
> {=c/postgres,postgres=CTc/postgres}
> (4 rows)
> 
> postgres=# \c bench
> FATAL:  database "bench" does not exist
> Previous connection kept

CREATE DATABASE didn't trigger the db flat file update, code for which
existed and was triggered in the cases when a transaction would normally
rebuild the flat files. Simple fix, but stupid oversight. 

Spotted another problem which is that BuildFlatFile may not be built
consistently if a rebuild is triggered prior to us reaching the recovery
consistency point. This is fixed by forcing a rebuild of the flat files
when we hit the recovery point.

Both one line changes, but I'll go looking for other issues there.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



On Tue, 2008-11-04 at 09:52 +0000, Simon Riggs wrote:

> > postgres=# \c bench
> > FATAL:  database "bench" does not exist
> > Previous connection kept
> 
> CREATE DATABASE didn't trigger the db flat file update, code for which
> existed and was triggered in the cases when a transaction would normally
> rebuild the flat files. Simple fix, but stupid oversight. 

Issue resolved.

> Spotted another problem which is that BuildFlatFile may not be built
> consistently if a rebuild is triggered prior to us reaching the recovery
> consistency point. This is fixed by forcing a rebuild of the flat files
> when we hit the recovery point.

Issue resolved.

> Both one line changes, but I'll go looking for other issues there.

I also mentioned previously that I hadn't implemented locking yet during
flat file updates. After spending longer looking at the code around this
I no longer think it is required.

These changes will be rolled into the next patch version, soon.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Simon Riggs wrote:
> On Tue, 2008-11-04 at 18:33 +1300, Mark Kirkwood wrote:
>
>   
>> postgres=# \l
>>                                      List of databases
>>    Name    |  Owner   | Encoding  | Collation | Ctype |          Access 
>> Privileges         
>> -----------+----------+-----------+-----------+-------+-------------------------------------
>>  bench     | postgres | SQL_ASCII | C         | C     |
>>  postgres  | postgres | SQL_ASCII | C         | C     |
>>  template0 | postgres | SQL_ASCII | C         | C     | 
>> {=c/postgres,postgres=CTc/postgres}
>>  template1 | postgres | SQL_ASCII | C         | C     | 
>> {=c/postgres,postgres=CTc/postgres}
>> (4 rows)
>>
>> postgres=# \c bench
>> FATAL:  database "bench" does not exist
>> Previous connection kept
>>     
>
> CREATE DATABASE didn't trigger the db flat file update, code for which
> existed and was triggered in the cases when a transaction would normally
> rebuild the flat files. Simple fix, but stupid oversight. 
>
> Spotted another problem which is that BuildFlatFile may not be built
> consistently if a rebuild is triggered prior to us reaching the recovery
> consistency point. This is fixed by forcing a rebuild of the flat files
> when we hit the recovery point.
>
> Both one line changes, but I'll go looking for other issues there.
>
>   
Patching with v5d lets me access the newly created database, another one 
down!

Cheers

Mark