Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections - Mailing list pgsql-general
From | John R Pierce |
---|---|
Subject | Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections |
Date | |
Msg-id | 4E930F6D.4010806@hogranch.com Whole thread Raw |
In response to | Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections (Sean Laurent <sean@studyblue.com>) |
Responses |
Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections
|
List | pgsql-general |
On 10/06/11 10:21 AM, Sean Laurent wrote: > We've been running into a particularly strange problem that I'm trying > to better understand. The super short version is that our application > servers lose their connection to the database when I run a backup > during periods of higher load and fail to reconnect. > > Here's an overview of the setup: > > - PostgreSQL 9.0.1 hosted on a cc1.4xlarge Amazon EC2 instance running > CentOS 5.6 > - 8 disk RAID-0 array of EBS volumes used for primary data storage > - 4 disk RAID-0 array of EBS volumes used for transaction logs > - Root partition is ext3 > - RAID arrays are xfs > > Backups are taken using a script that runs the following workflow: > > - Tell Postgres to start a backup: SELECT pg_start_backup('RAID backup'); > - Run "xfs_freeze" on the primary RAID array > - Tell Amazon to take snapshots of each of the EBS volumes > - Run "xfs_freeze -u" to thaw the primary RAID array > - Run "xfs_freeze" on the transaction log RAID array > - Tell Amazon to take snapshots of each of the EBS volumes > - Run "xfs_freeze -u" to thaw the transaction log RAID array > - Tell Postgres the backup is finished: SELECT pg_stop_backup(); > - Remove old WAL files > > The whole process takes roughly 7 seconds on average. The RAID arrays > are frozen for roughly 2 seconds on average. > While xfs_freeze is in effect, all writes are blocked. This is NOT what you want to do here, postgres does NOT expect you to take an atomic snapshot of the database files, rather, by bracketing your backup with pg_start_backup and pg_stop_backup, it puts things in a state where a file by file backup will be fine. from the man pages... xfs_freeze halts new access to the filesystem and creates a stable image on disk. xfs_freeze is intended to be used with volume managers and hardware RAID devices that support the creation of snapshots. The mount-point argument is the pathname of the directory where the filesystem is mounted. The filesystem must be mounted to be frozen (see mount <http://linux.die.net/man/8/mount>(8)). The -f flag requests the specified XFS filesystem to be frozen from new modifications. When this is selected, all ongoing transactions in the filesystem are allowed to complete, new write system calls are halted, other calls which modify the filesystem are halted, and all dirty data, metadata, and log information are written to disk. Any process attempting to write to the frozen filesystem will block waiting for the filesystem to be unfrozen. when postgres's writer processes block, I suspect things go sour fast. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
pgsql-general by date: