Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections - Mailing list pgsql-general

From John R Pierce
Subject Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections
Date
Msg-id 4E930F6D.4010806@hogranch.com
Whole thread Raw
In response to Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections  (Sean Laurent <sean@studyblue.com>)
Responses Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-general
On 10/06/11 10:21 AM, Sean Laurent wrote:
> We've been running into a particularly strange problem that I'm trying
> to better understand. The super short version is that our application
> servers lose their connection to the database when I run a backup
> during periods of higher load and fail to reconnect.
>
> Here's an overview of the setup:
>
> - PostgreSQL 9.0.1 hosted on a cc1.4xlarge Amazon EC2 instance running
> CentOS 5.6
> - 8 disk RAID-0 array of EBS volumes used for primary data storage
> - 4 disk RAID-0 array of EBS volumes used for transaction logs
> - Root partition is ext3
> - RAID arrays are xfs
>
> Backups are taken using a script that runs the following workflow:
>
> - Tell Postgres to start a backup: SELECT pg_start_backup('RAID backup');
> - Run "xfs_freeze" on the primary RAID array
> - Tell Amazon to take snapshots of each of the EBS volumes
> - Run "xfs_freeze -u" to thaw the primary RAID array
> - Run "xfs_freeze" on the transaction log RAID array
> - Tell Amazon to take snapshots of each of the EBS volumes
> - Run "xfs_freeze -u" to thaw the transaction log RAID array
> - Tell Postgres the backup is finished: SELECT pg_stop_backup();
> - Remove old WAL files
>
> The whole process takes roughly 7 seconds on average. The RAID arrays
> are frozen for roughly 2 seconds on average.
>

While xfs_freeze is in effect, all writes are blocked.  This is NOT what
you want to do here, postgres does NOT expect you to take an atomic
snapshot of the database files, rather, by bracketing your backup with
pg_start_backup and pg_stop_backup, it puts things in a state where a
file by file backup will be fine.

from the man pages...

    xfs_freeze halts new access to the filesystem and creates a stable
    image on disk. xfs_freeze is intended to be used with volume
    managers and hardware RAID devices that support the creation of
    snapshots.

    The mount-point argument is the pathname of the directory where the
    filesystem is mounted. The filesystem must be mounted to be frozen
    (see mount <http://linux.die.net/man/8/mount>(8)).

    The -f flag requests the specified XFS filesystem to be frozen from
    new modifications. When this is selected, all ongoing transactions
    in the filesystem are allowed to complete, new write system calls
    are halted, other calls which modify the filesystem are halted, and
    all dirty data, metadata, and log information are written to disk.
    Any process attempting to write to the frozen filesystem will block
    waiting for the filesystem to be unfrozen.


when postgres's writer processes block, I suspect things go sour fast.




--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast


pgsql-general by date:

Previous
From: "Tomas Vondra"
Date:
Subject: Re: Help on PostgreSQL
Next
From: Merlin Moncure
Date:
Subject: Re: how to save a bytea value into a file?