Thread: Problems backing up

Problems backing up

From
Sebastien Boisvert
Date:

Hi all,

We have an OS X app which integrates postgres as its database backend, and recently we've have a couple of cases where
usershaven't been able to perform a backup of their database. The failure gets reported as a problem in a table
("largedata")where we store large binary objects, with a bytea column. Normally when this happens, it's due to database
corruption,and we delete/fix the affected data to repair the database. However, in these recent cases, it appears that
databasecorruption is not the issue. 

When we run into these backup problems we use a tool we've created that fetches all data from each row for that table
toverify them; in the cases of corruption there's always 1 or more rows that can't be fetched, but in these cases all
datacan be fetch, so it doesn't appear to be a corruption issue. 

The backup is done by usin pg_dump, and the logs shows the failure details below; prior to the failure I've seen
real/virtualmem balloon past 1GB 

While one of these problem database itself is very large (about 30gigs), the data in each row isn't too large (biggest
objectwas 138871354 bytes, and our application can read it from the database just fine). 

I've attached the details of the error(s) and pg/table configurations below. I'm hoping someone can point us in the
rightdirection as to what to look for, as I'm not familiar enough with the intricacies and low-level implementation
detailsof postgres to know what to look for or what additional information to look at to determine the cause. 

Error during backup:





                                                                                            (31971) malloc: ***
mmap(size=1073745920)failed (error code=12) 
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
TopMemoryContext: 129928 total in 15 blocks; 9600 free (19 chunks); 120328 used
  Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
  TopTransactionContext: 8192 total in 1 blocks; 7776 free (0 chunks); 416 used
  MessageContext: 8192 total in 1 blocks; 5008 free (1 chunks); 3184 used
  smgr relation table: 24576 total in 2 blocks; 11952 free (4 chunks); 12624 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
    PortalHeapMemory: 536951860 total in 6 blocks; 11016 free (12 chunks); 536940844 used
      COPY TO: 564491844 total in 3 blocks; 8120 free (5 chunks); 564483724 used
  Relcache by OID: 24576 total in 2 blocks; 15584 free (3 chunks); 8992 used
  CacheMemoryContext: 4337488 total in 23 blocks; 1934512 free (3 chunks); 2402976 used
    pg_toast_30058_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_toast_29648_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_description_o_c_o_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
    pg_shdescription_o_c_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_database_datname_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_trigger_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_trigger_tgconstrname_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_constraint_contypid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_constraint_conrelid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_attrdef_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_cast_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_rewrite_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_depend_reference_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
    pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_tablespace_spcname_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_roles: 7168 total in 3 blocks; 2824 free (0 chunks); 4344 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_language_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
    pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used
    pg_language_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
    pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
  MdSmgr: 8192 total in 1 blocks; 1096 free (0 chunks); 7096 used
  LOCALLOCK hash: 57344 total in 3 blocks; 33496 free (8 chunks); 23848 used
  Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
  ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used
2010-01-26 22:26:23.775 EST [PoetProductionsMain] - ERROR:  out of memory
2010-01-26 22:26:23.781 EST [PoetProductionsMain] - DETAIL:  Failed on request of size 1073741823.
2010-01-26 22:26:23.781 EST [PoetProductionsMain] - STATEMENT:  COPY public.largedata (_timestamp, createdate, _rowid,
rawdata,plaintext, isplaintext, storesplaintextrepresentation, deletionuserid, deletiondate, deletionidentifier) TO
stdout;

PG Version: 8.3.5 (with a custom patch to deal with OS X's default shared mem value)

Platform: Mac OS X (10.5 & 10.6)

ulimit reports 'unlimited'

Config file has these values - tried different (bigger) values to no avail:
max_connections = 20# (change requires restart)
unix_socket_directory = '/tmp/DL-501'# (change requires restart)
unix_socket_permissions = 0700# begin with 0 to use octal notation
shared_buffers = 2400kB# min 128kB or max_connections*16kB
max_fsm_pages = 20000# min max_fsm_relations*16, 6 bytes each
max_files_per_process = 100# min 25
checkpoint_segments = 10# in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min# range 30s-1h


      __________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now
http://ca.toolbar.yahoo.com.

Re: Problems backing up

From
Tom Lane
Date:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
> [ COPY fails to dump a 138MB bytea column ]

If you can't switch to a 64-bit build of Postgres, you might need to
think about converting those byteas to large objects.  It's expected for
COPY to require memory space equal to several times the width of the row
it's trying to dump, because it has to have multiple copies of the row
data in order to do format conversion.  In this case the required space
seems to be going well past 1GB, so I wonder whether you are doing
anything that exacerbates the memory requirement, for instance by
forcing an encoding conversion to something other than the database's
server_encoding.

But having said that, it seems a bit odd that it is failing at what
seems to be only around 2GB of memory demand.  Even in a 32-bit build
I'd expect there to be enough address space for that, especially in
view of your very small (arguably too small) shared_buffers setting.
Are you sure the postmaster is really being started with 'unlimited'
ulimits?

            regards, tom lane

Re: Problems backing up

From
Sebastien Boisvert
Date:
----- Original Message ----
> From: Tom Lane <tgl@sss.pgh.pa.us>
>> [ COPY fails to dump a 138MB bytea column ]

> I wonder whether you are doing anything that exacerbates
> the memory requirement, for instance by forcing an encoding conversion to
> something other than the database's server_encoding.

Our backups are done with the "-F c" (in addition to the normal user/host/port options). As far as I know that
shouldn'tbe 
triggering any type conversions (which is UTF8 all-around). If you still think that might be the case, is there a way
toforce 
it _not_ to do the conversion?

The restores are done using the same option. We've recently hit a similar problem where restoring a backup fails with
the
same type of out-of-memory error. Backing up that database works fine however, as does reading all data from it.


      __________________________________________________________________
Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share what you know at
http://ca.answers.yahoo.com

Re: Problems backing up

From
Tom Lane
Date:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
> ----- Original Message ----
>> From: Tom Lane <tgl@sss.pgh.pa.us>
>>> [ COPY fails to dump a 138MB bytea column ]

>> I wonder whether you are doing anything that exacerbates
>> the memory requirement, for instance by forcing an encoding conversion to
>> something other than the database's server_encoding.

> Our backups are done with the "-F c" (in addition to the normal user/host/port options). As far as I know that
shouldn'tbe 
> triggering any type conversions (which is UTF8 all-around).

No, that wouldn't do it.  I'm thinking about forcing client_encoding to
be different from the database encoding, via pg_dump's -E switch, or
indirectly via PGCLIENTENCODING or PGOPTIONS or a user-specific
client_encoding setting (there are some other ways to set
client_encoding too).  You could double check that by seeing what
client_encoding is specified in pg_dump's output.

            regards, tom lane