[BUGS] BUG #14555: EBUSY error on read() on NFS - Mailing list pgsql-bugs
From | ashwath.rao@altair.com |
---|---|
Subject | [BUGS] BUG #14555: EBUSY error on read() on NFS |
Date | |
Msg-id | 20170220030121.1265.79953@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: [BUGS] BUG #14555: EBUSY error on read() on NFS
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: [BUGS] BUG #14555: EBUSY error on read() on NFS (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14555 Logged by: Ashwath Rao Email address: ashwath.rao@altair.com PostgreSQL version: 9.3.6 Operating system: SLES11SP4 Description: We use Postgres version 9.3.6 with our product PBS Professional. We're having an issue that seems to have cropped up at one of our customer because of the filesystem for the datastore but only exposed with the PostGreSQL update. The datastore sometimes does not work. When we actually try to dump the datastore we get very similar messages to the pg_log messages: #/opt/pbs/default/pgsql/bin/pg_dump -U <USER>-p 15007 pbs_datastore > pbs_datastore_14022017.sql Password: pg_dump: Dumping the contents of table "job_attr" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: could not read block 69600 in file "base/16384/16555": Device or resource busy pg_dump: The command was: COPY pbs.job_attr (ji_jobid, attr_name, attr_resource, attr_value, attr_flags) TO stdout; Once we have this, we seem to have errors only on that one file, but EBUSY is _really_ puzzling. It's not one of the valid errno setting for a read() system call. Bizarrely, problems seem to be reported on a small number of blocks: # grep ERR /panfs/e/PBS/datastore/pg_log/pbs_dataservice_log.Tue 2017-02-14 08:13:33 UTCERROR: could not read block 69600 in file "base/16384/16555": Device or resource busy 2017-02-14 08:15:59 UTCERROR: could not read block 99298 in file "base/16384/16555": Device or resource busy 2017-02-14 08:37:29 UTCERROR: could not read block 9608 in file "base/16384/16555": Device or resource busy But it's not consistently the same block: # /opt/pbs/default/pgsql/bin/pg_dump -U <USER> -p 15007 pbs_datastore > pbs_datastore_17012017_new.sql Password: pg_dump: Dumping the contents of table "job_attr" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: could not read block 105740 in file "base/16384/16555": Device or resource busy pg_dump: The command was: COPY pbs.job_attr (ji_jobid, attr_name, attr_resource, attr_value, attr_flags) TO stdout; When you're in this state, then even the usual error recovery methods don't work reliably: # $PBS_EXEC/pgsql/bin/psql -U <USER> -p 15007 -d pbs_datastore Password for user crayadm: psql (9.3.6) Type "help" for help. pbs_datastore=# set search_path to pbs; SET pbs_datastore=# set zero_damaged_pages=on; SET pbs_datastore=# vacuum full; ERROR: could not read block 99298 in file "base/16384/16555": Device or resource busy The file can be read *sequentially* without any issue, though: dd if=/panfs/e/PBS/datastore/base/16384/16555 of=/tmp/16555 2097152+0 records in 2097152+0 records out 1073741824 bytes (1.1 GB) copied, 10.1065 s, 106 MB/s xcepbs00:~ # echo $? 0 Last time we had this, just making a tarball and untarring things fixed everything! In other words, the files appeared to be readable sequentially without any issue but the I/O patterns used by PostGreSQL seemed to give it all trouble. We are trying to see what can actually give back EBUSY? Is this on a read call or could this be another call? Does the message tell anything about _where_ in the code there was a failure? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: