Thread: Losing data
I have had a serious loss of data and wondered if anyone could shed any light on what may have happened. My users have been writing reports on students. No error messages have been produced and when called back up the reports seem to be present at the time of writing. However, next day they have disappeared, and they do not appear in a pg_dump. They seem to have been kept in memory and never written to disk. We are using Zope and connecting to Postgres through psycopg on Centos 5. I suspect a hard disk failure but any other ideas would be welcome. Would these reports be in the WAL? regards garry
On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > I have had a serious loss of data and wondered if anyone could shed any light > on what may have happened. > My users have been writing reports on students. No error messages have been > produced and when called back up the reports seem to be present at the time > of writing. However, next day they have disappeared, and they do not appear > in a pg_dump. They seem to have been kept in memory and never written to > disk. > We are using Zope and connecting to Postgres through psycopg on Centos 5. > I suspect a hard disk failure but any other ideas would be welcome. > Would these reports be in the WAL? If it was hardware related you would know, quickly. This sounds a great deal more like an application level interaction. Perhaps your zope application caches things for a while before committing to disk? Are you familiar with the actual database itself? Have you checked the database logs for errors? Sincerely, Joshua D. rae
On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: > On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > > I have had a serious loss of data and wondered if anyone could shed any > > light on what may have happened. > > My users have been writing reports on students. No error messages have > > been produced and when called back up the reports seem to be present at > > the time of writing. However, next day they have disappeared, and they do > > not appear in a pg_dump. They seem to have been kept in memory and never > > written to disk. > > We are using Zope and connecting to Postgres through psycopg on Centos 5. > > I suspect a hard disk failure but any other ideas would be welcome. > > Would these reports be in the WAL? > > If it was hardware related you would know, quickly. This sounds a great > deal more like an application level interaction. Perhaps your zope > application caches things for a while before committing to disk? Yes I thought of this but once the report is sent to the DB a separate query is run to get all of that teacher's reports and these are then displayed on a new page. They all appear here but then disappear later. Zope has transaction machinery that rolls everything back on an error, so Postgres must have indicated a successful write somehow. I read in a Postgres manual that the hard disk may report to the OS that a write has occured when it actually has not, is this possible? Oh, and the problem has been intermittant. Another thing that happened this morning is that Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the OS reported this correctly. Restarting postgres sorted it, could this be the problem? Regards Garry
On Thu, Jun 19, 2008 at 9:55 AM, Garry Saddington <garry@schoolteachers.co.uk> wrote: > I have had a serious loss of data and wondered if anyone could shed any light > on what may have happened. > My users have been writing reports on students. No error messages have been > produced and when called back up the reports seem to be present at the time > of writing. However, next day they have disappeared, and they do not appear > in a pg_dump. They seem to have been kept in memory and never written to > disk. > We are using Zope and connecting to Postgres through psycopg on Centos 5. > I suspect a hard disk failure but any other ideas would be welcome. > Would these reports be in the WAL? > regards Just a guess, but do you have some kind of search engine pointing at the application? Could it be hitting a "delete" link maybe?
In response to Garry Saddington <garry@schoolteachers.co.uk>: > On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: > > On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > > > I have had a serious loss of data and wondered if anyone could shed any > > > light on what may have happened. > > > My users have been writing reports on students. No error messages have > > > been produced and when called back up the reports seem to be present at > > > the time of writing. However, next day they have disappeared, and they do > > > not appear in a pg_dump. They seem to have been kept in memory and never > > > written to disk. > > > We are using Zope and connecting to Postgres through psycopg on Centos 5. > > > I suspect a hard disk failure but any other ideas would be welcome. > > > Would these reports be in the WAL? > > > > If it was hardware related you would know, quickly. This sounds a great > > deal more like an application level interaction. Perhaps your zope > > application caches things for a while before committing to disk? > Yes I thought of this but once the report is sent to the DB a separate query > is run to get all of that teacher's reports and these are then displayed on a > new page. They all appear here but then disappear later. Zope has transaction > machinery that rolls everything back on an error, so Postgres must have > indicated a successful write somehow. I read in a Postgres manual that the > hard disk may report to the OS that a write has occured when it actually has > not, is this possible? No. If that happens you end up with corrupt disks. The chance of that going unnoticed by the OS is pretty slim. > Oh, and the problem has been intermittant. Another > thing that happened this morning is that Postgres had today as 18/06/2008 > when in fact it was 19/06/2008 and the OS reported this correctly. Restarting > postgres sorted it, could this be the problem? Sounds to me like there's something seriously wrong with you OS or your PostgreSQL install. What version of PostgreSQL is this? What OS? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk> wrote: > I read in a > Postgres manual that the hard disk may report to the OS that a write has > occured when it actually has not, is this possible? Yeah. But unless the power suddenly turned off that wouldn't cause data loss. > Oh, and the problem > has been intermittant. Another thing that happened this morning is that > Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the > OS reported this correctly. Restarting postgres sorted it, could this be > the problem? I strongly suspect the problem is between the keyboard and the chair. In any case, however, if PostgreSQL reported the transaction complete and the machine didn't experience any hardware problems (like sudden power or disk failure), I would certainly not suspect PostgreSQL as the source of the problem. -- Alan
On Thursday 19 June 2008 18:10, Bill Moran wrote: > In response to Garry Saddington <garry@schoolteachers.co.uk>: > > On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: > > > On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > > > > I have had a serious loss of data and wondered if anyone could shed > > > > any light on what may have happened. > > > > My users have been writing reports on students. No error messages > > > > have been produced and when called back up the reports seem to be > > > > present at the time of writing. However, next day they have > > > > disappeared, and they do not appear in a pg_dump. They seem to have > > > > been kept in memory and never written to disk. > > > > We are using Zope and connecting to Postgres through psycopg on > > > > Centos 5. I suspect a hard disk failure but any other ideas would be > > > > welcome. Would these reports be in the WAL? > > > > > > If it was hardware related you would know, quickly. This sounds a great > > > deal more like an application level interaction. Perhaps your zope > > > application caches things for a while before committing to disk? > > > > Yes I thought of this but once the report is sent to the DB a separate > > query is run to get all of that teacher's reports and these are then > > displayed on a new page. They all appear here but then disappear later. > > Zope has transaction machinery that rolls everything back on an error, so > > Postgres must have indicated a successful write somehow. I read in a > > Postgres manual that the hard disk may report to the OS that a write has > > occured when it actually has not, is this possible? > > No. If that happens you end up with corrupt disks. The chance of that > going unnoticed by the OS is pretty slim. > > > Oh, and the problem has been intermittant. Another > > thing that happened this morning is that Postgres had today as 18/06/2008 > > when in fact it was 19/06/2008 and the OS reported this correctly. > > Restarting postgres sorted it, could this be the problem? > > Sounds to me like there's something seriously wrong with you OS or your > PostgreSQL install. What version of PostgreSQL is this? What OS? > > -- Centos 5 with the Posgres that comes with it - 8.1 regards garry
-------------- Original message ---------------------- From: Garry Saddington <garry@schoolteachers.co.uk> > On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: > > On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > > > I have had a serious loss of data and wondered if anyone could shed any > > > light on what may have happened. > > > My users have been writing reports on students. No error messages have > > > been produced and when called back up the reports seem to be present at > > > the time of writing. However, next day they have disappeared, and they do > > > not appear in a pg_dump. They seem to have been kept in memory and never > > > written to disk. > > > We are using Zope and connecting to Postgres through psycopg on Centos 5. > > > I suspect a hard disk failure but any other ideas would be welcome. > > > Would these reports be in the WAL? > > > > If it was hardware related you would know, quickly. This sounds a great > > deal more like an application level interaction. Perhaps your zope > > application caches things for a while before committing to disk? > Yes I thought of this but once the report is sent to the DB a separate query > is run to get all of that teacher's reports and these are then displayed on a > new page. They all appear here but then disappear later. Zope has transaction > machinery that rolls everything back on an error, so Postgres must have > indicated a successful write somehow. I read in a Postgres manual that the > hard disk may report to the OS that a write has occured when it actually has > not, is this possible? Oh, and the problem has been intermittant. Another > thing that happened this morning is that Postgres had today as 18/06/2008 > when in fact it was 19/06/2008 and the OS reported this correctly. Restarting > postgres sorted it, could this be the problem? > Regards > Garry > Seems like a transaction with no commit. Basically along as the session is active the data is there but once the sessionis closed the data does not persist. -- Adrian Klaver aklaver@comcast.net
On Thursday 19 June 2008 18:15, Alan Hodgson wrote: > On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk> > > wrote: > > I read in a > > Postgres manual that the hard disk may report to the OS that a write has > > occured when it actually has not, is this possible? > > Yeah. But unless the power suddenly turned off that wouldn't cause data > loss. > > > Oh, and the problem > > has been intermittant. Another thing that happened this morning is that > > Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the > > OS reported this correctly. Restarting postgres sorted it, could this be > > the problem? > > I strongly suspect the problem is between the keyboard and the chair. I'd love to agree, but I have seen this first hand as a user! > > In any case, however, if PostgreSQL reported the transaction complete and > the machine didn't experience any hardware problems (like sudden power or > disk failure), I would certainly not suspect PostgreSQL as the source of > the problem. What has happened to the reports then? I have used this combination of Zope and Postgres for 5 years with no problems like this before and we have written one complete set of reports on this server in the past 6 weeks. The problem seems to have started last friday, when reports started to go missing. regards Garry
On Thursday 19 June 2008 18:09, Scott Marlowe wrote: > On Thu, Jun 19, 2008 at 9:55 AM, Garry Saddington > > <garry@schoolteachers.co.uk> wrote: > > I have had a serious loss of data and wondered if anyone could shed any > > light on what may have happened. > > My users have been writing reports on students. No error messages have > > been produced and when called back up the reports seem to be present at > > the time of writing. However, next day they have disappeared, and they do > > not appear in a pg_dump. They seem to have been kept in memory and never > > written to disk. > > We are using Zope and connecting to Postgres through psycopg on Centos 5. > > I suspect a hard disk failure but any other ideas would be welcome. > > Would these reports be in the WAL? > > regards > > Just a guess, but do you have some kind of search engine pointing at > the application? Could it be hitting a "delete" link maybe? Don't think so, once a teacher has entered a report there is no way that they can delete it. Regards Garry
Garry Saddington wrote: > On Thursday 19 June 2008 18:15, Alan Hodgson wrote: >> On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk> >> >> wrote: >>> I read in a >>> Postgres manual that the hard disk may report to the OS that a write has >>> occured when it actually has not, is this possible? >> Yeah. But unless the power suddenly turned off that wouldn't cause data >> loss. >> >>> Oh, and the problem >>> has been intermittant. Another thing that happened this morning is that >>> Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the >>> OS reported this correctly. Restarting postgres sorted it, could this be >>> the problem? >> I strongly suspect the problem is between the keyboard and the chair. > I'd love to agree, but I have seen this first hand as a user! >> In any case, however, if PostgreSQL reported the transaction complete and >> the machine didn't experience any hardware problems (like sudden power or >> disk failure), I would certainly not suspect PostgreSQL as the source of >> the problem. > What has happened to the reports then? I have used this combination of Zope > and Postgres for 5 years with no problems like this before and we have > written one complete set of reports on this server in the past 6 weeks. The > problem seems to have started last friday, when reports started to go > missing. What has changed prior to Friday? -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
On Thursday 19 June 2008 18:52, Adrian Klaver wrote: > -------------- Original message ---------------------- > From: Garry Saddington <garry@schoolteachers.co.uk> > > > On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: > > > On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > > > > I have had a serious loss of data and wondered if anyone could shed > > > > any light on what may have happened. > > > > My users have been writing reports on students. No error messages > > > > have been produced and when called back up the reports seem to be > > > > present at the time of writing. However, next day they have > > > > disappeared, and they do not appear in a pg_dump. They seem to have > > > > been kept in memory and never written to disk. > > > > We are using Zope and connecting to Postgres through psycopg on > > > > Centos 5. I suspect a hard disk failure but any other ideas would be > > > > welcome. Would these reports be in the WAL? > > > > > > If it was hardware related you would know, quickly. This sounds a great > > > deal more like an application level interaction. Perhaps your zope > > > application caches things for a while before committing to disk? > > > > Yes I thought of this but once the report is sent to the DB a separate > > query is run to get all of that teacher's reports and these are then > > displayed on a new page. They all appear here but then disappear later. > > Zope has transaction machinery that rolls everything back on an error, so > > Postgres must have indicated a successful write somehow. I read in a > > Postgres manual that the hard disk may report to the OS that a write has > > occured when it actually has not, is this possible? Oh, and the problem > > has been intermittant. Another thing that happened this morning is that > > Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the > > OS reported this correctly. Restarting postgres sorted it, could this be > > the problem? > > Regards > > Garry > > Seems like a transaction with no commit. Basically along as the session is > active the data is there but once the session is closed the data does not > persist. > Makes sense but what is to blame? Regards Garry
On Thu, 2008-06-19 at 19:06 +0100, Garry Saddington wrote: > On Thursday 19 June 2008 18:15, Alan Hodgson wrote: > > On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk> Although I appreciate that this is a funky problem, the problem doesn't yet exist and we are operating in a diagnostic vacuum. Can you duplicate the problem now? Do you have any logs from PostgreSQL that describe the session that either previously or currently (based on your tests) the problem? Do those logs actually show ERROR? What about your apachelog or zope logs? Have you turned up the logging in all three components to ensure that the problem actually does exist? Sincerely, Joshua D. Drake > regards > Garry >
On Thu, 2008-06-19 at 19:12 +0100, Garry Saddington wrote: > On Thursday 19 June 2008 18:52, Adrian Klaver wrote: > > -------------- Original message ---------------------- > > From: Garry Saddington <garry@schoolteachers.co.uk> > > Seems like a transaction with no commit. Basically along as the session is > > active the data is there but once the session is closed the data does not > > persist. > > > Makes sense but what is to blame? If this is indeed the problem, it is likely a lack of an explicit commit to your connect object within zope. In pysocpg2 unless you set your isolation level to something different it automatically creates a new transaction for your session so if you don't do conn.commit() everything you do will be rolled back. Joshua D. Drake > Regards > Garry >
-------------- Original message ---------------------- From: Garry Saddington <garry@schoolteachers.co.uk> > On Thursday 19 June 2008 18:52, Adrian Klaver wrote: > > -------------- Original message ---------------------- > > From: Garry Saddington <garry@schoolteachers.co.uk> > > > > > On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: > > > > On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: > > > > > I have had a serious loss of data and wondered if anyone could shed > > > > > any light on what may have happened. > > > > > My users have been writing reports on students. No error messages > > > > > have been produced and when called back up the reports seem to be > > > > > present at the time of writing. However, next day they have > > > > > disappeared, and they do not appear in a pg_dump. They seem to have > > > > > been kept in memory and never written to disk. > > > > > We are using Zope and connecting to Postgres through psycopg on > > > > > Centos 5. I suspect a hard disk failure but any other ideas would be > > > > > welcome. Would these reports be in the WAL? > > > > > > > > If it was hardware related you would know, quickly. This sounds a great > > > > deal more like an application level interaction. Perhaps your zope > > > > application caches things for a while before committing to disk? > > > > > > Yes I thought of this but once the report is sent to the DB a separate > > > query is run to get all of that teacher's reports and these are then > > > displayed on a new page. They all appear here but then disappear later. > > > Zope has transaction machinery that rolls everything back on an error, so > > > Postgres must have indicated a successful write somehow. I read in a > > > Postgres manual that the hard disk may report to the OS that a write has > > > occured when it actually has not, is this possible? Oh, and the problem > > > has been intermittant. Another thing that happened this morning is that > > > Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the > > > OS reported this correctly. Restarting postgres sorted it, could this be > > > the problem? > > > Regards > > > Garry > > > > Seems like a transaction with no commit. Basically along as the session is > > active the data is there but once the session is closed the data does not > > persist. > > > Makes sense but what is to blame? Therein lies the rub. As Josh mentioned in another post you will need turn up the logging for the various components andrerun the suspect procedure. I find the Postgres logs to most informative as they show what is actually hitting the database.Once I see the actual SQL statement I work backwards to see how it got there and in that form. > Regards > Garry -- Adrian Klaver aklaver@comcast.net
Garry Saddington <garry@schoolteachers.co.uk> writes: > Oh, and the problem has been intermittant. Another > thing that happened this morning is that Postgres had today as 18/06/2008 > when in fact it was 19/06/2008 and the OS reported this correctly. Two theories about that one: 1. Postgres' timezone setting is far away from where you are. 2. You were in a transaction that had been open since yesterday. now() and related functions show time of transaction start. #2 is interesting since forgetting to commit an open transaction might also explain the "data loss" complaint. regards, tom lane
Adrian Klaver wrote: >> Yes I thought of this but once the report is sent to the DB a >> separate query is run to get all of that teacher's reports and >> these are then displayed on a new page. They all appear here but >> then disappear later. Zope has transaction machinery that rolls >> everything back on an error, so Postgres must have indicated a >> successful write somehow. I read in a Postgres manual that the >> hard disk may report to the OS that a write has occured when it >> actually has not, is this possible? Oh, and the problem has been >> intermittant. Another thing that happened this morning is that >> Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and >> the OS reported this correctly. Restarting postgres sorted it, >> could this be the problem? Regards Garry >> > > Seems like a transaction with no commit. Basically along as the > session is active the data is there but once the session is closed > the data does not persist. > This is my thinking. Once they are entered and then get listed in the new page can another user see the reports? Do they log out and then back in and the reports are gone? If so then maybe the commit step is getting skipped for some reason. At what stage are things committed? After entry or at user logoff? Is there some user data updated on logout that gets an error and rolls back? -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
On Thursday 19 June 2008 19:03, Joshua D. Drake wrote: > On Thu, 2008-06-19 at 19:06 +0100, Garry Saddington wrote: > > On Thursday 19 June 2008 18:15, Alan Hodgson wrote: > > > On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk> > > Although I appreciate that this is a funky problem, the problem doesn't > yet exist and we are operating in a diagnostic vacuum. > > Can you duplicate the problem now? The problem is not consistent, some reports are lsot others not. > > Do you have any logs from PostgreSQL that describe the session that > either previously or currently (based on your tests) the problem? These are at school so I will look tomorrow > > Do those logs actually show ERROR? There are no errors relating to reports > > What about your apachelog or zope logs? Again tomorrow > > Have you turned up the logging in all three components to ensure that > the problem actually does exist? Will do this tomorrow regards and thanks for your efforts. PS Could it be somethiing to do with libpq.so.4/5 required by psycopg and will the WAL hold data to be replayed?
On Thu, 2008-06-19 at 20:29 +0100, Garry Saddington wrote: > On Thursday 19 June 2008 19:03, Joshua D. Drake wrote: > > On Thu, 2008-06-19 at 19:06 +0100, Garry Saddington wrote: > > > On Thursday 19 June 2008 18:15, Alan Hodgson wrote: > > > > On Thursday 19 June 2008, Garry Saddington > <garry@schoolteachers.co.uk> > > > PS > Could it be somethiing to do with libpq.so.4/5 required by psycopg and will I think it certainly has something to do with psycopg and the way it handles commits. Especially based on the description of what you say above where some work and some don't. My guess (as I mentioned previously) is there is a bit of code in your report generator that doesn't call commit correctly. > the WAL hold data to be replayed? Only if you are pushing the logs off for standby operations and only if you commit. Sincerely, Joshua D. Drake >
Hi, Garry Saddington wrote: > On Thursday 19 June 2008 16:55, Joshua D. Drake wrote: >> On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote: ... > Yes I thought of this but once the report is sent to the DB a separate query > is run to get all of that teacher's reports and these are then displayed on a > new page. They all appear here but then disappear later. Zope has transaction > machinery that rolls everything back on an error, so Postgres must have are you running the report within the same request that made the write? In this case you would be in the same transaction (and see uncommitted data) if then an exeption is raised after you display the report the transaction would be rolled back. You should check your zope logs. Regards Tino
Attachment
On Thursday 19 June 2008 14:06:38 Garry Saddington wrote: > > In any case, however, if PostgreSQL reported the transaction complete and > > the machine didn't experience any hardware problems (like sudden power or > > disk failure), I would certainly not suspect PostgreSQL as the source of > > the problem. > > What has happened to the reports then? I have used this combination of Zope > and Postgres for 5 years with no problems like this before and we have > written one complete set of reports on this server in the past 6 weeks. The > problem seems to have started last friday, when reports started to go > missing. Out of curiosity, what is your vacuum strategy? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > On Thursday 19 June 2008 14:06:38 Garry Saddington wrote: >> The problem seems to have started last friday, when reports started to go >> missing. > Out of curiosity, what is your vacuum strategy? If you're thinking "transaction ID wraparound", I believe we can rule that out, because Garry says he's running PG 8.1. 8.1 is not terribly proactive about preventing wraparound (no forced autovacuums), but it will squawk loudly about impending wraparound and shut down before it hits the wall. My bet is that something was changed in the client-side software last week that has broken its commit-issuing behavior. regards, tom lane
On Friday 20 June 2008 05:26, Robert Treat wrote: > On Thursday 19 June 2008 14:06:38 Garry Saddington wrote: > > > In any case, however, if PostgreSQL reported the transaction complete > > > and the machine didn't experience any hardware problems (like sudden > > > power or disk failure), I would certainly not suspect PostgreSQL as the > > > source of the problem. > > > > What has happened to the reports then? I have used this combination of > > Zope and Postgres for 5 years with no problems like this before and we > > have written one complete set of reports on this server in the past 6 > > weeks. The problem seems to have started last friday, when reports > > started to go missing. > > Out of curiosity, what is your vacuum strategy? We back up and vacuum at the same time each day. garry