Thread: getting status transaction error

getting status transaction error

From
"Merlin Moncure"
Date:
around 6:30 this morning, I started getting the following messages in my log:

Feb  6 06:33:34 mojo postgres[1117]: [2-1] :: ERROR:  could not access
status of transaction 544441911
Feb  6 06:34:35 mojo postgres[1128]: [2-1] :: ERROR:  could not access
status of transaction 544441911
[...]

repeated roughly once a minute.  I've never seen this before. this is
on postgres 8.1.1 running on fedora core 4 smp.  I don't have any
contextual information yet but I'm getting ready to turn statement
logging on.

Anybody know what this is?

merlin

Re: getting status transaction error

From
"Merlin Moncure"
Date:
On 2/6/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> around 6:30 this morning, I started getting the following messages in my log:
>
> Feb  6 06:33:34 mojo postgres[1117]: [2-1] :: ERROR:  could not access
> status of transaction 544441911
> Feb  6 06:34:35 mojo postgres[1128]: [2-1] :: ERROR:  could not access
> status of transaction 544441911
> [...]
>
> repeated roughly once a minute.  I've never seen this before. this is
> on postgres 8.1.1 running on fedora core 4 smp.  I don't have any
> contextual information yet but I'm getting ready to turn statement
> logging on.
>
> Anybody know what this is?
[x-posting to -hackers]

actually, here is some more relevant bits from the log.
Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
processing database "template0"
Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
status of transaction 544441911
Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
file "pg_clog/0207": No such file or directory

repeated ad-naseum

merlin

Re: [HACKERS] getting status transaction error

From
Stefan Kaltenbrunner
Date:
Merlin Moncure wrote:
> On 2/6/07, Merlin Moncure <mmoncure@gmail.com> wrote:
>> around 6:30 this morning, I started getting the following messages in
>> my log:
>>
>> Feb  6 06:33:34 mojo postgres[1117]: [2-1] :: ERROR:  could not access
>> status of transaction 544441911
>> Feb  6 06:34:35 mojo postgres[1128]: [2-1] :: ERROR:  could not access
>> status of transaction 544441911
>> [...]
>>
>> repeated roughly once a minute.  I've never seen this before. this is
>> on postgres 8.1.1 running on fedora core 4 smp.  I don't have any
>> contextual information yet but I'm getting ready to turn statement
>> logging on.
>>
>> Anybody know what this is?
> [x-posting to -hackers]
>
> actually, here is some more relevant bits from the log.
> Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
> processing database "template0"
> Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
> status of transaction 544441911
> Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
> file "pg_clog/0207": No such file or directory

hmm I first thought it could have been
http://archives.postgresql.org/pgsql-committers/2006-01/msg00288.php
which affects 8.1.1 but that's not the very same error as the one
created by the above bug.


Stefan

Re: [HACKERS] getting status transaction error

From
"Merlin Moncure"
Date:
On 2/6/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
> Merlin Moncure wrote:
> > On 2/6/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> >> around 6:30 this morning, I started getting the following messages in
> >> my log:
> >>
> >> Feb  6 06:33:34 mojo postgres[1117]: [2-1] :: ERROR:  could not access
> >> status of transaction 544441911
> >> Feb  6 06:34:35 mojo postgres[1128]: [2-1] :: ERROR:  could not access
> >> status of transaction 544441911
> >> [...]
> >>
> >> repeated roughly once a minute.  I've never seen this before. this is
> >> on postgres 8.1.1 running on fedora core 4 smp.  I don't have any
> >> contextual information yet but I'm getting ready to turn statement
> >> logging on.
> >>
> >> Anybody know what this is?
> > [x-posting to -hackers]
> >
> > actually, here is some more relevant bits from the log.
> > Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
> > processing database "template0"
> > Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
> > status of transaction 544441911
> > Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
> > file "pg_clog/0207": No such file or directory
>
> hmm I first thought it could have been
> http://archives.postgresql.org/pgsql-committers/2006-01/msg00288.php
> which affects 8.1.1 but that's not the very same error as the one
> created by the above bug.

ya, it doesn't seem to match, as this seems to be repeating quite
regularly.  interesting that my 'clog' files start at 06B6 and count
up. 0207 is way off the charts.

a lot of applications are hitting this database, and so far everything
seems to be running ok (i found this log msg by accident), but I am
now officially very nervous.

merlin

Re: getting status transaction error

From
Alvaro Herrera
Date:
Merlin Moncure wrote:
> On 2/6/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> >around 6:30 this morning, I started getting the following messages in my
> >log:
> >
> >Feb  6 06:33:34 mojo postgres[1117]: [2-1] :: ERROR:  could not access
> >status of transaction 544441911
> >Feb  6 06:34:35 mojo postgres[1128]: [2-1] :: ERROR:  could not access
> >status of transaction 544441911
> >[...]
> >
> >repeated roughly once a minute.  I've never seen this before. this is
> >on postgres 8.1.1 running on fedora core 4 smp.  I don't have any
> >contextual information yet but I'm getting ready to turn statement
> >logging on.
> >
> >Anybody know what this is?
> [x-posting to -hackers]
>
> actually, here is some more relevant bits from the log.
> Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
> processing database "template0"
> Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
> status of transaction 544441911
> Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
> file "pg_clog/0207": No such file or directory

I guess the problem here is that autovacuum believes that template0
needs a database-wide vacuum due to Xid wraparound getting closer.  And
that database seems to have Xid 544441911 somewhere, the clog bit for
which was in the 0207 file which was deleted some time ago.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: getting status transaction error

From
"Merlin Moncure"
Date:
On 2/6/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > actually, here is some more relevant bits from the log.
> > Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
> > processing database "template0"
> > Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
> > status of transaction 544441911
> > Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
> > file "pg_clog/0207": No such file or directory
>
> I guess the problem here is that autovacuum believes that template0
> needs a database-wide vacuum due to Xid wraparound getting closer.  And
> that database seems to have Xid 544441911 somewhere, the clog bit for
> which was in the 0207 file which was deleted some time ago.

Latest checkpoint's NextXID:         2162841139
2^31:                                           2147483648

is this related?
merlin

Re: [HACKERS] getting status transaction error

From
Alvaro Herrera
Date:
Merlin Moncure wrote:

> ya, it doesn't seem to match, as this seems to be repeating quite
> regularly.  interesting that my 'clog' files start at 06B6 and count
> up. 0207 is way off the charts.
>
> a lot of applications are hitting this database, and so far everything
> seems to be running ok (i found this log msg by accident), but I am
> now officially very nervous.

I don't think there's much cause for concern here.  If my theory is
correct, this is an autovacuum bug which was fixed in 8.1.7.

What I'd do is create a 0207 clog file, fill it with 0x55 (which is
"transactions committed" for all transactions in that interval), and do
a VACUUM FREEZE on that database.  You'll need to set
pg_database.datallowconn=true beforehand.

Of course, I'd copy the files somewhere else and experiment on a scratch
postmaster, running on a different port, just to be sure ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [HACKERS] getting status transaction error

From
"Merlin Moncure"
Date:
n 2/6/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Merlin Moncure wrote:
>
> > ya, it doesn't seem to match, as this seems to be repeating quite
> > regularly.  interesting that my 'clog' files start at 06B6 and count
> > up. 0207 is way off the charts.
> >
> > a lot of applications are hitting this database, and so far everything
> > seems to be running ok (i found this log msg by accident), but I am
> > now officially very nervous.
>
> I don't think there's much cause for concern here.  If my theory is
> correct, this is an autovacuum bug which was fixed in 8.1.7.
>
> What I'd do is create a 0207 clog file, fill it with 0x55 (which is
> "transactions committed" for all transactions in that interval), and do
> a VACUUM FREEZE on that database.  You'll need to set
> pg_database.datallowconn=true beforehand.
>
> Of course, I'd copy the files somewhere else and experiment on a scratch
> postmaster, running on a different port, just to be sure ...

thats a big help, database is actually fairly huge, so I may have to
just go ahead and do it.   I'm off to a meeting, but I'll check back
when I'm done and assuming nobody else says 'don't do that', I'll try
the fix and post back with the result.

thanks all,
merlin

Re: [HACKERS] getting status transaction error

From
Alvaro Herrera
Date:
Merlin Moncure wrote:
> On 2/6/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >> actually, here is some more relevant bits from the log.
> >> Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
> >> processing database "template0"
> >> Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
> >> status of transaction 544441911
> >> Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
> >> file "pg_clog/0207": No such file or directory
> >
> Latest checkpoint's NextXID:         2162841139
> 2^31:                                           2147483648

I think the relevant arithmetic here is

echo "2162841139 544441911 - p" | dc
1618399228

That's a billion and a half transactions.  Autovacuum uses the formula

        this_whole_db = (tmp->age >
                         (int32) ((MaxTransactionId >> 3) * 3 - 100000));

to determine whether it needs database-wide vacuum.
(MaxTransactionId >> 3) is 536870911, so the calculation is
536870911 * 3 - 100000

echo "536870911 3 * 100000 - p" | dc
1610512733

which looks awfully close to the number above.  About 7 million
transactions must have passed since the first time the error showed up
-- does that sound likely?

Well, scratch that -- what's the _current_ Xid? (not lastest
checkpoint's)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [HACKERS] getting status transaction error

From
Alvaro Herrera
Date:
Merlin Moncure wrote:
> n 2/6/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >Merlin Moncure wrote:
> >
> >> ya, it doesn't seem to match, as this seems to be repeating quite
> >> regularly.  interesting that my 'clog' files start at 06B6 and count
> >> up. 0207 is way off the charts.
> >>
> >> a lot of applications are hitting this database, and so far everything
> >> seems to be running ok (i found this log msg by accident), but I am
> >> now officially very nervous.
> >
> >I don't think there's much cause for concern here.  If my theory is
> >correct, this is an autovacuum bug which was fixed in 8.1.7.
> >
> >What I'd do is create a 0207 clog file, fill it with 0x55 (which is
> >"transactions committed" for all transactions in that interval), and do
> >a VACUUM FREEZE on that database.  You'll need to set
> >pg_database.datallowconn=true beforehand.
> >
> >Of course, I'd copy the files somewhere else and experiment on a scratch
> >postmaster, running on a different port, just to be sure ...
>
> thats a big help, database is actually fairly huge, so I may have to
> just go ahead and do it.   I'm off to a meeting, but I'll check back
> when I'm done and assuming nobody else says 'don't do that', I'll try
> the fix and post back with the result.

Well, you don't need to copy all databases for the test area, just the
base/<oid> dir for template0 (along with all pg_xlog and pg_clog files,
etc, but these shouldn't be as big as all the other stuff in base/).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [HACKERS] getting status transaction error

From
"Merlin Moncure"
Date:
On 2/6/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Merlin Moncure wrote:
> > On 2/6/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > >> actually, here is some more relevant bits from the log.
> > >> Feb  6 06:31:33 mojo postgres[1088]: [1-1] :: LOG:  autovacuum:
> > >> processing database "template0"
> > >> Feb  6 06:31:33 mojo postgres[1088]: [2-1] :: ERROR:  could not access
> > >> status of transaction 544441911
> > >> Feb  6 06:31:33 mojo postgres[1088]: [2-2] :: DETAIL:  could not open
> > >> file "pg_clog/0207": No such file or directory
> > >
> > Latest checkpoint's NextXID:         2162841139
> > 2^31:                                           2147483648
>
> I think the relevant arithmetic here is
>
> echo "2162841139 544441911 - p" | dc
> 1618399228
>
> That's a billion and a half transactions.  Autovacuum uses the formula
>
>         this_whole_db = (tmp->age >
>                          (int32) ((MaxTransactionId >> 3) * 3 - 100000));
>
> to determine whether it needs database-wide vacuum.
> (MaxTransactionId >> 3) is 536870911, so the calculation is
> 536870911 * 3 - 100000
>
> echo "536870911 3 * 100000 - p" | dc
> 1610512733
>
> which looks awfully close to the number above.  About 7 million
> transactions must have passed since the first time the error showed up
> -- does that sound likely?
>
> Well, scratch that -- what's the _current_ Xid? (not lastest
> checkpoint's)

I don't know any better way to get that than this:
postgres=# insert into foo default values;
INSERT 0 1
postgres=# select xmin,xmax from foo;
    xmin    | xmax
------------+------
 2163877346 |    0
(1 row)

merlin

Re: getting status transaction error

From
"Kevin Field"
Date:
This part...

> repeated roughly once a minute.  I've never seen this before. this is

...might mean some connection with autovacuum, maybe?  I have no clue beyond that.  Hope things go well for you.

Kev