Re: Could not open file "pg_subtrans/01EB" - Mailing list pgsql-admin

From Mariel Cherkassky
Subject Re: Could not open file "pg_subtrans/01EB"
Date
Msg-id CA+t6e1=NEhMCrZS9bcrhjAbuO_SVyAu2n98zSuY7xrNoZKitRA@mail.gmail.com
Whole thread Raw
In response to Re: Could not open file "pg_subtrans/01EB"  (Mariel Cherkassky <mariel.cherkassky@gmail.com>)
List pgsql-admin
Any idea what should I do ?

‫בתאריך יום ב׳, 27 באוג׳ 2018 ב-15:05 מאת ‪Mariel Cherkassky‬‏ <‪mariel.cherkassky@gmail.com‬‏>:‬
So As Tom Lane suggested I upgraded the database to the 9.2.24 version. I vacuumed (vacuum verbose) all the databases and for one of the databases (that has a duplicated record in pg_database) I got the next meesages for two of the obejcts : 
2018-08-27 16:57:59 +08 db4 22026  WARNING:  relation "a" page 1560 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4  22026  WARNING:  relation "a" page 1561 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4  22026  WARNING:  relation "a" page 1562 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4   22026  WARNING:  relation "a" page 1563 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4   22026  WARNING:  relation "a" page 1564 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4  22026  WARNING:  relation "a" page 1565 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4  22026  WARNING:  relation "a" page 1566 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4  22026  WARNING:  relation "a" page 1567 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4   22026  WARNING:  relation "a" page 1568 is uninitialized --- fixing
2018-08-27 16:58:00 +08  db4   22026  WARNING:  relation "a" page 1569 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1031 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1032 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1033 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1034 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1035 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1036 is uninitialized --- fixing
2018-08-27 16:58:23 +08  db4  22026  WARNING:  relation "b" page 1037 is uninitialized --- fixing

I tried re indexing the problematic databases : 
db1>reindex system db1
2018-08-27 17:09:44 +08 db1 23218  ERROR:  could not access status of transaction 32212695
2018-08-27 17:09:44 +08 db1 23218  DETAIL:  Could not open file "pg_subtrans/01EB": No such file or directory.

I tried to delete the duplicated rows in pg_database but the delete doesnt delete anything : 

select ctid,xmin,xmax,datname from pg_database order by datname;
  ctid  | xmin  |   xmax   |    datname
--------+-------+----------+----------------
 (0,21) |  2351 |        0 | db1
 (0,4)  |  1809 | 21093518 | db1
 (0,3)  |  1806 |        0 | postgres
 (0,24) | 12594 |        0 | db2
 (0,2)  |  1803 |        0 | template0
 (0,1)  |  1802 |        0 | template1
 (0,22) |  3590 |        0 | db3
 (0,23) |  3592 |        0 | db4
 (0,5)  |  1811 | 21077312 | db4
(9 rows)

As you can see db1 and db4 have duplicated records. I tried to delete them : 
delete from pg_database where ctid='(0,4)';
DELETE 0

but the record does exist : 
select ctid,datname from pg_database where ctid='(0,4)';
 ctid  | datname
-------+---------
 (0,4) | db1
(1 row)

I set the zero_damaged_pages to on but it didnt help either.

How can I continue ?


‫בתאריך יום א׳, 26 באוג׳ 2018 ב-19:42 מאת ‪Mariel Cherkassky‬‏ <‪mariel.cherkassky@gmail.com‬‏>:‬
1.I'm not really sure if they have one but its not my business.. I'm just trying to help those guys with an application that is based on postgres...
2.Yeah I realized that it is an option, but do you really think that it should be the first solution ? I tried to search for bugs that seems identical to my case but I didnt find any. 
3.I set the vacuum_freeze_table_age to 0 and vacuumed all the duplicated databases but it seems that it didn't solve their problem. 
4.I tried to delete the records according to the value in xmin/xmax and the result was that after the delete postgresql didnt recognize that I have those databases. I queried the ctid and the xmin/xman : 
postgres=# select xmin,xmax,datname,ctid from pg_Database;
 xmin  |   xmax   |    datname     |  ctid
-------+----------+----------------+--------
  1802 |        0 | template1      | (0,1)
  1803 |        0 | template0      | (0,2)
  1806 |        0 | postgres       | (0,3)
  1809 | 21093518 | db1            | (0,4)
  1811 | 21077312 | db2 | (0,5)
  2351 |        0 | db1            | (0,21)
  3590 |        0 | db3  | (0,22)
  3592 |        0 | db2 | (0,23)
 12594 |        0 | db4            | (0,24)

I tried to delete db1 with xmax 0 and db2 with xmax 2 (the opposite of what you suggested with the ctid). Now, I thought that if the xmax is set to 0 it means that I "didnt have" any update / delete operations and therefore I should delete those databases. Does it matters which one to delete ? I read about the xmin/xmax/ctid columns but I thought that what I did was legit.

Moreover I realized that I have duplicated rows in more system tables so i don`t really like this solution.

My next attempt will be to upgrade to 9.2.24.

Thanks  Tom ! ,

 Regards Mariel.

‫בתאריך יום א׳, 26 באוג׳ 2018 ב-18:51 מאת ‪Tom Lane‬‏ <‪tgl@sss.pgh.pa.us‬‏>:‬
Mariel Cherkassky <mariel.cherkassky@gmail.com> writes:
> I'm trying to investigate a database of one of our clients. The database
> version is 9.2.5.

1. Fire their DBA for dereliction of duty.

2. Update to the last available release of 9.2.x (9.2.24, looks like).

3. Vacuum everything and see if it gets better.

Vacuuming may or may not fix the observed data corruption problems, but
it's silly to ignore the fact that they're missing four years worth
of bug fixes in that branch.  In particular I'm noticing the first
entry in the change notes for 9.2.6, which recommends "vacuuming all
tables in all databases while having vacuum_freeze_table_age set to zero":
https://www.postgresql.org/docs/9.2/static/release-9-2-6.html
That problem as-described seems to involve rows disappearing, rather than
getting duplicated, but I wouldn't rule out that it could also have
symptoms like this.

If that doesn't fix things, you could then try:

4. Manually eliminate duplicate rows by identifying which one of each pair
seems older and deleting it with DELETE ... WHERE ctid = '(x,y)'.  Then
reindex to confirm no duplicates remain.

But you still need step 2, to reduce the odds the problem will recur.

                        regards, tom lane

pgsql-admin by date:

Previous
From: wambacher@posteo.de
Date:
Subject: Re: tuple concurrently updated
Next
From: Scott Ribe
Date:
Subject: Re: How to get alerted automatically whenever a table structure ischanged between Publisher and Subscriber in Logical Replication?