Thread: four template0 databases after vacuum

four template0 databases after vacuum

From
Kazuaki Fujikura
Date:
Hi there,

Version: 9.1.6 running since Dec, 2014
We have 3 different databases. 

[problem history/background]

Jan 10th, 2016:
The first problem was autovacuum issue.
- autovacuum could not finish successfully.
- I set autovacuum_freeze_max_age to 2 hundreds million.
- autovacuum immediately finished against the database which age was over 2 hundreds million.
- so, autovacuum did not go next database
- I then run "vacuumdb -az" and  run vacuum freeze analyze against template0 after setting datallowconn to true
- I set datallowconn to false

============================================================
$ vacuumdb -az
$ psql template1
template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';
UPDATE 1
template1=# \c template0
template0=# VACUUM FREEZE ANALYZE  ;
VACUUM
template0=# \c template1
template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';
UPDATE 1
============================================================

- At this point, there is one template0 only


Jan 30th, 2016:
After three weeks, I again hit the same issue - autovacuum could not finish.
This time, the age did not reduce with manual vacuum.
I then run vacuum full to pg_database. The age of pg_database becomes minus value.
Then, autovacuum started again.

============================================================
target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ;
                    relname                     |    age    
------------------------------------------------+-----------
 pg_database                                    | 219383067
target_db=# VACUUM FREEZE ;
VACUUM
target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ;
                    relname                     |    age    
------------------------------------------------+-----------
 pg_database                                    | 219387307
target_db=# VACUUM FULL ;
VACUUM
                    relname                     |    age    
------------------------------------------------+-----------
 pg_database                                    | -1861408089
============================================================



Yesterday:

I run the following command to run vacuum full to all pg_database.
Then, I run vacuum freeze analyze and vacuum  full to template0 after setting datallowconn.

============================================================
$ psql -lt | awk '{print $1}' | grep -v ^$ | grep -v ^\| | while read line; do psql ${line} -c "VACUUM FULL pg_database;"; done
VACUUM ....

$ psql template1
template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';
UPDATE 1
template1=# \c template0
template0=# VACUUM FULL  ;
VACUUM
template0=# \c template1
template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';
UPDATE 4
============================================================

***Then I HAVE FOUR template0 DATABASES***




[Current problems]

We now have three issues in our production.

1. It looks four template0 databases exist
2. Xid of template0 keeps growing
3. Can not freeze xid of template0



1. It looks four template0 databases exist

============================================================ 
$ psql -l | grep template0 
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
============================================================

These have same dataid.

============================================================ 
postgres=# SELECT datid, datname FROM pg_stat_database where datname = 'template0';
 datid |  datname  
-------+-----------
 12772 | template0
 12772 | template0
 12772 | template0
 12772 | template0
(4 rows)
============================================================



2. Xid of template0 keeps growing
 
============================================================
postgres=# SELECT datname, age(datfrozenxid) FROM pg_database order by age desc; 
                    datname                     |    age    
------------------------------------------------+-----------
 template0                                      | 198431852
 template0                                      | 198431852
 template0                                      | 198431852
 template0                                      |  50480024
 template1                                      |  45629585
============================================================

At this moment, the maximum age value of all databases is template0.
The age value keeps growing.

One of 4 template0 is young (504080024). Other three template0s are still old.
 

 3. Can not freeze xid of template0
 
To reset xid of template0, I did vacuum full/ vacuum freeze to template0. But,
the age of three template0 did not change. Only of of 4 template0 had successfully
changed the age young.
 
============================================================
$ psql template1
template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';
UPDATE 4
template1=# \c template0
template0=# VACUUM FREEZE ANALYZE  ;
VACUUM
template0=# VACUUM FULL  ;
VACUUM
template0=# \c template1
template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';
UPDATE 4
============================================================

I run the commands above. But, I could not change the age of three template0 databases.



[My idea to fix this]

If I don't do anything about this, I think our production service will be down because it exceeds the limit of xid.

I guess if I drop all template0 and create template0 again, then everything gets back normal.
But I am not quite sure if my approach is right.

I would appreciate any suggestion/comments.

Best regards,
Kazuaki Fujikura

Re: four template0 databases after vacuum

From
"Karsten Hilbert"
Date:
Just a shot in the dark for a possible lead to follow down (sorry for top-posting):

 

Is there index corruption on system tables ?

 

(like, several index entries pointing to the one template0 row)

 

Karsten

 

 

Gesendet: Sonntag, 07. Februar 2016 um 03:43 Uhr
Von: "Kazuaki Fujikura" <fujya@fujya.com>
An: pgsql-general@postgresql.org
Betreff: [GENERAL] four template0 databases after vacuum



Hi there,

 

Version: 9.1.6 running since Dec, 2014

We have 3 different databases. 

 


[problem history/background]

 

Jan 10th, 2016:

The first problem was autovacuum issue.
- autovacuum could not finish successfully.

- I set autovacuum_freeze_max_age to 2 hundreds million.

- autovacuum immediately finished against the database which age was over 2 hundreds million.

- so, autovacuum did not go next database

- I then run "vacuumdb -az" and  run vacuum freeze analyze against template0 after setting datallowconn
totrue 

- I set datallowconn to false

 

============================================================

$ vacuumdb -az



$ psql template1

template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';

UPDATE 1

template1=# \c template0

template0=# VACUUM FREEZE ANALYZE  ;

VACUUM

template0=# \c template1

template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';

UPDATE 1

============================================================

 



- At this point, there is one template0 only

 

 



Jan 30th, 2016:
After three weeks, I again hit the same issue - autovacuum could not finish.

This time, the age did not reduce with manual vacuum.

I then run vacuum full to pg_database. The age of pg_database becomes minus value.

Then, autovacuum started again.

 

============================================================


target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ;

                    relname            
       |    age     

------------------------------------------------+-----------

 pg_database                            
      | 219383067 

target_db=# VACUUM FREEZE ;

VACUUM

target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ;

                    relname            
       |    age     

------------------------------------------------+-----------

 pg_database                            
      | 219387307 

target_db=# VACUUM FULL ;

VACUUM

                    relname            
       |    age     

------------------------------------------------+-----------

 pg_database                            
      | -1861408089 



============================================================

 

 

 


Yesterday:
 

I run the following command to run vacuum full to all pg_database.

Then, I run vacuum freeze analyze and vacuum  full to template0 after setting datallowconn.

 

============================================================

$ psql -lt | awk '{print $1}' | grep -v ^$ | grep -v ^\| | while read
line;do psql ${line} -c "VACUUM FULL pg_database;"; done 

VACUUM ....

 


$ psql template1

template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';

UPDATE 1

template1=# \c template0

template0=# VACUUM FULL  ;

VACUUM

template0=# \c template1

template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';

UPDATE 4
============================================================

 

***Then I HAVE FOUR template0 DATABASES***


 

 

 


[Current problems]
 

We now have three issues in our production.
 

1. It looks four template0 databases exist

2. Xid of template0 keeps growing

3. Can not freeze xid of template0

 

 

 


1. It looks four template0 databases exist

 

============================================================ 

$ psql -l | grep template0 

 template0                            
        | postgres   | UTF8             | C
      | C                 | =c/postgres  
      + 

 template0                            
        | postgres   | UTF8             | C
      | C                 | =c/postgres  
      + 

 template0                            
        | postgres   | UTF8             | C
      | C                 | =c/postgres  
      + 

 template0                            
        | postgres   | UTF8             | C
      | C                 | =c/postgres  
      + 

============================================================

 

These have same dataid.

 

============================================================ 

postgres=# SELECT datid, datname FROM pg_stat_database where datname = 'template0';

 datid |  datname  

-------+-----------

 12772 | template0

 12772 | template0

 12772 | template0

 12772 | template0

(4 rows)

============================================================

 

 

 


2. Xid of template0 keeps growing

 

============================================================

postgres=# SELECT datname, age(datfrozenxid) FROM pg_database order by age desc; 

                    datname            
       |    age     

------------------------------------------------+-----------

 template0                            
        | 198431852 

 template0                            
        | 198431852 

 template0                            
        | 198431852 

 template0                            
        |  50480024 

 template1                            
        |  45629585 

============================================================

 

At this moment, the maximum age value of all databases is template0.

The age value keeps growing.

 

One of 4 template0 is young (504080024). Other three template0s are still old.

 

 


 3. Can not freeze xid of template0

 

To reset xid of template0, I did vacuum full/ vacuum freeze to template0. But,

the age of three template0 did not change. Only of of 4 template0 had successfully

changed the age young.

 

============================================================

$ psql template1

template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';

UPDATE 4

template1=# \c template0

template0=# VACUUM FREEZE ANALYZE  ;

VACUUM

template0=# VACUUM FULL  ;

VACUUM

template0=# \c template1

template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';

UPDATE 4

============================================================

 

I run the commands above. But, I could not change the age of three template0 databases.

 


 


 

[My idea to fix this]

 

If I don't do anything about this, I think our production service will be down because it exceeds the limit of
xid.

 

I guess if I drop all template0 and create template0 again, then everything gets back normal.

But I am not quite sure if my approach is right.

 

I would appreciate any suggestion/comments.

 

Best regards,

Kazuaki Fujikura

Re: four template0 databases after vacuum

From
Melvin Davidson
Date:

On Sun, Feb 7, 2016 at 4:51 AM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
Just a shot in the dark for a possible lead to follow down (sorry for top-posting):
 
Is there index corruption on system tables ?
 
(like, several index entries pointing to the one template0 row)
 
Karsten
 
 
Gesendet: Sonntag, 07. Februar 2016 um 03:43 Uhr
Von: "Kazuaki Fujikura" <fujya@fujya.com>
An: pgsql-general@postgresql.org
Betreff: [GENERAL] four template0 databases after vacuum
Hi there,
 
Version: 9.1.6 running since Dec, 2014
We have 3 different databases. 
 
[problem history/background]
 
Jan 10th, 2016:
The first problem was autovacuum issue.
- autovacuum could not finish successfully.
- I set autovacuum_freeze_max_age to 2 hundreds million.
- autovacuum immediately finished against the database which age was over 2 hundreds million.
- so, autovacuum did not go next database
- I then run "vacuumdb -az" and  run vacuum freeze analyze against template0 after setting datallowconn to true
- I set datallowconn to false
 
============================================================
$ vacuumdb -az
$ psql template1
template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';
UPDATE 1
template1=# \c template0
template0=# VACUUM FREEZE ANALYZE  ;
VACUUM
template0=# \c template1
template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';
UPDATE 1
============================================================
 
- At this point, there is one template0 only
 
 
Jan 30th, 2016:
After three weeks, I again hit the same issue - autovacuum could not finish.
This time, the age did not reduce with manual vacuum.
I then run vacuum full to pg_database. The age of pg_database becomes minus value.
Then, autovacuum started again.
 
============================================================
target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ;
                    relname                     |    age    
------------------------------------------------+-----------
 pg_database                                    | 219383067
target_db=# VACUUM FREEZE ;
VACUUM
target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' ;
                    relname                     |    age    
------------------------------------------------+-----------
 pg_database                                    | 219387307
target_db=# VACUUM FULL ;
VACUUM
                    relname                     |    age    
------------------------------------------------+-----------
 pg_database                                    | -1861408089
============================================================
 
 
 
Yesterday:
 
I run the following command to run vacuum full to all pg_database.
Then, I run vacuum freeze analyze and vacuum  full to template0 after setting datallowconn.
 
============================================================
$ psql -lt | awk '{print $1}' | grep -v ^$ | grep -v ^\| | while read line; do psql ${line} -c "VACUUM FULL pg_database;"; done
VACUUM ....
 
$ psql template1
template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';
UPDATE 1
template1=# \c template0
template0=# VACUUM FULL  ;
VACUUM
template0=# \c template1
template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';
UPDATE 4
============================================================
 
***Then I HAVE FOUR template0 DATABASES***
 
 
 
[Current problems]
 
We now have three issues in our production.
 
1. It looks four template0 databases exist
2. Xid of template0 keeps growing
3. Can not freeze xid of template0
 
 
 
1. It looks four template0 databases exist
 
============================================================ 
$ psql -l | grep template0 
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
 template0                                      | postgres   | UTF8             | C        | C                 | =c/postgres          +
============================================================
 
These have same dataid.
 
============================================================ 
postgres=# SELECT datid, datname FROM pg_stat_database where datname = 'template0';
 datid |  datname  
-------+-----------
 12772 | template0
 12772 | template0
 12772 | template0
 12772 | template0
(4 rows)
============================================================
 
 
 
2. Xid of template0 keeps growing
 
============================================================
postgres=# SELECT datname, age(datfrozenxid) FROM pg_database order by age desc; 
                    datname                     |    age    
------------------------------------------------+-----------
 template0                                      | 198431852
 template0                                      | 198431852
 template0                                      | 198431852
 template0                                      |  50480024
 template1                                      |  45629585
============================================================
 
At this moment, the maximum age value of all databases is template0.
The age value keeps growing.
 
One of 4 template0 is young (504080024). Other three template0s are still old.
 
 
 3. Can not freeze xid of template0
 
To reset xid of template0, I did vacuum full/ vacuum freeze to template0. But,
the age of three template0 did not change. Only of of 4 template0 had successfully
changed the age young.
 
============================================================
$ psql template1
template1=# UPDATE pg_database SET datallowconn = TRUE where datname = 'template0';
UPDATE 4
template1=# \c template0
template0=# VACUUM FREEZE ANALYZE  ;
VACUUM
template0=# VACUUM FULL  ;
VACUUM
template0=# \c template1
template1=# UPDATE pg_database SET datallowconn = FALSE where datname = 'template0';
UPDATE 4
============================================================
 
I run the commands above. But, I could not change the age of three template0 databases.
 
 
 
[My idea to fix this]
 
If I don't do anything about this, I think our production service will be down because it exceeds the limit of xid.
 
I guess if I drop all template0 and create template0 again, then everything gets back normal.
But I am not quite sure if my approach is right.
 
I would appreciate any suggestion/comments.
 
Best regards,
Kazuaki Fujikura
With regards to Karsten's thought, here is a query to find any pg_catalog indexes that are corrupt.

SELECT n.nspname as schema,
       i.relname as table,
       i.indexrelname as index,
       i.idx_scan,
       i.idx_tup_read,
       i.idx_tup_fetch,
       CASE WHEN idx.indisprimary
            THEN 'pkey'
            WHEN idx.indisunique
            THEN 'uidx'
            ELSE 'idx'
            END AS type,
       pg_get_indexdef(idx.indexrelid),
       CASE WHEN idx.indisvalid
            THEN 'valid'
            ELSE 'INVALID'
            END as statusi,
       pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname)) as size_in_bytes,
       pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname))) as size
  FROM pg_stat_all_indexes i
  JOIN pg_class c ON (c.oid = i.relid)
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
  JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
 WHERE n.nspname = 'pg_catalog'
   AND NOT idx.indisvalid
 ORDER BY 1, 2, 3;


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: four template0 databases after vacuum

From
Adrian Klaver
Date:
On 02/06/2016 06:43 PM, Kazuaki Fujikura wrote:
> Hi there,
>
> Version: 9.1.6 running since Dec, 2014
> We have 3 different databases.
>

See comment in line.

> [problem history/background]
>
> Jan 10th, 2016:
> The first problem was autovacuum issue.
> - autovacuum could not finish successfully.
> - I set autovacuum_freeze_max_age to 2 hundreds million.
> - autovacuum immediately finished against the database which age was
> over 2 hundreds million.

Which was?

> - so, autovacuum did not go next database

Which was?

> - I then run "vacuumdb -az" and  run vacuum freeze analyze against
> template0 after setting datallowconn to true

Why? template0 is by default read-only there should be nothing happening
it to require vacuuming.

> - I set datallowconn to false
>
> ============================================================
> $ vacuumdb -az
> $ psql template1
> template1=# UPDATE pg_database SET datallowconn = TRUE where datname =
> 'template0';
> UPDATE 1
> template1=# \c template0
> template0=# VACUUM FREEZE ANALYZE  ;
> VACUUM
> template0=# \c template1
> template1=# UPDATE pg_database SET datallowconn = FALSE where datname =
> 'template0';
> UPDATE 1
> ============================================================
>
> - At this point, there is one template0 only
>
>
> Jan 30th, 2016:
> After three weeks, I again hit the same issue - autovacuum could not finish.

Not finish on what?

> This time, the age did not reduce with manual vacuum.

Age of what?

> I then run vacuum full to pg_database. The age of pg_database becomes
> minus value.
> Then, autovacuum started again.
>
> ============================================================
> target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE
> relkind = 'r' ;
>                      relname                     |    age
> ------------------------------------------------+-----------
>   pg_database                                    | 219383067
> target_db=# VACUUM FREEZE ;
> VACUUM
> target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE
> relkind = 'r' ;
>                      relname                     |    age
> ------------------------------------------------+-----------
>   pg_database                                    | 219387307
> target_db=# VACUUM FULL ;
> VACUUM
>                      relname                     |    age
> ------------------------------------------------+-----------
>   pg_database                                    | -1861408089
> ============================================================
>
>
>
> Yesterday:
>
> I run the following command to run vacuum full to all pg_database.

This I do not get, how the xid count on pg_database so quickly and to
such an extent that it needs a VACUUM FULL?

Is there a script that is creating and dropping databases rapidly?

> Then, I run vacuum freeze analyze and vacuum  full to template0 after
> setting datallowconn.
>
> ============================================================
> $ psql -lt | awk '{print $1}' | grep -v ^$ | grep -v ^\| | while read
> line; do psql ${line} -c "VACUUM FULL pg_database;"; done
> VACUUM ....

What are you trying to do with the above?
I do not think it is a coincidence that the first time the above was run
in this sequence, shortly after 4 template0 databases appear.

>
> $ psql template1
> template1=# UPDATE pg_database SET datallowconn = TRUE where datname =
> 'template0';
> UPDATE 1
> template1=# \c template0
> template0=# VACUUM FULL  ;
> VACUUM
> template0=# \c template1
> template1=# UPDATE pg_database SET datallowconn = FALSE where datname =
> 'template0';
> UPDATE 4
> ============================================================
>
> ***Then I HAVE FOUR template0 DATABASES***
>
>
>
>
> [Current problems]
>
> We now have three issues in our production.
>
> 1. It looks four template0 databases exist
> 2. Xid of template0 keeps growing
> 3. Can not freeze xid of template0
>
>
>
> 1. It looks four template0 databases exist
>
> ============================================================
> $ psql -l | grep template0
>   template0                                      | postgres   | UTF8
>          | C        | C                 | =c/postgres          +
>   template0                                      | postgres   | UTF8
>          | C        | C                 | =c/postgres          +
>   template0                                      | postgres   | UTF8
>          | C        | C                 | =c/postgres          +
>   template0                                      | postgres   | UTF8
>          | C        | C                 | =c/postgres          +
> ============================================================
>
> These have same dataid.
>
> ============================================================
> postgres=# SELECT datid, datname FROM pg_stat_database where datname =
> 'template0';
>   datid |  datname
> -------+-----------
>   12772 | template0
>   12772 | template0
>   12772 | template0
>   12772 | template0
> (4 rows)
> ============================================================
>
>
>
> 2. Xid of template0 keeps growing
> ============================================================
> postgres=# SELECT datname, age(datfrozenxid) FROM pg_database order by
> age desc;
>                      datname                     |    age
> ------------------------------------------------+-----------
>   template0                                      | 198431852
>   template0                                      | 198431852
>   template0                                      | 198431852
>   template0                                      |  50480024
>   template1                                      |  45629585
> ============================================================

Can you show?:

SELECT * from pg_database;

If you do not want to show the whole cluster, then at least the
databases involved in this discussion.

>
> At this moment, the maximum age value of all databases is template0.
> The age value keeps growing.
>
> One of 4 template0 is young (504080024). Other three template0s are
> still old.
>
>   3. Can not freeze xid of template0
> To reset xid of template0, I did vacuum full/ vacuum freeze to
> template0. But,
> the age of three template0 did not change. Only of of 4 template0 had
> successfully
> changed the age young.
> ============================================================
> $ psql template1
> template1=# UPDATE pg_database SET datallowconn = TRUE where datname =
> 'template0';
> UPDATE 4
> template1=# \c template0
> template0=# VACUUM FREEZE ANALYZE  ;
> VACUUM
> template0=# VACUUM FULL  ;
> VACUUM
> template0=# \c template1
> template1=# UPDATE pg_database SET datallowconn = FALSE where datname =
> 'template0';
> UPDATE 4
> ============================================================
>
> I run the commands above. But, I could not change the age of three
> template0 databases.
>
>
>
> [My idea to fix this]
>
> If I don't do anything about this, I think our production service will
> be down because it exceeds the limit of xid.
>
> I guess if I drop all template0 and create template0 again, then
> everything gets back normal.
> But I am not quite sure if my approach is right.
>
> I would appreciate any suggestion/comments.
>
> Best regards,
> Kazuaki Fujikura


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: four template0 databases after vacuum

From
Kazuaki Fujikura
Date:
Thank you for your comments. 

First, I think I need to tell you our database situation

- 3 physical databases (installed in different servers. 1master, 2 slave servers.)
- more than logical 1100 databases in each servers


[Karsten and Melvin]
It shows 0 records in template0 with the query you provided.

===============================
 schema | table | index | idx_scan | idx_tup_read | idx_tup_fetch | type | pg_get_indexdef | statusi | size_in_bytes | size 
--------+-------+-------+----------+--------------+---------------+------+-----------------+---------+---------------+------
(0 rosw)
===============================




[Adrian]

>> [problem history/background] 
>> 
>> Jan 10th, 2016: 
>> The first problem was autovacuum issue. 
>> - autovacuum could not finish successfully. 
>> - I set autovacuum_freeze_max_age to 2 hundreds million. 
>> - autovacuum immediately finished against the database which age was 
>> over 2 hundreds million. 
>
>Which was? 

Any logical database (we have more than 1100 databases) which age (relfrozenxid) is more than 2 hundreds
million shows that autovacuum runs repeatedly (it starts and stops autovacuum process repeatedly with no
vacuum processing).


>> - so, autovacuum did not go next database 
>
>Which was? 

- I saw autovacuum stops at template0 because it can not run vacuum freeze against it
- I thought it was because the age of template0 exceeds the config parameter of autovacuum kick, which is 2 hundreds milliions
- So, I wanted to reduce the age of template0 (I don't know why it increases though)


>Not finish on what?

I could finish vacuum manually.
But, autovacuum was not finished.


>> This time, the age did not reduce with manual vacuum. 
>Age of what? 

age(relfrozenxid) of template0.


>This I do not get, how the xid count on pg_database so quickly and to 
>such an extent that it needs a VACUUM FULL? 
>Is there a script that is creating and dropping databases rapidly? 

We have more than 1100 databases and create new database every day
whenever new customer comes.
Number of transactions are more than ten millions in total of 1100+ database.



>What are you trying to do with the above? 
>I do not think it is a coincidence that the first time the above was run 
>in this sequence, shortly after 4 template0 databases appear. 


I run vacuum full because I could not change the value of relfrozenxid of pg_database with vacuum/vacuum freeze.
Except template0 database, I can change relfrozenxid if I run vacuum full pg_database.



>Can you show?: 
>
>SELECT * from pg_database; 
>
>If you do not want to show the whole cluster, then at least the 
>databases involved in this discussion. 

===============================

postgres=# SELECT oid,ctid,* from pg_database where datname =  'template0' ;
  oid  |  ctid   |  datname  | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace |        
       datacl                
-------+---------+-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
-----------------------------
 12772 | (36,25) | template0 |     10 |        6 | C          | C        | t             | f            |           -1 |         12772 |   2412920847 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
 12772 | (36,26) | template0 |     10 |        6 | C          | C        | t             | f            |           -1 |         12772 |   2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
 12772 | (36,27) | template0 |     10 |        6 | C          | C        | t             | f            |           -1 |         12772 |   2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
 12772 | (36,28) | template0 |     10 |        6 | C          | C        | t             | f            |           -1 |         12772 |   2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
(4 rows)
===============================

oid is same value.
But ctid is different values.

The rest of records has our customer name. If you need more info from here, I can send you the whole data.

Best regards,
Kazuaki Fujikura

Re: four template0 databases after vacuum

From
Adrian Klaver
Date:
On 02/08/2016 04:16 AM, Kazuaki Fujikura wrote:
> Thank you for your comments.
>
> First, I think I need to tell you our database situation
>
> - 3 physical databases (installed in different servers. 1master, 2 slave
> servers.)

For future reference the above are generally called database clusters or
instances to distinguish them from the databases created inside them,
what you call logical databases. Thanks for explaining it helps clear up
some confusion on my part.


Can you explain what your replication set up is?

> - more than logical 1100 databases in each servers
>
>
> [Karsten and Melvin]
> It shows 0 records in template0 with the query you provided.
>
> ===============================
>   schema | table | index | idx_scan | idx_tup_read | idx_tup_fetch |
> type | pg_get_indexdef | statusi | size_in_bytes | size
>
--------+-------+-------+----------+--------------+---------------+------+-----------------+---------+---------------+------
> (0 rosw)
> ===============================
>
>

So are you doing the below on the master, the slaves or all?

>
>
> [Adrian]
>
>  >> [problem history/background]
>  >>
>  >> Jan 10th, 2016:
>  >> The first problem was autovacuum issue.
>  >> - autovacuum could not finish successfully.
>  >> - I set autovacuum_freeze_max_age to 2 hundreds million.
>  >> - autovacuum immediately finished against the database which age was
>  >> over 2 hundreds million.
>  >
>  >Which was?
>
> Any logical database (we have more than 1100 databases) which age
> (relfrozenxid) is more than 2 hundreds
> million shows that autovacuum runs repeatedly (it starts and stops
> autovacuum process repeatedly with no
> vacuum processing).
>
>
>  >> - so, autovacuum did not go next database
>  >
>  >Which was?
>
> - I saw autovacuum stops at template0 because it can not run vacuum
> freeze against it
> - I thought it was because the age of template0 exceeds the config
> parameter of autovacuum kick, which is 2 hundreds milliions
> - So, I wanted to reduce the age of template0 (I don't know why it
> increases though)
>
>
>  >Not finish on what?
>
> I could finish vacuum manually.
> But, autovacuum was not finished.
>
>
>  >> This time, the age did not reduce with manual vacuum.
>  >Age of what?
>
> age(relfrozenxid) of template0.
>
>
>  >This I do not get, how the xid count on pg_database so quickly and to
>  >such an extent that it needs a VACUUM FULL?
>  >Is there a script that is creating and dropping databases rapidly?
>
> We have more than 1100 databases and create new database every day
> whenever new customer comes.
> Number of transactions are more than ten millions in total of 1100+
> database.
>
>
>
>  >What are you trying to do with the above?
>  >I do not think it is a coincidence that the first time the above was run
>  >in this sequence, shortly after 4 template0 databases appear.
>
>
> I run vacuum full because I could not change the value of relfrozenxid
> of pg_database with vacuum/vacuum freeze.
> Except template0 database, I can change relfrozenxid if I run vacuum
> full pg_database.
>
>
>
>  >Can you show?:
>  >
>  >SELECT * from pg_database;
>  >
>  >If you do not want to show the whole cluster, then at least the
>  >databases involved in this discussion.
>
> ===============================
>
> postgres=# SELECT oid,ctid,* from pg_database where datname =  'template0' ;
>    oid  |  ctid   |  datname  | datdba | encoding | datcollate |
> datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid |
> datfrozenxid | dattablespace |
>         datacl
>
-------+---------+-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
> -----------------------------
>   12772 | (36,25) | template0 |     10 |        6 | C          | C
>   | t             | f            |           -1 |         12772 |
> 2412920847 |          1663 | {=c/pos
> tgres,postgres=CTc/postgres}
>   12772 | (36,26) | template0 |     10 |        6 | C          | C
>   | t             | f            |           -1 |         12772 |
> 2264969019 |          1663 | {=c/pos
> tgres,postgres=CTc/postgres}
>   12772 | (36,27) | template0 |     10 |        6 | C          | C
>   | t             | f            |           -1 |         12772 |
> 2264969019 |          1663 | {=c/pos
> tgres,postgres=CTc/postgres}
>   12772 | (36,28) | template0 |     10 |        6 | C          | C
>   | t             | f            |           -1 |         12772 |
> 2264969019 |          1663 | {=c/pos
> tgres,postgres=CTc/postgres}
> (4 rows)
> ===============================
>
> oid is same value.
> But ctid is different values.
>
> The rest of records has our customer name. If you need more info from
> here, I can send you the whole data.

No at this point I do not think that is necessary.

>
> Best regards,
> Kazuaki Fujikura


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: four template0 databases after vacuum

From
Kazuaki Fujikura
Date:
Thank you for your comments. 

>Can you explain what your replication set up is? 

Streaming Replication.
=>
master----slave1 (async)
master----slave2 (async)

>So are you doing the below on the master, the slaves or all? 
=>master

Our current plan at this moment is
- Upgrade 9.1.19 from 9.1.6 at the end of March at our scheduled maintenance
- Export and import pg_dump files ( to eliminate the effect of template0 and xid, which are all reset at import)

We are still afraid that it is too late (as we still have 4 strange template0 files). So, if you have any ideas/suggestions which we can try before scheduled maintenance, that is much appreciated.


Best regards,
Kazuaki Fujikura


2016-02-09 1:51 GMT+09:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 02/08/2016 04:16 AM, Kazuaki Fujikura wrote:
Thank you for your comments.

First, I think I need to tell you our database situation

- 3 physical databases (installed in different servers. 1master, 2 slave
servers.)

For future reference the above are generally called database clusters or instances to distinguish them from the databases created inside them, what you call logical databases. Thanks for explaining it helps clear up some confusion on my part.


Can you explain what your replication set up is?

- more than logical 1100 databases in each servers


[Karsten and Melvin]
It shows 0 records in template0 with the query you provided.

===============================
  schema | table | index | idx_scan | idx_tup_read | idx_tup_fetch |
type | pg_get_indexdef | statusi | size_in_bytes | size
--------+-------+-------+----------+--------------+---------------+------+-----------------+---------+---------------+------
(0 rosw)
===============================



So are you doing the below on the master, the slaves or all?




[Adrian]

 >> [problem history/background]
 >>
 >> Jan 10th, 2016:
 >> The first problem was autovacuum issue.
 >> - autovacuum could not finish successfully.
 >> - I set autovacuum_freeze_max_age to 2 hundreds million.
 >> - autovacuum immediately finished against the database which age was
 >> over 2 hundreds million.
 >
 >Which was?

Any logical database (we have more than 1100 databases) which age
(relfrozenxid) is more than 2 hundreds
million shows that autovacuum runs repeatedly (it starts and stops
autovacuum process repeatedly with no
vacuum processing).


 >> - so, autovacuum did not go next database
 >
 >Which was?

- I saw autovacuum stops at template0 because it can not run vacuum
freeze against it
- I thought it was because the age of template0 exceeds the config
parameter of autovacuum kick, which is 2 hundreds milliions
- So, I wanted to reduce the age of template0 (I don't know why it
increases though)


 >Not finish on what?

I could finish vacuum manually.
But, autovacuum was not finished.


 >> This time, the age did not reduce with manual vacuum.
 >Age of what?

age(relfrozenxid) of template0.


 >This I do not get, how the xid count on pg_database so quickly and to
 >such an extent that it needs a VACUUM FULL?
 >Is there a script that is creating and dropping databases rapidly?

We have more than 1100 databases and create new database every day
whenever new customer comes.
Number of transactions are more than ten millions in total of 1100+
database.



 >What are you trying to do with the above?
 >I do not think it is a coincidence that the first time the above was run
 >in this sequence, shortly after 4 template0 databases appear.


I run vacuum full because I could not change the value of relfrozenxid
of pg_database with vacuum/vacuum freeze.
Except template0 database, I can change relfrozenxid if I run vacuum
full pg_database.



 >Can you show?:
 >
 >SELECT * from pg_database;
 >
 >If you do not want to show the whole cluster, then at least the
 >databases involved in this discussion.

===============================

postgres=# SELECT oid,ctid,* from pg_database where datname =  'template0' ;
   oid  |  ctid   |  datname  | datdba | encoding | datcollate |
datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid |
datfrozenxid | dattablespace |
        datacl
-------+---------+-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
-----------------------------
  12772 | (36,25) | template0 |     10 |        6 | C          | C
  | t             | f            |           -1 |         12772 |
2412920847 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
  12772 | (36,26) | template0 |     10 |        6 | C          | C
  | t             | f            |           -1 |         12772 |
2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
  12772 | (36,27) | template0 |     10 |        6 | C          | C
  | t             | f            |           -1 |         12772 |
2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
  12772 | (36,28) | template0 |     10 |        6 | C          | C
  | t             | f            |           -1 |         12772 |
2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
(4 rows)
===============================

oid is same value.
But ctid is different values.

The rest of records has our customer name. If you need more info from
here, I can send you the whole data.

No at this point I do not think that is necessary.



Best regards,
Kazuaki Fujikura


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: four template0 databases after vacuum

From
Adrian Klaver
Date:
On 02/10/2016 07:19 AM, Kazuaki Fujikura wrote:
> Thank you for your comments.
>
>  >Can you explain what your replication set up is?
>
> Streaming Replication.
> =>
> master----slave1 (async)
> master----slave2 (async)
>
>  >So are you doing the below on the master, the slaves or all?
> =>master
>
> Our current plan at this moment is
> - Upgrade 9.1.19 from 9.1.6 at the end of March at our scheduled maintenance
> - Export and import pg_dump files ( to eliminate the effect of template0
> and xid, which are all reset at import)
>
> We are still afraid that it is too late (as we still have 4 strange
> template0 files). So, if you have any ideas/suggestions which we can try
> before scheduled maintenance, that is much appreciated.

On which of the database clusters are you seeing the 4 template0 databases?

 From a previous post:

"We have more than 1100 databases and create new database every day
whenever new customer comes. .."

How are the databases created?

>
>
> Best regards,
> Kazuaki Fujikura
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: four template0 databases after vacuum

From
Kazuaki Fujikura
Date:
Firstly, I apologise for the delay in replying to you.


>On which of the database clusters are you seeing the 4 template0 databases? 
template0 looks the same on both the master and slave,

=======================================================
$ psql -U user -l -h master | grep template0
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
$ psql -U user -l -h slave1 | grep template0
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
$ psql -U user -l -h slave2 | grep template0
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
 template0                                      | postgres   | UTF8     | C       | C          | =c/postgres          +
=======================================================


>How are the databases created? 

We create database at createdb command.
ex) createdb -O user -E UTF8 --lc-ctype ja_JP.UTF8  -T template0 <dbname>


Best regards,
Kazuaki Fujikura


2016-02-11 4:41 GMT+09:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 02/10/2016 07:19 AM, Kazuaki Fujikura wrote:
Thank you for your comments.

 >Can you explain what your replication set up is?

Streaming Replication.
=>
master----slave1 (async)
master----slave2 (async)

 >So are you doing the below on the master, the slaves or all?
=>master

Our current plan at this moment is
- Upgrade 9.1.19 from 9.1.6 at the end of March at our scheduled maintenance
- Export and import pg_dump files ( to eliminate the effect of template0
and xid, which are all reset at import)

We are still afraid that it is too late (as we still have 4 strange
template0 files). So, if you have any ideas/suggestions which we can try
before scheduled maintenance, that is much appreciated.

On which of the database clusters are you seeing the 4 template0 databases?

From a previous post:

"We have more than 1100 databases and create new database every day
whenever new customer comes. .."

How are the databases created?




Best regards,
Kazuaki Fujikura



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: four template0 databases after vacuum

From
"David G. Johnston"
Date:
On Sun, Feb 7, 2016 at 8:14 AM, Melvin Davidson <melvin6925@gmail.com> wrote:
With regards to Karsten's thought, here is a query to find any pg_catalog indexes that are corrupt.

SELECT n.nspname as schema,
       i.relname as table,
       i.indexrelname as index,
       i.idx_scan,
       i.idx_tup_read,
       i.idx_tup_fetch,
       CASE WHEN idx.indisprimary
            THEN 'pkey'
            WHEN idx.indisunique
            THEN 'uidx'
            ELSE 'idx'
            END AS type,
       pg_get_indexdef(idx.indexrelid),
       CASE WHEN idx.indisvalid
            THEN 'valid'
            ELSE 'INVALID'
            END as statusi,
       pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname)) as size_in_bytes,
       pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname))) as size
  FROM pg_stat_all_indexes i
  JOIN pg_class c ON (c.oid = i.relid)
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
  JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
 WHERE n.nspname = 'pg_catalog'
   AND NOT idx.indisvalid
 ORDER BY 1, 2, 3;


This only returns "invalid" indexes but couldn't some forms of corruption result in errors without the "indisvalid" flag being removed?

Given that OID is a PK I don't see how there isn't some form of corruption going on here.

Aside from their presence indicating that something is wrong, somewhere, the ​fact that there are four of these template0's doesn't seem that problematic.  They neither depend upon nor are depended upon within the cluster.  Though I am a bit curious how "createdb -T template0" still works.  I guess it must ignore the extra records during its search...

I've trimmed a lot here but both up and down thread two observations seem meaningful when considering that as far as PostgreSQL is concerned only one of the 4 template0s is even visible.  There isn't enough use of "ctid" in the other queries to confirm that they are all talking about the same physical entry.

1)
​"""
​But, the age of three template0 did not change. Only [one] of 4 template0 had successfully changed the age young.
"""
and 2)
"""
 12772 | (36,25) | template0 |     10 |        6 | C          | C        | t             | f            |           -1 |         12772 |   2412920847 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
 12772 | (36,26) | template0 |     10 |        6 | C          | C        | t             | f            |           -1 |         12772 |   2264969019 |          1663 | {=c/pos
tgres,postgres=CTc/postgres}
​"""

Note that the first row recognizes that the maximum "​datlastsysoid" has significantly increased compared to what the other three rows report (all identical to the second record).

(36,25) seems to be in play while all of others, while present, are simply ignored.

Now, that said, if not every part of the system (like autovacuum-calculations) truly is ignoring them then that discrepancy is quite likely to cause a problem.

The question is whether its worth any effort exploring this further or should we just advise that the system is presently in an inconsistent state and that said inconsistency needs to be corrected by any means possible - in this case the obvious answer is DELETE FROM pg_database WHERE oid = # AND ctid != (36,25);followed by a REINDEX on pg_database.

I do not suggest this myself - I would only do this myself if a hacker agreed.

The rest of it hints to a potential bug or improvement, somewhere, but I suspect the cost/benefit of finding it is going to be prohibitive based upon a single report on an unpatched release.  But that is coming from a non-hacker.

David J.