Thread: Temporary tables prevent autovacuum, leading to XID wraparound

Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
Hello,

I've found a problem that an orphaned temporary table could cause XID wraparound.  Our customer encountered this
problemwith PG 9.5.2, but I think this will happen with the latest PG.
 

I'm willing to fix this, but I'd like to ask you what approach we should take.


PROBLEM
====================================

The customer has a database for application data, which I call it user_db here.  They don't store application data in
postgresdatabase.
 

No tables in user_db was autovacuumed for more than a month, leading to user tables bloating.  Those tables are
eligiblefor autovacuum according to pg_stat_all_tables and autovacuum settings.
 

age(datfrozenxid) of user_db and postgres databases are greater than autovacuum_max_freeze_age, so they are eligible
forautovacuuming for XID wraparound.
 

There are user tables in user_db whose age(relfrozenxid) is greater than autovacuum_freeze_max_age, so those tables
shouldget autovacuum treatment.
 


CAUSE
====================================

postgres database has a table named pg_temp_3.fetchchunks, whose age(relfrozenxid) is greater than
autovacuum_freeze_max_age. This temporary table is the culprit.  pg_temp_3.fetchchunks is created by pg_rewind.  The
customersays they ran pg_rewind.
 

autovacuum launcher always choose postgres, because do_start_worker() scans pg_database and finds that postgres
databaseneeds vacuuming for XID wraparound.  user_db is never chosen for vacuuming, although it also needs vacuuming
forXID wraparound.
 

autovacuum worker doesn't delete pg_temp3.fetchchunks, because the backendid 3 is used by some application so
autovacuumworker thinks that the backend is active and the temporary table cannot be dropped.
 

I don't know why pg_temp3.fetchchunks still exists.  Maybe the user ran pg_ctl stop -mi while pg_rewind was running.


FIX
====================================

I have the following questions.  Along which line should I proceed to fix the problem?

* Why does autovacuum launcher always choose only one database when that database need vacuuming for XID wraparound?
Shouldn'tit also choose other databases?
 

* I think temporary tables should not require vacuuming for XID wraparound.  Furtherover, should updates/deletes to
temporarytables  be in-place instead of creating garbage, so that any form of vacuum is unnecessary?  Other sessions do
notneed to read temporary tables.
 

* In this incident, autovacuum worker misjudged that pg_temp_3.fetchchunks can't be deleted, although the creator
(pg_rewind)is no longer active.  How can we delete orphan temporary tables safely?
 


Regards
Takayuki Tsunakawa





Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Thu, Jan 25, 2018 at 06:14:41AM +0000, Tsunakawa, Takayuki wrote:
> I don't know why pg_temp3.fetchchunks still exists.  Maybe the user
> ran pg_ctl stop -mi while pg_rewind was running.

Likely that was the case :(

As a superuser, DROP TABLE should work on the temporary schema of
another session. Have you tried that to solve the situation?

> * I think temporary tables should not require vacuuming for XID
> wraparound.  Furtherover, should updates/deletes to temporary tables
> be in-place instead of creating garbage, so that any form of vacuum is
> unnecessary?  Other sessions do not need to read temporary tables.

Yeah, there are many areas of improvements in this area. Temp tables
also generate WAL..

> * In this incident, autovacuum worker misjudged that
> pg_temp_3.fetchchunks can't be deleted, although the creator
> (pg_rewind) is no longer active.  How can we delete orphan temporary
> tables safely?

As long as Postgres sees that its temporary schema is in use, it would
think that the table is not orphaned. Another thing possible would be to
have the session now holding this schema space to reuse fetchchunks so
as things are reset.
--
Michael

Attachment

RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
> From: Michael Paquier [mailto:michael.paquier@gmail.com]
> As a superuser, DROP TABLE should work on the temporary schema of another
> session. Have you tried that to solve the situation?

Yes, we asked the customer to do that today.  I think the customer will do in the near future.

> > * In this incident, autovacuum worker misjudged that
> > pg_temp_3.fetchchunks can't be deleted, although the creator
> > (pg_rewind) is no longer active.  How can we delete orphan temporary
> > tables safely?
> 
> As long as Postgres sees that its temporary schema is in use, it would think
> that the table is not orphaned. Another thing possible would be to have
> the session now holding this schema space to reuse fetchchunks so as things
> are reset.

I understood you suggested a new session which recycle the temp schema should erase the zombie metadata of old temp
tablesor recreate the temp schema.  That sounds easy.
 

Regards
Takayuki Tsunakawa




Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Thu, Jan 25, 2018 at 08:10:00AM +0000, Tsunakawa, Takayuki wrote:
>>> * In this incident, autovacuum worker misjudged that
>>> pg_temp_3.fetchchunks can't be deleted, although the creator
>>> (pg_rewind) is no longer active.  How can we delete orphan temporary
>>> tables safely?
>>
>> As long as Postgres sees that its temporary schema is in use, it would think
>> that the table is not orphaned. Another thing possible would be to have
>> the session now holding this schema space to reuse fetchchunks so as things
>> are reset.
>
> I understood you suggested a new session which recycle the temp schema
> should erase the zombie metadata of old temp tables or recreate the
> temp schema.  That sounds easy.

If the new session makes use of the same temporary schema where the
orphan table is, cleanup is possible. Now you have a problem if this is
not available as this depends on the backend ID uniquely assigned. It
would be better to just drop the table manually at the end.
--
Michael

Attachment

RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com]
> On Thu, Jan 25, 2018 at 08:10:00AM +0000, Tsunakawa, Takayuki wrote:
> > I understood you suggested a new session which recycle the temp schema
> > should erase the zombie metadata of old temp tables or recreate the
> > temp schema.  That sounds easy.
> 
> If the new session makes use of the same temporary schema where the orphan
> table is, cleanup is possible. Now you have a problem if this is not available
> as this depends on the backend ID uniquely assigned.

Ouch, you're right.  If the new session "uses the temp schema," it has a chance to clean the old temp table metadata.
However,it won't help if the session doesn't try to use the temp schema by creating a temp table...
 


> It would be better
> to just drop the table manually at the end.

Just to solve this very incident, it's so.  But this is a bug, so we need to fix it somehow.

Regards
Takayuki Tsunakawa





Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Masahiko Sawada
Date:
On Thu, Jan 25, 2018 at 3:14 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
>
> FIX
> ====================================
>
> I have the following questions.  Along which line should I proceed to fix the problem?
>
> * Why does autovacuum launcher always choose only one database when that database need vacuuming for XID wraparound?
Shouldn'tit also choose other databases?
 

Yeah, I'd also like to fix this issue. This can be problem even in
other case; there are two databases that require anti-wraparound
vacuum, and one of them has a very large table that could take a long
time to vacuum. In this case, if autovacuum chooses the database
having big table first, another database would not be selected until
an autovacuum worker completes vacuum on the large table. To deal with
it, I think we can make autovacuum workers tell that it is no longer
necessary to launch a new autovacuum worker on the database to
autovacuum launcher before exit. For example, autovacuum worker
reports both the total number of relations and the number of relations
that require an anti-wraparound vacuum to the stats collector.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Thu, Jan 25, 2018 at 1:14 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> * I think temporary tables should not require vacuuming for XID wraparound.  Furtherover, should updates/deletes to
temporarytables  be in-place instead of creating garbage, so that any form of vacuum is unnecessary?  Other sessions do
notneed to read temporary tables. 

Temporary tables contain XIDs, so they need to be vacuumed for XID
wraparound.  Otherwise, queries against those tables by the session
that created them could yield wrong answers.  However, autovacuum
can't perform that vacuuming; it would have to be done by the session.
I think we should consider having backends try to remove their
temporary schema on startup; then, if a temp table in a backend is old
enough that it's due for vacuum for wraparound, have autovacuum kill
the connection.  The former is necessary to prevent sessions from
being killed on account of temp tables they "inherited" from a backend
that didn't exit cleanly.

The in-place update idea won't work for a couple of reasons.  First, a
command shouldn't see the results of modifications made earlier in the
same command.  Second, using cursors, it's possible to have more than
one distinct snapshot open against a temporary table at the same time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
> On Thu, Jan 25, 2018 at 3:14 PM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > * Why does autovacuum launcher always choose only one database when that
> database need vacuuming for XID wraparound?  Shouldn't it also choose other
> databases?
> 
> Yeah, I'd also like to fix this issue. This can be problem even in other
> case; there are two databases that require anti-wraparound vacuum, and one
> of them has a very large table that could take a long time to vacuum. In
> this case, if autovacuum chooses the database having big table first,
> another database would not be selected until an autovacuum worker completes
> vacuum on the large table. To deal with it, I think we can make autovacuum
> workers tell that it is no longer necessary to launch a new autovacuum worker
> on the database to autovacuum launcher before exit. For example, autovacuum
> worker reports both the total number of relations and the number of relations
> that require an anti-wraparound vacuum to the stats collector.

Thanks for commenting.  I believe you have deep knowledge and experience with vacuum because you did a great work for
freezemap in 9.6, so I appreciate your help!
 

How would you use those two counts?

How about just modifying do_start_worker(), so that the launcher chooses a database in the following order?

1. wraparound-risky database not being vacuumed by any worker
2. non-wraparound-risky database  not being vacuumed by any worker
3. wraparound-risky database being vacuumed by any worker
4. non-wraparound-risky database being vacuumed by any worker

Regards
Takayuki Tsunakawa



RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> I think we should consider having backends try to remove their temporary
> schema on startup; then, if a temp table in a backend is old enough that
> it's due for vacuum for wraparound, have autovacuum kill the connection.
> The former is necessary to prevent sessions from being killed on account
> of temp tables they "inherited" from a backend that didn't exit cleanly.

That seems to be the only reasonable solution.  One might feel it annoying to emit WAL during connection establishment
todelete the temp schema, but even now the client authentication can emit WAL for hot pruning while scanning
pg_database,pg_authid, etc.  Thanks.
 

> The in-place update idea won't work for a couple of reasons.  First, a
> command shouldn't see the results of modifications made earlier in the same
> command.  Second, using cursors, it's possible to have more than one
> distinct snapshot open against a temporary table at the same time.

You're right.  And if the transaction rolls back, it needs to see the old tuples, which requires undo log.

Regards
Takayuki Tsunakawa


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Masahiko Sawada
Date:
On Fri, Jan 26, 2018 at 2:22 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
>> On Thu, Jan 25, 2018 at 3:14 PM, Tsunakawa, Takayuki
>> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> > * Why does autovacuum launcher always choose only one database when that
>> database need vacuuming for XID wraparound?  Shouldn't it also choose other
>> databases?
>>
>> Yeah, I'd also like to fix this issue. This can be problem even in other
>> case; there are two databases that require anti-wraparound vacuum, and one
>> of them has a very large table that could take a long time to vacuum. In
>> this case, if autovacuum chooses the database having big table first,
>> another database would not be selected until an autovacuum worker completes
>> vacuum on the large table. To deal with it, I think we can make autovacuum
>> workers tell that it is no longer necessary to launch a new autovacuum worker
>> on the database to autovacuum launcher before exit. For example, autovacuum
>> worker reports both the total number of relations and the number of relations
>> that require an anti-wraparound vacuum to the stats collector.
>
> Thanks for commenting.  I believe you have deep knowledge and experience with vacuum because you did a great work for
freezemap in 9.6, so I appreciate your help!
 

Thanks. The most of hackers on the community have deep knowledge, so
I'd like to hear other opinion.

> How would you use those two counts?

What I thought is that a worker reports these two values after scanned
pg_class and after freezed a table. The launcher decides to launch a
new worker if the number of tables requiring anti-wraparound vacuum is
greater than the number of workers running on the database. Similarly,
the autovacuum launcher doesn't launch a new worker if two values are
equal, which means all tables requiring an anti-wraparound vacuum is
being vacuumed. There are chances that new relation is added during a
worker is running on the last one table that requires anti-wraparound
vacuum and launcher launches a new worker on the database. I think
it's no problem because the new worker would update that two values
and exits soon.

>
> How about just modifying do_start_worker(), so that the launcher chooses a database in the following order?
>
> 1. wraparound-risky database not being vacuumed by any worker
> 2. non-wraparound-risky database  not being vacuumed by any worker
> 3. wraparound-risky database being vacuumed by any worker
> 4. non-wraparound-risky database being vacuumed by any worker
>

IMO the limiting the number of worker on a database to 1 seems risky.
If a database has many tables that require an anti-wraparound vacuum,
it takes a long time to freeze the all of these tables. In current
implementation, as I mentioned as above, launcher can launch multiple
worker on the one database. Since the above idea would be complex a
bit, as an alternated idea it might be better to specify the number of
worker to launch per database by a new GUC parameter or something. If
the number of worker running on the database exceeds that limit, the
launcher doesn't select the database even if the database is about to
wraparound.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
> What I thought is that a worker reports these two values after scanned
> pg_class and after freezed a table. The launcher decides to launch a new
> worker if the number of tables requiring anti-wraparound vacuum is greater
> than the number of workers running on the database. Similarly, the
> autovacuum launcher doesn't launch a new worker if two values are equal,
> which means all tables requiring an anti-wraparound vacuum is being vacuumed.
> There are chances that new relation is added during a worker is running
> on the last one table that requires anti-wraparound vacuum and launcher
> launches a new worker on the database. I think it's no problem because the
> new worker would update that two values and exits soon.

I got it.  Currently, the launcher assigns all workers to one database even if that database has only one table in
dangerof wraparound.  With your suggestion, the launcher assigns as many workers as the tables to be frozen, and use
remainingworkers for the other databases.
 


> > How about just modifying do_start_worker(), so that the launcher chooses
> a database in the following order?
> >
> > 1. wraparound-risky database not being vacuumed by any worker 2.
> > non-wraparound-risky database  not being vacuumed by any worker 3.
> > wraparound-risky database being vacuumed by any worker 4.
> > non-wraparound-risky database being vacuumed by any worker
> >
> 
> IMO the limiting the number of worker on a database to 1 seems risky.
> If a database has many tables that require an anti-wraparound vacuum, it
> takes a long time to freeze the all of these tables. In current
> implementation, as I mentioned as above, launcher can launch multiple worker
> on the one database. 

I can understand your concern.  On the other hand, it's unfair that one database could monopolize all workers, because
otherdatabases might also be facing wraparound risk.
 

> Since the above idea would be complex a bit, as an
> alternated idea it might be better to specify the number of worker to launch
> per database by a new GUC parameter or something. If the number of worker
> running on the database exceeds that limit, the launcher doesn't select
> the database even if the database is about to wraparound.

I'm afraid the GUC would be difficult for the user to understand and tune.

I want to submit the patch for handling the garbage temporary table metadata as Robert suggested in the next CF.  That
shouldbe enough to prevent this customer's problem.  I would appreciate if anyone could address the other improvement
thatSawada-san proposed.
 

Regards
Takayuki Tsunakawa



Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Masahiko Sawada
Date:
On Mon, Jan 29, 2018 at 3:33 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
>> What I thought is that a worker reports these two values after scanned
>> pg_class and after freezed a table. The launcher decides to launch a new
>> worker if the number of tables requiring anti-wraparound vacuum is greater
>> than the number of workers running on the database. Similarly, the
>> autovacuum launcher doesn't launch a new worker if two values are equal,
>> which means all tables requiring an anti-wraparound vacuum is being vacuumed.
>> There are chances that new relation is added during a worker is running
>> on the last one table that requires anti-wraparound vacuum and launcher
>> launches a new worker on the database. I think it's no problem because the
>> new worker would update that two values and exits soon.
>
> I got it.  Currently, the launcher assigns all workers to one database even if that database has only one table in
dangerof wraparound.  With your suggestion, the launcher assigns as many workers as the tables to be frozen, and use
remainingworkers for the other databases. 
>
>
>> > How about just modifying do_start_worker(), so that the launcher chooses
>> a database in the following order?
>> >
>> > 1. wraparound-risky database not being vacuumed by any worker 2.
>> > non-wraparound-risky database  not being vacuumed by any worker 3.
>> > wraparound-risky database being vacuumed by any worker 4.
>> > non-wraparound-risky database being vacuumed by any worker
>> >
>>
>> IMO the limiting the number of worker on a database to 1 seems risky.
>> If a database has many tables that require an anti-wraparound vacuum, it
>> takes a long time to freeze the all of these tables. In current
>> implementation, as I mentioned as above, launcher can launch multiple worker
>> on the one database.
>
> I can understand your concern.  On the other hand, it's unfair that one database could monopolize all workers,
becauseother databases might also be facing wraparound risk. 

On third thought, we can change the policy of launching workers so
that the launcher dispatches workers evenly to wraparound-risky
databases instead of choosing only one most wraparound-risky database.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
> On Mon, Jan 29, 2018 at 3:33 PM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > I can understand your concern.  On the other hand, it's unfair that one
> database could monopolize all workers, because other databases might also
> be facing wraparound risk.
> 
> On third thought, we can change the policy of launching workers so that
> the launcher dispatches workers evenly to wraparound-risky databases
> instead of choosing only one most wraparound-risky database.

+1

Regards
Takayuki Tsunakawa


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Mon, Jan 29, 2018 at 1:33 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
>> Since the above idea would be complex a bit, as an
>> alternated idea it might be better to specify the number of worker to launch
>> per database by a new GUC parameter or something. If the number of worker
>> running on the database exceeds that limit, the launcher doesn't select
>> the database even if the database is about to wraparound.
>
> I'm afraid the GUC would be difficult for the user to understand and tune.

I agree.  It's autovacuum's job to do the correct thing.  If it's not,
then we need figure out how to make it do the right thing.  Adding a
GUC seems like saying we don't know what the right thing to do is but
we hope the user does know.  That's not a good answer.

Unfortunately, I think a full solution to the problem of allocating AV
workers to avoid wraparound is quite complex.  Suppose that database A
will force a shutdown due to impending wraparound in 4 hours and
database B will force a shutdown in 12 hours.  On first blush, it
seems that we should favor adding workers to A.  But it might be that
database A needs only 2 hours of vacuuming to avoid a shutdown whereas
B needs 12 hours.  In that case, it seems that we ought to instead
favor adding workers to B.   However, it might be the case that A has
more table coming do for wraparound 6 hours from now, and we need
another 15 hours of vacuuming to avoid that shutdown.  That would
favor adding workers to A.  Then again, it might be that A and B
already both have workers, and that adding yet another worker to A
won't speed anything up (because only large tables remain to be
processed and each has a worker already), whereas adding a worker to B
would speed things up (because it still has a big table that we could
immediately start to vacuum for wraparound).  In that case, perhaps we
ought to instead add a worker to B.  But, thinking some more, it seems
like that should cause autovacuum_vacuum_cost_limit to be reallocated
among the workers, making the existing vacuum processes take longer,
which might actually make a bad situation worse.  It seems possible
that the right answer could be to start no new autovacuum worker at
all.

Given all of the foregoing this seems like a very hard problem.  I
can't even articulate a clear set of rules for what our priorities
should be, and it seems that such rules would depend on the rate at
which we're consuming XIDs, how close we are in each database to a
wraparound shutdown, what tables exist in each database, how big the
not-all-frozen part of each one is, how big their indexes are, how
much they're holding back relfrozenxid, and which ones already have
workers, among other things.  I think it's quite possible that we can
come up with something that's better than what we have now without
embarking on a huge project, but it's not going to be anywhere near
perfect because this is really complicated, and there's a real risk
that we'll just making some cases better and others worse rather than
actually coming out ahead overall.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> Unfortunately, I think a full solution to the problem of allocating AV
> workers to avoid wraparound is quite complex.

Yes, that easily puts my small brain into an infinite loop...

> Given all of the foregoing this seems like a very hard problem.  I can't
> even articulate a clear set of rules for what our priorities should be,
> and it seems that such rules would depend on the rate at which we're consuming
> XIDs, how close we are in each database to a wraparound shutdown, what tables
> exist in each database, how big the not-all-frozen part of each one is,
> how big their indexes are, how much they're holding back relfrozenxid, and
> which ones already have workers, among other things.  I think it's quite
> possible that we can come up with something that's better than what we have
> now without embarking on a huge project, but it's not going to be anywhere
> near perfect because this is really complicated, and there's a real risk
> that we'll just making some cases better and others worse rather than
> actually coming out ahead overall.

So a simple improvement would be to assign workers fairly to databases facing a wraparound risk, as Sawada-san
suggested.

One ultimate solution should be the undo-based MVCC that makes vacuuming unnecessary, which you proposed about a year
ago...

Regards
Takayuki Tsunakawa



Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Tue, Jan 30, 2018 at 7:04 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> So a simple improvement would be to assign workers fairly to databases facing a wraparound risk, as Sawada-san
suggested.

Is that always an improvement, or does it make some cases better and
others worse?

> One ultimate solution should be the undo-based MVCC that makes vacuuming unnecessary, which you proposed about a year
ago...

And blogged about yesterday.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Masahiko Sawada
Date:
On Thu, Feb 1, 2018 at 2:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jan 30, 2018 at 7:04 PM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> So a simple improvement would be to assign workers fairly to databases facing a wraparound risk, as Sawada-san
suggested.
>
> Is that always an improvement, or does it make some cases better and
> others worse?

I think the idea would not be an improvement, but just change the
policy. The current launcher's policy is "let's launch a new worker as
much as possible on the database that is at risk of wraparound most".
The idea I suggested makes the cases mentioned on this thread better
while perhaps making other cases worse.

To improve while keeping the current policy, we might want to use the
first idea I proposed. That is, we don't  launch a new worker on a
database impending wraparound if the last table of the database is
being vacuumed. But it needs to share new information such as what
tables exist in each database and which tables already have worker. It
might be overkill in order to deal with only such a corner case
though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Wed, Jan 31, 2018 at 7:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think the idea would not be an improvement, but just change the
> policy. The current launcher's policy is "let's launch a new worker as
> much as possible on the database that is at risk of wraparound most".
> The idea I suggested makes the cases mentioned on this thread better
> while perhaps making other cases worse.
>
> To improve while keeping the current policy, we might want to use the
> first idea I proposed. That is, we don't  launch a new worker on a
> database impending wraparound if the last table of the database is
> being vacuumed. But it needs to share new information such as what
> tables exist in each database and which tables already have worker. It
> might be overkill in order to deal with only such a corner case
> though.

Something like that might work, but I think it needs more thought.
Maybe, for each database currently being processed by at least 1
worker, advertise in shared memory the oldest XID that isn't already
being vacuumed by some AV worker; when considering which database is
at greatest risk of wraparound, if that value is available, use it
instead of the database's datfrozenxid.  Then when all tables that
make that database riskier than some other database already have
workers, some other database can get a chance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Masahiko Sawada
Date:
On Fri, Feb 2, 2018 at 12:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 7:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I think the idea would not be an improvement, but just change the
>> policy. The current launcher's policy is "let's launch a new worker as
>> much as possible on the database that is at risk of wraparound most".
>> The idea I suggested makes the cases mentioned on this thread better
>> while perhaps making other cases worse.
>>
>> To improve while keeping the current policy, we might want to use the
>> first idea I proposed. That is, we don't  launch a new worker on a
>> database impending wraparound if the last table of the database is
>> being vacuumed. But it needs to share new information such as what
>> tables exist in each database and which tables already have worker. It
>> might be overkill in order to deal with only such a corner case
>> though.
>
> Something like that might work, but I think it needs more thought.
> Maybe, for each database currently being processed by at least 1
> worker, advertise in shared memory the oldest XID that isn't already
> being vacuumed by some AV worker; when considering which database is
> at greatest risk of wraparound, if that value is available, use it
> instead of the database's datfrozenxid.  Then when all tables that
> make that database riskier than some other database already have
> workers, some other database can get a chance.
>

Thank you for suggestion. It sounds more smarter. So it would be more
better if we vacuums database for anti-wraparound in ascending order
of relfrozenxid?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
> Thank you for suggestion. It sounds more smarter. So it would be more better
> if we vacuums database for anti-wraparound in ascending order of
> relfrozenxid?

I thought so, too.  The current behavior is inconsistent: the launcher tries to assign all workers to one database with
thebiggest wraparound risk in order to eliminate the risk as fast as possible, but the workers don't get hurry to
reducethe risk.
 

Regards
Takayuki Tsunakawa



Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you for suggestion. It sounds more smarter. So it would be more
> better if we vacuums database for anti-wraparound in ascending order
> of relfrozenxid?

Currently, we're doing it based on datfrozenxid.  I was looking for a
small, relatively safe change to improve things, which led to my
proposal: continue using datfrozenxid when the database isn't being
vacuumed, but when it is, substitute the datfrozenxid that we expect
the database *will have* after the tables currently being vacuumed are
finished for the actual datfrozenxid.

On further reflection, though, I see problems.  Suppose db1 has
age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m
and 400m.  db2 has age(datfrozenxid) = 500m.  Then suppose #1 starts
vacuuming the table with age 600m.  The AV launcher sees that, with
that table out of the way, we'll be able to bring datfrozenxid forward
to 400m and so sends worker #2 to db2.  But actually, when the worker
in db1 finishes vacuuming the table with age 600m, it's going to have
to vacuum the table with age 400m before performing any relfrozenxid
update.  So actually, the table with age 400m is holding the
relfrozenxid at 600m just as much as if it too had a relfrozenxid of
600m.  In fact, any tables in the same database that need routine
vacuuming are a problem, too: we're going to vacuum all of those
before updating relfrozenxid, too.

Maybe we should start by making the scheduling algorithm used by the
individual workers smarter:

1. Let's teach do_autovacuum() that, when there are any tables needing
vacuum for wraparound, either for relfrozenxid or relminmxid, it
should vacuum only those and forget about the rest.  This seems like
an obvious optimization to prevent us from delaying
datfrozenxid/datminmxid updates for the sake of vacuuming tables that
are "merely" bloated.

2. Let's have do_autovacuum() sort the tables to be vacuumed in
ascending order by relfrozenxid, so older tables are vacuumed first.
A zero-order idea is to put tables needing relfrozenxid vacuuming
before tables needing relminmxid vacuuming, and sort the latter by
ascending relminmxid, but maybe there's something smarter possible
there.  The idea here is to vacuum tables in order of priority rather
than in whatever order they happen to be physically mentioned in
pg_class.

3. Let's have do_autovacuum() make an extra call to
vac_update_datfrozenxid() whenever the next table to be vacuumed is at
least 10 million XIDs (or MXIDs) newer than the first one it vacuumed
either since the last call to vac_update_datfrozenxid() or, if none,
since it started.  That way, we don't have to wait for every single
table to get vacuumed before we can consider advancing
relfrozenxid/relminmxid.

4. When it's performing vacuuming for wraparound, let's have AV
workers advertise in shared memory the oldest relfrozenxid and
relminmxid that it might exist in the database.  Given #1 and #2, this
is pretty easy, since we start by moving through tables in increasing
relfrozenxid order and then shift to moving through them in increasing
relminmxid order.  When we're working through the relfrozenxid tables,
the oldest relminmxid doesn't move, and the oldest relfrozenxid is
that of the next table in the list.  When we're working through the
relminmxid tables, it's the reverse.  We need a little cleverness to
figure out what value to advertise when we're on the last table in
each list -- it should be the next-higher value, even though that will
be above the relevant threshold, not a sentinel value.

5. With those steps in place, I think we can now adopt my previous
idea to have the AV launcher use any advertised relfrozenxid (and, as
I now realize, relminmxid) instead of the ones found in pg_database,
because now we know that individual workers are definitely focused on
getting relfrozenxid (and/or relminmxid) as soon as possible, and
vacuuming unrelated tables won't help them do it any faster.

This gets us fairly close to vacuuming tables in decreasing order of
wraparound danger across the entire cluster.  It's not perfect.  It
prefers to keep vacuuming tables in the same database rather than
having a worker exit and maybe launching a new one in a different
database -- but the alternative is not very appealing.  If we didn't
do it that way, and if we had a million tables with XIDs that were
closely spaced spread across different databases, we'd have to
terminate and relaunching workers at a very high rate to get
everything sorted out, which would be inefficient and annoying to
program.  Also, it keeps the existing hard prioritization of
relfrozenxid over relminmxid, which could theoretically be wrong for
some installation.  But I think that might not be a big problem in
practice, and it seems like that could be separately improved at
another time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> Temporary tables contain XIDs, so they need to be vacuumed for XID
> wraparound.  Otherwise, queries against those tables by the session
> that created them could yield wrong answers.  However, autovacuum
> can't perform that vacuuming; it would have to be done by the session.
> I think we should consider having backends try to remove their
> temporary schema on startup; then, if a temp table in a backend is old
> enough that it's due for vacuum for wraparound, have autovacuum kill
> the connection.  The former is necessary to prevent sessions from
> being killed on account of temp tables they "inherited" from a backend
> that didn't exit cleanly.

The attached patch does the former.  The small change in autovacuum.c is mainly for autovac launcher and background
workerswhich don't connect to a database.  I'll add this to the next CF.  I'd like this to be back-patched.
 

I didn't do the latter, because killing the connection seemed a bit overkill.  If we're going to do it, then we should
alsokill the connection which is preventing vacuum regardless of whether it has temporary tables in its session.
 

Regards
Takayuki Tsunakawa




Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Sun, Feb 4, 2018 at 10:10 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> The attached patch does the former.  The small change in autovacuum.c is mainly for autovac launcher and background
workerswhich don't connect to a database.  I'll add this to the next CF.  I'd like this to be back-patched.
 

This is really two separate changes:

1. Teach backends to remove any leftover pg_temp_%d schema on startup.

2. Teach autovacuum to remove any leftover tables in a pg_temp_%d
schema if the backend is active but in some other database (rather
than only when the backend is not active at all).

Both of those changes seem broadly reasonable to me, but what do other
people think?

#1 is bound to have some small impact on backend startup time, but
it's probably negligible (one extra syscache lookup, in the common
case).  I guess one problem is that if the cleanout of the old
pg_temp_%d schema should fail, then the backend will probably end up
exiting.  So it would be really bad if you had catalog corruption
preventing the removal of pg_temp_2, because then every time you
connect, it will try to remove that schema, fail, and disconnect you.
On the other hand, there's plenty of kinds of catalog corruption that
can prevent a database connection already.  It would be nice if we
could think of a way to skip this cleanup if the previous backend
failed to complete it, but I don't see a way to do that off-hand.  I
think we should at least skip it if we're in single-user mode, though.

I don't see any problem with #2, although that part of the patch could
use a little cosmetic work.

I would not be inclined to back-patch these changes, or at least not
farther than v10.  Before v10, we only tried to drop temporary tables
if they were old enough to threaten wraparound, so we've already made
this much more aggressive than it used to be.  I think making it more
aggressive in the ways mentioned above is a good idea, but I'm not
sure that the failure to do so constitutes a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Tue, Feb 06, 2018 at 11:33:59AM -0500, Robert Haas wrote:
> This is really two separate changes:
>
> 1. Teach backends to remove any leftover pg_temp_%d schema on
> startup.

I am not sure that we would like to give up that easily the property
that we have now to clean up past temporary files only at postmaster
startup and only when not in recovery.  If you implement that, there is
a risk that the backend you are starting is eating the connection slot
and by consequence its temporary schema and its set of temporary tables
on which one may want to look into after a crash.

> 2. Teach autovacuum to remove any leftover tables in a pg_temp_%d
> schema if the backend is active but in some other database (rather
> than only when the backend is not active at all).

Yeah.  Here we can do something.  This does not sound much difficult to
me.
--
Michael

Attachment

RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com]
> I am not sure that we would like to give up that easily the property that
> we have now to clean up past temporary files only at postmaster startup
> and only when not in recovery.  If you implement that, there is a risk that
> the backend you are starting is eating the connection slot and by consequence
> its temporary schema and its set of temporary tables on which one may want
> to look into after a crash.

postmaster deletes temporary relation files at startup by calling RemovePgTempFiles() regardless of whether it's in
recovery. It doesn't call that function during auto restart after a crash when restart_after_crash is on.
 


> > 2. Teach autovacuum to remove any leftover tables in a pg_temp_%d
> > schema if the backend is active but in some other database (rather
> > than only when the backend is not active at all).
> 
> Yeah.  Here we can do something.  This does not sound much difficult to
> me.

I did that in my patch.

Regards
Takayuki Tsunakawa




Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Wed, Feb 07, 2018 at 12:37:50AM +0000, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:michael.paquier@gmail.com]
>> I am not sure that we would like to give up that easily the property that
>> we have now to clean up past temporary files only at postmaster startup
>> and only when not in recovery.  If you implement that, there is a risk that
>> the backend you are starting is eating the connection slot and by consequence
>> its temporary schema and its set of temporary tables on which one may want
>> to look into after a crash.
>
> postmaster deletes temporary relation files at startup by calling
> RemovePgTempFiles() regardless of whether it's in recovery.  It
> doesn't call that function during auto restart after a crash when
> restart_after_crash is on.

The comment on top of RemovePgTempFiles() states the following:
 * NOTE: we could, but don't, call this during a post-backend-crash restart
 * cycle.  The argument for not doing it is that someone might want to examine
 * the temp files for debugging purposes.  This does however mean that
 * OpenTemporaryFile had better allow for collision with an existing temp
 * file name.

>> > 2. Teach autovacuum to remove any leftover tables in a pg_temp_%d
>> > schema if the backend is active but in some other database (rather
>> > than only when the backend is not active at all).
>>
>> Yeah.  Here we can do something.  This does not sound much difficult to
>> me.
>
> I did that in my patch.

Nice to hear that.  Please note that I did not check your patch, so I
cannot conclude on its correctness in details.
--
Michael

Attachment

RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com]
> > postmaster deletes temporary relation files at startup by calling
> > RemovePgTempFiles() regardless of whether it's in recovery.  It
> > doesn't call that function during auto restart after a crash when
> > restart_after_crash is on.
> 
> The comment on top of RemovePgTempFiles() states the following:
>  * NOTE: we could, but don't, call this during a post-backend-crash restart
>  * cycle.  The argument for not doing it is that someone might want to
> examine
>  * the temp files for debugging purposes.  This does however mean that
>  * OpenTemporaryFile had better allow for collision with an existing temp
>  * file name.

Yes, I saw that comment.  postmaster keeps orphaned temp relation files only after a crash when restart_after_crash is
on.


> Nice to hear that.  Please note that I did not check your patch, so I cannot
> conclude on its correctness in details.

I thought so.  Don't mind.

Regards
Takayuki Tsunakawa




Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Tue, Feb 6, 2018 at 6:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I am not sure that we would like to give up that easily the property
> that we have now to clean up past temporary files only at postmaster
> startup and only when not in recovery.  If you implement that, there is
> a risk that the backend you are starting is eating the connection slot
> and by consequence its temporary schema and its set of temporary tables
> on which one may want to look into after a crash.

You can't actually do that from SQL, because as soon as you try to
access your pg_temp schema, it'll be cleared out.  All this changes
does is move that forward from time-of-first-access to session start.
That's a significant change that needs discussion, but I don't think
it has the effect you are supposing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Masahiko Sawada
Date:
On Sat, Feb 3, 2018 at 1:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Thank you for suggestion. It sounds more smarter. So it would be more
>> better if we vacuums database for anti-wraparound in ascending order
>> of relfrozenxid?
>
> Currently, we're doing it based on datfrozenxid.  I was looking for a
> small, relatively safe change to improve things, which led to my
> proposal: continue using datfrozenxid when the database isn't being
> vacuumed, but when it is, substitute the datfrozenxid that we expect
> the database *will have* after the tables currently being vacuumed are
> finished for the actual datfrozenxid.
>
> On further reflection, though, I see problems.  Suppose db1 has
> age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m
> and 400m.  db2 has age(datfrozenxid) = 500m.  Then suppose #1 starts
> vacuuming the table with age 600m.  The AV launcher sees that, with
> that table out of the way, we'll be able to bring datfrozenxid forward
> to 400m and so sends worker #2 to db2.  But actually, when the worker
> in db1 finishes vacuuming the table with age 600m, it's going to have
> to vacuum the table with age 400m before performing any relfrozenxid
> update.  So actually, the table with age 400m is holding the
> relfrozenxid at 600m just as much as if it too had a relfrozenxid of
> 600m.  In fact, any tables in the same database that need routine
> vacuuming are a problem, too: we're going to vacuum all of those
> before updating relfrozenxid, too.
>
> Maybe we should start by making the scheduling algorithm used by the
> individual workers smarter:
>
> 1. Let's teach do_autovacuum() that, when there are any tables needing
> vacuum for wraparound, either for relfrozenxid or relminmxid, it
> should vacuum only those and forget about the rest.  This seems like
> an obvious optimization to prevent us from delaying
> datfrozenxid/datminmxid updates for the sake of vacuuming tables that
> are "merely" bloated.
>
> 2. Let's have do_autovacuum() sort the tables to be vacuumed in
> ascending order by relfrozenxid, so older tables are vacuumed first.
> A zero-order idea is to put tables needing relfrozenxid vacuuming
> before tables needing relminmxid vacuuming, and sort the latter by
> ascending relminmxid, but maybe there's something smarter possible
> there.  The idea here is to vacuum tables in order of priority rather
> than in whatever order they happen to be physically mentioned in
> pg_class.
>
> 3. Let's have do_autovacuum() make an extra call to
> vac_update_datfrozenxid() whenever the next table to be vacuumed is at
> least 10 million XIDs (or MXIDs) newer than the first one it vacuumed
> either since the last call to vac_update_datfrozenxid() or, if none,
> since it started.  That way, we don't have to wait for every single
> table to get vacuumed before we can consider advancing
> relfrozenxid/relminmxid.
>
> 4. When it's performing vacuuming for wraparound, let's have AV
> workers advertise in shared memory the oldest relfrozenxid and
> relminmxid that it might exist in the database.  Given #1 and #2, this
> is pretty easy, since we start by moving through tables in increasing
> relfrozenxid order and then shift to moving through them in increasing
> relminmxid order.  When we're working through the relfrozenxid tables,
> the oldest relminmxid doesn't move, and the oldest relfrozenxid is
> that of the next table in the list.  When we're working through the
> relminmxid tables, it's the reverse.  We need a little cleverness to
> figure out what value to advertise when we're on the last table in
> each list -- it should be the next-higher value, even though that will
> be above the relevant threshold, not a sentinel value.

So I think we should includes tables as well that are not at risk of
wraparound in order to get the next-higher value (that is, the oldest
table of the non-risked tables) instead of forgetting them. And then
we skip vacuuming them.

>
> 5. With those steps in place, I think we can now adopt my previous
> idea to have the AV launcher use any advertised relfrozenxid (and, as
> I now realize, relminmxid) instead of the ones found in pg_database,
> because now we know that individual workers are definitely focused on
> getting relfrozenxid (and/or relminmxid) as soon as possible, and
> vacuuming unrelated tables won't help them do it any faster.
>
> This gets us fairly close to vacuuming tables in decreasing order of
> wraparound danger across the entire cluster.  It's not perfect.  It
> prefers to keep vacuuming tables in the same database rather than
> having a worker exit and maybe launching a new one in a different
> database -- but the alternative is not very appealing.  If we didn't
> do it that way, and if we had a million tables with XIDs that were
> closely spaced spread across different databases, we'd have to
> terminate and relaunching workers at a very high rate to get
> everything sorted out, which would be inefficient and annoying to
> program.  Also, it keeps the existing hard prioritization of
> relfrozenxid over relminmxid, which could theoretically be wrong for
> some installation.  But I think that might not be a big problem in
> practice, and it seems like that could be separately improved at
> another time.
>

I think this algorithm works fine and improves the current behavior.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> This is really two separate changes:

> 1. Teach backends to remove any leftover pg_temp_%d schema on startup.

> 2. Teach autovacuum to remove any leftover tables in a pg_temp_%d
> schema if the backend is active but in some other database (rather
> than only when the backend is not active at all).

> Both of those changes seem broadly reasonable to me, but what do other
> people think?

I just read through this thread for the first time; sorry for not
paying attention sooner.

I'm uncomfortable with all the discussion of changing the autovacuum
launcher's algorithm; all of that seems complicated and unsure of making
anyone's life better.  It definitely does nothing for the problem
originally stated, that autovacuum is skipping an orphaned temp table
altogether.  That's not relevant to the actually proposed patch;
I'm just saying that I'm not sure anything useful would come of that.

Now as for the problem originally stated, step 1 alone doesn't fix it,
and there's reason not to like that change much.  Forcing backends to
clear their temp schemas immediately on connection will slow down
connection times, and for applications that never make any temp tables,
that's just a dead loss (though admittedly it might not be too expensive
in that case).  But the problem here is that it doesn't fix the issue.
The reason that a temp table might stay orphaned for a long time, if we're
speaking of an app that does use temp tables, is that it's in a very
high-numbered temp schema and only once in a blue moon does the database
have enough connections for that temp schema to be used at all.  Forcing
on-connection cleaning doesn't fix that.

So the correct fix is to improve autovacuum's check to discover whether
an old temp table is orphaned, so that it isn't fooled by putative
owner processes that are connected to some other DB.  Step 2 of the
proposed patch tries to do that, but it's only half right: we need a
change near line 2264 not only line 2090.  (Evidently, we need either
a comment that the logic is repeated, or else refactor so that there's
only one copy.)

Now, you can argue that autovacuum's check can be fooled by an "owner"
backend that is connected to the current DB but hasn't actually taken
possession of its assigned temp schema (and hence the table in question
really is orphaned after all).  This edge case could be taken care of by
having backends clear their temp schema immediately, as in step 1 of the
patch.  But I still think that that is an expensive way to catch what
would be a really infrequent case.  Perhaps a better idea is to have
backends advertise in PGPROC whether they have taken possession of their
assigned temp schema, and then autovacuum could ignore putative owners
that aren't showing that flag.  Or we could just do nothing about that,
on the grounds that nobody has shown the case to be worth worrying about.
Temp schemas that are sufficiently low-numbered to be likely to have
an apparent owner are also likely to get cleaned out by actual temp
table creation reasonably often, so I'm not at all convinced that we
need a mechanism that covers this case.  We do need something for the
cross-DB case though, because even a very low-numbered temp schema
can be problematic if it's in a seldom-used DB, which seems to be the
case that was complained of originally.

On the whole, my vote is to fix and apply step 2, and leave it at that.

            regards, tom lane


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> I just read through this thread for the first time; sorry for not paying
> attention sooner.

Don't mind, please.  It's very happy that you gave attention now.


> I'm uncomfortable with all the discussion of changing the autovacuum
> launcher's algorithm; all of that seems complicated and unsure of making
> anyone's life better.  It definitely does nothing for the problem
> originally stated, that autovacuum is skipping an orphaned temp table
> altogether.  That's not relevant to the actually proposed patch; I'm just
> saying that I'm not sure anything useful would come of that.

Yes.  Sawada-san is addressing the autovacuum launcher algorithm in another thread.




> The reason that a temp table might stay orphaned for a long time, if we're
> speaking of an app that does use temp tables, is that it's in a very
> high-numbered temp schema and only once in a blue moon does the database
> have enough connections for that temp schema to be used at all.  Forcing
> on-connection cleaning doesn't fix that.

Uh, you're right.


> So the correct fix is to improve autovacuum's check to discover whether
> an old temp table is orphaned, so that it isn't fooled by putative owner
> processes that are connected to some other DB.  Step 2 of the proposed patch
> tries to do that, but it's only half right: we need a change near line 2264
> not only line 2090.  (Evidently, we need either a comment that the logic

> is repeated, or else refactor so that there's only one copy.)

I see...


> Now, you can argue that autovacuum's check can be fooled by an "owner"
> backend that is connected to the current DB but hasn't actually taken
> possession of its assigned temp schema (and hence the table in question
> really is orphaned after all).  This edge case could be taken care of by
> having backends clear their temp schema immediately, as in step 1 of the
> patch.  But I still think that that is an expensive way to catch what would
> be a really infrequent case.  Perhaps a better idea is to have backends
> advertise in PGPROC whether they have taken possession of their assigned
> temp schema, and then autovacuum could ignore putative owners that aren't
> showing that flag.

That autovacuum-driven approach sounds interesting.  But it seems to require some new locking to prevent a race
conditionbetween the autovacuum launcher and the backend:  autovacuum launcher thinks that a temp schema is not owned
byanyone, then a backend beings to use that temp schema, and autovacuum launcher deletes temp tables in that schema.
 



> Or we could just do nothing about that, on the grounds
> that nobody has shown the case to be worth worrying about.
> Temp schemas that are sufficiently low-numbered to be likely to have an
> apparent owner are also likely to get cleaned out by actual temp table
> creation reasonably often, so I'm not at all convinced that we need a
> mechanism that covers this case.  We do need something for the cross-DB
> case though, because even a very low-numbered temp schema can be problematic
> if it's in a seldom-used DB, which seems to be the case that was complained
> of originally.

Yes, the original problem was the low-numbered temp schema (pg_temp_3) which pg_rewind had used.  So we want to rescue
thiscase.
 


> On the whole, my vote is to fix and apply step 2, and leave it at that.

Regards
Takayuki Tsunakawa




Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Tue, Mar 6, 2018 at 6:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now as for the problem originally stated, step 1 alone doesn't fix it,
> and there's reason not to like that change much.  Forcing backends to
> clear their temp schemas immediately on connection will slow down
> connection times, and for applications that never make any temp tables,
> that's just a dead loss (though admittedly it might not be too expensive
> in that case).

I think that's a little short-sighted.  I think we really want temp
tables of no-longer-running backends to go away as soon as possible;
that should be viewed as a gain in and of itself.  One, it saves disk
space.  Two, it prevents them from causing wraparound problems.  I
believe we've had people complain about both, but definitely the
latter.

> Now, you can argue that autovacuum's check can be fooled by an "owner"
> backend that is connected to the current DB but hasn't actually taken
> possession of its assigned temp schema (and hence the table in question
> really is orphaned after all).  This edge case could be taken care of by
> having backends clear their temp schema immediately, as in step 1 of the
> patch.  But I still think that that is an expensive way to catch what
> would be a really infrequent case.

I think we should try to do something about this case -- if not now,
then later.  I agree that it would be better if we could get
autovacuum to do it instead of doing it in the foreground.  I don't
really share your concern about performance; one extra syscache lookup
at backend startup isn't going to break the bank.  Rather, I'm
concerned about reliability.  As I said upthread:

"So it would be really bad if you had catalog corruption
preventing the removal of pg_temp_2, because then every time you
connect, it will try to remove that schema, fail, and disconnect you."

Now granted your database *shouldn't* have catalog corruption, but a
lot of things that shouldn't happen sometimes do, and it's better when
those problem don't cause cascading failures.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 6, 2018 at 6:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now as for the problem originally stated, step 1 alone doesn't fix it,
>> and there's reason not to like that change much.  Forcing backends to
>> clear their temp schemas immediately on connection will slow down
>> connection times, and for applications that never make any temp tables,
>> that's just a dead loss (though admittedly it might not be too expensive
>> in that case).

> I think that's a little short-sighted.  I think we really want temp
> tables of no-longer-running backends to go away as soon as possible;
> that should be viewed as a gain in and of itself.

We already have autovacuum taking care of that, and as I stated, asking
backends to do it is provably insufficient.  The right path is to make
autovacuum cover the cases it's missing today.

> I don't
> really share your concern about performance; one extra syscache lookup
> at backend startup isn't going to break the bank.

If it were just that, I wouldn't be worried either.  But it's not.
Once the pg_namespace row exists, which it will in an installation
that's been in use for for any length of time, you're going to have
to actively search for entries in that schema.  I'm not sure how
expensive a performDeletion() scan that finds nothing to do really
is, but for sure it's more than the syscache lookup you expended to
find the pg_namespace row.

It's perhaps also worth reminding people that this whole discussion
pertains only to crash recovery; if the previous owner of the temp
schema had exited cleanly, it would've cleaned things up.  I'm unexcited
about adding overhead for crash recovery to the normal connection
code path when it's not necessary, and even more so when it's neither
necessary nor sufficient.

            regards, tom lane


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Robert Haas
Date:
On Thu, Mar 8, 2018 at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We already have autovacuum taking care of that, and as I stated, asking
> backends to do it is provably insufficient.  The right path is to make
> autovacuum cover the cases it's missing today.

Well, your counter-proposal of teaching autovacuum to recognize the
case where the backend and the orphaned schema are in different
databases is also provably insufficient.  We have to make more than
one change here to really fix this, and the two changes that
Tsunakawa-san proposed, taken together, are sufficient; either one by
itself is not.  Doing only #2 out of his proposals, as you proposed,
is better than doing nothing, but it's not a complete fix.

>> I don't
>> really share your concern about performance; one extra syscache lookup
>> at backend startup isn't going to break the bank.
>
> If it were just that, I wouldn't be worried either.  But it's not.
> Once the pg_namespace row exists, which it will in an installation
> that's been in use for for any length of time, you're going to have
> to actively search for entries in that schema.  I'm not sure how
> expensive a performDeletion() scan that finds nothing to do really
> is, but for sure it's more than the syscache lookup you expended to
> find the pg_namespace row.

True, but that's a rare scenario.

Still, I'm not sure we're too far apart on the underlying issue here;
we both would prefer, for one reason or another, if autovacuum could
just take care of this.  But I think that Tsunakawa-san is correct to
guess that your proposal might have some race conditions that need to
be solved somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> So the correct fix is to improve autovacuum's check to discover whether
> an old temp table is orphaned, so that it isn't fooled by putative owner
> processes that are connected to some other DB.  Step 2 of the proposed patch
> tries to do that, but it's only half right: we need a change near line 2264
> not only line 2090.  (Evidently, we need either a comment that the logic
> is repeated, or else refactor so that there's only one copy.)
> 
> Now, you can argue that autovacuum's check can be fooled by an "owner"
> backend that is connected to the current DB but hasn't actually taken
> possession of its assigned temp schema (and hence the table in question
> really is orphaned after all).  This edge case could be taken care of by
> having backends clear their temp schema immediately, as in step 1 of the
> patch.  But I still think that that is an expensive way to catch what would
> be a really infrequent case.  Perhaps a better idea is to have backends
> advertise in PGPROC whether they have taken possession of their assigned
> temp schema, and then autovacuum could ignore putative owners that aren't
> showing that flag.  Or we could just do nothing about that, on the grounds
> that nobody has shown the case to be worth worrying about.
> Temp schemas that are sufficiently low-numbered to be likely to have an
> apparent owner are also likely to get cleaned out by actual temp table
> creation reasonably often, so I'm not at all convinced that we need a
> mechanism that covers this case.  We do need something for the cross-DB
> case though, because even a very low-numbered temp schema can be problematic
> if it's in a seldom-used DB, which seems to be the case that was complained
> of originally.
> 
> On the whole, my vote is to fix and apply step 2, and leave it at that.

Done.  It seems to work well.

Regards
Takayuki Tsunakawa


Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Tue, Mar 13, 2018 at 08:08:48AM +0000, Tsunakawa, Takayuki wrote:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> On the whole, my vote is to fix and apply step 2, and leave it at that.

Yeah, I have been thinking about the idea 1 mentioned above, or in short
clean up the temporary namespace at connection start instead of
first-use of it, and while that would make the cleanup more aggressive,
it could be possible as well that only having autovacuum do the work
could be enough, so I am +-0 on this idea.

> Done.  It seems to work well.

I have looked at the patch proposed.

+   /* Does the backend own the temp schema? */
+   if (proc->tempNamespaceId != namespaceID)
+       return false;
I have a very hard time believing that this is safe lock-less, and a
spin lock would be enough it seems.

+   /* Is the backend connected to this database? */
+   if (proc->databaseId != MyDatabaseId)
+       return false;
Wouldn't it be more interesting to do this cleanup as well if the
backend is *not* connected to the database autovacuum'ed?  This would
make the cleanup more aggresive, which is better.
--
Michael

Attachment

RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> +   /* Does the backend own the temp schema? */
> +   if (proc->tempNamespaceId != namespaceID)
> +       return false;
> I have a very hard time believing that this is safe lock-less, and a spin
> lock would be enough it seems.

The lwlock in BackendIdGetProc() flushes the CPU cache so that proc->tempNamespaceId reflects the latest value.  Or, do
youmean another spinlock elsewhere?
 


> +   /* Is the backend connected to this database? */
> +   if (proc->databaseId != MyDatabaseId)
> +       return false;
> Wouldn't it be more interesting to do this cleanup as well if the backend
> is *not* connected to the database autovacuum'ed?  This would make the
> cleanup more aggresive, which is better.

IIUC, the above code does what you suggest.  proc->databaseId is the database the client is connected to, and
MyDatabaseIdis the database being autovacuumed (by this autovacuum worker.)
 


Regards
Takayuki Tsunakawa





Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Kyotaro HORIGUCHI
Date:
Hello.

At Wed, 18 Jul 2018 07:34:10 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in
<0A3221C70F24FB45833433255569204D1FA538FD@G01JPEXMBYT05>
> From: Michael Paquier [mailto:michael@paquier.xyz]
> > +   /* Does the backend own the temp schema? */
> > +   if (proc->tempNamespaceId != namespaceID)
> > +       return false;
> > I have a very hard time believing that this is safe lock-less, and a spin
> > lock would be enough it seems.
> 
> The lwlock in BackendIdGetProc() flushes the CPU cache so that proc->tempNamespaceId reflects the latest value.  Or,
doyou mean another spinlock elsewhere?
 

It seems to be allowed that the series of checks on *proc results
in false-positive, which is the safer side for the usage, even it
is not atomically updated. Actually ->databaseId is written
without taking a lock.

> > +   /* Is the backend connected to this database? */
> > +   if (proc->databaseId != MyDatabaseId)
> > +       return false;
> > Wouldn't it be more interesting to do this cleanup as well if the backend
> > is *not* connected to the database autovacuum'ed?  This would make the
> > cleanup more aggresive, which is better.
> 
> IIUC, the above code does what you suggest.  proc->databaseId is the database the client is connected to, and
MyDatabaseIdis the database being autovacuumed (by this autovacuum worker.)
 

It seems that you're right. But maybe the comments in
backned_uses_temp_namespace should be written invsered. And the
comment on the last condition needs to be more detailed. For
example:

|  /* Not used if the backend is not on connection */
|  if (proc == NULL)
|  ..
|  /* Not used if the backend is on another database */
|  if (proc->databaseId !=  MyDatabaseID)
|  ..
|  /* 
|   * Not used if this backend has no temp namespace or one with
|   * different OID.  However we could reuse the namespce OID,
|   * tables in the old teblespace have been cleaned up at the
|   * creation time.  See InitTempTableNamespace.
|   */
| if (proc->tempNamespaceId != namespaceId)
...(is it correct?)

backend_uses_temp_namespace is taking two parameters but the
first one is always derived from the second. backendID doesn't
seem to be needed outside so just one parameter namespaceID is
needed.

There may be a case where classForm->relnamespace is not found in
catalog or nspname is corrupt, but the namespace is identified as
inactive and orphaned tables are cleaned up in the case so we
don't need to inform that to users..


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Thu, Jul 26, 2018 at 07:05:11PM +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 18 Jul 2018 07:34:10 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in
<0A3221C70F24FB45833433255569204D1FA538FD@G01JPEXMBYT05>
>> From: Michael Paquier [mailto:michael@paquier.xyz]
>>> +   /* Does the backend own the temp schema? */
>>> +   if (proc->tempNamespaceId != namespaceID)
>>> +       return false;
>>> I have a very hard time believing that this is safe lock-less, and a spin
>>> lock would be enough it seems.
>>
>> The lwlock in BackendIdGetProc() flushes the CPU cache so that
>> proc->tempNamespaceId reflects the latest value.  Or, do you mean
>> another spinlock elsewhere?
>
> It seems to be allowed that the series of checks on *proc results
> in false-positive, which is the safer side for the usage, even it
> is not atomically updated. Actually ->databaseId is written
> without taking a lock.

Well, from postinit.c there are a couple of assumptions in this case, so
it is neither white nor black:
/*
 * Now we can mark our PGPROC entry with the database ID.
 *
 * We assume this is an atomic store so no lock is needed; though actually
 * things would work fine even if it weren't atomic.  Anyone searching the
 * ProcArray for this database's ID should hold the database lock, so they
 * would not be executing concurrently with this store.  A process looking
 * for another database's ID could in theory see a chance match if it read
 * a partially-updated databaseId value; but as long as all such searches
 * wait and retry, as in CountOtherDBBackends(), they will certainly see
 * the correct value on their next try.
 */
MyProc->databaseId = MyDatabaseId;

Anyway, I have spent some time on this patch, and the thing is doing a
rather bad job about why it is fine to assume that the update can happen
lock-less, and the central part of it seems to be that autovacuum cannot
see the temporary schema created, and any objects on it when it scans
pg_class until it gets committed, and the tracking field is updated in
PGPROC before the commit.

> backend_uses_temp_namespace is taking two parameters but the
> first one is always derived from the second. backendID doesn't
> seem to be needed outside so just one parameter namespaceID is
> needed.

Yes, that reduces the number of calls to GetTempNamespaceBackendId.
autovacuum.c is a pretty bad place for stuff as namespace.c holds all
the logic related to temporary tablespaces, so I renamed the routine to
isTempNamespaceInUse and moved it there.

The patch I have now is attached.  I have not been able to test it much,
particularly with orphaned temp tables and autovacuum, which is
something I'll try to look at this week end or perhaps at the beginning
of next week, heading toward a commit if I am fine enough with it.
Please feel free to look at it for the time being.

Thanks,
--
Michael

Attachment

RE: Temporary tables prevent autovacuum, leading to XID wraparound

From
"Tsunakawa, Takayuki"
Date:
Thank you, Michael and Horiguchi-san,

From: Michael Paquier [mailto:michael@paquier.xyz]
> autovacuum.c is a pretty bad place for stuff as namespace.c holds all the
> logic related to temporary tablespaces, so I renamed the routine to
> isTempNamespaceInUse and moved it there.

I don't have a strong opinion, but I wonder which of namespace.c or autovacuum.c is suitable, because
isTempNamespaceInUseis specific to autovacuum.
 


Regards
Takayuki Tsunakawa
 





Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Fri, Jul 27, 2018 at 08:27:26AM +0000, Tsunakawa, Takayuki wrote:
> I don't have a strong opinion, but I wonder which of namespace.c or
> autovacuum.c is suitable, because isTempNamespaceInUse is specific to
> autovacuum.

I think that there is also a point in allowing other backends to use it
as well, so I left it in namespace.c.  I have been doing more testing
with this patch today.  In order to catch code paths where this is
triggered I have for example added an infinite loop in do_autovacuum
when finding a temp table which exits once a given on-disk file is
found.  This lets plenty of time to attach autovacuum to a debugger, and
play with other sessions in parallel, so I have checked the
transactional assumption this patch relied on, and tested down to v10 as
that's where removal of orphaned temp relations has been made more
aggressive. This patch can also be applied cleanly there.

The check on MyDatabaseId does not actually matter much as the same
temporary namespace OID would get reused only after an OID wraparound
with a backend using a different database but I left it anyway as that's
more consistent to me to check for database and namespace, and added a
sanity check to make sure that the routine gets called only by a process
connected to a database.

I am going to be out of town for a couple of days, and the next minor
release planned is close by, so I am not pushing that today, but I'd
like to do so after the next minor release if there are no objections.
Attached is a patch with a proposal of commit message.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Kyotaro HORIGUCHI
Date:
At Mon, 30 Jul 2018 16:59:16 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180730075916.GB2878@paquier.xyz>
> On Fri, Jul 27, 2018 at 08:27:26AM +0000, Tsunakawa, Takayuki wrote:
> > I don't have a strong opinion, but I wonder which of namespace.c or
> > autovacuum.c is suitable, because isTempNamespaceInUse is specific to
> > autovacuum.
> 
> I think that there is also a point in allowing other backends to use it
> as well, so I left it in namespace.c.  I have been doing more testing
> with this patch today.  In order to catch code paths where this is

+1 for namespace.c from me.

> triggered I have for example added an infinite loop in do_autovacuum
> when finding a temp table which exits once a given on-disk file is
> found.  This lets plenty of time to attach autovacuum to a debugger, and
> play with other sessions in parallel, so I have checked the
> transactional assumption this patch relied on, and tested down to v10 as
> that's where removal of orphaned temp relations has been made more
> aggressive. This patch can also be applied cleanly there.
> 
> The check on MyDatabaseId does not actually matter much as the same
> temporary namespace OID would get reused only after an OID wraparound
> with a backend using a different database but I left it anyway as that's
> more consistent to me to check for database and namespace, and added a
> sanity check to make sure that the routine gets called only by a process
> connected to a database.
> 
> I am going to be out of town for a couple of days, and the next minor
> release planned is close by, so I am not pushing that today, but I'd
> like to do so after the next minor release if there are no objections.
> Attached is a patch with a proposal of commit message.

"int backendID" is left in autovacuum.c as an unused variable.

+  * decide if a temporary file entry can be marked as orphaned or not. Even
+  * if this is an atomic operation, this can be safely done lock-less as no

"Even if this is *not* an atomic operation" ?


+  * Mark the proc entry as owning this namespace which autovacuum uses to
+  * decide if a temporary file entry can be marked as orphaned or not. Even
+  * if this is an atomic operation, this can be safely done lock-less as no
+  * temporary relations associated to it can be seen yet by autovacuum when
+  * scanning pg_class.
+  */
+ MyProc->tempNamespaceId = namespaceId;

The comment looks wrong. After a crash having temp tables,
pg_class has orphaned temp relations and the namespace can be of
reconnected backends especially for low-numbered backends. So
autovacuum may look the shared variable while the reconnected
backend is writing it. The only harmful misbehavior is false
"unused" judgement. This happens only when reusing the same
namespace ID but the temp namespace is already cleaned up by
InitTempTableNamespace in the case. In the case of false "used"
judgement, the later autovacuum rounds will do the clean up
correctly.

In short, I think it is safely done lock-less but the reason is
not no concurrent reads but transient reads do not harm
(currently..).

| * Mark the proc entry as owning this namespace which autovacuum uses to
| * decide if a temporary file entry can be marked as orphaned or not. Even
| * if this is not an atomic operation, this can be safely done lock-less as
| * a transient read doesn't let autovacuum go wrong.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Tue, Jul 31, 2018 at 01:39:07PM +0900, Kyotaro HORIGUCHI wrote:
> "int backendID" is left in autovacuum.c as an unused variable.

Fixed.

> "Even if this is *not* an atomic operation" ?

Yeah, I am reworking this comment a bit more to map with the other
PGPROC fields.

> +  * Mark the proc entry as owning this namespace which autovacuum uses to
> +  * decide if a temporary file entry can be marked as orphaned or not. Even
> +  * if this is an atomic operation, this can be safely done lock-less as no
> +  * temporary relations associated to it can be seen yet by autovacuum when
> +  * scanning pg_class.
> +  */
> + MyProc->tempNamespaceId = namespaceId;
>
> The comment looks wrong. After a crash having temp tables,
> pg_class has orphaned temp relations and the namespace can be of
> reconnected backends especially for low-numbered backends

I don't quite understand your comment.  InitProcess() initializes the
field, so if a backend reconnects and uses the same proc slot as a
backend which had a temporary namespace, then you would discard orphaned
tables.  Anyway, I have reworked it as such:
+   /*
+    * Mark MyProc as owning this namespace which other processes can use to
+    * decide if a temporary namespace is in use or not.  We assume that
+    * assignment of namespaceId is an atomic operation.  Even if it is not,
+    * there is as no temporary relations associated to it and the temporary
+    * namespace creation is not committed yet, so none of its contents can
+    * be seen yet if scanning pg_class or pg_namespace.
+    */

As new minor versions have been tagged, I am planning to commit this
patch soon with some tweaks for the comments.  As this introduces a new
field to PGPROC, so back-patching the thing as-is would cause an ABI
breakage.  Are folks here fine with the new field added to the bottom of
the structure for the backpatched versions, including v11?  I have found
about commit 13752743 which has also added a new field called
isBackgroundWorker in the middle of PGPROC in a released branch, which
looks to me like an ABI breakage...
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Andres Freund
Date:
Hi,

On 2018-07-30 16:59:16 +0900, Michael Paquier wrote:
> +    /*
> +     * Mark MyProc as owning this namespace which other processes can use to
> +     * decide if a temporary namespace is in use or not. Even if this is an
> +     * atomic operation, this can be safely done lock-less as no temporary
> +     * relations associated to it can be seen yet if scanning pg_class.
> +     */
> +    MyProc->tempNamespaceId = namespaceId;

I can't parse this. "Even if this is an    atomic operation, this can be
safely done lock-less" - that seems like a contradictory sentence. Is
there a "not" missing?

Also, this seems like insufficient reasoning. What guarantees the
visibility of the flag? You're going to have to talk about externally
implied memory ordering here.  Or add explicit barriers - the latter is
probably preferrable.

Greetings,

Andres Freund


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Alvaro Herrera
Date:
On 2018-Aug-08, Michael Paquier wrote:

> As this introduces a new
> field to PGPROC, so back-patching the thing as-is would cause an ABI
> breakage.  Are folks here fine with the new field added to the bottom of
> the structure for the backpatched versions, including v11?  I have found
> about commit 13752743 which has also added a new field called
> isBackgroundWorker in the middle of PGPROC in a released branch, which
> looks to me like an ABI breakage...

Unnoticed ABI breaks make my hair stand on end. 

I suppose if we didn't know about 13752743 earlier, then not much
outside code relies on PGPROC, or at least its members after
isBackgroundWorker.  I wouldn't move it now (I suppose anyone who cared
has already adjusted for it), but please do put your new member at the
end in backbranches.

I'm unsure about pg11 -- is it a backbranch already or not?  Since we've
released beta3 already, ISTM we should consider it so.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Unnoticed ABI breaks make my hair stand on end. 

Yeah.

> I'm unsure about pg11 -- is it a backbranch already or not?  Since we've
> released beta3 already, ISTM we should consider it so.

No.  See
https://www.postgresql.org/message-id/12725.1533737052%40sss.pgh.pa.us

At this point I'd judge it's still more valuable to minimize differences
between v11 and v12 than to avoid ABI breakages in v11.  It's unlikely
this would be the last ABI break before 11.0 anyway.

            regards, tom lane


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Wed, Aug 08, 2018 at 10:50:34AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> I'm unsure about pg11 -- is it a backbranch already or not?  Since we've
>> released beta3 already, ISTM we should consider it so.
>
> No.  See
> https://www.postgresql.org/message-id/12725.1533737052%40sss.pgh.pa.us
>
> At this point I'd judge it's still more valuable to minimize differences
> between v11 and v12 than to avoid ABI breakages in v11.  It's unlikely
> this would be the last ABI break before 11.0 anyway.

Thanks!  Based on that I will get this patch into HEAD and 11 pretty
soon.  v10 is definitely out of scope, even if it would have been
interesting to make autovacuum more aggressive there as well, as that's
where the last improvements have been done.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
Hi Andres,

(Not my intention to miss your message, I have just noticed it.)

On Wed, Aug 08, 2018 at 01:41:27AM -0700, Andres Freund wrote:
> I can't parse this. "Even if this is an atomic operation, this can be
> safely done lock-less" - that seems like a contradictory sentence. Is
> there a "not" missing?

Yes, a "not" has gone missing here.  I reworked the comment block as
mentioned upthread.

> Also, this seems like insufficient reasoning. What guarantees the
> visibility of the flag? You're going to have to talk about externally
> implied memory ordering here.  Or add explicit barriers - the latter is
> probably preferrable.

Well, we use BackendIdGetProc() in this case, where we could finish with
information out-of-date pretty quickly, and there is no special
reasoning for backendId and databaseId for autovacuum but...  Perhaps
you could explain more what you have in mind?  And it is not like this
relies on the number of elements in PGPROC.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Andres Freund
Date:

On August 9, 2018 12:41:17 AM GMT+05:30, Michael Paquier <michael@paquier.xyz> wrote:
>Hi Andres,
>
>(Not my intention to miss your message, I have just noticed it.)
>
>On Wed, Aug 08, 2018 at 01:41:27AM -0700, Andres Freund wrote:
>> I can't parse this. "Even if this is an atomic operation, this can be
>> safely done lock-less" - that seems like a contradictory sentence. Is
>> there a "not" missing?
>
>Yes, a "not" has gone missing here.  I reworked the comment block as
>mentioned upthread.
>
>> Also, this seems like insufficient reasoning. What guarantees the
>> visibility of the flag? You're going to have to talk about externally
>> implied memory ordering here.  Or add explicit barriers - the latter
>is
>> probably preferrable.
>
>Well, we use BackendIdGetProc() in this case, where we could finish
>with
>information out-of-date pretty quickly, and there is no special
>reasoning for backendId and databaseId for autovacuum but...  Perhaps
>you could explain more what you have in mind?  And it is not like this
>relies on the number of elements in PGPROC.

My point is that the documentation isn't sufficient. Not that there's an active problem.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Thu, Aug 09, 2018 at 08:32:32AM +0530, Andres Freund wrote:
> My point is that the documentation isn't sufficient. Not that there's an active problem.

OK.  Attached is the latest version if the patch I have that I was
preparing for commit.

On top of isTempNamespaceInUse I have this note:
+ * Note: this can be used while scanning relations in pg_class to detect
+ * orphaned temporary tables or namespaces with a backend connected to a
+ * given database.

Would you be fine if I add an extra note like what's in
BackendIdGetProc?  Say "The result may be out of date quickly, so the
caller must be careful how to handle this information."
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Andres Freund
Date:
On 2018-08-09 09:00:29 +0200, Michael Paquier wrote:
> On Thu, Aug 09, 2018 at 08:32:32AM +0530, Andres Freund wrote:
> > My point is that the documentation isn't sufficient. Not that there's an active problem.
> 
> OK.  Attached is the latest version if the patch I have that I was
> preparing for commit.
> 
> On top of isTempNamespaceInUse I have this note:
> + * Note: this can be used while scanning relations in pg_class to detect
> + * orphaned temporary tables or namespaces with a backend connected to a
> + * given database.
> 
> Would you be fine if I add an extra note like what's in
> BackendIdGetProc?  Say "The result may be out of date quickly, so the
> caller must be careful how to handle this information."

That's a caveat, not a documentation of the memory ordering /
concurrency. You somewhere need a comment to the effect of "We are
guaranteed to see a recent enough value of ->tempNamespace because
anybody creating a temporary table acquires a lock - serving as a memory
barrier - during the creation of said table, after assigning the
tempNamespace variable. At the same time, before considering dropping a
temporary table as orphaned, we acquire a lock and recheck tempNamespace
afterwards".  Note that I'm not explaining the concrete model you have
here, but the way you could explain a theoretical one.

- Andres


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Thu, Aug 09, 2018 at 02:29:54AM -0700, Andres Freund wrote:
> On 2018-08-09 09:00:29 +0200, Michael Paquier wrote:
>> Would you be fine if I add an extra note like what's in
>> BackendIdGetProc?  Say "The result may be out of date quickly, so the
>> caller must be careful how to handle this information."
>
> That's a caveat, not a documentation of the memory ordering /
> concurrency.

I think that I am going to add this note anyway in the new routine.

> You somewhere need a comment to the effect of "We are
> guaranteed to see a recent enough value of ->tempNamespace because
> anybody creating a temporary table acquires a lock - serving as a memory
> barrier - during the creation of said table, after assigning the
> tempNamespace variable. At the same time, before considering dropping a
> temporary table as orphaned, we acquire a lock and recheck tempNamespace
> afterwards".  Note that I'm not explaining the concrete model you have
> here, but the way you could explain a theoretical one.

Okay, here is an idea...  When assigning the value I have now that:
+   /*
+    * Mark MyProc as owning this namespace which other processes can use to
+    * decide if a temporary namespace is in use or not.  We assume that
+    * assignment of namespaceId is an atomic operation.  Even if it is not,
+    * there is no visible temporary relations associated to it and the
+    * temporary namespace creation is not committed yet, so none of its
+    * contents should be seen yet if scanning pg_class or pg_namespace.
+    */
I actually have tried to mention what you are willing to see in the
comments with the last sentence.  So that is awkward :)

I would propose to reword the last sentence of the patch as follows
then:
"Even if it is not atomic, the temporary relation which resulted in the
creation of this temporary namespace is still locked until the current
transaction commits, so it would not be accessible yet."

When resetting the value on abort I have that:
+   /*
+    * Reset the temporary namespace flag in MyProc. The creation of
+    * the temporary namespace has failed for some reason and should
+    * not be seen by other processes as it has not been committed
+    * yet, hence this would be fine even if not atomic, still we
+    * assume that it is an atomic assignment.
+    */

Hence I would propose the following wording for this part:
"Reset the temporary namespace flag in MyProc.  We assume that this
operation is atomic, however it would be fine even if not atomic as the
temporary table which created this namespace is still locked until this
transaction aborts so it would not be visible yet."

Better ideas are of course welcome.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Thu, Aug 09, 2018 at 06:50:47PM +0200, Michael Paquier wrote:
> Better ideas are of course welcome.

I have gone back-and-forth on this patch for the last couple of days,
reworded the comment blocks to outline the point Andres has been making,
and I have finally been able to push it and back-patched it down to v11
but not further down as we don't want an ABI breakage in already
released branches.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Andres Freund
Date:
On 2018-08-09 18:50:47 +0200, Michael Paquier wrote:
> On Thu, Aug 09, 2018 at 02:29:54AM -0700, Andres Freund wrote:
> +   /*
> +    * Mark MyProc as owning this namespace which other processes can use to
> +    * decide if a temporary namespace is in use or not.  We assume that
> +    * assignment of namespaceId is an atomic operation.  Even if it is not,
> +    * there is no visible temporary relations associated to it and the
> +    * temporary namespace creation is not committed yet, so none of its
> +    * contents should be seen yet if scanning pg_class or pg_namespace.
> +    */

> I actually have tried to mention what you are willing to see in the
> comments with the last sentence.  So that is awkward :)

I don't know what you're trying to say with this.

> I would propose to reword the last sentence of the patch as follows
> then:
> "Even if it is not atomic, the temporary relation which resulted in the
> creation of this temporary namespace is still locked until the current
> transaction commits, so it would not be accessible yet."
> 
> When resetting the value on abort I have that:
> +   /*
> +    * Reset the temporary namespace flag in MyProc. The creation of
> +    * the temporary namespace has failed for some reason and should
> +    * not be seen by other processes as it has not been committed
> +    * yet, hence this would be fine even if not atomic, still we
> +    * assume that it is an atomic assignment.
> +    */
> 
> Hence I would propose the following wording for this part:
> "Reset the temporary namespace flag in MyProc.  We assume that this
> operation is atomic, however it would be fine even if not atomic as the
> temporary table which created this namespace is still locked until this
> transaction aborts so it would not be visible yet."

I don't think that comment, nor the comment that you ended up
committing:
+
+           /*
+            * Reset the temporary namespace flag in MyProc.  We assume that
+            * this operation is atomic.  Even if it is not, the temporary
+            * table which created this namespace is still locked until this
+            * transaction aborts so it would not be visible yet, acting as a
+            * barrier.
+            */

is actually correct. *Holding* a lock isn't a memory barrier. Acquring
or releasing one is.

Greetings,

Andres Freund


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Alvaro Herrera
Date:
From 1e9ba9fa9b172bda1ea54b1f3be1b930973ff45f Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 8 Aug 2018 19:45:31 +0200
Subject: [PATCH] Make autovacuum more aggressive to remove orphaned temp
 tables

I was here to complain about this piece:

> @@ -3975,6 +4033,15 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
>              myTempNamespace = InvalidOid;
>              myTempToastNamespace = InvalidOid;
>              baseSearchPathValid = false;    /* need to rebuild list */
> +
> +            /*
> +             * Reset the temporary namespace flag in MyProc. The creation of
> +             * the temporary namespace has failed for some reason and should
> +             * not be seen by other processes as it has not been committed
> +             * yet, hence this would be fine even if not atomic, still we
> +             * assume that it is an atomic assignment.
> +             */
> +            MyProc->tempNamespaceId = InvalidOid;
>          }
>      }

But after looking carefully, I realized that this only runs if the
subtransaction being aborted is the same that set up the temp namespace
(or one of its children) in the first place; therefore it's correct to
tear it down unconditionally.  I was afraid of this clobbering a temp
namespace that had been set up by some other branch of the transaction
tree.  However, that's not a concern because we compare the
subTransactionId (and update the value when propagating to parent
subxact.)

I do share Andres' concerns on the wording the comment.  I would say
something like

/*
 * Reset the temporary namespace flag in MyProc.  We assume this to be
 * an atomic assignment.
 *
 * Because this subtransaction is rolling back, the pg_namespace
 * row is not visible to anyone else anyway, but that doesn't matter:
 * it's not a problem if objects contained in this namespace are removed
 * concurrently.
 */

The fact of assignment being atomic and the fact of the pg_namespace row
being visible are separately important.  You care about it being atomic
because it means you must not have someone read "16" (0x10) when you
were partway removing the value "65552" (0x10010), thus causing that
someone removing namespace 16.  And you care about the visibility of the
pg_namespace row because of whether you're worried about a third party
removing the tables from that namespace or not: since the subxact is
aborting, you are not.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Mon, Aug 13, 2018 at 02:56:16AM -0700, Andres Freund wrote:
> On 2018-08-09 18:50:47 +0200, Michael Paquier wrote:
> I don't think that comment, nor the comment that you ended up
> committing:
> +
> +           /*
> +            * Reset the temporary namespace flag in MyProc.  We assume that
> +            * this operation is atomic.  Even if it is not, the temporary
> +            * table which created this namespace is still locked until this
> +            * transaction aborts so it would not be visible yet, acting as a
> +            * barrier.
> +            */
>
> is actually correct. *Holding* a lock isn't a memory barrier. Acquring
> or releasing one is.

I cannot guess what you think, but would something like the attached be
more adapted?  Both things look rather similar to me now, likely for you
it does not.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Alvaro Herrera
Date:
If writing an OID were not atomic, the assignment would be really
dangerous.  I don't think your proposed update is good.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Mon, Aug 13, 2018 at 02:15:07PM -0300, Alvaro Herrera wrote:
> If writing an OID were not atomic, the assignment would be really
> dangerous.  I don't think your proposed update is good.

OK, I am withdrawing this one then.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Mon, Aug 13, 2018 at 01:53:18PM -0300, Alvaro Herrera wrote:
> I do share Andres' concerns on the wording the comment.  I would say
> something like
>
> /*
>  * Reset the temporary namespace flag in MyProc.  We assume this to be
>  * an atomic assignment.
>  *
>  * Because this subtransaction is rolling back, the pg_namespace
>  * row is not visible to anyone else anyway, but that doesn't matter:
>  * it's not a problem if objects contained in this namespace are removed
>  * concurrently.
>  */

> The fact of assignment being atomic and the fact of the pg_namespace row
> being visible are separately important.  You care about it being atomic
> because it means you must not have someone read "16" (0x10) when you
> were partway removing the value "65552" (0x10010), thus causing that
> someone removing namespace 16.  And you care about the visibility of the
> pg_namespace row because of whether you're worried about a third party
> removing the tables from that namespace or not: since the subxact is
> aborting, you are not.

I was thinking about adding "Even if it is not atomic" or such at the
beginning of the paragraph, but at the end your phrasing sounds better
to me.  So I have hacked up the attached, which also reworks the comment
in InitTempTableNamespace in the same spirit.  Thoughts?
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Tue, Aug 14, 2018 at 06:53:32PM +0200, Michael Paquier wrote:
> I was thinking about adding "Even if it is not atomic" or such at the
> beginning of the paragraph, but at the end your phrasing sounds better
> to me.  So I have hacked up the attached, which also reworks the comment
> in InitTempTableNamespace in the same spirit.  Thoughts?

It has been close to one week since I sent this patch.  Anything?  I
don't like to keep folks unhappy about anything I committed.
--
Michael

Attachment

Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Alvaro Herrera
Date:
On 2018-Aug-20, Michael Paquier wrote:

> On Tue, Aug 14, 2018 at 06:53:32PM +0200, Michael Paquier wrote:
> > I was thinking about adding "Even if it is not atomic" or such at the
> > beginning of the paragraph, but at the end your phrasing sounds better
> > to me.  So I have hacked up the attached, which also reworks the comment
> > in InitTempTableNamespace in the same spirit.  Thoughts?
> 
> It has been close to one week since I sent this patch.  Anything?  I
> don't like to keep folks unhappy about anything I committed.

Seems reasonable.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Temporary tables prevent autovacuum, leading to XID wraparound

From
Michael Paquier
Date:
On Mon, Aug 20, 2018 at 10:54:28AM -0300, Alvaro Herrera wrote:
> Seems reasonable.
Thanks Álvaro, pushed.
--
Michael

Attachment