Thread: Fun fact about autovacuum and orphan temp tables

Fun fact about autovacuum and orphan temp tables

From

Grigory Smolkin

Date:

05 September 2016, 11:54:12

<p>Hello, hackers!<p>We were testing how well some application works with PostgreSQL and stumbled upon an autovacuum
behaviorwhich I fail to understand.<br /> Application in question have a habit to heavily use temporary tables in funny
ways.<br/> For example it creates A LOT of them.<br /> Which is ok.<br /> Funny part is that it never drops them. So
whenbackend is finally terminated, it tries to drop them and fails with error:<br /><br /> FATAL:  out of shared
memory<br/> HINT:  You might need to increase max_locks_per_transaction<br /><br /> If I understand that rigth, we are
tryingto drop all these temp tables in one transaction and running out of locks to do so.<br /> After that
postgresql.logis flooded at the rate 1k/s with messages like that:<br /><br /> LOG:  autovacuum: found orphan temp
table"pg_temp_15"."tt38147" in database "DB_TEST"<br /><br /> It produces a noticeable load on the system and it`s
gettingworst with every terminated backend or restart.<br /> I did some RTFS and it appears that autovacuum has no
intentionof cleaning that orphan tables unless<br /> it`s wraparound time:<br /><br />
src/backend/postmaster/autovacuum.c<br/>              /* We just ignore it if the owning backend is still active */<br
/> 2037             if (backendID == MyBackendId || BackendIdGetProc(backendID) == NULL)<br />  2038             {<br
/> 2039                 /*<br />  2040                  * We found an orphan temp table (which was probably left<br />
 2041                 * behind by a crashed backend).  If it's so old as to need<br />  2042                  * vacuum
forwraparound, forcibly drop it.  Otherwise just<br />  2043                  * log a complaint.<br />
 2044                 */<br />  2045                 if (wraparound)<br />  2046                 {<br />
 2047                    ObjectAddress object;<br />  2048 <br />  2049                     ereport(LOG,<br />
 2050                            (errmsg("autovacuum: dropping orphan temp table \"%s\".\"%s\" in database \"%s\"",<br
/> 2051                                  get_namespace_name(classForm->relnamespace),<br />
 2052                                    NameStr(classForm->relname),<br />
 2053                                    get_database_name(MyDatabaseId))));<br />  2054                    
object.classId= RelationRelationId;<br />  2055                     object.objectId = relid;<br />
 2056                    object.objectSubId = 0;<br />  2057                     performDeletion(&object,
DROP_CASCADE,PERFORM_DELETION_INTERNAL);<br />  2058                 }<br />  2059                 else<br />
 2060                {<br />  2061                     ereport(LOG,<br />  2062                            
(errmsg("autovacuum:found orphan temp table \"%s\".\"%s\" in database \"%s\"",<br />
 2063                                 get_namespace_name(classForm->relnamespace),<br />
 2064                                    NameStr(classForm->relname),<br />
 2065                                    get_database_name(MyDatabaseId))));<br />  2066                 }<br />
 2067            }<br />  2068         }<br /><br /><br /> What is more troubling is that pg_statistic is starting to
bloatbadly.<br /><br /> LOG:  automatic vacuum of table "DB_TEST.pg_catalog.pg_statistic": index scans: 0<br />        
pages:0 removed, 68225 remain, 0 skipped due to pins<br />         tuples: 0 removed, 2458382 remain, 2408081 are dead
butnot yet removable<br />         buffer usage: 146450 hits, 31 misses, 0 dirtied<br />         avg read rate: 0.010
MB/s,avg write rate: 0.000 MB/s<br />         system usage: CPU 3.27s/6.92u sec elapsed 23.87 sec<br /><br /> What is
thepurpose of keeping orphan tables around and not dropping them on the spot?<br /><br /><br /><pre
class="moz-signature"cols="72">-- 
 
Grigory Smolkin
Postgres Professional: <a class="moz-txt-link-freetext"
href="http://www.postgrespro.com">http://www.postgrespro.com</a>
The Russian Postgres Company</pre>

Re: Fun fact about autovacuum and orphan temp tables

From

Vik Fearing

Date:

05 September 2016, 13:13:23

On 09/05/2016 01:54 PM, Grigory Smolkin wrote:
> What is the purpose of keeping orphan tables around and not dropping
> them on the spot?

You can read the discussion about it here:
https://www.postgresql.org/message-id/flat/3507.1214581513%40sss.pgh.pa.us
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Fun fact about autovacuum and orphan temp tables

From

Alvaro Herrera

Date:

05 September 2016, 13:34:50

Grigory Smolkin wrote:

> Funny part is that it never drops them. So when backend is finally
> terminated, it tries to drop them and fails with error:
> 
> FATAL:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction
> 
> If I understand that rigth, we are trying to drop all these temp tables in
> one transaction and running out of locks to do so.

Hmm, yeah, I suppose it does that, and it does seem pretty inconvenient.
It is certainly pointless to hold onto these locks for temp tables.  I
wonder how ugly would be to fix this problem ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fun fact about autovacuum and orphan temp tables

From

Grigory Smolkin

Date:

05 September 2016, 14:05:06

<br /><div class="moz-cite-prefix">On 09/05/2016 04:34 PM, Alvaro Herrera wrote:<br /></div><blockquote
cite="mid:20160905133440.GA671130@alvherre.pgsql"type="cite"><pre wrap="">Grigory Smolkin wrote:
 

</pre><blockquote type="cite"><pre wrap="">Funny part is that it never drops them. So when backend is finally
terminated, it tries to drop them and fails with error:

FATAL:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction

If I understand that rigth, we are trying to drop all these temp tables in
one transaction and running out of locks to do so.
</pre></blockquote><pre wrap="">
Hmm, yeah, I suppose it does that, and it does seem pretty inconvenient.
It is certainly pointless to hold onto these locks for temp tables.  I
wonder how ugly would be to fix this problem ...

</pre></blockquote><br /> Thank you for your interest in this problem.<br /> I dont think this is a source of problem.
Uglyfix here would only force backend to terminate properly.<br /> It will not help at all in cause of server crash or
poweroutage.<br /> We need a way to tell autovacuum, that we don`t need orphan temp tables, so they can be removed
usingexisting routine.<br /><br /> The least invasive solution would be to have a guc, something like
'keep_orphan_temp_tables'with boolean value.<br /> Which would determine a autovacuum worker policy toward encountered
orphantemp tables.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Grigory Smolkin
Postgres Professional: <a class="moz-txt-link-freetext"
href="http://www.postgrespro.com">http://www.postgrespro.com</a>
The Russian Postgres Company</pre>

Re: Fun fact about autovacuum and orphan temp tables

From

Alvaro Herrera

Date:

05 September 2016, 15:48:45

Grigory Smolkin wrote:
> 
> On 09/05/2016 04:34 PM, Alvaro Herrera wrote:
> >Grigory Smolkin wrote:
> >
> >>Funny part is that it never drops them. So when backend is finally
> >>terminated, it tries to drop them and fails with error:
> >>
> >>FATAL:  out of shared memory
> >>HINT:  You might need to increase max_locks_per_transaction
> >>
> >>If I understand that rigth, we are trying to drop all these temp tables in
> >>one transaction and running out of locks to do so.
> >Hmm, yeah, I suppose it does that, and it does seem pretty inconvenient.
> >It is certainly pointless to hold onto these locks for temp tables.  I
> >wonder how ugly would be to fix this problem ...
> >
> 
> Thank you for your interest in this problem.
> I dont think this is a source of problem. Ugly fix here would only force
> backend to terminate properly.
> It will not help at all in cause of server crash or power outage.
> We need a way to tell autovacuum, that we don`t need orphan temp tables, so
> they can be removed using existing routine.

It is always possible to drop the containing schemas; and as soon as
some other backend uses the BackendId 15 (in your example) the tables
would be removed anyway.  This only becomes a longstanding problem when
the crashing backend uses a high-numbered BackendId that's not reused
promptly enough.

> The least invasive solution would be to have a guc, something like
> 'keep_orphan_temp_tables' with boolean value.
> Which would determine a autovacuum worker policy toward encountered orphan
> temp tables.

The stated reason for keeping them around is to ensure you have time to
do some forensics research in case there was something useful in the
crashing backend.  My feeling is that if the reason they are kept around
is not a crash but rather some implementation defect that broke end-time
cleanup, then they don't have their purported value and I would rather
just remove them.

I have certainly faced my fair share of customers with dangling temp
tables, and would like to see this changed in some way or another.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fun fact about autovacuum and orphan temp tables

From

Bruce Momjian

Date:

05 September 2016, 17:14:59

On Mon, Sep  5, 2016 at 12:48:32PM -0300, Alvaro Herrera wrote:
> > The least invasive solution would be to have a guc, something like
> > 'keep_orphan_temp_tables' with boolean value.
> > Which would determine a autovacuum worker policy toward encountered orphan
> > temp tables.
> 
> The stated reason for keeping them around is to ensure you have time to
> do some forensics research in case there was something useful in the
> crashing backend.  My feeling is that if the reason they are kept around
> is not a crash but rather some implementation defect that broke end-time
> cleanup, then they don't have their purported value and I would rather
> just remove them.
> 
> I have certainly faced my fair share of customers with dangling temp
> tables, and would like to see this changed in some way or another.

I don't think we look at those temp tables frequently enough to justify
keeping them around for all users.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: Fun fact about autovacuum and orphan temp tables

From

Jim Nasby

Date:

06 September 2016, 16:22:00

On 9/5/16 12:14 PM, Bruce Momjian wrote:
>> > I have certainly faced my fair share of customers with dangling temp
>> > tables, and would like to see this changed in some way or another.
> I don't think we look at those temp tables frequently enough to justify
> keeping them around for all users.

Plus, if we cared about forensics, we'd prevent re-use of the orphaned 
schemas by new backends. That doesn't seem like a good idea for normal 
use, but if we had a preserve_orphaned_temp_objects GUC someone could 
add that behavior.

Isn't there some other GUC aimed at preserving data for forensics 
(besides zero_damaged_pages)? Maybe we could just broaden that to 
include orphaned temp objects.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Fun fact about autovacuum and orphan temp tables

From

Robert Haas

Date:

07 September 2016, 15:38:16

On Mon, Sep 5, 2016 at 1:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Sep  5, 2016 at 12:48:32PM -0300, Alvaro Herrera wrote:
>> > The least invasive solution would be to have a guc, something like
>> > 'keep_orphan_temp_tables' with boolean value.
>> > Which would determine a autovacuum worker policy toward encountered orphan
>> > temp tables.
>>
>> The stated reason for keeping them around is to ensure you have time to
>> do some forensics research in case there was something useful in the
>> crashing backend.  My feeling is that if the reason they are kept around
>> is not a crash but rather some implementation defect that broke end-time
>> cleanup, then they don't have their purported value and I would rather
>> just remove them.
>>
>> I have certainly faced my fair share of customers with dangling temp
>> tables, and would like to see this changed in some way or another.
>
> I don't think we look at those temp tables frequently enough to justify
> keeping them around for all users.

+1.  I think it would be much better to nuke them more aggressively.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fun fact about autovacuum and orphan temp tables

From

"Constantin S. Pan"

Date:

20 October 2016, 12:30:37

On Mon, 5 Sep 2016 14:54:05 +0300
Grigory Smolkin <g.smolkin@postgrespro.ru> wrote:

> Hello, hackers!
>
> We were testing how well some application works with PostgreSQL and
> stumbled upon an autovacuum behavior which I fail to understand.
> Application in question have a habit to heavily use temporary tables
> in funny ways.
> For example it creates A LOT of them.
> Which is ok.
> Funny part is that it never drops them. So when backend is finally
> terminated, it tries to drop them and fails with error:
>
> FATAL:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction
>
> If I understand that rigth, we are trying to drop all these temp
> tables in one transaction and running out of locks to do so.
> After that postgresql.log is flooded at the rate 1k/s with messages
> like that:
>
> LOG:  autovacuum: found orphan temp table "pg_temp_15"."tt38147" in
> database "DB_TEST"
>
> It produces a noticeable load on the system and it`s getting worst
> with every terminated backend or restart.
> I did some RTFS and it appears that autovacuum has no intention of
> cleaning that orphan tables unless
> it`s wraparound time:
>
> src/backend/postmaster/autovacuum.c
>               /* We just ignore it if the owning backend is still
> active */ 2037             if (backendID == MyBackendId ||
> BackendIdGetProc(backendID) == NULL)
>   2038             {
>   2039                 /*
>   2040                  * We found an orphan temp table (which was
> probably left
>   2041                  * behind by a crashed backend).  If it's so
> old as to need
>   2042                  * vacuum for wraparound, forcibly drop it.
> Otherwise just
>   2043                  * log a complaint.
>   2044                  */
>   2045                 if (wraparound)
>   2046                 {
>   2047                     ObjectAddress object;
>   2048
>   2049                     ereport(LOG,
>   2050                             (errmsg("autovacuum: dropping
> orphan temp table \"%s\".\"%s\" in database \"%s\"",
>   2051 get_namespace_name(classForm->relnamespace),
>   2052 NameStr(classForm->relname),
>   2053 get_database_name(MyDatabaseId))));
>   2054                     object.classId = RelationRelationId;
>   2055                     object.objectId = relid;
>   2056                     object.objectSubId = 0;
>   2057                     performDeletion(&object, DROP_CASCADE,
> PERFORM_DELETION_INTERNAL);
>   2058                 }
>   2059                 else
>   2060                 {
>   2061                     ereport(LOG,
>   2062                             (errmsg("autovacuum: found orphan
> temp table \"%s\".\"%s\" in database \"%s\"",
>   2063 get_namespace_name(classForm->relnamespace),
>   2064 NameStr(classForm->relname),
>   2065 get_database_name(MyDatabaseId))));
>   2066                 }
>   2067             }
>   2068         }
>
>
> What is more troubling is that pg_statistic is starting to bloat
> badly.
>
> LOG:  automatic vacuum of table "DB_TEST.pg_catalog.pg_statistic":
> index scans: 0
>          pages: 0 removed, 68225 remain, 0 skipped due to pins
>          tuples: 0 removed, 2458382 remain, 2408081 are dead but not
> yet removable
>          buffer usage: 146450 hits, 31 misses, 0 dirtied
>          avg read rate: 0.010 MB/s, avg write rate: 0.000 MB/s
>          system usage: CPU 3.27s/6.92u sec elapsed 23.87 sec
>
> What is the purpose of keeping orphan tables around and not dropping
> them on the spot?
>
>

Hey Hackers,

I tried to fix the problem with a new backend not being
able to reuse a temporary namespace when it contains
thousands of temporary tables. I disabled locking of objects
during namespace clearing process. See the patch attached.
Please tell me if there are any reasons why this is wrong.

I also added a GUC variable and changed the condition in
autovacuum to let it nuke orphan tables on its way.
See another patch attached.

Regards,
Constantin Pan

Attachment

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

21 October 2016, 05:16:41

On Thu, Sep 8, 2016 at 12:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 5, 2016 at 1:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> I don't think we look at those temp tables frequently enough to justify
>> keeping them around for all users.
>
> +1.  I think it would be much better to nuke them more aggressively.

+1 from here as well. Making the deletion of orphaned temp tables even
in the non-wraparound autovacuum case mandatory looks to be the better
move to me. I can see that it could be important to be able to look at
some of temp tables' data after a crash, but the argument looks weak
compared to the potential bloat of catalog tables because of those
dangling temp relations. And I'd suspect that there are far more users
who would like to see this removal more aggressive than users caring
about having a look at those orphaned tables after a crash.
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

21 October 2016, 05:29:35

On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan <kvapen@gmail.com> wrote:
> I tried to fix the problem with a new backend not being
> able to reuse a temporary namespace when it contains
> thousands of temporary tables. I disabled locking of objects
> during namespace clearing process. See the patch attached.
> Please tell me if there are any reasons why this is wrong.

That's invasive. I am wondering if a cleaner approach here would be a
flag in deleteOneObject() that performs the lock cleanup, as that's
what you are trying to solve here.

> I also added a GUC variable and changed the condition in
> autovacuum to let it nuke orphan tables on its way.
> See another patch attached.

It seems to me that you'd even want to make the drop of orphaned
tables mandatory once it is detected even it is not a wraparound
autovacuum. Dangling temp tables have higher chances to hit users than
diagnostic of orphaned temp tables after a crash. (A background worker
could be used for existing versions to clean up that more aggressively
actually)
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

21 October 2016, 05:31:15

On Fri, Oct 21, 2016 at 2:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan <kvapen@gmail.com> wrote:
>> I tried to fix the problem with a new backend not being
>> able to reuse a temporary namespace when it contains
>> thousands of temporary tables. I disabled locking of objects
>> during namespace clearing process. See the patch attached.
>> Please tell me if there are any reasons why this is wrong.
>
> That's invasive. I am wondering if a cleaner approach here would be a
> flag in deleteOneObject() that performs the lock cleanup, as that's
> what you are trying to solve here.
>
>> I also added a GUC variable and changed the condition in
>> autovacuum to let it nuke orphan tables on its way.
>> See another patch attached.
>
> It seems to me that you'd even want to make the drop of orphaned
> tables mandatory once it is detected even it is not a wraparound
> autovacuum. Dangling temp tables have higher chances to hit users than
> diagnostic of orphaned temp tables after a crash. (A background worker
> could be used for existing versions to clean up that more aggressively
> actually)

You should as well add your patch to the next commit fest, so as to be
sure that it will attract more reviews and more attention:
https://commitfest.postgresql.org/11/
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

"Constantin S. Pan"

Date:

21 October 2016, 10:09:05

On Fri, 21 Oct 2016 14:29:24 +0900
Michael Paquier <michael.paquier@gmail.com> wrote:

> That's invasive. I am wondering if a cleaner approach here would be a
> flag in deleteOneObject() that performs the lock cleanup, as that's
> what you are trying to solve here.

The problem occurs earlier, at the findDependentObjects step. All the
objects inside the namespace are being locked before any of them gets
deleted, which leads to the "too many locks" condition.

Cheers,
Constantin Pan

Re: Fun fact about autovacuum and orphan temp tables

From

Tom Lane

Date:

21 October 2016, 13:47:31

Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan <kvapen@gmail.com> wrote:
>> I tried to fix the problem with a new backend not being
>> able to reuse a temporary namespace when it contains
>> thousands of temporary tables. I disabled locking of objects
>> during namespace clearing process. See the patch attached.
>> Please tell me if there are any reasons why this is wrong.

> That's invasive.

Invasive or otherwise, it's *completely unacceptable*.  Without a lock
you have no way to be sure that nothing else is touching the table.

A less broken approach might be to split the cleanup into multiple shorter
transactions, that is, after every N objects stop and commit what you've
done so far.  This shouldn't be that hard to do during backend exit, as
I'm pretty sure we're starting a new transaction just for this purpose
anyway.  I don't know if it'd be possible to do it during the automatic
cleanup when glomming onto a pre-existing temp namespace, because we're
already within a user-started transaction at that point.  But if we solve
the problem where it's being created, maybe that's enough for now.

>> I also added a GUC variable and changed the condition in
>> autovacuum to let it nuke orphan tables on its way.
>> See another patch attached.

> It seems to me that you'd even want to make the drop of orphaned
> tables mandatory once it is detected even it is not a wraparound
> autovacuum.

If we are willing to do that then we don't really have to solve the
problem on the backend side.  One could expect that autovacuum would
clean things up within a few minutes after a backend failure.  We'd
have to be really darn sure that that "orphaned backend" test can
never have any false positives, though.  I'm not sure that it was
ever designed to be race-condition-proof.
        regards, tom lane

Re: Fun fact about autovacuum and orphan temp tables

From

Jim Nasby

Date:

21 October 2016, 15:16:50

On 10/21/16 8:47 AM, Tom Lane wrote:
>> It seems to me that you'd even want to make the drop of orphaned
>> > tables mandatory once it is detected even it is not a wraparound
>> > autovacuum.
> If we are willing to do that then we don't really have to solve the
> problem on the backend side.  One could expect that autovacuum would
> clean things up within a few minutes after a backend failure.

Unless all the autovac workers are busy working on huge tables... maybe 
a delay of several hours/days is OK in this case, but it's not wise to 
assume autovac will always get to something within minutes.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

21 October 2016, 22:41:49

On Sat, Oct 22, 2016 at 12:15 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 10/21/16 8:47 AM, Tom Lane wrote:
>>>
>>> It seems to me that you'd even want to make the drop of orphaned
>>> > tables mandatory once it is detected even it is not a wraparound
>>> > autovacuum.
>>
>> If we are willing to do that then we don't really have to solve the
>> problem on the backend side.  One could expect that autovacuum would
>> clean things up within a few minutes after a backend failure.
>
> Unless all the autovac workers are busy working on huge tables... maybe a
> delay of several hours/days is OK in this case, but it's not wise to assume
> autovac will always get to something within minutes.

I am really thinking that we should just do that and call it a day
then, but document the fact that if one wants to look at the content
of orphaned tables after a crash he had better turn autovacuum to off
for the time of the analysis.
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

Tom Lane

Date:

22 October 2016, 00:45:23

Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Oct 22, 2016 at 12:15 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 10/21/16 8:47 AM, Tom Lane wrote:
>>> If we are willing to do that then we don't really have to solve the
>>> problem on the backend side.  One could expect that autovacuum would
>>> clean things up within a few minutes after a backend failure.

> I am really thinking that we should just do that and call it a day
> then, but document the fact that if one wants to look at the content
> of orphaned tables after a crash he had better turn autovacuum to off
> for the time of the analysis.

Yeah, agreed.  This also points up the value of Robert's suggestion
of a "really off" setting.
        regards, tom lane

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

22 October 2016, 06:42:55

On Sat, Oct 22, 2016 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Sat, Oct 22, 2016 at 12:15 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>> On 10/21/16 8:47 AM, Tom Lane wrote:
>>>> If we are willing to do that then we don't really have to solve the
>>>> problem on the backend side.  One could expect that autovacuum would
>>>> clean things up within a few minutes after a backend failure.
>
>> I am really thinking that we should just do that and call it a day
>> then, but document the fact that if one wants to look at the content
>> of orphaned tables after a crash he had better turn autovacuum to off
>> for the time of the analysis.
>
> Yeah, agreed.  This also points up the value of Robert's suggestion
> of a "really off" setting.

Okay, so I suggest something like the attached as HEAD-only change
because that's a behavior modification.
--
Michael

Attachment

autovacuum-orphan-cleanup.patch

Re: Fun fact about autovacuum and orphan temp tables

From

Haribabu Kommi

Date:

17 November 2016, 01:24:30

Hi Dilip,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.

But you haven't shared your review yet. Can you please try to share your views

about the patch. This will help us in smoother operation of commitfest.

Michael had sent an updated patch based on some discussion.

Please Ignore if you already shared your review.

Regards,

Hari Babu

Fujitsu Australia

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

17 November 2016, 04:14:06

On Wed, Nov 16, 2016 at 5:24 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> This is a gentle reminder.
>
> you assigned as reviewer to the current patch in the 11-2016 commitfest.
> But you haven't shared your review yet. Can you please try to share your
> views
> about the patch. This will help us in smoother operation of commitfest.

Thanks for the reminder.

> Michael had sent an updated patch based on some discussion.
> Please Ignore if you already shared your review.

Hm. Thinking about that again, having a GUC to control if orphaned
temp tables in autovacuum is an overkill (who is going to go into this
level of tuning!?) and that we had better do something more aggressive
as there have been cases of users complaining about dangling temp
tables. I suspect the use case where people would like to have a look
at orphaned temp tables after a backend crash is not that wide, at
least a method would be to disable autovacuum after a crash if such a
monitoring is necessary. Tom has already stated upthread that the
patch to remove wildly locks is not acceptable, and he's clearly
right.

So the best move would be really to make the removal of orphaned temp
tables more aggressive, and not bother about having a GUC to control
that. The patch sent in
https://www.postgresql.org/message-id/CAB7nPqSRYwaz1i12mPkH06_roO3ifgCgR88_aeX1OEg2r4OaNw@mail.gmail.com
does so, so I am marking the CF entry as ready for committer for this
patch to attract some committer's attention on the matter.
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

Robert Haas

Date:

17 November 2016, 15:37:22

On Wed, Nov 16, 2016 at 11:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Hm. Thinking about that again, having a GUC to control if orphaned
> temp tables in autovacuum is an overkill (who is going to go into this
> level of tuning!?) and that we had better do something more aggressive
> as there have been cases of users complaining about dangling temp
> tables. I suspect the use case where people would like to have a look
> at orphaned temp tables after a backend crash is not that wide, at
> least a method would be to disable autovacuum after a crash if such a
> monitoring is necessary. Tom has already stated upthread that the
> patch to remove wildly locks is not acceptable, and he's clearly
> right.
>
> So the best move would be really to make the removal of orphaned temp
> tables more aggressive, and not bother about having a GUC to control
> that. The patch sent in
> https://www.postgresql.org/message-id/CAB7nPqSRYwaz1i12mPkH06_roO3ifgCgR88_aeX1OEg2r4OaNw@mail.gmail.com
> does so, so I am marking the CF entry as ready for committer for this
> patch to attract some committer's attention on the matter.

The whole point of the review process is to get an opinion from
somebody other than the original author on the patch in question.  If
you write a patch and then mark your own patch Ready for Committer,
you're defeating the entire purpose of this process.  I think that's
what you did here, I think you have done it on other occasions in the
not-too-distant patch, and I wish you'd stop.  I understand perfectly
well that there are times when a committer needs to involve themselves
directly in a patch even when nobody else has reviewed it, because
that just has to happen, and I try to keep an eye out for such
scenarios, but it's frustrating to clear time to work on the RfC
backlog and then discover patches that have just been marked that way
without discussion or agreement.

Now, on the substance of this issue, I think your proposed change to
clean up temporary tables immediately rather than waiting until they
become old enough to threaten wraparound is a clear improvement, and
IMHO we should definitely do it.  However, I think your proposed
documentation changes are pretty well inadequate.  If somebody
actually does want to get at the temporary tables of a crashed
backend, they will need to do a lot more than what you mention in that
update.  Even if they set autovacuum=off, they will be vulnerable to
having those tables removed by another backend with the same backend
ID that reinitializes the temporary schema.  And even if that doesn't
happen, they won't be able to select from those tables because they'll
get an error about reading temporary tables of another session.  To
fix that they'd have to move the temporary tables into some other
schema AND change the filenames on disk.  And then on top of that, by
the time anybody even found this in the documentation, it would be far
too late to shut off autovacuum in the first place because it runs
every minute (unless you have raised autovacuum_naptime, which you
shouldn't).  I think we just shouldn't document this.  The proposed
behavior change is basically switching from an extremely conservative
behavior with surprising consequences to what people would expect
anyway, and anyone who needs to dig information out of another
session's temporary tables is going to need to know a lot more about
the server internals than we normally explain in the documentation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

17 November 2016, 18:41:13

On Thu, Nov 17, 2016 at 7:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The whole point of the review process is to get an opinion from
> somebody other than the original author on the patch in question.  If
> you write a patch and then mark your own patch Ready for Committer,
> you're defeating the entire purpose of this process.  I think that's
> what you did here, I think you have done it on other occasions in the
> not-too-distant patch, and I wish you'd stop.  I understand perfectly
> well that there are times when a committer needs to involve themselves
> directly in a patch even when nobody else has reviewed it, because
> that just has to happen, and I try to keep an eye out for such
> scenarios, but it's frustrating to clear time to work on the RfC
> backlog and then discover patches that have just been marked that way
> without discussion or agreement.

OK, sorry about that, I switched it back to "Needs Review" FWIW. For
my defense, after a review of the first set of patches proposed, I
suggested one way to go which is the second patch I sent. Bruce and
yourself mentioned that nuking them more aggressively would be better
to have. Tom mentioned that doing so and having a GUC would be worth
it. In short this sounded like an agreement to me, which is why I
proposed a new patch to make the discussion move on. And the patch is
not that complicated. When looking at it I asked myself about
potential timing issues regarding fact of doing this cleanup more
aggressively but discarded them at the end.

> Now, on the substance of this issue, I think your proposed change to
> clean up temporary tables immediately rather than waiting until they
> become old enough to threaten wraparound is a clear improvement, and
> IMHO we should definitely do it.  However, I think your proposed
> documentation changes are pretty well inadequate.  If somebody
> actually does want to get at the temporary tables of a crashed
> backend, they will need to do a lot more than what you mention in that
> update.  Even if they set autovacuum=off, they will be vulnerable to
> having those tables removed by another backend with the same backend
> ID that reinitializes the temporary schema.  And even if that doesn't
> happen, they won't be able to select from those tables because they'll
> get an error about reading temporary tables of another session.  To
> fix that they'd have to move the temporary tables into some other
> schema AND change the filenames on disk.  And then on top of that, by
> the time anybody even found this in the documentation, it would be far
> too late to shut off autovacuum in the first place because it runs
> every minute (unless you have raised autovacuum_naptime, which you
> shouldn't).  I think we just shouldn't document this. The proposed
> behavior change is basically switching from an extremely conservative
> behavior with surprising consequences to what people would expect
> anyway, and anyone who needs to dig information out of another
> session's temporary tables is going to need to know a lot more about
> the server internals than we normally explain in the documentation.

Okay, let's remove the documentation then. What do you think about the
updated version attached?
(Even this patch enters into "Needs Review" state).
--
Michael

Attachment

autovacuum-orphan-cleanup-v2.patch

Re: Fun fact about autovacuum and orphan temp tables

From

Robert Haas

Date:

17 November 2016, 19:16:36

On Thu, Nov 17, 2016 at 1:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Okay, let's remove the documentation then. What do you think about the
> updated version attached?
> (Even this patch enters into "Needs Review" state).

LGTM.  I'll commit it if there are not objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

17 November 2016, 21:25:12

On Thu, Nov 17, 2016 at 11:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 17, 2016 at 1:41 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Okay, let's remove the documentation then. What do you think about the
>> updated version attached?
>> (Even this patch enters into "Needs Review" state).
>
> LGTM.  I'll commit it if there are not objections.

Thank you.
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

Dilip Kumar

Date:

18 November 2016, 14:30:50

On Thu, Nov 17, 2016 at 6:54 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> you assigned as reviewer to the current patch in the 11-2016 commitfest.
> But you haven't shared your review yet. Can you please try to share your
> views
> about the patch. This will help us in smoother operation of commitfest.
>
> Michael had sent an updated patch based on some discussion.
>
> Please Ignore if you already shared your review.


Hi Hari,

So far I haven't spent time on this, I am planning to spend in next week.
But it seems that lot of review has happened and patch is already in
good shapes. So I think other reviewers can take decision.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Fun fact about autovacuum and orphan temp tables

From

Robert Haas

Date:

18 November 2016, 21:12:10

On Thu, Nov 17, 2016 at 4:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Nov 17, 2016 at 11:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Nov 17, 2016 at 1:41 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> Okay, let's remove the documentation then. What do you think about the
>>> updated version attached?
>>> (Even this patch enters into "Needs Review" state).
>>
>> LGTM.  I'll commit it if there are not objections.
>
> Thank you.

On further reflection, I'm not sure this fixes the original complaint.
In fact, it might even make it worse.  The performDeletion() call here
is going to run inside of a loop that's scanning through pg_class, so
all in a single transaction.  So we'll actually try to drop ALL
orphaned tables for ALL backends that failed to clean up in a single
transaction, as opposed to the current state of affairs where we will
drop all orphaned tables for ONE backend in a single transaction.  If
anything, that could require MORE locks.  And if it starts failing,
then all autovacuum activity in that database will cease.  Oops.

So now I think that we probably need to make this logic a bit smarter.
Add all of the OIDs that need to be dropped to a list.  Then have a
loop prior to the main loop (where it says "Perform operations on
collected tables.") which iterates over that list and drops those
tables one by one, starting a transaction every (say) 100 tables or
after an error.  For bonus points, if a transaction fails, put all of
the OIDs except the one that provoked the failure back into the list
of OIDs to be dropped, so that we still make a progress even if some
DROPs are failing for some reason.

That might sound adding unnecessary work just for the sake of
paranoia, but I don't think it is.  Failures here won't be common, but
since they are entirely automated there will be no human intelligence
available to straighten things out.  Barring considerable caution,
we'll just enter a flaming death spiral.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

19 November 2016, 05:16:07

On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> So now I think that we probably need to make this logic a bit smarter.
> Add all of the OIDs that need to be dropped to a list.  Then have a
> loop prior to the main loop (where it says "Perform operations on
> collected tables.") which iterates over that list and drops those
> tables one by one, starting a transaction every (say) 100 tables or
> after an error.  For bonus points, if a transaction fails, put all of
> the OIDs except the one that provoked the failure back into the list
> of OIDs to be dropped, so that we still make a progress even if some
> DROPs are failing for some reason.

Okay.

> That might sound adding unnecessary work just for the sake of
> paranoia, but I don't think it is.  Failures here won't be common, but
> since they are entirely automated there will be no human intelligence
> available to straighten things out.  Barring considerable caution,
> we'll just enter a flaming death spiral.

Thinking more paranoid, an extra way to enter in this flaming death
spiral is to not limit the maximum number of failures authorized when
dropping a set of orphaned tables and transactions fail multiple
times. This is basically not important as the relation on which the
drop failed gets dropped from the list but failing on each one of them
is a good way to slow down autovacuum, so putting a limit of say 10
transactions failing is I think really important.

I have played with what you suggested, and finished with the patch
attached. I have run some tests using this function to create some
temp tables with several backends to be sure that multiple backend IDs
are used:

CREATE FUNCTION create_temp_tables(i int) RETURNS void
AS $$
BEGIN
FOR i IN 1..i LOOP
  EXECUTE 'CREATE TEMP TABLE aa' || i || ' (a int);';
END LOOP;
END
$$ LANGUAGE plpgsql;

Then I killed the instance. At restart I could see a bunch of temp
tables in pg_class, and I let autovacuum do the cleanup after restart.
I have tested as well the error code path in the PG_TRY() block by
enforcing manually a elog(ERROR) to be sure that the maximum number of
failures is correctly handled, better safe than sorry.
--
Michael

Attachment

autovacuum-orphan-cleanup-v3.patch

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

21 November 2016, 03:42:18

On Sat, Nov 19, 2016 at 2:16 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> That might sound adding unnecessary work just for the sake of
>> paranoia, but I don't think it is.  Failures here won't be common, but
>> since they are entirely automated there will be no human intelligence
>> available to straighten things out.  Barring considerable caution,
>> we'll just enter a flaming death spiral.
>
> Thinking more paranoid, an extra way to enter in this flaming death
> spiral is to not limit the maximum number of failures authorized when
> dropping a set of orphaned tables and transactions fail multiple
> times. This is basically not important as the relation on which the
> drop failed gets dropped from the list but failing on each one of them
> is a good way to slow down autovacuum, so putting a limit of say 10
> transactions failing is I think really important.

By the way, when hacking this patch I asked myself three questions:
1) How many objects should be dropped per transaction? 50 sounds like
a fine number to me so the patch I sent is doing so. This should
definitely not be more than the default for max_locks_per_transaction,
aka 64. Another thing to consider would be to use a number depending
on max_locks_per_transaction, like half of it.
2) How many transaction failures should autovacuum forgive? The patch
uses 10. Honestly I put that number out of thin air.
3) Should the drop of orphaned tables be done for a wraparound
autovacuum? Obviously, the answer is no. It is vital not to consume
transaction XIDs in this case. The patch I sent is dropping the
objects even for a wraparound job, that's easily switchable.
-- 
Michael

Re: Fun fact about autovacuum and orphan temp tables

From

Robert Haas

Date:

21 November 2016, 18:06:14

On Sun, Nov 20, 2016 at 10:42 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Nov 19, 2016 at 2:16 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> That might sound adding unnecessary work just for the sake of
>>> paranoia, but I don't think it is.  Failures here won't be common, but
>>> since they are entirely automated there will be no human intelligence
>>> available to straighten things out.  Barring considerable caution,
>>> we'll just enter a flaming death spiral.
>>
>> Thinking more paranoid, an extra way to enter in this flaming death
>> spiral is to not limit the maximum number of failures authorized when
>> dropping a set of orphaned tables and transactions fail multiple
>> times. This is basically not important as the relation on which the
>> drop failed gets dropped from the list but failing on each one of them
>> is a good way to slow down autovacuum, so putting a limit of say 10
>> transactions failing is I think really important.
>
> By the way, when hacking this patch I asked myself three questions:
> 1) How many objects should be dropped per transaction? 50 sounds like
> a fine number to me so the patch I sent is doing so. This should
> definitely not be more than the default for max_locks_per_transaction,
> aka 64. Another thing to consider would be to use a number depending
> on max_locks_per_transaction, like half of it.
> 2) How many transaction failures should autovacuum forgive? The patch
> uses 10. Honestly I put that number out of thin air.
> 3) Should the drop of orphaned tables be done for a wraparound
> autovacuum? Obviously, the answer is no. It is vital not to consume
> transaction XIDs in this case. The patch I sent is dropping the
> objects even for a wraparound job, that's easily switchable.

I don't think that you should skip it in the wraparound case, because
it might be one of the temp tables that is threatening wraparound.

I've committed the patch after doing some work on the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fun fact about autovacuum and orphan temp tables

From

Michael Paquier

Date:

21 November 2016, 21:15:40

On Tue, Nov 22, 2016 at 3:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't think that you should skip it in the wraparound case, because
> it might be one of the temp tables that is threatening wraparound.
>
> I've committed the patch after doing some work on the comments.

OK, thanks.
-- 
Michael