Home > mailing lists

Thread: cleanup temporary files after crash

cleanup temporary files after crash

From

Euler Taveira

Date:

28 October 2020, 14:16:26

Hi,

Bug #16427 mentioned that temporary files are not removed after a crash. I heard similar complaints from some customers. In the bug report [1], I proposed a new GUC to control whether temporary files are removed after a crash recovery. The current behavior is only useful for debugging purposes. It also has an undesirable behavior: you have to restart the service to reclaim the space. Interrupting the service continuity isn't always an option and due to limited resources, you have no choice but to restart the service.

The new GUC cleanup_temp_files_after_crash is marked as SIGHUP. Hence, you can enable it to debug without service interruption. The default value is on which means it changes the current behavior. Documentation and tests are included.

[1] https://www.postgresql.org/message-id/CAH503wB28N1382YReXWjqpqZE6iqaxERoZUqnf02eNOYs0cZOA%40mail.gmail.com

Regards,

--

Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-Control-temporary-files-removal-after-crash.patch

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

28 October 2020, 18:42:52

Hi Euler,

On Wed, Oct 28, 2020 at 11:16:26AM -0300, Euler Taveira wrote:
>Hi,
>
>Bug #16427 mentioned that temporary files are not removed after a crash. I
>heard similar complaints from some customers. In the bug report [1], I
>proposed a new GUC to control whether temporary files are removed after a
>crash recovery. The current behavior is only useful for debugging purposes.
>It also has an undesirable behavior: you have to restart the service to
>reclaim the space. Interrupting the service continuity isn't always an
>option and due to limited resources, you have no choice but to restart the
>service.
>
>The new GUC cleanup_temp_files_after_crash is marked as SIGHUP. Hence, you
>can enable it to debug without service interruption. The default value is
>on which means it changes the current behavior. Documentation and tests are
>included.
>

I did a quick review and the patch seems fine to me. Let's wait for a
bit and see if there are any objections - if not, I'll get it committed
in the next CF.

One thing I'm not sure about is whether we should have the GUC as
proposed, or have a negative "keep_temp_files_after_restart" defaulting
to false. But I don't have a very good justification for the alternative
other than vague personal preference.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: cleanup temporary files after crash

From

Euler Taveira

Date:

01 November 2020, 00:01:15

On Wed, 28 Oct 2020 at 15:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I did a quick review and the patch seems fine to me. Let's wait for a
bit and see if there are any objections - if not, I'll get it committed
in the next CF.

Tomas, thanks for your review.

One thing I'm not sure about is whether we should have the GUC as
proposed, or have a negative "keep_temp_files_after_restart" defaulting
to false. But I don't have a very good justification for the alternative
other than vague personal preference.

I thought about not providing a GUC at all or provide it in the developer
section. I've never heard someone saying that they use those temporary files to
investigate an issue. Regarding a crash, all information is already available
and temporary files don't provide extra details. This new GUC is just to keep the
previous behavior. I'm fine without the GUC, though.

--

Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: cleanup temporary files after crash

From

Michael Paquier

Date:

01 November 2020, 01:25:55

On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> I thought about not providing a GUC at all or provide it in the developer
> section. I've never heard someone saying that they use those temporary
> files to investigate an issue. Regarding a crash, all information is already
> available and temporary files don't provide extra details. This new
> GUC is just to keep the previous behavior. I'm fine without the GUC, though.

The original behavior is as old as 4a5f38c4, and last time we talked
about that there were arguments about still keeping the existing
behavior to not cleanup files during a restart-after-crash scenario
for the sake of being useful just "in case".  I have never used that
property myself, TBH, and I have seen much more cases of users caring
about the data folder not facing an ENOSPC particularly if they don't
use different partitions for pg_wal/ and the main data folder.
--
Michael

Attachment

signature.asc

Re: cleanup temporary files after crash

From

Anastasia Lubennikova

Date:

26 November 2020, 08:48:46

On 01.11.2020 04:25, Michael Paquier wrote:
> On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
>> I thought about not providing a GUC at all or provide it in the developer
>> section. I've never heard someone saying that they use those temporary
>> files to investigate an issue. Regarding a crash, all information is already
>> available and temporary files don't provide extra details. This new
>> GUC is just to keep the previous behavior. I'm fine without the GUC, though.
> The original behavior is as old as 4a5f38c4, and last time we talked
> about that there were arguments about still keeping the existing
> behavior to not cleanup files during a restart-after-crash scenario
> for the sake of being useful just "in case".  I have never used that
> property myself, TBH, and I have seen much more cases of users caring
> about the data folder not facing an ENOSPC particularly if they don't
> use different partitions for pg_wal/ and the main data folder.
> --
> Michael

Thank you, Euler for submitting this.
+1 for the feature. One of the platforms we support uses temp files a 
lot and we faced the problem, while never actually used these orphan 
files for debugging purposes.

I also think that the GUC is not needed here. This 'feature' was 
internal from the very beginning, so users shouldn't care about 
preserving old behavior. Without the GUC the patch is very simple, 
please see attached version. I also omit the test, because I am not sure 
it will be stable given that the RemovePgTempFiles() allows the 
possibility of failure.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001_remove_temporary_files_after_crash.patch

Re: cleanup temporary files after crash

From

Euler Taveira

Date:

26 November 2020, 13:14:37

On Thu, 26 Nov 2020 at 05:48, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:

I also think that the GUC is not needed here. This 'feature' was
internal from the very beginning, so users shouldn't care about
preserving old behavior. Without the GUC the patch is very simple,
please see attached version. I also omit the test, because I am not sure
it will be stable given that the RemovePgTempFiles() allows the
possibility of failure.

Anastasia, thanks for reviewing it. As I said, I'm fine without the GUC.
However, if we decided to go with the GUC, default behavior should be remove
the temporary files after the crash.

Regards,

--

Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

18 March 2021, 16:40:22


On 3/17/21 2:34 AM, Euler Taveira wrote:
> On Sun, Mar 14, 2021, at 11:01 PM, Thomas Munro wrote:
>> On Wed, Mar 10, 2021 at 1:31 AM Michael Paquier <michael@paquier.xyz
>> <mailto:michael@paquier.xyz>> wrote:
>> > On Tue, Mar 09, 2021 at 02:28:43AM +0100, Tomas Vondra wrote:
>> > > Let's move this patch forward. Based on the responses, I agree the
>> > > default behavior should be to remove the temp files, and I think we
>> > > should have the GUC (on the off chance that someone wants to preserve
>> > > the temporary files for debugging or whatever other reason).
>> >
>> > Thanks for taking care of this.  I am having some second-thoughts
>> > about changing this behavior by default, still that's much more useful
>> > this way.
>>
>> +1 for having it on by default.
>>
>> I was also just looking at this patch and came here to say LGTM except
>> for two cosmetic things, below.
> Thanks for taking a look at this patch. I'm not sure Tomas is preparing
> a new
> patch that includes the suggested modifications but I decided to do it. This
> new version has the new GUC name (using "remove"). I also replaced "cleanup"
> with "remove" in the all the remain places. As pointed by Thomas, I reworded
> the paragraph that describes the GUC moving the default information to the
> beginning of the sentence. I also added the "literal" as suggested by
> Michael.
> The postgresql.conf.sample was fixed. The tests was slightly modified. I
> reworded some comments and added a hack to avoid breaking the temporary file
> test in slow machines. A usleep() after sending the query provides some time
> for the query to create the temporary file. I used an arbitrarily sleep
> (10ms)
> that seems to be sufficient.
> 

Thanks. Pushed with some minor changes to docs wording.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

18 March 2021, 17:51:17

Hmm,

crake and florican seem to have failed because of this commit, with this
error in the new TAP test:

error running SQL: 'psql:<stdin>:1: ERROR:  could not open directory
"base/pgsql_tmp": No such file or directory'
while running 'psql -XAtq -d port=64336 host=/tmp/sv1WjSvj3P
dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT COUNT(1)
FROM pg_ls_dir($$base/pgsql_tmp$$)' at
/home/andrew/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
line 1572.

So it seems the pgsql_tmp directory does not exist, for some reason.
Considering the directory should be created for the first temp file,
that either means the query in the TAP test does not actually create a
temp file on those machines, or it gets killed too early.

The 500 rows used by the test seems fairly low, so maybe those machines
can do the sort entirely in memory?

The other option is that the sleep in the TAP test is a bit too short,
but those machines don't seem to be that slow.

Anyway, TAP test relying on timing like this may not be the best idea,
so I wonder how else to test this ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

18 March 2021, 19:20:08

On 3/18/21 6:51 PM, Tomas Vondra wrote:
> Hmm,
> 
> crake and florican seem to have failed because of this commit, with this
> error in the new TAP test:
> 
> error running SQL: 'psql:<stdin>:1: ERROR:  could not open directory
> "base/pgsql_tmp": No such file or directory'
> while running 'psql -XAtq -d port=64336 host=/tmp/sv1WjSvj3P
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT COUNT(1)
> FROM pg_ls_dir($$base/pgsql_tmp$$)' at
> /home/andrew/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
> line 1572.
> 
> So it seems the pgsql_tmp directory does not exist, for some reason.
> Considering the directory should be created for the first temp file,
> that either means the query in the TAP test does not actually create a
> temp file on those machines, or it gets killed too early.
> 
> The 500 rows used by the test seems fairly low, so maybe those machines
> can do the sort entirely in memory?
> 
> The other option is that the sleep in the TAP test is a bit too short,
> but those machines don't seem to be that slow.
> 
> Anyway, TAP test relying on timing like this may not be the best idea,
> so I wonder how else to test this ...
> 

I think a better way to test this would be to use a tuple lock:

setup:

  create table t (a int unique);

session 1:

  begin;
  insert into t values (1);
  ... keep open ...

session 2:

   begin;
   set work_mem = '64kB';
   insert into t select i from generate_series(1,10000) s(i);
   ... should block ...

Then, once the second session gets waiting on the tuple, kill the
backend. We might as well test that there actually is a temp file first,
and then test that it disappeared.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

18 March 2021, 20:06:23

On Thu, Mar 18, 2021, at 4:20 PM, Tomas Vondra wrote:

I think a better way to test this would be to use a tuple lock:

I predicated such issues with this test. Your suggestion works for me. Maybe

you should use less rows in the session 2 query.

setup:

create table t (a int unique);

session 1:

begin;
insert into t values (1);
... keep open ...

session 2:

   begin;
   set work_mem = '64kB';
   insert into t select i from generate_series(1,10000) s(i);
   ... should block ...

Then, once the second session gets waiting on the tuple, kill the
backend. We might as well test that there actually is a temp file first,
and then test that it disappeared.

Your suggestion works for me. Maybe you could use less rows in the session 2

query. I experimented with 1k rows and it generates a temporary file.

--

Euler Taveira

EDB https://www.enterprisedb.com/

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

18 March 2021, 20:51:11

On 3/18/21 9:06 PM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 4:20 PM, Tomas Vondra wrote:
>> I think a better way to test this would be to use a tuple lock:
> I predicated such issues with this test. Your suggestion works for me. Maybe
> you should use less rows in the session 2 query.
> 
>> setup:
>>
>>   create table t (a int unique);
>>
>> session 1:
>>
>>   begin;
>>   insert into t values (1);
>>   ... keep open ...
>>
>> session 2:
>>
>>    begin;
>>    set work_mem = '64kB';
>>    insert into t select i from generate_series(1,10000) s(i);
>>    ... should block ...
>>
>> Then, once the second session gets waiting on the tuple, kill the
>> backend. We might as well test that there actually is a temp file first,
>> and then test that it disappeared.
> Your suggestion works for me. Maybe you could use less rows in the session 2
> query. I experimented with 1k rows and it generates a temporary file.
> 

OK. Can you prepare a patch with the proposed test approach?

FWIW I can reproduce this on a 32-bit ARM system (rpi4), where 500 rows
simply does not use a temp file, and with 1000 rows it works fine. On
the x86_64 the temp file is created even with 500 rows. So there clearly
is some platform dependency, not sure if it's due to 32/64 bits,
alignment or something else. In any case, the 500 rows seems to be just
on the threshold.

We need to do both - stop using the timing and increase the number of
rows, to consistently get temp files.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

18 March 2021, 21:00:14

On Thu, Mar 18, 2021, at 5:51 PM, Tomas Vondra wrote:

OK. Can you prepare a patch with the proposed test approach?

I'm on it.

FWIW I can reproduce this on a 32-bit ARM system (rpi4), where 500 rows
simply does not use a temp file, and with 1000 rows it works fine. On
the x86_64 the temp file is created even with 500 rows. So there clearly
is some platform dependency, not sure if it's due to 32/64 bits,
alignment or something else. In any case, the 500 rows seems to be just
on the threshold.

We need to do both - stop using the timing and increase the number of
rows, to consistently get temp files.

Yeah.

--

Euler Taveira

EDB https://www.enterprisedb.com/

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

18 March 2021, 21:44:53

On Thu, Mar 18, 2021, at 6:00 PM, Euler Taveira wrote:

On Thu, Mar 18, 2021, at 5:51 PM, Tomas Vondra wrote:
OK. Can you prepare a patch with the proposed test approach?
I'm on it.

What do you think about this patch?

--

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

fix-crash-temp-files.patch

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

18 March 2021, 23:34:09

On 3/18/21 10:44 PM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 6:00 PM, Euler Taveira wrote:
>> On Thu, Mar 18, 2021, at 5:51 PM, Tomas Vondra wrote:
>>> OK. Can you prepare a patch with the proposed test approach?
>> I'm on it.
> What do you think about this patch?
> 

Well, that's better, bit it still does not do the trick on the 32-bit
machine - in that case a 1000 rows with int4 still fit into work_mem, so
the temp file is not created. Per my experiments about 1040 rows are
needed - soooo close ;-) So let's make it 2000.

We might as well check that the temp file actually exists, before
killing the backend. Just to be sure.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

19 March 2021, 00:54:01

On Thu, Mar 18, 2021, at 8:34 PM, Tomas Vondra wrote:

Well, that's better, bit it still does not do the trick on the 32-bit
machine - in that case a 1000 rows with int4 still fit into work_mem, so
the temp file is not created. Per my experiments about 1040 rows are
needed - soooo close ;-) So let's make it 2000.

My 32-bit laptop needs some repairs so I blindly chose 1k rows.

We might as well check that the temp file actually exists, before
killing the backend. Just to be sure.

Do you mean with remove_temp_files_after_crash = on? New version attached.

--

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v2-fix-crash-temp-files.patch

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 01:21:29

On 3/19/21 1:54 AM, Euler Taveira wrote:
> On Thu, Mar 18, 2021, at 8:34 PM, Tomas Vondra wrote:
>> Well, that's better, bit it still does not do the trick on the 32-bit
>> machine - in that case a 1000 rows with int4 still fit into work_mem, so
>> the temp file is not created. Per my experiments about 1040 rows are
>> needed - soooo close ;-) So let's make it 2000.
> My 32-bit laptop needs some repairs so I blindly chose 1k rows.
> 

Yeah, it's not immediately obvious how many rows are needed. 2000 would
be enough, but I just bumped it up to 5000 for good measure.

>> We might as well check that the temp file actually exists, before
>> killing the backend. Just to be sure.
> Do you mean with remove_temp_files_after_crash = on? New version attached.
> 

On second thought, we don't need that check. I realized the second part
of the test (checking remove_temp_files_after_crash=off) actually does
verify that we've created the file, so that should be OK.

I've pushed the previous version with the 5000 rows. Let's see what
crake and florican think about this ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 02:40:55

On 3/19/21 2:21 AM, Tomas Vondra wrote:
> On 3/19/21 1:54 AM, Euler Taveira wrote:
>> On Thu, Mar 18, 2021, at 8:34 PM, Tomas Vondra wrote:
>>> Well, that's better, bit it still does not do the trick on the 32-bit
>>> machine - in that case a 1000 rows with int4 still fit into work_mem, so
>>> the temp file is not created. Per my experiments about 1040 rows are
>>> needed - soooo close ;-) So let's make it 2000.
>> My 32-bit laptop needs some repairs so I blindly chose 1k rows.
>>
> 
> Yeah, it's not immediately obvious how many rows are needed. 2000 would
> be enough, but I just bumped it up to 5000 for good measure.
> 
>>> We might as well check that the temp file actually exists, before
>>> killing the backend. Just to be sure.
>> Do you mean with remove_temp_files_after_crash = on? New version attached.
>>
> 
> On second thought, we don't need that check. I realized the second part
> of the test (checking remove_temp_files_after_crash=off) actually does
> verify that we've created the file, so that should be OK.
> 
> I've pushed the previous version with the 5000 rows. Let's see what
> crake and florican think about this ...
> 

So crake and florican seem to be happy now. Not sure about lapwing yet.

But interestingly enough, prion and curculio got unhappy. They worked
fine with the older test, but now it fails with the "no such file or
directory" message. I wonder what makes them different from the other
x86_64 machines ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

Tom Lane

Date:

19 March 2021, 02:57:35

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> So crake and florican seem to be happy now. Not sure about lapwing yet.
> But interestingly enough, prion and curculio got unhappy. They worked
> fine with the older test, but now it fails with the "no such file or
> directory" message. I wonder what makes them different from the other
> x86_64 machines ...

I'm confused about why the new query would use a temp file at all.
Maybe they aren't using the same plan?

            regards, tom lane

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 03:07:05

On 3/19/21 3:57 AM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> So crake and florican seem to be happy now. Not sure about lapwing yet.
>> But interestingly enough, prion and curculio got unhappy. They worked
>> fine with the older test, but now it fails with the "no such file or
>> directory" message. I wonder what makes them different from the other
>> x86_64 machines ...
> 
> I'm confused about why the new query would use a temp file at all.
> Maybe they aren't using the same plan?
> 

Because generate_series() stashes the results into a tuplestore, and the
test sets work_mem to just 64kB. Maybe I'm missing some detail, though.

The plan is extremely simple - just Function Scan feeding data into an
Insert, not sure what other plan could be used.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

Tom Lane

Date:

19 March 2021, 03:23:54

[ reads code... ]
... no, I think the problem is the test is still full of race conditions.

In the first place, waiting till you see the output of a SELECT that's
before the useful query is not enough to guarantee that the useful query
is done, or even started.  That's broken on both sessions.

In the second place, even if the second INSERT has started, you don't know
that it's reached the point of blocking on the tuple conflict yet.
Which in turn means that it might not've filled its tuplestore yet.

In short, this script is designed to ensure that the test query can't
finish too soon, but that proves nothing about whether the test query
has even started.  And since you also haven't really guaranteed that the
intended-to-be-blocking query is done, I don't think that the first
condition really holds either.

            regards, tom lane

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

19 March 2021, 14:17:50

On Fri, Mar 19, 2021, at 12:23 AM, Tom Lane wrote:

[ reads code... ]
... no, I think the problem is the test is still full of race conditions.

In the first place, waiting till you see the output of a SELECT that's
before the useful query is not enough to guarantee that the useful query
is done, or even started. That's broken on both sessions.

That's an ugly and fragile mechanism to workaround the fact that pump_until

reacts after you have the query return.

In the second place, even if the second INSERT has started, you don't know
that it's reached the point of blocking on the tuple conflict yet.
Which in turn means that it might not've filled its tuplestore yet.

In short, this script is designed to ensure that the test query can't
finish too soon, but that proves nothing about whether the test query
has even started. And since you also haven't really guaranteed that the
intended-to-be-blocking query is done, I don't think that the first
condition really holds either.

In order to avoid the race condition between filling the tuplestore and killing

the backend, we could use a pool_query_until() before SIGKILL to wait the

temporary file being created. Do you think this modification will make this

test more stable?

--

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v3-fix-crash-temp-files.patch

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 16:23:13

On 3/19/21 3:17 PM, Euler Taveira wrote:
> On Fri, Mar 19, 2021, at 12:23 AM, Tom Lane wrote:
>> [ reads code... ]
>> ... no, I think the problem is the test is still full of race conditions.
>>
>> In the first place, waiting till you see the output of a SELECT that's
>> before the useful query is not enough to guarantee that the useful query
>> is done, or even started.  That's broken on both sessions.
> That's an ugly and fragile mechanism to workaround the fact that pump_until
> reacts after you have the query return.
> 
>> In the second place, even if the second INSERT has started, you don't know
>> that it's reached the point of blocking on the tuple conflict yet.
>> Which in turn means that it might not've filled its tuplestore yet.
>>
>> In short, this script is designed to ensure that the test query can't
>> finish too soon, but that proves nothing about whether the test query
>> has even started.  And since you also haven't really guaranteed that the
>> intended-to-be-blocking query is done, I don't think that the first
>> condition really holds either.
> In order to avoid the race condition between filling the tuplestore and
> killing
> the backend, we could use a pool_query_until() before SIGKILL to wait the
> temporary file being created. Do you think this modification will make this
> test more stable?
> 

Wow, I thought I understand the perl code, but clearly I was wrong.

I think the solution is not particularly difficult.

For the initial insert, it's fine to switch the order of the SQL
commands, so that the insert happens first, and only then emit the string.

For the second insert, we can't do that, because it's expected to block
on the lock. So we have to verify that by looking at the lock directly.

I however don't understand what the pump_until function does. AFAICS it
should run the query in a loop, and bail out once it matches the
expected output. But that does not seem to happen, so when the query
gets executed before the lock is there, it results in infinite loop.

In the attached patch I've simulated this by random() < 0.5.

If I replace this with a wait loop in a plpgsql block, that works
perfectly fine (no infinite loops). Tested both on x86_64 and rpi.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

cleanup-tap.patch

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 16:25:54

On 3/19/21 5:23 PM, Tomas Vondra wrote:
> ...
>
> If I replace this with a wait loop in a plpgsql block, that works
> perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
>

For the record, here's the version with plpgsql block, which seems to be
working just fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 16:26:05

On 3/19/21 5:23 PM, Tomas Vondra wrote:
> ...
>
> If I replace this with a wait loop in a plpgsql block, that works
> perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
>

For the record, here's the version with plpgsql block, which seems to be
working just fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

cleanup-tap-plpgsql-block.patch

Re: cleanup temporary files after crash

From

Tomas Vondra

Date:

19 March 2021, 17:27:21

On 3/19/21 5:26 PM, Tomas Vondra wrote:
> On 3/19/21 5:23 PM, Tomas Vondra wrote:
>> ...
>>
>> If I replace this with a wait loop in a plpgsql block, that works
>> perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
>>
> 
> For the record, here's the version with plpgsql block, which seems to be
> working just fine.
> 

I've pushed this version, with the plpgsql block. I've tested it on all
the machines I have here, hopefully it'll make buildfarm happy too.

AFAICS I was mistaken about what the pump() functions do - it clearly
does not run the command repeatedly, it just waits for the right output
to appear. So a busy loop in plpgsql seems like a reasonable solution.
Perhaps there's a better way to do this in TAP, not sure.

My brain hurts from reading too much Perl today ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

19 March 2021, 17:51:39

On Fri, Mar 19, 2021, at 2:27 PM, Tomas Vondra wrote:

I've pushed this version, with the plpgsql block. I've tested it on all
the machines I have here, hopefully it'll make buildfarm happy too.

Thanks for taking care of this issue.

AFAICS I was mistaken about what the pump() functions do - it clearly
does not run the command repeatedly, it just waits for the right output
to appear. So a busy loop in plpgsql seems like a reasonable solution.
Perhaps there's a better way to do this in TAP, not sure.

I took 3 times longer to do that fragile test than the code itself. :-/

My brain hurts from reading too much Perl today ...

I feel the same when I have to deal with Perl code.

It seems the animals are happy with this fix.

--

Euler Taveira

EDB https://www.enterprisedb.com/

Re: cleanup temporary files after crash

From

"Euler Taveira"

Date:

22 June 2021, 19:12:51

On Fri, Jun 11, 2021, at 9:43 PM, Justin Pryzby wrote:

On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > > The current behavior is only useful for debugging purposes.

On Wed, 28 Oct 2020 at 15:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > One thing I'm not sure about is whether we should have the GUC as
> > proposed, or have a negative "keep_temp_files_after_restart" defaulting
> > to false. But I don't have a very good justification for the alternative
> > other than vague personal preference.

On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> I thought about not providing a GUC at all or provide it in the developer
> section. I've never heard someone saying that they use those temporary
> files to investigate an issue. Regarding a crash, all information is already
> available and temporary files don't provide extra details. This new GUC is just to keep the
> previous behavior. I'm fine without the GUC, though.

Should this GUC be classified as a developer option, and removed from
postgresql.sample.conf ?

It probably should.

That was discussed initially in October but not since.

Was it? I seem to have missed this suggestion.

I'm attaching a patch to fix it.

--

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

v1-0001-Move-new-GUC-remove_temp_files_after_crash-to-ano.patch