On 3/19/21 3:17 PM, Euler Taveira wrote:
> On Fri, Mar 19, 2021, at 12:23 AM, Tom Lane wrote:
>> [ reads code... ]
>> ... no, I think the problem is the test is still full of race conditions.
>>
>> In the first place, waiting till you see the output of a SELECT that's
>> before the useful query is not enough to guarantee that the useful query
>> is done, or even started. That's broken on both sessions.
> That's an ugly and fragile mechanism to workaround the fact that pump_until
> reacts after you have the query return.
>
>> In the second place, even if the second INSERT has started, you don't know
>> that it's reached the point of blocking on the tuple conflict yet.
>> Which in turn means that it might not've filled its tuplestore yet.
>>
>> In short, this script is designed to ensure that the test query can't
>> finish too soon, but that proves nothing about whether the test query
>> has even started. And since you also haven't really guaranteed that the
>> intended-to-be-blocking query is done, I don't think that the first
>> condition really holds either.
> In order to avoid the race condition between filling the tuplestore and
> killing
> the backend, we could use a pool_query_until() before SIGKILL to wait the
> temporary file being created. Do you think this modification will make this
> test more stable?
>
Wow, I thought I understand the perl code, but clearly I was wrong.
I think the solution is not particularly difficult.
For the initial insert, it's fine to switch the order of the SQL
commands, so that the insert happens first, and only then emit the string.
For the second insert, we can't do that, because it's expected to block
on the lock. So we have to verify that by looking at the lock directly.
I however don't understand what the pump_until function does. AFAICS it
should run the query in a loop, and bail out once it matches the
expected output. But that does not seem to happen, so when the query
gets executed before the lock is there, it results in infinite loop.
In the attached patch I've simulated this by random() < 0.5.
If I replace this with a wait loop in a plpgsql block, that works
perfectly fine (no infinite loops). Tested both on x86_64 and rpi.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company