BUG #16685: The ecpg thread/descriptor test fails sometimes on Windows - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16685: The ecpg thread/descriptor test fails sometimes on Windows
Date
Msg-id 16685-d6cd241872c101d3@postgresql.org
Whole thread Raw
Responses Re: BUG #16685: The ecpg thread/descriptor test fails sometimes on Windows
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16685
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 13.0
Operating system:   Ubuntu 20.04
Description:

When running `vcregress ecpgcheck`, sometimes I get:
test thread/descriptor            ... stderr FAILED       99 ms

regression.diffs contains:
--- .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr
2019-12-04 16:05:46 +0300
+++ .../src/interfaces/ecpg/test/results/thread-descriptor.stderr
2020-10-20 10:00:34 +0300
@@ -0,0 +1 @@
+SQL error: descriptor "mydesc" not found on line 31

See also:
https://www.postgresql.org/message-id/flat/230799.1603045446%40sss.pgh.pa.us

In descriptor.pgc we have:
30:          EXEC SQL ALLOCATE DESCRIPTOR mydesc;
31:          EXEC SQL DEALLOCATE DESCRIPTOR mydesc;
So the mydesc descriptor disappeared somehow just after allocation.

`EXEC SQL DEALLOCATE DESCRIPTOR` and `EXEC SQL DEALLOCATE DESCRIPTOR` are
implemented in ECPGallocate_desc and ECPGdeallocate_desc in
ecpglib\descriptor.c, correspondingly, so I looked into the code.

I found that the get_descriptors() function called in ECPGdeallocate_desc
sometimes can return null.
static struct descriptor *
get_descriptors(void)
{
    pthread_once(&descriptor_once, descriptor_key_init);
    return (struct descriptor *) pthread_getspecific(descriptor_key);
}
pthread_getspecific(key) implemented on Widnows as TlsGetValue(key);

To make the bug reproduction easier, I replaced ecpg_schedule contents with
100 "test: thread/descriptor" lines and ran `vcregress ecpgcheck` in a loop
with 100 iterations. And with such setup it takes just several minutes to
get a failure.

The following debugging code inserted into the ECPGallocate_desc:
+++ b/src/interfaces/ecpg/ecpglib/descriptor.c
@@ -829,6 +829,17 @@ ECPGallocate_desc(int line, const char *name)
        }
        strcpy(new->name, name);
        set_descriptors(new);
+
+       long initialdk = descriptor_key;
+       for (int n = 0; n < 1000; n++) {
+               void *new1 = TlsGetValue(descriptor_key);
+               if (!new1) {
+                       DWORD lasterr = GetLastError();
+                       fprintf(stdout, "TlsGetValue() returned null on
iteration %d, error: %d, descriptor_key: %d, initial descriptor_key:
%d.\n",
+                                       n, lasterr, descriptor_key,
initialdk);
+                       exit(2);
+               }
+       }
        return true;
 }
shows on a failure:
TlsGetValue() returned null on iteration 209, error: 0, descriptor_key: 28,
initial descriptor_key: 0.
or
TlsGetValue() returned null on iteration: 369, error: 0, descriptor_key: 28,
initial descriptor_key: 0

So the descriptor_key changed after set_descriptors(new), and following
get_descriptors() would return null as seen on a test failure.


pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: ADD TO UPDATE LIMIT
Next
From: Alexander Lakhin
Date:
Subject: Re: BUG #16678: The ecpg connect/test5 test sometimes fails on Windows