The following bug has been logged on the website:
Bug reference: 16685
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 13.0
Operating system: Ubuntu 20.04
Description:
When running `vcregress ecpgcheck`, sometimes I get:
test thread/descriptor ... stderr FAILED 99 ms
regression.diffs contains:
--- .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr
2019-12-04 16:05:46 +0300
+++ .../src/interfaces/ecpg/test/results/thread-descriptor.stderr
2020-10-20 10:00:34 +0300
@@ -0,0 +1 @@
+SQL error: descriptor "mydesc" not found on line 31
See also:
https://www.postgresql.org/message-id/flat/230799.1603045446%40sss.pgh.pa.us
In descriptor.pgc we have:
30: EXEC SQL ALLOCATE DESCRIPTOR mydesc;
31: EXEC SQL DEALLOCATE DESCRIPTOR mydesc;
So the mydesc descriptor disappeared somehow just after allocation.
`EXEC SQL DEALLOCATE DESCRIPTOR` and `EXEC SQL DEALLOCATE DESCRIPTOR` are
implemented in ECPGallocate_desc and ECPGdeallocate_desc in
ecpglib\descriptor.c, correspondingly, so I looked into the code.
I found that the get_descriptors() function called in ECPGdeallocate_desc
sometimes can return null.
static struct descriptor *
get_descriptors(void)
{
pthread_once(&descriptor_once, descriptor_key_init);
return (struct descriptor *) pthread_getspecific(descriptor_key);
}
pthread_getspecific(key) implemented on Widnows as TlsGetValue(key);
To make the bug reproduction easier, I replaced ecpg_schedule contents with
100 "test: thread/descriptor" lines and ran `vcregress ecpgcheck` in a loop
with 100 iterations. And with such setup it takes just several minutes to
get a failure.
The following debugging code inserted into the ECPGallocate_desc:
+++ b/src/interfaces/ecpg/ecpglib/descriptor.c
@@ -829,6 +829,17 @@ ECPGallocate_desc(int line, const char *name)
}
strcpy(new->name, name);
set_descriptors(new);
+
+ long initialdk = descriptor_key;
+ for (int n = 0; n < 1000; n++) {
+ void *new1 = TlsGetValue(descriptor_key);
+ if (!new1) {
+ DWORD lasterr = GetLastError();
+ fprintf(stdout, "TlsGetValue() returned null on
iteration %d, error: %d, descriptor_key: %d, initial descriptor_key:
%d.\n",
+ n, lasterr, descriptor_key,
initialdk);
+ exit(2);
+ }
+ }
return true;
}
shows on a failure:
TlsGetValue() returned null on iteration 209, error: 0, descriptor_key: 28,
initial descriptor_key: 0.
or
TlsGetValue() returned null on iteration: 369, error: 0, descriptor_key: 28,
initial descriptor_key: 0
So the descriptor_key changed after set_descriptors(new), and following
get_descriptors() would return null as seen on a test failure.