BUG #18493: COPY FROM STDIN BINARY failure - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18493: COPY FROM STDIN BINARY failure
Date
Msg-id 18493-803cb5dd2f69746c@postgresql.org
Whole thread Raw
Responses Re: BUG #18493: COPY FROM STDIN BINARY failure
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18493
Logged by:          Paul Tobey
Email address:      ptobey@rainbird.com
PostgreSQL version: 16.1
Operating system:   Windows Server 2022
Description:

(This post duplicates one including the noted attachments sent directly to
pgsql-bugs mailing list; I wanted a bug number assigned)

I'm running "PostgreSQL 16.1, compiled by Visual C++ build 1937, 64-bit" in
an AWS EC2 Windows Server 2022 Datacenter, Intel Xeon Platinum
8151(at)3(dot)4GHz/4/8, 64GB RAM, 2TB disk.

The client software is .NET/Windows based (using Npgsql), running ~40
threads, each using COPY table FROM STDIN (FORMAT BINARY) commands via a
separate connection, initializing a table. This all runs fine for about an
hour, roughly 9500 COPY operations, before reaching a state where the COPY
operations are no longer working (never complete), ending in timeouts
(~17min timeout; normal execution time is < 15s). At the failure point
273,558,408 of 313,886,247 total rows have been successfully inserted. The
failure point in repeated runs is approximately but not exactly the same.
There are no foreign keys in the table and all indices are disabled.

At the failure point, this is the state indicated in pgAdmin 4. There seems
to be one IPC:BufferIO for each client with similar content each in the same
"waiting" state:

(See previous pgsql-bugs mailing list post which includes attachments)

The PostgreSQL log isn't very revealing, at least to me, but I've attached
it (LOG). (See previous post to pgsql-bugs mailing list for attachments)
FYI, there are multiple COPY operations starting at,

20:27:30
20:27:35
20:27:37
20:27:42
...
20:28:23

none of which finish successfully before timing out at 20:45:25, et al. The
timeout occurs from the NpgsqlBinaryImporter.Complete() method; I think that
means when it's waiting for the data transmission to be completed before
sending the end-of-stream/complete marker. For reference, the last 20
successful COPY operations (client side) are shown below:

2024-05-30 13:27:18.478 -07:00 [DBG] thread=18, write completed (id range =
1833675856 - 1833705498), milliseconds=6021
2024-05-30 13:27:18.530 -07:00 [DBG] thread=9, write completed (id range =
1839727448 - 1839757090), milliseconds=3595
2024-05-30 13:27:19.192 -07:00 [DBG] thread=31, write completed (id range =
1834209820 - 1834239462), milliseconds=6921
2024-05-30 13:27:19.282 -07:00 [DBG] thread=24, write completed (id range =
1839905436 - 1839935078), milliseconds=3116
2024-05-30 13:27:20.758 -07:00 [DBG] thread=17, write completed (id range =
1839549460 - 1839579102), milliseconds=4188
2024-05-30 13:27:21.140 -07:00 [DBG] thread=12, write completed (id range =
1837769580 - 1837799222), milliseconds=6123
2024-05-30 13:27:21.391 -07:00 [DBG] thread=25, write completed (id range =
1837235616 - 1837265258), milliseconds=6484
2024-05-30 13:27:21.521 -07:00 [DBG] thread=6, write completed (id range =
1839015496 - 1839045138), milliseconds=5115
2024-05-30 13:27:23.100 -07:00 [DBG] thread=20, write completed (id range =
1837057628 - 1837087270), milliseconds=6161
2024-05-30 13:27:23.100 -07:00 [DBG] thread=38, write completed (id range =
1838837508 - 1838867150), milliseconds=5261
2024-05-30 13:27:23.136 -07:00 [DBG] thread=1, write completed (id range =
1838125556 - 1838155198), milliseconds=4007
2024-05-30 13:27:23.815 -07:00 [DBG] thread=29, write completed (id range =
1838481532 - 1838511174), milliseconds=5716
2024-05-30 13:27:24.484 -07:00 [DBG] thread=30, write completed (id range =
1839371472 - 1839401114), milliseconds=5515
2024-05-30 13:27:24.896 -07:00 [DBG] thread=22, write completed (id range =
1837591592 - 1837621234), milliseconds=5235
2024-05-30 13:27:25.481 -07:00 [DBG] thread=32, write completed (id range =
1837947568 - 1837977210), milliseconds=4661
2024-05-30 13:27:25.982 -07:00 [DBG] thread=33, write completed (id range =
1835455736 - 1835485378), milliseconds=6355
2024-05-30 13:27:26.859 -07:00 [DBG] thread=19, write completed (id range =
1840439400 - 1840469042), milliseconds=3383
2024-05-30 13:27:27.740 -07:00 [DBG] thread=16, write completed (id range =
1840261412 - 1840291054), milliseconds=2299
2024-05-30 13:27:28.376 -07:00 [DBG] thread=10, write completed (id range =
1840083424 - 1840113066), milliseconds=3030
2024-05-30 13:27:30.447 -07:00 [DBG] thread=26, write completed (id range =
1838659520 - 1838689162), milliseconds=4026

I've also attached the configuration settings active during this run (CSV).
(See previous pgsql-bugs mailing list post for attachments)

The expected result is a successful COPY operation in a time roughly
consistent with the other recent operations which are writing similar row
counts, ~40000. I'm hoping for a suggestion or two about how to localize the
error or, if you recognize the symptom, what to correct. I'm sorry it's not
a simpler crash or query failure.

Thanks and regards,
Paul T.


pgsql-bugs by date:

Previous
From: Chant Guo
Date:
Subject: Error of reinstallatio
Next
From: Erik Wienhold
Date:
Subject: Re: Missing semicolumn in anonymous plpgsql block does not raise syntax error