Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers
| From | Mihail Nikalayeu |
|---|---|
| Subject | Re: Adding REPACK [concurrently] |
| Date | |
| Msg-id | CADzfLwWNz_jwi7KVOmJ9D97+zwxsiwDSqSUUJ9oqUCOqkbGnRA@mail.gmail.com Whole thread Raw |
| In response to | Re: Adding REPACK [concurrently] (Mihail Nikalayeu <mihailnikalayeu@gmail.com>) |
| List | pgsql-hackers |
Hello!
On Sat, Dec 13, 2025 at 7:45 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
> Stress tests for REPACK concurrently in attachment.
To run:
ninja && meson test --suite setup && meson test --print-errorlogs
--suite amcheck *007*
ninja && meson test --suite setup && meson test --print-errorlogs
--suite amcheck *008*
Results for v28:
Up to " v28-0005-Use-background-worker-to-do-logical-decoding.patch":
Technically it passes, but sometimes I saw 0% CPU usage for long
periods with such stacks (looks like it happens for 0008 more often):
epoll_wait 0x000078b99512a037
WaitEventSetWaitBlock waiteventset.c:1192
WaitEventSetWait waiteventset.c:1140
WaitLatch latch.c:196
decode_concurrent_changes cluster.c:2702
repack_worker_internal cluster.c:3777
RepackWorkerMain cluster.c:3725
BackgroundWorkerMain bgworker.c:850
postmaster_child_launch launch_backend.c:268
StartBackgroundWorker postmaster.c:4168
maybe_start_bgworkers postmaster.c:4334
LaunchMissingBackgroundProcesses postmaster.c:3408
ServerLoop postmaster.c:1728
PostmasterMain postmaster.c:1403
main main.c:231
epoll_wait 0x000078b99512a037
WaitEventSetWaitBlock waiteventset.c:1192
WaitEventSetWait waiteventset.c:1140
WaitLatch latch.c:196
ConditionVariableTimedSleep condition_variable.c:165
ConditionVariableSleep condition_variable.c:100
process_concurrent_changes cluster.c:3042
rebuild_relation_finish_concurrent cluster.c:3303
rebuild_relation cluster.c:1121
cluster_rel cluster.c:731
process_single_relation cluster.c:2405
ExecRepack cluster.c:391
standard_ProcessUtility utility.c:864
ProcessUtility utility.c:525
PortalRunUtility pquery.c:1148
PortalRunMulti pquery.c:1306
PortalRun pquery.c:783
exec_simple_query postgres.c:1280
PostgresMain postgres.c:4779
BackendMain backend_startup.c:124
postmaster_child_launch launch_backend.c:268
BackendStartup postmaster.c:3598
ServerLoop postmaster.c:1713
PostmasterMain postmaster.c:1403
main main.c:231
Probably it is because
> 100000L, /* XXX Tune the delay. */
100 seconds is clearly too much.
For "v28-0006-Use-multiple-snapshots-to-copy-the-data.patch":
0007: crash with
TRAP: failed Assert("portal->portalSnapshot == GetActiveSnapshot()"),
File: "../src/backend/tcop/pquery.c", Line: 1169, PID: 178414
postgres: CIC_test: nkey postgres [local]
REPACK(ExceptionalCondition+0xbe)[0x5743f9a955bb]
postgres: CIC_test: nkey postgres [local] REPACK(+0x67fac4)[0x5743f98a7ac4]
postgres: CIC_test: nkey postgres [local] REPACK(+0x67fced)[0x5743f98a7ced]
postgres: CIC_test: nkey postgres [local]
REPACK(PortalRun+0x346)[0x5743f98a7107]
postgres: CIC_test: nkey postgres [local] REPACK(+0x6773bb)[0x5743f989f3bb]
postgres: CIC_test: nkey postgres [local]
REPACK(PostgresMain+0xc1c)[0x5743f98a4f58]
postgres: CIC_test: nkey postgres [local] REPACK(+0x6726c6)[0x5743f989a6c6]
postgres: CIC_test: nkey postgres [local]
REPACK(postmaster_child_launch+0x191)[0x5743f979678c]
postgres: CIC_test: nkey postgres [local] REPACK(+0x5755ca)[0x5743f979d5ca]
postgres: CIC_test: nkey postgres [local] REPACK(+0x572972)[0x5743f979a972]
postgres: CIC_test: nkey postgres [local]
REPACK(PostmasterMain+0x168a)[0x5743f979a225]
postgres: CIC_test: nkey postgres [local] REPACK(main+0x3a1)[0x5743f9662176]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x77f80402a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x77f80402a28b]
postgres: CIC_test: nkey postgres [local] REPACK(_start+0x25)[0x5743f9311eb5]
0008: pass
Best regards,
Mikhail.
pgsql-hackers by date: