PostgreSQL and a Catch-22 Issue related to dead rows - Mailing list pgsql-performance

From Lars Aksel Opsahl
Subject PostgreSQL and a Catch-22 Issue related to dead rows
Date
Msg-id AM7P189MB10281DCA1FD196A5610484B29D3C2@AM7P189MB1028.EURP189.PROD.OUTLOOK.COM
Whole thread Raw
Responses Re: PostgreSQL and a Catch-22 Issue related to dead rows
Re: PostgreSQL and a Catch-22 Issue related to dead rows
Re: PostgreSQL and a Catch-22 Issue related to dead rows
List pgsql-performance
Hi
When processing multiple simple feature layers through PostGIS Topology to perform overlays and eliminate small areas/slivers, we face a complex workflow. To manage the workload, we split the input into smaller jobs using the Content Balanced Grid. These jobs are then executed in parallel using Postgres Execute Parallel.
In one case, we processed a total of 750 cells, with an overall runtime of 40 hours. However, one specific cell took over 12 hours to complete, most of which was spent on removing small areas by deleting edges in PostGIS Topology. The root cause of this delay is related to removal of dead rows.
By introducing periodic COMMIT statements and VACUUM (FULL) operations, we managed to reduce the processing time for that single cell to approximately 3 hours. However, when we scaled this updated code to use 100 parallel threads, we encountered the expected “LWLock | SubtransControlLock” bottleneck, leading to an overall increase in runtime for all the cells.
One issue is the inability to accurately estimate the size of a job before execution, making it difficult to optimize job splitting in advance. Currently, the only viable solution seems to be implementing a timer for each job. The job would terminate after a predefined time, committing its progress and returning control to the caller. The caller must then trigger new jobs to complete the remaining tasks until done.
If there were a way to remove dead rows without requiring a commit from totally unrelated jobs, it would be much easier. The tables in use are unlogged and I already control the vacuum process from the caller. I can't use temp tables. Such a mechanism would also prevent unrelated long running jobs to cause problems for my jobs. To avoid that problem now, I have to move my jobs to a server not used by others also.

Yes there are very good reason for the way removal for dead rows work now, but is there any chance of adding an option when creating table to disable this behavior for instance for unlogged tables ?

Thanks

Lars


pgsql-performance by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: Performance of Query 60 on TPC-DS Benchmark
Next
From: Greg Sabino Mullane
Date:
Subject: Re: PostgreSQL and a Catch-22 Issue related to dead rows