Home > mailing lists

PostgreSQL and a Catch-22 Issue related to dead rows - Mailing list pgsql-performance

From	Lars Aksel Opsahl
Subject	PostgreSQL and a Catch-22 Issue related to dead rows
Date	December 9, 2024 14:02:53
Msg-id	AM7P189MB10281DCA1FD196A5610484B29D3C2@AM7P189MB1028.EURP189.PROD.OUTLOOK.COM Whole thread Raw
Responses	Re: PostgreSQL and a Catch-22 Issue related to dead rows Re: PostgreSQL and a Catch-22 Issue related to dead rows Re: PostgreSQL and a Catch-22 Issue related to dead rows
List	pgsql-performance

Tree view

When processing multiple simple feature layers through PostGIS Topology to perform overlays and eliminate small areas/slivers, we face a complex workflow. To manage the workload, we split the input into smaller jobs using the Content Balanced Grid. These jobs are then executed in parallel using Postgres Execute Parallel.

In one case, we processed a total of 750 cells, with an overall runtime of 40 hours. However, one specific cell took over 12 hours to complete, most of which was spent on removing small areas by deleting edges in PostGIS Topology. The root cause of this delay is related to removal of dead rows.

By introducing periodic COMMIT statements and VACUUM (FULL) operations, we managed to reduce the processing time for that single cell to approximately 3 hours. However, when we scaled this updated code to use 100 parallel threads, we encountered the expected “LWLock | SubtransControlLock” bottleneck, leading to an overall increase in runtime for all the cells.

One issue is the inability to accurately estimate the size of a job before execution, making it difficult to optimize job splitting in advance. Currently, the only viable solution seems to be implementing a timer for each job. The job would terminate after a predefined time, committing its progress and returning control to the caller. The caller must then trigger new jobs to complete the remaining tasks until done.

If there were a way to remove dead rows without requiring a commit from totally unrelated jobs, it would be much easier. The tables in use are unlogged and I already control the vacuum process from the caller. I can't use temp tables. Such a mechanism would also prevent unrelated long running jobs to cause problems for my jobs. To avoid that problem now, I have to move my jobs to a server not used by others also.

Yes there are very good reason for the way removal for dead rows work now, but is there any chance of adding an option when creating table to disable this behavior for instance for unlogged tables ?

Thanks

Lars

pgsql-performance by date:

From: Nikita Malakhov
Date: 28 November 2024, 10:58:04
Subject: Re: Performance of Query 60 on TPC-DS Benchmark

From: Greg Sabino Mullane
Date: 09 December 2024, 16:35:20
Subject: Re: PostgreSQL and a Catch-22 Issue related to dead rows

PostgreSQL and a Catch-22 Issue related to dead rows - Mailing list pgsql-performance

Previous

Next