Re: parallelizing the archiver - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: parallelizing the archiver |
Date | |
Msg-id | BE2F4111-B79C-418B-9A10-6FC9C31A6730@amazon.com Whole thread Raw |
In response to | Re: parallelizing the archiver (Magnus Hagander <magnus@hagander.net>) |
Responses |
Re: parallelizing the archiver
|
List | pgsql-hackers |
On 10/22/21, 7:43 AM, "Magnus Hagander" <magnus@hagander.net> wrote: > I still like the idea of loading the library via a special > parameter, archive_library or such. My first attempt [0] added a GUC like this, so I can speak to some of the interesting design decisions that follow. The simplest thing we could do would be to add the archive_library GUC and to load that just like the library is at the end of shared_preload_libraries. This would mean that the archive library could be specified in either GUC, and there would effectively be no difference between the two. The next thing we could consider doing is adding a new boolean called process_archive_library_in_progress, which would be analogous to process_shared_preload_libraries_in_progress. If a library is loaded from the archive_library GUC, its _PG_init() will be called with process_archive_library_in_progress set. This also means that if a library is specified in both shared_preload_libraries and archive_library, we'd call its _PG_init() twice. The library could then branch based on whether process_shared_preload_libraries_in_progress or process_archive_library_in_progress was set. Another approach would be to add a new initialization function (e.g., PG_archive_init()) that would be used if the library is being loaded from archive_library. Like before, you can use the library for both shared_preload_libraries and archive_library, but your initialization logic would be expected to go in separate functions. However, there still wouldn't be anything forcing that. A library could still break the rules and do everything in _PG_init() and be loaded via shared_preload_libraries. One more thing we could do is to discover the relevant symbols for archiving in library loading function. Rather than expecting the initialization function to set the hook correctly, we'd just look up the _PG_archive() function during loading. Again, a library could probably still break the rules and do everything in _PG_init()/shared_preload_libraries, but there would at least be a nicer interface available. I believe the main drawbacks of going down this path are the additional complexity in the backend and the slippery slope of adding all kinds of new GUCs in the future. My original patch also tried to do something similar for some other shell command GUCs (archive_cleanup_command, restore_command, and recovery_end_command). While I'm going to try to keep this focused on archive_command for now, presumably we'd eventually want the ability to use hooks for all of these things. I don't know if we really want to incur a new GUC for every single one of these. To be clear, I'm not against adding a GUC if it seems like the right thing to do. I just want to make sure we are aware of the tradeoffs compared to a simple shared_preload_libraries approach with its terrible UX. > Wouldn't that be better as a callback into the module? So that > shell_archive would implement the check for XLogArchiveCommand. Then > another third party module can make it's own decision on what to > check. And PGarchive would then be a struct that holds a function > pointer to the archive command and another function pointer to the > isenabled command? (I think having a struct for it would be useful > regardless -- for possible future extensions with more API points). +1. This crossed my mind, too. I'll add this in the next revision. Nathan [0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com
pgsql-hackers by date: