RE: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: long-standing data loss bug in initial sync of logical replication |
Date | |
Msg-id | TYAPR01MB569258257A869477A27F0BD0F56A2@TYAPR01MB5692.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: long-standing data loss bug in initial sync of logical replication (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
List | pgsql-hackers |
Dear Shlok, > Hi, > > I tried to add changes to selectively invalidate the cache to reduce > the performance degradation during the distribution of invalidations. Thanks for improving the patch! >... > > Solution: > 1. When we alter a publication using commands like ‘ALTER PUBLICATION > pub_name DROP TABLE table_name’, first all tables in the publications > are invalidated using the function ‘rel_sync_cache_relation_cb’. Then > again ‘rel_sync_cache_publication_cb’ function is called which > invalidates all the tables. On my environment, rel_sync_cache_publication_cb() was called first and invalidate all the entries, then rel_sync_cache_relation_cb() was called and the specified entry is invalidated - hence second is NO-OP. > This happens because of the following > callback registered: > CacheRegisterSyscacheCallback(PUBLICATIONRELMAP, > rel_sync_cache_publication_cb, (Datum) 0); But even in this case, I could understand that you want to remove the rel_sync_cache_publication_cb() callback. > 2. When we add/drop a schema to/from a publication using command like > ‘ALTER PUBLICATION pub_name ADD TABLES in SCHEMA schema_name’, first > all tables in that schema are invalidated using > ‘rel_sync_cache_relation_cb’ and then again > ‘rel_sync_cache_publication_cb’ function is called which invalidates > all the tables. Even in this case, rel_sync_cache_publication_cb() was called first and then rel_sync_cache_relation_cb(). > > 3. When we alter a namespace using command like ‘ALTER SCHEMA > schema_name RENAME to new_schema_name’ all the table in cache are > invalidated as ‘rel_sync_cache_publication_cb’ is called due to the > following registered callback: > CacheRegisterSyscacheCallback(NAMESPACEOID, > rel_sync_cache_publication_cb, (Datum) 0); > > So, we added a new callback function ‘rel_sync_cache_namespacerel_cb’ > will be called instead of function ‘rel_sync_cache_publication_cb’ , > which invalidates only the cache of the tables which are part of that > particular namespace. For the new function the ‘namespace id’ is added > in the Invalidation message. Hmm, I feel this fix is too much. Unlike ALTER PUBLICATION statements, I think ALTER SCHEMA is rarely executed at the production stage. However, this approach requires adding a new cache callback system, which affects the entire postgres system; this is not very beneficial compared to the outcome. It should be discussed on another thread to involve more people, and then we can add the improvement after being accepted. > Performance Comparison: > I have run the same tests as shared in [1] and observed a significant > decrease in the degradation with the new changes. With selective > invalidation degradation is around ~5%. This results are an average of > 3 runs. IIUC, the executed workload did not contain ALTER SCHEMA command, so third improvement did not contribute this improvement. Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: