If you are worried about the audit trigger and also an additional audit table to hold all the rows , then you may get an additional flag added to your base table deleted_flag and when delete happens, you just need to update that flag through your code manually, but not physically delete the record from the base table (if you want to avoid the performance impact of the row trigger on the base tables). So it will make the main table bulky but will have lesser impact to your incoming data load to base or main tables.
Hello,
My understanding is that the replication tools normally rely on the database transaction logs to find the CDC/delta changes(Insert/Update/Delete) for tables and then move those delta changes to the target system/databases. Whatever may be the source database (It might be open source postgres or aws RDS). And never done though, but i am assuming , manually scanning the DB logs must not be easy ones and also might not be given access to DB logs because of security reasons too. Hope my understanding is correct here.
Thus, in absence of such replication tool(may be because of the additional cost associated etc) if someone wants to find the delta changes (insert/update/delete) in a database as efficiently as possible and move those to the target database in a continuous data streaming setup, I can only think of below option....
i.e maintaining audit columns like create_timestamp/update_timestamp columns in every source table so that they can be utilized to get the CDC for Insert and Update statements and also for bookmarking. But to find the delta for the deletes , there is not much option but to have row level triggers created on the base table which will populate another audit table with the deleted rows, and this is going to crawl if we get a lot of deletes(in millions) on the source tables.
Want to know from experts, if there exists any other way to have these manual CDC and book marking more efficient for such continuous delta data movement scenarios?
Regards
Lok