Logical replication for async service communication? - Mailing list pgsql-general

From Sean Huber
Subject Logical replication for async service communication?
Date
Msg-id CAM8f5Mi1Ftj+48PZxN1AbM-P=4YMLENY5zRaPwTbmbkFwCsTkA@mail.gmail.com
Whole thread Raw
List pgsql-general
Has anyone attempted to use logical replication with table partitioning for async service communication?


Services would commit messages to their own databases along with the rest of their data (with the same transactional guarantees) and then messages are "realtime" replicated (with all of its features and guarantees) to the receiving service's database where their workers (e.g. que-rb, skip locked polling, etc) are waiting to respond by inserting messages into their database to be replicated back.

Throw in a trigger to automatically acknowledge/cleanup/notify messages and I think we've got something that resembles a queue? Maybe make that same trigger match incoming messages against a "routes" table (based on message type, certain JSON schemas in the payload, etc) and write matches to the que-rb jobs table instead for some kind of distributed/replicated work queue hybrid?

My motivations for this line of thinking were mostly based around high availability and isolating service downtime/failures from each other. Our PostgreSQL databases are the most critical pieces of infrastructure for all of our services - if it's down then we don't want the impacted service to even attempt to be doing work. On the other hand, we don't want a service's downtime to impact its ability to receive (queued) messages from other services that it can resume consuming (once, in order) when it's back up.

We're exploring other message queues but keep getting drawn back to PostgreSQL because we can get the same transactional guarantees with our messages/jobs as the rest of our data. Even the act of enqueuing a job or sending a message to another service is something that must be committed and can be rolled back like everything else. 

For our potential use case specifically, we're not dealing with high levels of realtime traffic etc - we're not even close to 1k jobs/messages per second.

I'm looking to poke holes in this concept before sinking anymore time exploring the idea. Any feedback/warnings/concerns would be much appreciated, thanks for your time!

Sean Huber

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: view selection during query rewrite
Next
From: Chris Withers
Date:
Subject: Re: surprisingly slow creation of gist index used in excludeconstraint