Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date
Msg-id 50819B5B.7030507@agliodbs.com
Whole thread Raw
In response to Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility  (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility  (Jim Nasby <jim@nasby.net>)
List pgsql-hackers
> If we describe a queue as something you put stuff in at one end and
> get it out in same or some other specific order at the other end, then
> WAL _is_ a queue when you use it for replication  (if you just write to it,
> then it is "Log", if you write and read, it is "Queue")

For that matter, WAL is a queue you use for recovery.  But, for that
matter, BerkeleyDB is a database just as PostgreSQL as a database.  That
doesn't mean you can use BerkeleyDB and PostgreSQL for all the same tasks.

> All it takes to make this scenario work is keeping track of LSN or simply
> log position on the slave side.
> 
> What you seem to be wanting is support for a cooperative consumers,
> that is multiple consumers on the same queue working together and
> sharing the work to process the incoming event .
> 
> This can be easily achieved using a single ordered event stream and
> extra bookkeeping structures on the consumer side (look at cooperative
> consumer samples in skytools).

What I'm saying is, we'll get nowhere promoting an application queue
which is permanently inferior to existing, popular open source software.My advice: Forget about the application queue
aspectsof this.  Focus
 
on making it work for replication and matviews, which are already hard
use cases to optimize.

If someone can turn this feature into the base for a distributed
queueing system later, then great.  But let's not complicate this
feature by worrying about a use case it may never fulfill.

> Thanks to introducing logical replication, it now makes sense to have
> actions recorded _only_ in this queue and this is what the whole RC was
> about.

Yes, I agree.

I'm just pointing out that the needs of a replication queue and of an
application queue are divergent.

> Currently the "clear tech spec" is just this:
> 
> * works as table on INSERTS up to inserting logical WAL record
> describing the
> insert but no data is inserted locally.

Yeah, I think where you confused a bunch of people here is the
definition of "locally".  Let me see if I understand this:

* a Writer would INSERT data into the LOG ONLY TABLE (L.O.T.), which
write would be synched to WAL but there would be no in-memory or on-disk
version of the table updated.

* Readers could subscribe to the LSN for the L.O.T. and would receive a
stream of INSERTs, which they could handle as they wished.

Is my understanding correct?  If it is, I have more questions!

> with all things that follow from the local table having no data
>   - unique constraints don't make sense
>   - indexes make no sense
>   -  updates and deletes hit no data
>   - etc. . .

Right.

> As Simon explained, the initial RFC was just  about not keeping the
> data in local table if we know it will never be accessed

Ah, so to answer Simon's question: no, this RFC makes no sense without a
description of expected Reader activity.

> (at leas not
> for anything except vacuum and delete/truncate)

If the table is not being represented as a table in the catalog or on
disk, why would it ever need to be vacuumed?

> It is very hard for me to tell for sure if walsender->walreceiver combo
>  "reads the events" on master or slave side

Well, presumably the only way a Reader on the master could get the queue
would be for the master to subscribe to its own LSN.  No?

> HEAD is the queue producer, where the events go in (any insert on master)
> 
> TAIL (to avoid another word) is where they come out
>  (walreader -> walreceiver moving the events to slave)

BTW, I suggest using "Writer" and "Reader" for the queue roles, not
"Head" and "Tail", which terms are rather unclear.

> Think of an analogy with a snake feeding on berries used by
> an ant colony to get the nutrients in the berries to its nest :)

That's a very ... unique analogy. ;-)

>>> Having said that, the LOGGING ONLY syntax makes me shiver. Better name?
>>
> I guess WRITE ONLY tables would get us more publicity would not be
> entirely correct, as the data is readable from the log .

I like LOG ONLY TABLES, actually; it's the mirror of UNLOGGED TABLEs.
Or REPLICATION MESSAGE TABLE.

Now, since I've pointed out what use case this mechanism does not apply
to (replacing a generic application queue), let me point out some ones
which it *does* apply to, and handily:

* Updating matviews on a replica
* Updating a cache (assuming an autonomous LSN reader)
* Remote security logging (especially if combined with command triggers)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: assertion failure w/extended query protocol
Next
From: Robert Haas
Date:
Subject: Re: assertion failure w/extended query protocol