Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date
Msg-id 508050E1.5050202@2ndQuadrant.com
Whole thread Raw
In response to Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility  (Josh Berkus <josh@agliodbs.com>)
Responses Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility  (Christopher Browne <cbbrowne@gmail.com>)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On 10/18/2012 07:33 PM, Josh Berkus wrote:
> Simon,
>
>
>> It's hard to work out how to reply to this because its just so off
>> base. I don't agree with the restrictions you think you see at all,
>> saying it politely rather than giving a one word answer.
> You have inside knowledge of Hannu's design.
Actually Simon has currently no more knowledge of this specific
design than you do - I posted this on this list as soon as I had figured
it out as a possible solution of a specific problem of supporting full pgQ/Londiste functionality in WAL based logical
replication
with minimal overhead.

(well, actually I let it settle a few weeks, but i did not discuss
this off-list before ).

Simon may have better grasp of it thanks to having done work
on the BDR/Logical Replication design  and thus having better or
at least more recent understanding of issues involved in Logical
Replication.

When mapping londiste/Slony message capture to Logical WAL
the WAL already _is_ the event queue for replication.
NOT LOGGED tables make it also usable for non-replication
things using same mechanisms. (the equivalent in trigger-based
system would be a  log trigger which captures insert event and then
cancels an insert).

> I am merely going from his
> description *on this list*, because that's all I have to go in.
>
> He requested comments, so here I am, commenting.  I'm *hoping* that it's
> merely the description which is poor and not the conception of the
> feature.  *As Hannu described the feature* it sounds useless and
> obscure, and miles away from powering any kind of general queueing
> mechanism.
If we describe a queue as something you put stuff in at one end and
get it out in same or some other specific order at the other end, then
WAL _is_ a queue when you use it for replication  (if you just write to it,
then it is "Log", if you write and read, it is "Queue")

That is, the WAL already is  a form of persistent and ordered (that is 
how WAL works)
stream of messages ("WAL records") that are generated on the "master"
and replayed on one or more consumers (called "slaves" in case of simple
replication)

All it takes to make this scenario work is keeping track of LSN or simply
log position on the slave side.

What you seem to be wanting is support for a cooperative consumers,
that is multiple consumers on the same queue working together and
sharing the work to process the incoming event .

This can be easily achieved using a single ordered event stream and
extra bookkeeping structures on the consumer side (look at cooperative
consumer samples in skytools).

What I suggested was optimisation for the case where you know that you
will never need the data on the master side and are only interested in it
on the slave side.

By writing rows/events/messages only to log (or steam or queue), you
avoid the need to later clean up it on the master by either DELETE or
TRUNCATE or rotating tables.

For both physical and logical streaming the WAL _is_ the queue of events
that were recorded on master and need to be replied on the slave.

Thanks to introducing logical replication, it now makes sense to have
actions recorded _only_ in this queue and this is what the whole RC was 
about.

I recommend that you introduce yourself a bit to skytools/pgQ to get a
better feel of the things I am talking about. Londiste is just one 
application
built on a general event logging, transport and transform/replay (that is
what i'd call queueing :) ) system pgQ.

pgQ does have its roots in Slony an(and earlier) replication systems, 
but it
is by no means _only_ a replication system.

The LOG ONLY tables are _not_ needed for pure replication (like Slony) but
they make replication + queueing type solutions like skytools/pgQ much more efficient as they do away wuth the need to
maintainthe queued data on 
 
the
master side where it will never be needed ( just to reapeat this once more
)

> Or anything we discussed at the clustering meetings.
>
> And, again, if you didn't want comments, you shouldn't have posted an RFC.
I did want comments and as far as I know I do not see you as hostile :)

I do understand that what you mean by QUEUE (and specially as a
MESSAGE QUEUE) is different from what I described.
You seem to want specifically an implementation of cooperative
consumers for a generic queue.

The answer is yes, it is possible to build this on WAL, or table based
event logs/queue of londiste / slony. It just takkes a little extra
management on the receiving side to do the record locking and
distribution between cooperating consumers.
>> All we're discussing is moving a successful piece of software into
>> core, which has been discussed for years at the international
>> technical meetings we've both been present at. I think an open
>> viewpoint on the feasibility of that would be reasonable, especially
>> when it comes from one of the original designers.
> When I ask you for technical clarification or bring up potential
> problems with a 2Q feature, you consistently treat it as a personal
> attack and are emotionally defensive instead of answering my technical
> questions.  This, in turn, frustrates the heck out of me (and others)
> because we can't get the technical questions answered.  I don't want you
> to justify yourself, I want a clear technical spec.
Currently the "clear tech spec" is just this:

* works as table on INSERTS up to inserting logical WAL record 
describing the
insert but no data is inserted locally.

with all things that follow from the local table having no data  - unique constraints don't make sense  - indexes make
nosense  -  updates and deletes hit no data  - etc. . .
 
>
> I'm asking these questions because I'm excited about ReplicationII, and
> I want it to be the best feature it can possibly be.
>
> Or, as we tell many new contributors, "We wouldn't bring up potential
> problems and ask lots of questions if we weren't interested in the feature."
>
> Now, on to the technical questions:
>
>>> QUEUE emphasizes the aspect of logged only table that it accepts
>>> "records" in a certain order, persists these and then quarantees
>>> that they can be read out in exact the same order - all this being
>>> guaranteed by existing WAL mechanisms.
>>>
>>> It is not meant to be a full implementation of application level queuing
>>> system though but just the capture, persisting and distribution parts
>>>
>>> Using this as an "application level queue" needs a set of interface
>>> functions to extract the events and also to keep track of the processed
>>> events. As there is no general consensus what these shoul be (like if
>>> processing same event twice is allowed) this part is left for specific
>>> queue consumer implementations.
> While implementations vary, I think you'll find that the set of
> operations required for a full-featured application queue are remarkably
> similar across projects.  Personally, I've worked with celery, Redis,
> AMQ, and RabbitMQ, as well as a custom solution on top of pgQ.  The
> design, as you've described it, make several of these requirements
> unreasonably convoluted to implement.
As Simon explained, the initial RFC was just  about not keeping the
data in local table if we know it will never be accessed (at leas not
for anything except vacuum and delete/truncate)

This is something that made no sense for physical replication .

> It sounds to me like the needs of internal queueing and application
> queueing may be hopelessly divergent.  That was always possible, and
> maybe the answer is to forget about application queueing and focus on
> making this mechanism work for replication and for matviews, the two
> features we *know* we want it for.  Which don't need the application
> queueing features I described AFAIK.
>
>> The two halves of the queue are the TAIL/entry point and the HEAD/exit
>> point. As you point out these could be on the different servers,
>> wherever the logical changes flow to, but could also be on the same
>> server. When the head and tail are on the same server, the MESSAGE
>> QUEUE syntax seems appropriate, but I agree that calling it that when
>> its just a head or just a tail seems slightly misleading.
> Yeah, that's why I was asking for clarification; the way Hannu described
> it, it sounded like it *couldn't* be read on the insert node, but only
> on a replica.
Well, the reading is done the same way any WAL reading is done -
you subscribe to the stream and from that point on get the records
in LSN order.

It is very hard for me to tell for sure if walsender->walreceiver combo "reads the events" on master or slave side
>
>> We do, I think, want a full queue implementation in core. We also want
>> to allow other queue implementations to interface with Postgres, so we
>> probably want to allow "first half" only as well. Meaning we want both
>> head and tail separately in core code. The question is whether we
>> require both head and tail in core before we allow commit, to which I
>> would say I think adding the tail first is OK, and adding the head
>> later when we know exactly the design.
> I'm just pointing out that some of the requirements of the design for
> the replication queue may conflict with a design for a full-featured
> application queue.
>
> I don't quite follow you on what you mean by "head" vs. "tail".  Explain?
HEAD is the queue producer, where the events go in (any insert on master)

TAIL (to avoid another word) is where they come out (walreader -> walreceiver moving the events to slave)

Think of an analogy with a snake feeding on berries used by
an ant colony to get the nutrients in the berries to its nest :)

Ans there is no processing inside the snake - the work of
distributing said nutrients once they have arrived to the nest has
to be organised by the cooperative colony of ants on that end, the
snake just guarantees that the berries arrive in the same order they
get in.

I guess this organisation of works after the events are delivered is
what you were after when asking about "an application level queue".

>> Having said that, the LOGGING ONLY syntax makes me shiver. Better name?
>
I guess WRITE ONLY tables would get us more publicity would not be
entirely correct, as the data is readable from the log .


Hannu






pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Bugs in CREATE/DROP INDEX CONCURRENTLY
Next
From: Hannu Krosing
Date:
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility