Re: Re: pglogical_output - a general purpose logical decoding output plugin - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: Re: pglogical_output - a general purpose logical decoding output plugin |
Date | |
Msg-id | CAMsr+YGq7mqy+yxpJ=xQf=iBf2ynQ-kD75--mGBsT6tLD7bgWQ@mail.gmail.com Whole thread Raw |
In response to | Re: pglogical_output - a general purpose logical decoding output plugin (Tomasz Rybak <tomasz.rybak@post.pl>) |
Responses |
Re: Re: pglogical_output - a general purpose logical
decoding output plugin
|
List | pgsql-hackers |
On 21 January 2016 at 06:23, Tomasz Rybak <tomasz.rybak@post.pl> wrote:
I reviewed more files:
Thanks.
Can you try to put more whitespace between items? It can be hard to follow at the moment.
pglogical_proto_native.c
+ pq_sendbyte(out, 'N'); /* column name block follows */
+ attname = NameStr(att->attname);
+ len = strlen(attname) + 1;
+ pq_sendint(out, len, 2);
+ pq_sendbytes(out, attname, len); /* data */
Identifier names are limited to 63 - so why we're sending 2 bytes here?
Good question. It should be one byte. I'll need to amend the protocol for that, but I don't think that's a problem this early on.
+ pq_sendbyte(out, 'B'); /* BEGIN */
+
+ /* send the flags field its self */
+ pq_sendbyte(out, flags);
Comment: "flags field its self"? Shouldn't be "... itself"?
Similarly in write_origin; write_insert just says:
itself is an abbreviation of its self.
+ /* send the flags field */
+ pq_sendbyte(out, flags);
Speaking about flags - in most cases they are 0; only for attributes
we might have:
+ flags |= IS_REPLICA_IDENTITY;
I assume that flags field is put into protocol for future needs?
Correct. The protocol specifies (in most places; need to double check all sites) that the lower 4 bits are reserved and must be treated as an ERROR if set. The high 4 bits must be ignored if set and not recognised. That gives us some room to wiggle without bumping the protocol version incompatibly, and lets us use capabilities (via client-supplied parameters) to add extra information on the wire.
+ /* FIXME support whole tuple (O tuple type) */
+ if (oldtuple != NULL)
+ {
+ pq_sendbyte(out, 'K'); /* old key follows */
+ pglogical_write_tuple(out, data, rel, oldtuple);
+ }
I don't fully understand. We are sending whole old tuple here,
so this FIXME should be more about supporting sending just keys.
But then comment "old key follows" is not true. Or am I missing
something here?
In wal_level=logical the tuple that's written is an abbreviated tuple containing data only for the REPLICA IDENTITY fields.
Ideally we'd also be able to support sending the _whole_ old tuple, but this would require the ability to log the whole old tuple in WAL when logging a DELETE or UPDATE into WAL. This isn't so much a FIXME as a logical decoding limitation and wishlist item; I'll amend to that effect.
+ pq_sendbyte(out, 'S'); /* message type field */
+ pq_sendbyte(out, 1); /* startup message version */
For now protocol is 1, but for code readability it might be better
to change this line to:
+ pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM); /* startup message version */
The startup message format isn't the same as the protocol version. Hopefully we'll never have to change it. The reason it's specified is so that if we ever do bump it a decoding plugin can recognise an old client and fall back. Maybe it's BC overkill but I'd kick myself for not doing it if we ever decided to (say) add support for structured json startup options from the client. Working on BDR has taught me that there's no such thing as too much consideration for cross-version compat and negotiation in replication.
I'm happy to create a new define for that an comment to this effect.
Just for the sake of avoiding code repetition:
+ for (i = 0; i < desc->natts; i++)
+ {
+ if (desc->attrs[i]->attisdropped)
+ continue;
+ nliveatts++;
+ }
+ pq_sendint(out, nliveatts, 2);
The exact same code is in write_tuple and write_attrs. I don't know what's
policy for refactoring, but this might be extracted into separate function.
Seems trivial enough not to care, but probably can.
+ else if (att->attlen == -1)
+ {
+ char *data = DatumGetPointer(values[i]);
+
+ /* send indirect datums inline */
+ if (VARATT_IS_EXTERNAL_INDIRECT(values[i]))
+ {
+ struct varatt_indirect redirect;
+ VARATT_EXTERNAL_GET_POINTER(redirect, data);
+ data = (char *) redirect.pointer;
+ }
I really don't like this. We have function parameter "data" and now
are creating new variable with the same name.
I agree. Good catch.
I don't much like the use of 'data' as the param name for the plugin private data and am quite inclined to change that instead, to plugin_private or something.
pglogical_proto_json.c
+ appendStringInfo(out, ", \"origin_lsn\":\"%X/%X\"",
+ (uint32)(txn->origin_lsn >> 32), (uint32)(txn->origin_lsn));
I remember there was discussion on *-hackers recently about %X/%X; I'll
try to find it and check whether it's according to final conclusion.
Thanks.
pglogical_relmetacache.c
First. In pglogical_output.c, in pg_decode_startup we are calling
init_relmetacache. I haven't found call to destroy_relmetacache
and comment says that it must be called at backend shutdown.
Is it guaranteed? Or will cache get freed with its context?
Hm, ok. It must never be called before shutdown, but it's not necessary to call at all. I'll amend it. It's present just in case future Valgrind support wants to call it to explicitly clean up memory.
I'll take a closer look there. I think I changed the lifetime of the relmetacache after figuring out a better way to handle the cache invalidations and it's possible I missed an update in the comments.
+ /* Find cached function info, creating if not found */
+ hentry = (struct PGLRelMetaCacheEntry*) hash_search(RelMetaCache,
+ (void *)(&RelationGetRelid(rel)),
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ Assert(hentry->relid = RelationGetRelid(rel));
+ hentry->is_cached = false;
+ hentry->api_private = NULL;
+ }
+
+ Assert(hentry != NULL);
Shouldn't Assert be just after calling hash_search? We're (if !found)
dereferencing hentry and only after checking whether it's not NULL.
Yeah. It's more of an exit condition, stating "this function never returns non-null hentry" but moving it up is fine. Can do.
I haven't found relevance of relmeta_cache_size attribute.
It's set to value coming from client - but then the only thing that matters
is whether it is 0 or not. I'll try to reread DESIGN.md session about cache,
but for now (quite late and I'm quite tired) it is not clear on the first reading.
It's the first part of a feature. You can turn the relation metadata cache off, set it to unlimited, or (in future, not implemented yet) set a bounded size where a LRU is used to evict the oldest entry.
The LRU approach is more complex since we have to track the LRU and do evictions as well as the cache map its self. Efficiently. I expect this to be a 1.1 or later feature. Having the param as an int now means we don't later land up having to have two params, one to enable the cache and another to bound its size, so once it's fully implemented it'll (IMO) be clearer what's going on.
Thanks again for the detailed code examination. I find it hard to read familiar code closely for review since I see what I expect to see. So it's really, really useful.
pgsql-hackers by date: