JSON decoding plugin - Mailing list pgsql-hackers

From Euler Taveira
Subject JSON decoding plugin
Date
Msg-id 52A5BFAE.1040209@timbira.com.br
Whole thread Raw
Responses Re: JSON decoding plugin
List pgsql-hackers
Hi,

A few months ago, it was proposed [1] that would be interested to have a
json output plugin for logical decoding. Here it is.

Each transaction is a JSON object that can contain xid (optional),
timestamp (optional), and change array. Each change's element is a
command that was decoded and it can contains: kind (I/U/D), schema
(optional), table, columnnames, columntypes (optional), columnvalues,
and oldkeys (only for U/D). columnnames, columntypes and columnvalues
are arrays. oldkeys is an object that contains the following arrays:
keynames, keytypes (optional), and keyvalues.

The JSON objects are serialized if you are decoding a serie of
transactions. Here is an output example:

{
    "xid": 702,
    "change": [
        {
            "kind": "insert",
            "schema": "public",
            "table": "foo",
            "columnnames": ["a", "b", "c"],
            "columntypes": ["int4", "int4", "text"],
            "columnvalues": [1, 2, "test"]
        }
        ,{
            "kind": "update",
            "schema": "public",
            "table": "foo",
            "columnnames": ["a", "b", "c"],
            "columntypes": ["int4", "int4", "text"],
            "columnvalues": [1, 2, "test2"],
            "oldkeys": {
                "keynames": ["a", "b"],
                "keytypes": ["int4", "int4"],
                "keyvalues": [1, 2]
            }
        }
    ]
}
{
    "xid": 703,
    "change": [
        {
            "kind": "update",
            "schema": "public",
            "table": "foo",
            "columnnames": ["a", "b", "c"],
            "columntypes": ["int4", "int4", "text"],
            "columnvalues": [1, 3, "test2"],
            "oldkeys": {
                "keynames": ["a", "b"],
                "keytypes": ["int4", "int4"],
                "keyvalues": [1, 2]
            }
        }
    ]
}
{
    "xid": 704,
    "change": [
        {
            "kind": "delete",
            "schema": "public",
            "table": "foo",
            "oldkeys": {
                "keynames": ["a", "b"],
                "keytypes": ["int4", "int4"],
                "keyvalues": [1, 3]
            }
        }
    ]
}


Some data types was adapted to conform with JSON spec. NAN and Infinity
are not valid JSON symbols so their representation is NULL (as some JSON
implementations). Due to JSON datatype simplicity, I represent the vast
majority of Postgres datatypes as string (However, I admit that we could
mimic the json datatype conversion rules).

The oldkeys treatment follows what was defined by the commit [2]. It uses:

(i) primary key (default behavior);
(ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table);
(iii) full tuple (if REPLICA IDENTITY FULL is defined for table);
(iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for
table).

The TOAST columns have a special treatment for UPDATEs. If a tuple that
contains a TOAST field is updated, the TOAST field is included iif it is
changed too. It means that unchanged TOAST field are omitted from
columns* arrays. This means less overhead while transmitting,
processing and applying changes.

By design, (i) output plugin doesn't know about aborted transactions and
(ii) subtransactions are reordered into a toplevel transaction and only
the committed pieces are passed to the plugin.

You can test it firing the regression tests (e.g. 'make test') or using
the following steps?

postgresql.conf:
wal_level = logical
max_wal_senders = 2
max_logical_slots = 2

start collecting WAL records:

$ pg_recvlogical --slot=foo -d euler -f /dev/stdout
--plugin=json_decoding_plugin --init

[execute some transactions]

start printing decoded transactions:

$ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start

stop collecting WAL records:

$ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop


Comments?


[1]
http://www.postgresql.org/message-id/20131015144848.GC8001@awork2.anarazel.de
[2]
http://www.postgresql.org/message-id/E1VepyF-0005gU-Tv@gemulon.postgresql.org


--
   Euler Taveira                   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Attachment

pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: [bug fix] pg_ctl always uses the same event source
Next
From: "MauMau"
Date:
Subject: Re: Recovery to backup point