Thread: JSON decoding plugin

JSON decoding plugin

From
Euler Taveira
Date:
Hi,

A few months ago, it was proposed [1] that would be interested to have a
json output plugin for logical decoding. Here it is.

Each transaction is a JSON object that can contain xid (optional),
timestamp (optional), and change array. Each change's element is a
command that was decoded and it can contains: kind (I/U/D), schema
(optional), table, columnnames, columntypes (optional), columnvalues,
and oldkeys (only for U/D). columnnames, columntypes and columnvalues
are arrays. oldkeys is an object that contains the following arrays:
keynames, keytypes (optional), and keyvalues.

The JSON objects are serialized if you are decoding a serie of
transactions. Here is an output example:

{
    "xid": 702,
    "change": [
        {
            "kind": "insert",
            "schema": "public",
            "table": "foo",
            "columnnames": ["a", "b", "c"],
            "columntypes": ["int4", "int4", "text"],
            "columnvalues": [1, 2, "test"]
        }
        ,{
            "kind": "update",
            "schema": "public",
            "table": "foo",
            "columnnames": ["a", "b", "c"],
            "columntypes": ["int4", "int4", "text"],
            "columnvalues": [1, 2, "test2"],
            "oldkeys": {
                "keynames": ["a", "b"],
                "keytypes": ["int4", "int4"],
                "keyvalues": [1, 2]
            }
        }
    ]
}
{
    "xid": 703,
    "change": [
        {
            "kind": "update",
            "schema": "public",
            "table": "foo",
            "columnnames": ["a", "b", "c"],
            "columntypes": ["int4", "int4", "text"],
            "columnvalues": [1, 3, "test2"],
            "oldkeys": {
                "keynames": ["a", "b"],
                "keytypes": ["int4", "int4"],
                "keyvalues": [1, 2]
            }
        }
    ]
}
{
    "xid": 704,
    "change": [
        {
            "kind": "delete",
            "schema": "public",
            "table": "foo",
            "oldkeys": {
                "keynames": ["a", "b"],
                "keytypes": ["int4", "int4"],
                "keyvalues": [1, 3]
            }
        }
    ]
}


Some data types was adapted to conform with JSON spec. NAN and Infinity
are not valid JSON symbols so their representation is NULL (as some JSON
implementations). Due to JSON datatype simplicity, I represent the vast
majority of Postgres datatypes as string (However, I admit that we could
mimic the json datatype conversion rules).

The oldkeys treatment follows what was defined by the commit [2]. It uses:

(i) primary key (default behavior);
(ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table);
(iii) full tuple (if REPLICA IDENTITY FULL is defined for table);
(iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for
table).

The TOAST columns have a special treatment for UPDATEs. If a tuple that
contains a TOAST field is updated, the TOAST field is included iif it is
changed too. It means that unchanged TOAST field are omitted from
columns* arrays. This means less overhead while transmitting,
processing and applying changes.

By design, (i) output plugin doesn't know about aborted transactions and
(ii) subtransactions are reordered into a toplevel transaction and only
the committed pieces are passed to the plugin.

You can test it firing the regression tests (e.g. 'make test') or using
the following steps?

postgresql.conf:
wal_level = logical
max_wal_senders = 2
max_logical_slots = 2

start collecting WAL records:

$ pg_recvlogical --slot=foo -d euler -f /dev/stdout
--plugin=json_decoding_plugin --init

[execute some transactions]

start printing decoded transactions:

$ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start

stop collecting WAL records:

$ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop


Comments?


[1]
http://www.postgresql.org/message-id/20131015144848.GC8001@awork2.anarazel.de
[2]
http://www.postgresql.org/message-id/E1VepyF-0005gU-Tv@gemulon.postgresql.org


--
   Euler Taveira                   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Attachment

Re: JSON decoding plugin

From
Merlin Moncure
Date:
On Mon, Dec 9, 2013 at 7:03 AM, Euler Taveira <euler@timbira.com.br> wrote:
> Hi,
>
> A few months ago, it was proposed [1] that would be interested to have a
> json output plugin for logical decoding. Here it is.
>
> Each transaction is a JSON object that can contain xid (optional),
> timestamp (optional), and change array. Each change's element is a
> command that was decoded and it can contains: kind (I/U/D), schema
> (optional), table, columnnames, columntypes (optional), columnvalues,
> and oldkeys (only for U/D). columnnames, columntypes and columnvalues
> are arrays. oldkeys is an object that contains the following arrays:
> keynames, keytypes (optional), and keyvalues.
>
> The JSON objects are serialized if you are decoding a serie of
> transactions. Here is an output example:
>
> {
>         "xid": 702,
>         "change": [
>                 {
>                         "kind": "insert",
>                         "schema": "public",
>                         "table": "foo",
>                         "columnnames": ["a", "b", "c"],
>                         "columntypes": ["int4", "int4", "text"],
>                         "columnvalues": [1, 2, "test"]
>                 }
>                 ,{
>                         "kind": "update",
>                         "schema": "public",
>                         "table": "foo",
>                         "columnnames": ["a", "b", "c"],
>                         "columntypes": ["int4", "int4", "text"],
>                         "columnvalues": [1, 2, "test2"],
>                         "oldkeys": {
>                                 "keynames": ["a", "b"],
>                                 "keytypes": ["int4", "int4"],
>                                 "keyvalues": [1, 2]
>                         }
>                 }
>         ]
> }
> {
>         "xid": 703,
>         "change": [
>                 {
>                         "kind": "update",
>                         "schema": "public",
>                         "table": "foo",
>                         "columnnames": ["a", "b", "c"],
>                         "columntypes": ["int4", "int4", "text"],
>                         "columnvalues": [1, 3, "test2"],
>                         "oldkeys": {
>                                 "keynames": ["a", "b"],
>                                 "keytypes": ["int4", "int4"],
>                                 "keyvalues": [1, 2]
>                         }
>                 }
>         ]
> }
> {
>         "xid": 704,
>         "change": [
>                 {
>                         "kind": "delete",
>                         "schema": "public",
>                         "table": "foo",
>                         "oldkeys": {
>                                 "keynames": ["a", "b"],
>                                 "keytypes": ["int4", "int4"],
>                                 "keyvalues": [1, 3]
>                         }
>                 }
>         ]
> }
>
>
> Some data types was adapted to conform with JSON spec. NAN and Infinity
> are not valid JSON symbols so their representation is NULL (as some JSON
> implementations). Due to JSON datatype simplicity, I represent the vast
> majority of Postgres datatypes as string (However, I admit that we could
> mimic the json datatype conversion rules).
>
> The oldkeys treatment follows what was defined by the commit [2]. It uses:
>
> (i) primary key (default behavior);
> (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table);
> (iii) full tuple (if REPLICA IDENTITY FULL is defined for table);
> (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for
> table).
>
> The TOAST columns have a special treatment for UPDATEs. If a tuple that
> contains a TOAST field is updated, the TOAST field is included iif it is
> changed too. It means that unchanged TOAST field are omitted from
> columns* arrays. This means less overhead while transmitting,
> processing and applying changes.
>
> By design, (i) output plugin doesn't know about aborted transactions and
> (ii) subtransactions are reordered into a toplevel transaction and only
> the committed pieces are passed to the plugin.
>
> You can test it firing the regression tests (e.g. 'make test') or using
> the following steps?
>
> postgresql.conf:
> wal_level = logical
> max_wal_senders = 2
> max_logical_slots = 2
>
> start collecting WAL records:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout
> --plugin=json_decoding_plugin --init
>
> [execute some transactions]
>
> start printing decoded transactions:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start
>
> stop collecting WAL records:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop
>
>
> Comments?

This is pretty neat.   Couple minor questions:
*) Aren't you *en*coding data into json, not the other way around (decoding?)
*) Consider generating a long bytea instead of explicitly writing a
32kb sql into the patch.
*) You've built your own json serializer here.  Maybe some code can be
shared with the json type?
*) Consider removing 'plugin ' from the name of the plugin.
--plugin=json_decoding etc.

merlin



Re: JSON decoding plugin

From
Euler Taveira
Date:
On 09-12-2013 13:12, Merlin Moncure wrote:
> This is pretty neat.   Couple minor questions:
> *) Aren't you *en*coding data into json, not the other way around (decoding?)
>
Yes. The 'decoding' came from the functionality (logical decoding) and
because the POC plugin is named 'test_decoding'. I also think that
'json_decoding' doesn't say much about the module purpose. I confess
that I don't like the name but can't come up with a good name. Maybe
'wal2json' or 'logrep2json'? Could you suggest something?

> *) Consider generating a long bytea instead of explicitly writing a
> 32kb sql into the patch.
>
I'll consider for next version.

> *) You've built your own json serializer here.  Maybe some code can be
> shared with the json type?
>
Same here. I already took a look at the json datatype but decided that I
wouldn't mess up with the backend code before have a feedback in the
general idea.

> *) Consider removing 'plugin ' from the name of the plugin.
> --plugin=json_decoding etc.
> 
'plugin' was a tentative to produce an unique name (it sucks but...).


--   Euler Taveira                   Timbira - http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 



Re: JSON decoding plugin

From
Merlin Moncure
Date:
On Mon, Dec 9, 2013 at 10:53 AM, Euler Taveira <euler@timbira.com.br> wrote:
> On 09-12-2013 13:12, Merlin Moncure wrote:
>> This is pretty neat.   Couple minor questions:
>> *) Aren't you *en*coding data into json, not the other way around (decoding?)
>>
> Yes. The 'decoding' came from the functionality (logical decoding) and
> because the POC plugin is named 'test_decoding'. I also think that
> 'json_decoding' doesn't say much about the module purpose. I confess
> that I don't like the name but can't come up with a good name. Maybe
> 'wal2json' or 'logrep2json'? Could you suggest something?

I'm partial to wal2json actually.

merlin