Thread: JSON decoding plugin
Hi, A few months ago, it was proposed [1] that would be interested to have a json output plugin for logical decoding. Here it is. Each transaction is a JSON object that can contain xid (optional), timestamp (optional), and change array. Each change's element is a command that was decoded and it can contains: kind (I/U/D), schema (optional), table, columnnames, columntypes (optional), columnvalues, and oldkeys (only for U/D). columnnames, columntypes and columnvalues are arrays. oldkeys is an object that contains the following arrays: keynames, keytypes (optional), and keyvalues. The JSON objects are serialized if you are decoding a serie of transactions. Here is an output example: { "xid": 702, "change": [ { "kind": "insert", "schema": "public", "table": "foo", "columnnames": ["a", "b", "c"], "columntypes": ["int4", "int4", "text"], "columnvalues": [1, 2, "test"] } ,{ "kind": "update", "schema": "public", "table": "foo", "columnnames": ["a", "b", "c"], "columntypes": ["int4", "int4", "text"], "columnvalues": [1, 2, "test2"], "oldkeys": { "keynames": ["a", "b"], "keytypes": ["int4", "int4"], "keyvalues": [1, 2] } } ] } { "xid": 703, "change": [ { "kind": "update", "schema": "public", "table": "foo", "columnnames": ["a", "b", "c"], "columntypes": ["int4", "int4", "text"], "columnvalues": [1, 3, "test2"], "oldkeys": { "keynames": ["a", "b"], "keytypes": ["int4", "int4"], "keyvalues": [1, 2] } } ] } { "xid": 704, "change": [ { "kind": "delete", "schema": "public", "table": "foo", "oldkeys": { "keynames": ["a", "b"], "keytypes": ["int4", "int4"], "keyvalues": [1, 3] } } ] } Some data types was adapted to conform with JSON spec. NAN and Infinity are not valid JSON symbols so their representation is NULL (as some JSON implementations). Due to JSON datatype simplicity, I represent the vast majority of Postgres datatypes as string (However, I admit that we could mimic the json datatype conversion rules). The oldkeys treatment follows what was defined by the commit [2]. It uses: (i) primary key (default behavior); (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table); (iii) full tuple (if REPLICA IDENTITY FULL is defined for table); (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for table). The TOAST columns have a special treatment for UPDATEs. If a tuple that contains a TOAST field is updated, the TOAST field is included iif it is changed too. It means that unchanged TOAST field are omitted from columns* arrays. This means less overhead while transmitting, processing and applying changes. By design, (i) output plugin doesn't know about aborted transactions and (ii) subtransactions are reordered into a toplevel transaction and only the committed pieces are passed to the plugin. You can test it firing the regression tests (e.g. 'make test') or using the following steps? postgresql.conf: wal_level = logical max_wal_senders = 2 max_logical_slots = 2 start collecting WAL records: $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --plugin=json_decoding_plugin --init [execute some transactions] start printing decoded transactions: $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start stop collecting WAL records: $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop Comments? [1] http://www.postgresql.org/message-id/20131015144848.GC8001@awork2.anarazel.de [2] http://www.postgresql.org/message-id/E1VepyF-0005gU-Tv@gemulon.postgresql.org -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Attachment
On Mon, Dec 9, 2013 at 7:03 AM, Euler Taveira <euler@timbira.com.br> wrote: > Hi, > > A few months ago, it was proposed [1] that would be interested to have a > json output plugin for logical decoding. Here it is. > > Each transaction is a JSON object that can contain xid (optional), > timestamp (optional), and change array. Each change's element is a > command that was decoded and it can contains: kind (I/U/D), schema > (optional), table, columnnames, columntypes (optional), columnvalues, > and oldkeys (only for U/D). columnnames, columntypes and columnvalues > are arrays. oldkeys is an object that contains the following arrays: > keynames, keytypes (optional), and keyvalues. > > The JSON objects are serialized if you are decoding a serie of > transactions. Here is an output example: > > { > "xid": 702, > "change": [ > { > "kind": "insert", > "schema": "public", > "table": "foo", > "columnnames": ["a", "b", "c"], > "columntypes": ["int4", "int4", "text"], > "columnvalues": [1, 2, "test"] > } > ,{ > "kind": "update", > "schema": "public", > "table": "foo", > "columnnames": ["a", "b", "c"], > "columntypes": ["int4", "int4", "text"], > "columnvalues": [1, 2, "test2"], > "oldkeys": { > "keynames": ["a", "b"], > "keytypes": ["int4", "int4"], > "keyvalues": [1, 2] > } > } > ] > } > { > "xid": 703, > "change": [ > { > "kind": "update", > "schema": "public", > "table": "foo", > "columnnames": ["a", "b", "c"], > "columntypes": ["int4", "int4", "text"], > "columnvalues": [1, 3, "test2"], > "oldkeys": { > "keynames": ["a", "b"], > "keytypes": ["int4", "int4"], > "keyvalues": [1, 2] > } > } > ] > } > { > "xid": 704, > "change": [ > { > "kind": "delete", > "schema": "public", > "table": "foo", > "oldkeys": { > "keynames": ["a", "b"], > "keytypes": ["int4", "int4"], > "keyvalues": [1, 3] > } > } > ] > } > > > Some data types was adapted to conform with JSON spec. NAN and Infinity > are not valid JSON symbols so their representation is NULL (as some JSON > implementations). Due to JSON datatype simplicity, I represent the vast > majority of Postgres datatypes as string (However, I admit that we could > mimic the json datatype conversion rules). > > The oldkeys treatment follows what was defined by the commit [2]. It uses: > > (i) primary key (default behavior); > (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table); > (iii) full tuple (if REPLICA IDENTITY FULL is defined for table); > (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for > table). > > The TOAST columns have a special treatment for UPDATEs. If a tuple that > contains a TOAST field is updated, the TOAST field is included iif it is > changed too. It means that unchanged TOAST field are omitted from > columns* arrays. This means less overhead while transmitting, > processing and applying changes. > > By design, (i) output plugin doesn't know about aborted transactions and > (ii) subtransactions are reordered into a toplevel transaction and only > the committed pieces are passed to the plugin. > > You can test it firing the regression tests (e.g. 'make test') or using > the following steps? > > postgresql.conf: > wal_level = logical > max_wal_senders = 2 > max_logical_slots = 2 > > start collecting WAL records: > > $ pg_recvlogical --slot=foo -d euler -f /dev/stdout > --plugin=json_decoding_plugin --init > > [execute some transactions] > > start printing decoded transactions: > > $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start > > stop collecting WAL records: > > $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop > > > Comments? This is pretty neat. Couple minor questions: *) Aren't you *en*coding data into json, not the other way around (decoding?) *) Consider generating a long bytea instead of explicitly writing a 32kb sql into the patch. *) You've built your own json serializer here. Maybe some code can be shared with the json type? *) Consider removing 'plugin ' from the name of the plugin. --plugin=json_decoding etc. merlin
On 09-12-2013 13:12, Merlin Moncure wrote: > This is pretty neat. Couple minor questions: > *) Aren't you *en*coding data into json, not the other way around (decoding?) > Yes. The 'decoding' came from the functionality (logical decoding) and because the POC plugin is named 'test_decoding'. I also think that 'json_decoding' doesn't say much about the module purpose. I confess that I don't like the name but can't come up with a good name. Maybe 'wal2json' or 'logrep2json'? Could you suggest something? > *) Consider generating a long bytea instead of explicitly writing a > 32kb sql into the patch. > I'll consider for next version. > *) You've built your own json serializer here. Maybe some code can be > shared with the json type? > Same here. I already took a look at the json datatype but decided that I wouldn't mess up with the backend code before have a feedback in the general idea. > *) Consider removing 'plugin ' from the name of the plugin. > --plugin=json_decoding etc. > 'plugin' was a tentative to produce an unique name (it sucks but...). -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On Mon, Dec 9, 2013 at 10:53 AM, Euler Taveira <euler@timbira.com.br> wrote: > On 09-12-2013 13:12, Merlin Moncure wrote: >> This is pretty neat. Couple minor questions: >> *) Aren't you *en*coding data into json, not the other way around (decoding?) >> > Yes. The 'decoding' came from the functionality (logical decoding) and > because the POC plugin is named 'test_decoding'. I also think that > 'json_decoding' doesn't say much about the module purpose. I confess > that I don't like the name but can't come up with a good name. Maybe > 'wal2json' or 'logrep2json'? Could you suggest something? I'm partial to wal2json actually. merlin