Thread: [HACKERS] export import bytea from psql

[HACKERS] export import bytea from psql

From
Pavel Stehule
Date:
Hi

This is topic, that I try to solve some longer time. I would to open discussion before summer commitfest.

The work with binary data or long text fields (multilines) or long not only utf8 encoded XML is not well supported by psql console. Is not hard to use binary protocol and write custom application, but it needs some time and some experience. Buildin support in psql can helps lot of users.

Export
=====
Last time the most conflict point is mapping bytea field to file. Has sense or has not sense to push more bytea fields to one file?  It is implementation detail - just raise or not raise a exception when count > 1. Mapping 1:1 is little bit cleaner, simpler from user perspective - but some use cases are not possible and files should be joined on operation system level. Can we find a agreement?

Design
----------

SELECT xxx
\gstore file -- use text protocol

SELECT xxx
\gbstore file -- use binary protocol

Theoretically we can support more target files in one command. But I am thinking it is over engineering. If somebody need it, then he can write small app in Perl, Python ...

Import
=====

There are more possible ways

1. using psql variables - we just need to write commands \setfrom \setfrombf - this should be very simple implementation. The value can be used more times. On second hand - the loaded variable can hold lot of memory (and it can be invisible for user). Next disadvantage - when SQL commands with this value fails, then the full SQL command can be written to log (there is high risk of log bloating).

2. using psql variables with query variables - first part is same like @1. This needs a parametric queries support. This is partially optimised first variant - the risk of log bloating is reduced.

3. using special gexec command where query parameters can be replaced by specified content. Some like

insert into foo values($1, $2)
\gusefiles xml:~/xx.xml bytea:~/avatar.jpg

This is one step command - so work can be faster. There is zero risk of forgotten content in variable. On second hand the command is more complex.  

The binary import with binary protocol is risky if content doesn't respect a protocol. I don't think we need to support unknown (any) formats and custom data types. Only known formats can be supported text, json, xml (text or binary), bytea (text or binary). Using binary protocol for XML format enforces automatic conversion. 

Opinions? Notes?

Regards

Pavel

Re: [HACKERS] export import bytea from psql

From
"Daniel Verite"
Date:
Pavel Stehule wrote:

> 1. using psql variables - we just need to write commands \setfrom
> \setfrombf - this should be very simple implementation. The value can be
> used more times. On second hand - the loaded variable can hold lot of
> memory (and it can be invisible for user).

This could be simplified by using the variable only for the filename,
and then injecting the contents of the file into the PQexec'd query
as the interpolation of the variable.
We already have::var for verbatim injection:'var' for injection quoted-as-text:"var" for injection quoted-as-identifier

What if we add new forms of dereferencing, for instance
(not necessarily with this exact syntax)::<var> for injecting the quoted-as-text contents of the file pointed to by
var.:{var}same thing but with file contents quoted as binary(PQescapeByteaConn) 

then we could write:

\set img '/path/to/image.png'
insert into image(binary) values(:{img});

We could also go further  in that direction. More new interpolation
syntax could express that a variable is to be passed as a
parameter rather than injected (assuming a parametrized query),
and whether the value is directly the contents or it's a filename
pointing to the contents, and whether its format is binary or text,
and even support an optional oid or typename coupled to
the variable.
That would be a lot of new syntax for interpolation but no new
backslash command and no change to \set itself.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] export import bytea from psql

From
Pavel Stehule
Date:


2017-05-09 23:00 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
        Pavel Stehule wrote:

> 1. using psql variables - we just need to write commands \setfrom
> \setfrombf - this should be very simple implementation. The value can be
> used more times. On second hand - the loaded variable can hold lot of
> memory (and it can be invisible for user).

This could be simplified by using the variable only for the filename,
and then injecting the contents of the file into the PQexec'd query
as the interpolation of the variable.
We already have:
 :var for verbatim injection
 :'var' for injection quoted-as-text
 :"var" for injection quoted-as-identifier

What if we add new forms of dereferencing, for instance
(not necessarily with this exact syntax):
 :<var> for injecting the quoted-as-text contents of the file pointed
  to by var.
 :{var} same thing but with file contents quoted as binary
 (PQescapeByteaConn)

then we could write:

\set img '/path/to/image.png'
insert into image(binary) values(:{img});

It is similar to my first or second proposal - rejected by Tom :(
 

We could also go further  in that direction. More new interpolation
syntax could express that a variable is to be passed as a
parameter rather than injected (assuming a parametrized query),
and whether the value is directly the contents or it's a filename
pointing to the contents, and whether its format is binary or text,
and even support an optional oid or typename coupled to
the variable.
That would be a lot of new syntax for interpolation but no new
backslash command and no change to \set itself.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] export import bytea from psql

From
"Daniel Verite"
Date:
    Pavel Stehule wrote:

> It is similar to my first or second proposal - rejected by Tom :(

Doesn't it differ? ISTM that in the patches/discussion related to:
https://commitfest.postgresql.org/11/787/
it was proposed to change \set in one way or another,
and also in the point #1 of this present thread:

> > 1. using psql variables - we just need to write commands \setfrom
> > \setfrombf - this should be very simple implementation. The value can be
> > used more times. On second hand - the loaded variable can hold lot of
> > memory (and it can be invisible for user).


My comment is: let's not change \set or even create a variant
of it just for importing verbatim file contents, in text or binary,
because interpolating psql variables differently in the query itself
is all we need.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] export import bytea from psql

From
Pavel Stehule
Date:


2017-05-11 17:16 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
        Pavel Stehule wrote:

> It is similar to my first or second proposal - rejected by Tom :(

Doesn't it differ? ISTM that in the patches/discussion related to:
https://commitfest.postgresql.org/11/787/
it was proposed to change \set in one way or another,
and also in the point #1 of this present thread:

> > 1. using psql variables - we just need to write commands \setfrom
> > \setfrombf - this should be very simple implementation. The value can be
> > used more times. On second hand - the loaded variable can hold lot of
> > memory (and it can be invisible for user).


My comment is: let's not change \set or even create a variant
of it just for importing verbatim file contents, in text or binary,
because interpolating psql variables differently in the query itself
is all we need.

There was 100% agreement on format and then discussion finished without any result.

Regards

Pavel
 


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite