Thread: [PROPOSAL]a new data type 'bytea' for ECPG
Hi, Hackers # This is my first post. I will try to implement a new data type 'bytea' for ECPG. I think that the implementation is not complicated. Does anyone need it ? * Why do I need bytea ? Currently, ECPG program can treat binary data for bytea column with 'char' type of C language, but it must convert from/to escaped format with PQunescapeBytea/ PQescapeBytea(). It forces users to add an unnecessary code and to pay cost for the conversion in runtime. # My PoC will not be able to solve output conversion cost. I think that set/put data for host variable should be more simple. The following is an example of Oracle Pro *C program for RAW type column. VARCHAR raw_data[20]; /* preprocessed to the following * struct * { * unsigned short len; * unsigned char arr[20]; * } raw_data; */ raw_data.len = 10; memcpy(raw_data.arr, data, 10); see also: https://docs.oracle.com/cd/E11882_01/appdev.112/e10825/pc_04dat.htm#i23305 In ECPG, varchar host variable cannot be used for bytea because it cannot treat '\0' as part of data. If the length is set to 10 and there is '\0' at 3rd byte, ecpglib truncates 3rd byte and later at the following: [src/interfaces/ecpg/ecpglib/execute.c] ecpg_store_input(const int lineno, const bool force_indicator, const struct : switch (var->type) : case ECPGt_varchar: if (!(newcopy = (char *) ecpg_alloc(variable->len + 1, lineno))) return false; !! strncpy(newcopy, variable->arr, variable->len); newcopy[variable->len] = '\0'; I also think that the behavior of varchar host variable should not be changed because of compatibility. Therefore, a new type of host variable 'bytea' is needed. Since ecpglib can distinguish between C string and binary data, it can send binary data to backend directly by using 'paramFormats' argument of PQexecParams(). Unfortunately, the conversion of output data cannot be omitted in ecpglib because libpq doesn't provide like 'paramFormats'. ('resultFormat' means that *all* data from backend is formatted by binary or not.) PQexecParams(PGconn *conn, const char *command, int nParams, const Oid *paramTypes, const char *const *paramValues, const int *paramLengths, !! const int *paramFormats, int resultFormat) * How to use new 'bytea' ? ECPG programmers can use almost same as 'varchar' but cannot use as names. (e.g. connection name, prepared statement name, cursor name and so on) - Can use in Declare Section. exec sql begin declare section; bytea data1[512]; bytea data2[DATA_SIZE]; /* can use macro */ bytea send_data[DATA_NUM][DATA_SIZE]; /* can use two dimensional array */ bytea recv_data[][DATA_SIZE]; /* can use flexible array */ exec sql end declare section; - Can *not* use for name. exec sql begin declare section; bytea conn_name[DATA_SIZE]; exec sql end declare section; exec sql connect to :conn_name; !! error - Conversion is not needed in user program. exec sql begin declare section; bytea send_buf[DATA_SIZE]; bytea recv_buf[DATA_SIZE - 13]; int ind_recv; exec sql end declare section; exec sql create table test (data1 bytea); exec sql truncate test; exec sql insert into test (data1) values (:send_buf); exec sql select data1 into :recv_buf:ind_recv from test; /* ind_recv is set to 13. */ * How to preprocess 'bytea' ? 'bytea' is preprocessed almost same as varchar. The following is preprocessed to the next. exec sql begin declare section; bytea data[DATA_SIZE]; bytea send_data[DATA_NUM][DATA_SIZE]; bytea recv_data[][DATA_SIZE]; exec sql end declare section; struct bytea_1 {int len; char arr[DATA_SIZE]} data; struct bytea_2 {int len; char arr[DATA_SIZE]} send_data[DATA_NUM]; struct bytea_3 {int len; char arr[DATA_SIZE]} *recv_data; Thank you for your consideration. Regards Ryo Matsumura
Hackers No one commented to the proposal, but I'm not discouraged. I attach a patch. Please review or comment to proposal. Note: - The patch cannot yet decode escape format data from backend. - [ecpg/test/expected/sql-bytea.stderr] in patch includes non-ascii data. I explain a little about the patch. Preproc: Almost same as varchar. Ecpglib: - ecpg_build_params() Build two more arrays paramlengths and paramformats for PQexecParams(). If input variable type is bytea, set pramformats to 1(= is binary) and set binary data length to paramlengths. - ecpg_store_input() If input variable type is bytea, copy its binary data to ecpg_alloc-ed area directly. - ecpg_get_data() If output variable type is bytea, decode received results to user area. Encode/decode function is imported from backend/utils/adt/encode.c - ECPGset_desc() Currently ecpglib saves data to internal area(struct descriptor_item) for execution phase, but doesn't save type information that is needed in case of bytea. So I add a member is_binary to descriptor_item structure. Thank you. Regards Ryo Matsumura > -----Original Message----- > From: Matsumura, Ryo [mailto:matsumura.ryo@jp.fujitsu.com] > Sent: Monday, October 1, 2018 5:04 PM > To: pgsql-hackers@lists.postgresql.org > Subject: [PROPOSAL]a new data type 'bytea' for ECPG > > Hi, Hackers > > # This is my first post. > > I will try to implement a new data type 'bytea' for ECPG. > I think that the implementation is not complicated. > Does anyone need it ? > > > * Why do I need bytea ? > > Currently, ECPG program can treat binary data for bytea column with 'char' > type > of C language, but it must convert from/to escaped format with PQunescapeBytea/ > PQescapeBytea(). It forces users to add an unnecessary code and to pay cost > for > the conversion in runtime. > # My PoC will not be able to solve output conversion cost. > > I think that set/put data for host variable should be more simple. > The following is an example of Oracle Pro *C program for RAW type column. > > VARCHAR raw_data[20]; > > /* preprocessed to the following > * struct > * { > * unsigned short len; > * unsigned char arr[20]; > * } raw_data; > */ > > raw_data.len = 10; > memcpy(raw_data.arr, data, 10); > > see also: > > https://docs.oracle.com/cd/E11882_01/appdev.112/e10825/pc_04dat.htm#i2330 > 5 > > In ECPG, varchar host variable cannot be used for bytea because it cannot treat > '\0' as part of data. If the length is set to 10 and there is '\0' at 3rd byte, > ecpglib truncates 3rd byte and later at the following: > > [src/interfaces/ecpg/ecpglib/execute.c] > ecpg_store_input(const int lineno, const bool force_indicator, const struct > : > switch (var->type) > : > case ECPGt_varchar: > if (!(newcopy = (char *) ecpg_alloc(variable->len + 1, lineno))) > return false; > !! strncpy(newcopy, variable->arr, variable->len); > newcopy[variable->len] = '\0'; > > I also think that the behavior of varchar host variable should not be changed > because of compatibility. > Therefore, a new type of host variable 'bytea' is needed. > > Since ecpglib can distinguish between C string and binary data, it can send > binary data to backend directly by using 'paramFormats' argument of > PQexecParams(). > Unfortunately, the conversion of output data cannot be omitted in ecpglib > because > libpq doesn't provide like 'paramFormats'. > ('resultFormat' means that *all* data from backend is formatted by binary > or not.) > > PQexecParams(PGconn *conn, > const char *command, > int nParams, > const Oid *paramTypes, > const char *const *paramValues, > const int *paramLengths, > !! const int *paramFormats, > int resultFormat) > > > > * How to use new 'bytea' ? > > ECPG programmers can use almost same as 'varchar' but cannot use as names. > (e.g. connection name, prepared statement name, cursor name and so on) > > - Can use in Declare Section. > > exec sql begin declare section; > bytea data1[512]; > bytea data2[DATA_SIZE]; /* can use macro */ > bytea send_data[DATA_NUM][DATA_SIZE]; /* can use two dimensional array > */ > bytea recv_data[][DATA_SIZE]; /* can use flexible array */ > exec sql end declare section; > > - Can *not* use for name. > > exec sql begin declare section; > bytea conn_name[DATA_SIZE]; > exec sql end declare section; > > exec sql connect to :conn_name; !! error > > - Conversion is not needed in user program. > > exec sql begin declare section; > bytea send_buf[DATA_SIZE]; > bytea recv_buf[DATA_SIZE - 13]; > int ind_recv; > exec sql end declare section; > > exec sql create table test (data1 bytea); > exec sql truncate test; > exec sql insert into test (data1) values (:send_buf); > exec sql select data1 into :recv_buf:ind_recv from test; > /* ind_recv is set to 13. */ > > > > * How to preprocess 'bytea' ? > > 'bytea' is preprocessed almost same as varchar. > The following is preprocessed to the next. > > exec sql begin declare section; > bytea data[DATA_SIZE]; > bytea send_data[DATA_NUM][DATA_SIZE]; > bytea recv_data[][DATA_SIZE]; > exec sql end declare section; > > struct bytea_1 {int len; char arr[DATA_SIZE]} data; > struct bytea_2 {int len; char arr[DATA_SIZE]} send_data[DATA_NUM]; > struct bytea_3 {int len; char arr[DATA_SIZE]} *recv_data; > > > Thank you for your consideration. > > > Regards > Ryo Matsumura > >
Attachment
Hi Matsumoro-san, thanks for your effort and apologies for the late reply. > I think that set/put data for host variable should be more simple. > The following is an example of Oracle Pro *C program for RAW type > column. Just to be clear, Oracle can use varchars for binary data, right? > In ECPG, varchar host variable cannot be used for bytea because it > cannot treat > '\0' as part of data. If the length is set to 10 and there is '\0' at > 3rd byte, > ecpglib truncates 3rd byte and later at the following: I've been traveling too much to check, but does the standard say anything about that? > I also think that the behavior of varchar host variable should not be > changed > because of compatibility. > Therefore, a new type of host variable 'bytea' is needed. This I am actually not sure about. I think we should try to stick with the standard and, if it does not comment on it, with what others in the market do to make migrations easier. So far I do not remember any database having a bytea datatype in embedded SQL. Comments anyone? Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Hi Michael Thank you for your comment. # Please call me Ryo. Matsumura is too long. > > I think that set/put data for host variable should be more simple. > > The following is an example of Oracle Pro *C program for RAW type > > column. > > Just to be clear, Oracle can use varchars for binary data, right? I'm sorry. That is my mistake. In Pro*C, the data should be represented as hex format C string. > > In ECPG, varchar host variable cannot be used for bytea because it > > cannot treat > > '\0' as part of data. If the length is set to 10 and there is '\0' at > > 3rd byte, > > ecpglib truncates 3rd byte and later at the following: > > I've been traveling too much to check, but does the standard say > anything about that? bytea as a type of table definition may correspond to BLOB in the standard. If it is true, the standard defines corresponding type in C as the following: ------ struct { long hvn_reserved unsigned long hvn_length char hvn_data[L]; } hvn * hvn is the name of the host variable defined to correspond to the SQL data type ------ > > I also think that the behavior of varchar host variable should not be > > changed > > because of compatibility. > > Therefore, a new type of host variable 'bytea' is needed. > This I am actually not sure about. I think we should try to stick with > the standard and, if it does not comment on it, with what others in the > market do to make migrations easier. So far I do not remember any > database having a bytea datatype in embedded SQL. Maybe Oracle doesn't have it. I found documents of DB2. blob(n) correspond to BLOB in V11. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.apdv.embed.doc/doc/r0006090.html Since I don't have DB2 installation, I cannot confirm typedef of blob(n). But in V9 the following structure correspond to BLOB. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.7.0/com.ibm.db2.luw.apdv.routines.doc/doc/c0009190.html struct sqludf_lob { sqluint32 length; /* length in bytes */ char data[1]; /* first byte of lob */ }; It seems that there is no defact and no product following to the standards. I wonder whether bytea should follow to the standard completely or follow to existing varchar for usability. Thank you. Regards Ryo Matsumura
Hi Ryo-san, > # Please call me Ryo. Matsumura is too long. Thanks. > In Pro*C, the data should be represented as hex format C string. Just to clarify, there is no special datatype for binary data? > bytea as a type of table definition may correspond to BLOB in the > standard. Would we prefer to add a blob datatype then? > It seems that there is no defact and no product following to the > standards. > I wonder whether bytea should follow to the standard completely or > follow to existing varchar for usability. Do you see any disadvantage of following the standard? I don't really see where the usability drawback is. In general I would prefer being as close to the standard as reasonably possible. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
From: Michael Meskes [mailto:meskes@postgresql.org] > > bytea as a type of table definition may correspond to BLOB in the > > standard. > > Would we prefer to add a blob datatype then? > > > It seems that there is no defact and no product following to the > > standards. > > I wonder whether bytea should follow to the standard completely or > > follow to existing varchar for usability. > > Do you see any disadvantage of following the standard? I don't really > see where the usability drawback is. In general I would prefer being as > close to the standard as reasonably possible. I think the host variable data type that corresponds to the server-side bytea should be bytea. As the following pages stateor imply, it would be better to create standard-compliant LOB types someday, and use the keyword BLOB in ECPG for thattype. The server-side data types should have the names BLOB, CLOB and NCLOB. Those types should handle data largetthan 1 GB and have the locator feature defined in the SQL standard. Maybe we should also advanced LOB features likeOracle's SecureFiles LOB and SQL Server's FileTables. https://www.postgresql.org/docs/current/static/datatype-binary.html "The SQL standard defines a different binary string type, called BLOB or BINARY LARGE OBJECT. The input format is differentfrom bytea, but the provided functions and operators are mostly the same." BinaryFilesInDB https://wiki.postgresql.org/wiki/BinaryFilesInDB Regards Takayuki Tsunakawa
Hi Michael > > In Pro*C, the data should be represented as hex format C string. > > Just to clarify, there is no special datatype for binary data? I apology for lack of research again. Since it's a little difficult to answer, I explain by samples. The following works. unsigned char buffer[128]; /* It's not needed to declare in DECLARE section. */ exec sql var buffer is raw(128); /* This sematics may be different in ECPG. */ exec sql create table test(c1 raw(128)); exec sql insert into test(c1) values(:buffer); The following cannot be pre-compiled. In this sence, there is no special datatype for binary data. exec sql begin declare section; raw buffer[128]; /* error */ exec sql end declare section; exec sql create table test(c1 raw(128)); exec sql insert into test(c1) values(:buffer); > > bytea as a type of table definition may correspond to BLOB in the > > standard. > > Would we prefer to add a blob datatype then? I think that blob datatype is needed for large binary data *in finally*, but blob datatype and its access methods(*) is not needed for non-large binary data(e.g. use for text data of which encoding is different from database encoding) because of its complexity. My proposal is mainly for non-large binary data. (*) e.g. In Pro*C, OPEN, READ, WRITE, CLOSE, APPEND, and so on. > > It seems that there is no defact and no product following to the > > standards. > > I wonder whether bytea should follow to the standard completely or > > follow to existing varchar for usability. > > Do you see any disadvantage of following the standard? I don't really > see where the usability drawback is. In general I would prefer being as > close to the standard as reasonably possible. I think there is no special reason to follow to existing varchar. I can accept the standard. (Re-implementation is not difficult.) Thank you. Regards Ryo Matsumura > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Saturday, October 27, 2018 3:43 AM > To: Matsumura, Ryo/松村 量 <matsumura.ryo@jp.fujitsu.com>; > pgsql-hackers@lists.postgresql.org > Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG > > Hi Ryo-san, > > > # Please call me Ryo. Matsumura is too long. > > Thanks. > > > In Pro*C, the data should be represented as hex format C string. > > Just to clarify, there is no special datatype for binary data? > > > bytea as a type of table definition may correspond to BLOB in the > > standard. > > Would we prefer to add a blob datatype then? > > > It seems that there is no defact and no product following to the > > standards. > > I wonder whether bytea should follow to the standard completely or > > follow to existing varchar for usability. > > Do you see any disadvantage of following the standard? I don't really > see where the usability drawback is. In general I would prefer being as > close to the standard as reasonably possible. > > Michael > -- > Michael Meskes > Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) > Meskes at (Debian|Postgresql) dot Org > Jabber: michael at xmpp dot meskes dot org > VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL >
> From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com] > > I think the host variable data type that corresponds to the server-side bytea > should be bytea. As the following pages state or imply, it would be better > to create standard-compliant LOB types someday, and use the keyword BLOB in > ECPG for that type. The server-side data types should have the names BLOB, > CLOB and NCLOB. Those types should handle data larget than 1 GB and have the > locator feature defined in the SQL standard. Maybe we should also advanced > LOB features like Oracle's SecureFiles LOB and SQL Server's FileTables. Tsunakawa-san, thanks for your advice. I understand that C type definition of client-side bytea is not constrained by the standard BLOB. What should I do next? For now, I attach a patch that is removed noise(pgindent/typedef.list). P.S. The patch does not support ECPG.bytea in sqltype of "struct sqlvar_struct" because of compatibility. Regards Ryo Matsumura
Attachment
Meskes-san Sorry to bother you, but I hope any comment of yours. Regards Ryo Matsumura > Subject: RE: [PROPOSAL]a new data type 'bytea' for ECPG > > > From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com] > > > > I think the host variable data type that corresponds to the server-side bytea > > should be bytea. As the following pages state or imply, it would be better > > to create standard-compliant LOB types someday, and use the keyword BLOB > in > > ECPG for that type. The server-side data types should have the names BLOB, > > CLOB and NCLOB. Those types should handle data larget than 1 GB and have > the > > locator feature defined in the SQL standard. Maybe we should also advanced > > LOB features like Oracle's SecureFiles LOB and SQL Server's FileTables. > > Tsunakawa-san, thanks for your advice. > I understand that C type definition of client-side bytea is not constrained > by the standard BLOB. > > What should I do next? > For now, I attach a patch that is removed noise(pgindent/typedef.list). > > P.S. > The patch does not support ECPG.bytea in sqltype of "struct sqlvar_struct" > because of compatibility. > > Regards > Ryo Matsumura
Matsumura-san, > Sorry to bother you, but I hope any comment of yours. It is no bother. I'm fine with the patch if it does not work against the standard. I do think, though, we should change the debug output for ecpg_free_params(). The way it is now it prints binary values which we also have in our test suite. I'm afraid this will come back to haunt us. > > The patch does not support ECPG.bytea in sqltype of "struct > > sqlvar_struct" > > because of compatibility. Sorry I do not really understand what you mean. Could you please explain? Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san Tnak you for your comment. > I do think, though, we should change the debug output for > ecpg_free_params(). I try to change about it. Next patch will print binary in hex-format. > > > The patch does not support ECPG.bytea in sqltype of "struct > > > sqlvar_struct" > > > because of compatibility. > > Sorry I do not really understand what you mean. Could you please > explain? I meaned that existing applications that receive data of bytea column with using sqlda will encounter their unknown type(=ECPG.bytea) in sqlvar_struct.sqltype. Regards Ryo Matsumura > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Wednesday, December 5, 2018 8:24 PM > To: Matsumura, Ryo/松村 量 <matsumura.ryo@jp.fujitsu.com>; Tsunakawa, > Takayuki/綱川 貴之 <tsunakawa.takay@jp.fujitsu.com> > Cc: pgsql-hackers@lists.postgresql.org > Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG > > Matsumura-san, > > > Sorry to bother you, but I hope any comment of yours. > > It is no bother. > > I'm fine with the patch if it does not work against the standard. > > I do think, though, we should change the debug output for > ecpg_free_params(). The way it is now it prints binary values which we > also have in our test suite. I'm afraid this will come back to haunt > us. > > > > The patch does not support ECPG.bytea in sqltype of "struct > > > sqlvar_struct" > > > because of compatibility. > > Sorry I do not really understand what you mean. Could you please > explain? > > Michael > -- > Michael Meskes > Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) > Meskes at (Debian|Postgresql) dot Org > Jabber: michael at xmpp dot meskes dot org > VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL > >
Matsumura-san, > > I do think, though, we should change the debug output for > > ecpg_free_params(). > > I try to change about it. Next patch will print binary in hex-format. Thank you. > > > > The patch does not support ECPG.bytea in sqltype of "struct > > > > sqlvar_struct" > > > > because of compatibility. > > > > Sorry I do not really understand what you mean. Could you please > > explain? > > I meaned that existing applications that receive data of bytea column > with using sqlda will encounter their unknown type(=ECPG.bytea) in > sqlvar_struct.sqltype. You mean if they are not recompiled? If so, yes, how else could that be handled? Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san > > > > The patch does not support ECPG.bytea in sqltype of "struct > > > > sqlvar_struct" > > > > because of compatibility. > > > > Sorry I do not really understand what you mean. Could you please > > explain? > > I meaned that existing applications that receive data of bytea column > with using sqlda will encounter their unknown type(=ECPG.bytea) in > sqlvar_struct.sqltype. > > You mean if they are not recompiled? If so, yes, how else could that be > handled? Even if they are recompiled, they will fail. switch (sqlvar_struct.sqltype) { case ECPG.int: break; case ECPG.char: break; /* There is no case for ECPG.bytea */ default: abort(); There is an idea as following, but it seems to be ugly. Implement a parameter for ecpglib. The parameter means whether application want to receive bytea data in binary format or not. Default is "not". # I don't know any ecpglib's parameter like it. In other words, if application uses "bytea" type host variable, ecpglib could know its intent, but in case of sqlda ecpglib could not know it. Regards Ryo Matsumura
Matsumura-san, > > I meaned that existing applications that receive data of bytea > > column > > with using sqlda will encounter their unknown type(=ECPG.bytea) in > > sqlvar_struct.sqltype. > > > > You mean if they are not recompiled? If so, yes, how else could > > that be > > handled? > > Even if they are recompiled, they will fail. > > switch (sqlvar_struct.sqltype) > { > case ECPG.int: break; > case ECPG.char: break; > /* There is no case for ECPG.bytea */ > default: abort(); Sorry, I should have been more precise. I meant if they are not recompiled against the new ecpglib which has a case for ECPG.bytea. > There is an idea as following, but it seems to be ugly. > > Implement a parameter for ecpglib. > The parameter means whether application want to receive > bytea data in binary format or not. Default is "not". > # I don't know any ecpglib's parameter like it. > > In other words, if application uses "bytea" type host variable, > ecpglib could know its intent, but in case of sqlda ecpglib could > not know it. I'm at a loss here. I don't understand what you are trying to say. An incompatible change is ok if we change the version number of the library accordingly. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san Maybe I understand your comments about compatibility. I will try to implement for sqlda. # I am not goot at library version mechanism. # I will learn about it. Regards Ryo Matsumura
Meskes-san I noticed that I was confused. My topic was about adding bytea as new host variable type. The topic *didn't* include that receiving binary format data into SQLDATA descriptor like the following. sqlda_t *sqlda; exec sql create table if not exists test (c1 bytea); exec sql select c1 into descriptor sqlda from test; /* It expects that sqlda->sqlvar[0].sqldata is binary format. */ So, please ignore the following in my mail at 2018-11-12 02:14:58. > P.S. > The patch does not support ECPG.bytea in sqltype of "struct sqlvar_struct" > because of compatibility. The topic included that receiving binary data into Named SQL descriptor with using bytea host variable like the following. It has already been implemented in my patch. exec sql begin declare section; bytea var[128]; exec sql end declare section; exec sql create table if not exists test (c1 bytea); exec sql allocate descriptor odesc; exec sql select c1 into sql descriptor odesc from test; exec sql get descriptor odesc value 1 :var = data; Regards Ryo Matsumura
Meskes-san > > I do think, though, we should change the debug output for > > ecpg_free_params(). > > I try to change about it. Next patch will print binary in hex-format. I implement and attach it. Please review a new patch in this mail. I have a question about ecpg manual when I add article for bytea. I wonder what does the following about VARCHAR mean. 35.4.4. Type Mapping Table 35.1. Mapping Between PostgreSQL Data Types and C Variable Types character(n), varchar(n), text | char[n+1], VARCHAR[n+1] [b] <footnote> [b] declared in ecpglib.h There is no declaration for VARCHAR in ecpglib.h. There is a declaration for ECPGt_varchar in ecpgtype.h, but it may be be unusefull information for users. On the other hand, footnote for 'bool' is usefull because there is C-definition macro in ecpglib.h. <footnote> [c] declared in ecpglib.h if not native <ecpglib.h> #ifndef bool #define bool char #endif I think, if the footnote of VARCHAR is meaningless, I remove it while I add the article for bytea. (I didn't remove in this patch.) Regards Ryo Matsumura
Attachment
Meskes-san Sorry to bother you, but I would be grateful if you would comment to me. Regards Ryo Matsumura > -----Original Message----- > From: Matsumura, Ryo [mailto:matsumura.ryo@jp.fujitsu.com] > Sent: Wednesday, December 19, 2018 7:05 PM > <tsunakawa.takay@jp.fujitsu.com> > Cc: pgsql-hackers@lists.postgresql.org > Subject: RE: [PROPOSAL]a new data type 'bytea' for ECPG > > Meskes-san > > > > I do think, though, we should change the debug output for > > > ecpg_free_params(). > > > > I try to change about it. Next patch will print binary in hex-format. > > I implement and attach it. Please review a new patch in this mail. > > > > I have a question about ecpg manual when I add article for bytea. > I wonder what does the following about VARCHAR mean. > > 35.4.4. Type Mapping > Table 35.1. Mapping Between PostgreSQL Data Types and C Variable Types > > character(n), varchar(n), text | char[n+1], VARCHAR[n+1] [b] > <footnote> > [b] declared in ecpglib.h > > There is no declaration for VARCHAR in ecpglib.h. > There is a declaration for ECPGt_varchar in ecpgtype.h, but it may be > be unusefull information for users. > > On the other hand, footnote for 'bool' is usefull because there is > C-definition macro in ecpglib.h. > > <footnote> > [c] declared in ecpglib.h if not native > > <ecpglib.h> > #ifndef bool > #define bool char > #endif > > I think, if the footnote of VARCHAR is meaningless, I remove it while I add > the article for bytea. (I didn't remove in this patch.) > > Regards > Ryo Matsumura
Matsumura-san, > Sorry to bother you, but I would be grateful if you would comment to me. Sorry, I didn't know you were waiting on a reply by me. > > I have a question about ecpg manual when I add article for bytea. > > I wonder what does the following about VARCHAR mean. > > ... > > I think, if the footnote of VARCHAR is meaningless, I remove it while I add > > the article for bytea. (I didn't remove in this patch.) I have no idea where the footnote comes from, but I agree that it doesn't seem to make sense. The datatype varchar in the C code is handled by the preprocessor and replaced by a struct definition anyway. Feel free to remove. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
On Fri, Feb 01, 2019 at 10:29:11AM +0100, Michael Meskes wrote: > I have no idea where the footnote comes from, but I agree that it doesn't seem > to make sense. The datatype varchar in the C code is handled by the > preprocessor and replaced by a struct definition anyway. > > Feel free to remove. Moved to next CF as the discussion moves on. -- Michael
Attachment
Meskes-san Thank you for your comment. I remove it and attach a new patch. Please review it. I feel sorry for asking you to reveiw without contribution. Regards Ryo Matsumura > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Friday, February 1, 2019 6:29 PM > To: Matsumura, Ryo/松村 量 <matsumura.ryo@jp.fujitsu.com> > Cc: 'Michael Meskes' <meskes@postgresql.org>; Tsunakawa, Takayuki/綱川 貴 > 之 <tsunakawa.takay@jp.fujitsu.com>; pgsql-hackers@lists.postgresql.org > Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG > > Matsumura-san, > > > Sorry to bother you, but I would be grateful if you would comment to me. > > Sorry, I didn't know you were waiting on a reply by me. > > > > I have a question about ecpg manual when I add article for bytea. > > > I wonder what does the following about VARCHAR mean. > > > ... > > > I think, if the footnote of VARCHAR is meaningless, I remove it while I > add > > > the article for bytea. (I didn't remove in this patch.) > > I have no idea where the footnote comes from, but I agree that it doesn't seem > to make sense. The datatype varchar in the C code is handled by the > preprocessor and replaced by a struct definition anyway. > > Feel free to remove. > > Michael > -- > Michael Meskes > Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) > Meskes at (Debian|Postgresql) dot Org > Jabber: michael at xmpp dot meskes dot org > VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL >
Attachment
Matsumura-san, > I remove it and attach a new patch. Please review it. > I feel sorry for asking you to reveiw without contribution. Don't worry. There is one thing that I don't understand right now. YOu change ecpg_store_input() to handle the bytea data type, yet you also change ECPGset_desc() to not use ecpg_store_input() in case of an bytea. This looks odd to me. Can you please explain that to me? Thanks Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san Thank you for your review. > There is one thing that I don't understand right now. YOu > change ecpg_store_input() to handle the bytea data type, yet you also > change ECPGset_desc() to not use ecpg_store_input() in case of an > bytea. This looks odd to me. Can you please explain that to me? I try to explain as follows. I would like to receive your comment. The current architecture of data copying of descriptor walks through the following path. The important point is that it walks through two ecpg_store_input(). step 1. ECPGset_desc Store to descriptor_item with ecpg_store_input(). step 2. ecpg_build_params(setup for tobeinserted) Store to tobeinserted with ecpg_store_input(). step 3. ecpg_build_params(building stmt->param*) Set tobeinserted to stmt->paramvalues. On the other hand, the part of ecpg_build_params(building stmt->param*) for bytea needs two information that are is_binary and binary_length. But, data copying with ecpg_store_input() losts them. There are two ideas to pass the information to part of ecpg_build_params(building stmt->param*). But they are same in terms of copying data without ecpg_store_input() at least ones. I selected Idea-1. Idea-1. step 1. ECPGset_desc Set descriptor_item.is_binary. Memcpy both bytea.length and bytea.array to descriptor_item.data. step 2. ecpg_build_params(setup for tobeinserted) Store bytea.array to tobeinserted with ecpg_store_input(bytea route). Set is_binary(local) from descriptor_item.is_binary. Set binary_length(local) from descriptor_item.data. step 3. ecpg_build_params(building stmt->param*) Set stmt->paramvalues from tobeinserted. Set stmt->formats from is_binary(local). Set stmt->lengths from binary_length(local). Idea-2. step 1. ECPGset_desc Set descriptor_item.is_binary. Set bytea.length to descriptor_item.data_len. (different!) Set bytea.array to descriptor_item.data. (different!) step 2. ecpg_build_params(setup for tobeinserted) Memcpy bytea.array to tobeinserted by using alloc and memcpy whitout store_input. (different!) Set is_binary(local) from descriptor_item.is_binary. Set binary_length(local) from descriptor_item.data_len. (different!) step 3. ecpg_build_params(building stmt->param*) Set stmt->paramvalues with tobeinserted. Set stmt->formats from is_binary(local). Set stmt->lengths from binary_length(local). Regards Ryo Matsumura
Matsumura-san, > I try to explain as follows. I would like to receive your comment. > ... I'm afraid I don't really understand your explanation. Why is handling a bytea so different from handling a varchar? I can see some differences due to its binary nature, but I do not understand why it needs so much special handling for stuff like its length? There is a length field in the structure but instead of using it the data field is used to store both, the length and the data. What am I missing? Please keep in mind that I did not write the descriptor code, so I may very well not see the obvious. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san At first, I find my mistake that the following member is not used in my patch. Sorry... [ecpglib_extern.h] 120 struct descriptor_item 130 int data_len; > Why is handling a bytea so different from handling a varchar? Current architecture: Internal expression of varchar is C-string that includes length information implicitly because the length can be computed by strlen(). ECPGdo(ecpg_build_params) assumes that the data in descriptor is C-string encoded. In other hand, bytea data is binary that doesn't include any length information. And the merit of my proposal is that bytea data can be sent to backend without C-string encodeing overhead. They are different points from varchar. I try to explain current data flow and my ideas. # It may not be simple explanation... Current data flow: > /* exec sql set descriptor idesc value 1 data = :binary_var; */ > { ECPGset_desc(__LINE__, "idesc", 1,ECPGd_data, > ECPGt_bytea,&(binary_var),(long)DATA_SIZE,(long)1,sizeof(struct bytea_1), ECPGd_EODT); Ecpglib stores user data into [struct descriptor_item]. Ecpglib stores only C-string encoded data to 'data' member with ecpg_store_input. Of course, if user specifies length(*), ecpg_store_input strncpy() with the length. (*)len member of struct varchar_1 { int len; char arr[ DATA_SIZE ]; } # desctiptor_item has 'type' and 'length' member. But the above statement doesn't set # these fields because I think they should be set by user explicitly as the following: # exec sql set descriptor idesc value 1 length = 3; # I explain later that the above user statement is ignored in result. > /* exec sql execute ins_stmt using sql descriptor idesc; */ > { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_execute, "ins_stmt", > ECPGt_descriptor, "idesc", 1L, 1L, 1L, > ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EOIT, ECPGt_EORT); ecpg_build_params, the first step of ECPGdo, only strcpy() from descriptor_item.data to tobeinserted by ecpg_store_input because the input [struct variable] for ecpg_store_input is always set type='ECPGt_char'. descriptor_item.type and descriptor_item.length are always not used. # varcharsize member is set to value of strlen(descriptor_item.data) but it's ignored # by ecpg_store_input. In that flow, how user binary data is set to tobeinserted without C-string encoding? The premise are the followings: - The length information set by user must be inform upto ecpg_build_params. - The media of the length information from ECPGset_desc to ECPGdo is only [struct descriptor_item]. My Idea-1 in the previous mail is that: - ECPGset_desc copies whole of the struct(*) to descriptor_item.data and sets type information to descriptor_item.is_binary. (*)bytea_a { int len; char arr[DATA_SIZE]; } - ecpg_build_params calls ecpg_store_input for the descriptor_item.data just as the folowing input variable. execute sql insert into foo values(:binary_var); My Idea-2 is that: - ECPGset_desc copies data to descriptor_item.data, set the length to dscriptor_item.data_len and set type information to descriptor_item.is_binary. - ecpg_build_params only memcpy as folowing without ecpg_store_input: if (descriptor_item.is_binary) memcpy(&tobeinserted, descriptor_item.data, descriptor_item.data_len) Thank you. Ryo Matsumura > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Tuesday, February 12, 2019 11:06 PM > To: Matsumura, Ryo/松村 量 <matsumura.ryo@jp.fujitsu.com> > Cc: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@jp.fujitsu.com>; > pgsql-hackers@lists.postgresql.org > Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG > > Matsumura-san, > > > I try to explain as follows. I would like to receive your comment. > > ... > > I'm afraid I don't really understand your explanation. Why is handling > a bytea so different from handling a varchar? I can see some > differences due to its binary nature, but I do not understand why it > needs so much special handling for stuff like its length? There is a > length field in the structure but instead of using it the data field is > used to store both, the length and the data. What am I missing? > > Please keep in mind that I did not write the descriptor code, so I may > very well not see the obvious. > > Michael > -- > Michael Meskes > Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) > Meskes at (Debian|Postgresql) dot Org > Jabber: michael at xmpp dot meskes dot org > VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL > >
Matsumura-san, > Current architecture: > Internal expression of varchar is C-string that includes length > information implicitly > because the length can be computed by strlen(). > ECPGdo(ecpg_build_params) assumes that the data in descriptor is C- > string encoded. > > In other hand, bytea data is binary that doesn't include any length > information. > And the merit of my proposal is that bytea data can be sent to > backend without > C-string encodeing overhead. They are different points from varchar. Yes, I agree with this. But it does not explain why we cannot just add a length parameter. And it neither explains why we need so many if (!bytea) { thisandthat } else { somethingelse } blocks. I would prefer the integration to be smoother. Hopefully that is possible. > My Idea-2 is that: > - ECPGset_desc copies data to descriptor_item.data, set the length to > dscriptor_item.data_len and set type information to > descriptor_item.is_binary. > - ecpg_build_params only memcpy as folowing without ecpg_store_input: > > if (descriptor_item.is_binary) > memcpy(&tobeinserted, descriptor_item.data, > descriptor_item.data_len) Isn't that a better way then? This looks more smoothly to me. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san > Yes, I agree with this. But it does not explain why we cannot just add > a length parameter. And it neither explains why we need so many if > (!bytea) { thisandthat } else { somethingelse } blocks. I would prefer > the integration to be smoother. Hopefully that is possible. I agree that the special route is ugly, but I cannot remove them completely. I try to implement Idea-2. In same time, I try to move if(bytea) blocks to new function for readability. e.g. move the following to new function set_data_attr(). if (var->type != ECPGt_bytea) desc_item->is_binary = false; else { struct ECPGgeneric_varchar *variable = (struct ECPGgeneric_varchar *) (var->value); desc_item->is_binary = true; desc_item->data_len = variable->len; } ecpg_free(desc_item->data); desc_item->data = (char *) tobeinserted; Regards Ryo Matsumura > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Wednesday, February 13, 2019 9:09 PM > To: Matsumura, Ryo/松村 量 <matsumura.ryo@jp.fujitsu.com> > Cc: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@jp.fujitsu.com>; > pgsql-hackers@lists.postgresql.org > Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG > > Matsumura-san, > > > Current architecture: > > Internal expression of varchar is C-string that includes length > > information implicitly > > because the length can be computed by strlen(). > > ECPGdo(ecpg_build_params) assumes that the data in descriptor is C- > > string encoded. > > > > In other hand, bytea data is binary that doesn't include any length > > information. > > And the merit of my proposal is that bytea data can be sent to > > backend without > > C-string encodeing overhead. They are different points from varchar. > > Yes, I agree with this. But it does not explain why we cannot just add > a length parameter. And it neither explains why we need so many if > (!bytea) { thisandthat } else { somethingelse } blocks. I would prefer > the integration to be smoother. Hopefully that is possible. > > > My Idea-2 is that: > > - ECPGset_desc copies data to descriptor_item.data, set the length to > > dscriptor_item.data_len and set type information to > > descriptor_item.is_binary. > > - ecpg_build_params only memcpy as folowing without ecpg_store_input: > > > > if (descriptor_item.is_binary) > > memcpy(&tobeinserted, descriptor_item.data, > > descriptor_item.data_len) > > Isn't that a better way then? This looks more smoothly to me. > > Michael > -- > Michael Meskes > Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) > Meskes at (Debian|Postgresql) dot Org > Jabber: michael at xmpp dot meskes dot org > VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL > >
Matsumura-san, > I agree that the special route is ugly, but I cannot remove them > completely. > I try to implement Idea-2. In same time, I try to move if(bytea) > blocks to > new function for readability. > > e.g. move the following to new function set_data_attr(). I don't think this will help much, don't bother. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san I attach a new patch. - ECPGset_desc and ecpg_build_params are changed. - implement Idea-2. - get if(is_binary)-block out from main flow to new functions (set_desc_attr, store_input_from_desc). Regards Ryo Matsumura
Attachment
Matsumura-san, > I attach a new patch. > ... Thank you so much. This looks very good. Committed to HEAD. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Meskes-san > This looks very good. Committed to HEAD. Thank you so match. Regards Ryo Matsumura