Thread: ECGP - varchar in struct?
I looked at comp.databases.postgresql.questions and posted a brief form of this question there. I am porting an existing Oracle/ PRO-C Application over to PostgreSQL 7.2.1 (ecpg 2.9.0). This application takes advantage of the fact that PRO-C lets you use 'varchar' as part of 'struct' definitions by re-writing 'varchar' during INCLUDE or DECLARE processing either one. This application can (and presently does) make a lot of use of: struct rowdef { ... /* Column */ varchar nameOFcolumn[maxlen] ; ... /* Column */ }; to define an interface buffer. PRO-C will re-write this, during EXEC SQL INCLUDE processing, as: struct rowdef { ... /* Column */ struct {int len; char arr[maxlen];} nameOFcolumn; ... /* Column */ }; This makes it possible to easily declare an I/O buffer that can hold many rows: EXEC SQL BEGIN DECLARE SECTION; struct rowdef ManyRows[25]; EXEC SQL END DECLARE SECTION; and code can easily refer to members of this buffer: /* Length column instance 7 */ ManyRows [7].nameOFcolumn.len; /* Address of column instance 7 */ ManyRows [7].nameOFcolumn.arr; or send them to other compile units: foo (ManyRows + 7); can run 'foo' on row 7 without 'foo' needing to be able to see the ManyRows DECLARE. The 'foo' compile unit only needs the INCLUDE to provide the 'rowdef' struct layout, since INCLUDE processing takes care of the C re-write that a varchar always needs. ==== It would appear that ecpg will only re-write a varchar if it is found in a DECLARE statement, because the varchar gets past the ecpg preprocessor's INCLUDE as-is ... and, of course, the real C compiler does not know what a 'varchar' is, so it complains. As a PostgreSQL 'newbie', I am not sure if it is more work to fix ecpg or to re-write how the application does things. I see it is a 'yacc' unit that handles the re-writing that ecpg now does, but I do not know 'yacc-ese' and do not know what else would be involved to fix ecpg to work like PRO-C? - Did I miss something here? - Any insights/ advice from the gurus? - Would it be a really big deal to support varchar re-writes during INCLUDE and not just during DECLARE?
On Fri, Jul 26, 2002 at 09:38:25PM -0400, William West wrote: > It would appear that ecpg will only re-write a > varchar if it is found in a DECLARE statement, > because the varchar gets past the ecpg preprocessor's > INCLUDE as-is ... and, of course, the real C compiler > does not know what a 'varchar' is, so it complains. Yes, ecpg only parses stuff inside the declare section. A full syntax checking mode would be a nice add-on (and is listed as todo) but IMO it is not really needed urgently. > As a PostgreSQL 'newbie', I am not sure if it is more > work to fix ecpg or to re-write how the application > does things. I see it is a 'yacc' unit that handles the > re-writing that ecpg now does, but I do not know > 'yacc-ese' and do not know what else would be involved > to fix ecpg to work like PRO-C? Yes, it's quite some work. :-) But if you volunteer I gladly accept patches. :-) > - Did I miss something here? Yes, at first you can list you struct definition inside the declare section and it will work nicely. And second you can use a typedef command that AFAIK Pro*C does not have. This should help you without much work. > varchar re-writes during INCLUDE and not > just during DECLARE? Actually it does not have anything to do with INCLUDE or I misunderstood your problem comletely. Hope this helps. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
>> As a PostgreSQL 'newbie', I am not sure if it is more >> work to fix ecpg or to re-write how the application >> does things. I see it is a 'yacc' unit that handles the >> re-writing that ecpg now does, but I do not know >> 'yacc-ese' and do not know what else would be involved >> to fix ecpg to work like PRO-C? > Yes, it's quite some work. :-) But if you volunteer I gladly accept > patches. :-) The 'encapsulated C' approach would require few changes to Pro-C based application, and would make it easier to support both Oracle and PostgreSQL from the same code-base, if ecpg had a capabilities closer to Pro-C's. But I am leery of the time it would take for me to continue with that more general and portable approach (first upgrade ecpg and then port the application) ... so I have sort of decided to convert to using the libpq (or maybe libpqeasy + libpq) instead ... at least then it is only one project (albeit much bigger than at first expected). Perhaps later, when I have mastered the libpq interface, I'll reconsider taking you up on that, since the 'encapsulated' approach really would be better from a maintenance point of view :-) >> - Did I miss something here? > Yes, at first you can list you struct definition inside the declare > section and it will work nicely. I need for the struct to define a shape and to be able to *separately* instantiate instances (or arrays-of-instances) of that shape. As I understand it, a DECLARE section will instantiate an instance of the struct each time I INCLUDE it in a compile unit, and I am not sure how I would declare a multiple-rows array? To clarify what I am faced with: The existing application has many different structs, each of which has members for the columns affected by any of a number of different realtime message-driven update accesses. That is, there are a couple of dozen different messages, each needing to revise different columns of the tables defined (one of which has in excess of 60 columns). There are also 'update all' messages that need to INSERT or UPDATE into *all* columns of the 60+ columns table. Each message is backed by a struct that has storage shapes for the column(s) to be updated by that message. The functions that perform message data-to-struct (or data-from-struct) operations are in different compile units than are the functions that issue the SQL operations that exchange data with the database. The application currently does pass-by-reference operations among many functions across many separate compile units. Of course, every compile unit having to DECLARE the struct will give each compile unit its own *instance* of the struct and won't allow the application to do the SQL to load/ unload the struct from/to the database in one function in one compile unit and then pass by reference to/ from other functions in other compile units (as the application currently does). In addition, the application processes some messages that work with many rows from a given table all at once. Being unable to separate 'row shapes' (struct layouts) from instantiation thereof (DECLARE thereof) appears to make it difficult or impossible to instantiate front-end space for arrays-of-rows. It also appears to make it impossible to have ecpg issue SELECTS/ INSERTS/ UPDATES that affect many rows as well as columns with each front-end to back-end SQL exchange (each EXEC SQL) ... something that the present application does a *lot* of and that - at least with Oracle - gives an order of magnitude performance improvement. It appears to me that the libpq layer front-end to back-end interface is able to pass an open-ended number of columns and rows per PQexec() call ... that the problem would be a vectoring/ mapping definition capable of properly delivering PQgetvalue() returns into stuct members. That is why, unless I am still missing something, I am inclined to convert to a direct-to-libpq interface (or maybe a hybrid of libpgeasy plus libpq). Does that seem reasonable, given the present organization of the application?
On Sun, Aug 11, 2002 at 05:14:07PM -0400, William West wrote: > The 'encapsulated C' approach would require few changes to > Pro-C based application, and would make it easier to support > both Oracle and PostgreSQL from the same code-base, if ecpg > had a capabilities closer to Pro-C's. But moving to ECPG's encapsulated approach does not prevent the app from running under Pro*C, i.e. Pro*C can also work with everything listed inside the declare section etc. > But I am leery of the time it would take for me to continue with > that more general and portable approach (first upgrade ecpg and > then port the application) ... so I have sort of decided to convert > to using the libpq (or maybe libpqeasy + libpq) instead ... at > least then it is only one project (albeit much bigger than at > first expected). This certainly will be much more time to port the app. If you have a Pro*C app, porting it to ECPG will almost definitely be less work than proting to libpq. > I need for the struct to define a shape and to be able to > *separately* instantiate instances (or arrays-of-instances) > of that shape. But you can add the definition for each instance. Yes, C doesn't require this, but it does not forbid it either. Also you can use a typedef to define the struct so ecpg does the job of adding the definition. > As I understand it, a DECLARE section will instantiate an > instance of the struct each time I INCLUDE it in a compile unit, What I mean is instaed of struct foo {...}; struct foo a; struct foo b; you can use exec sql begin declare section; struct foo {...} a; exec sql end declare section; exec sql begin declare section; struct foo {...} b; exec sql end declare section; or instead exec sql typedef sf struct foo {...}; exec sql begin declare section; sf a; exec sql end declare section; exec sql begin declare section; sf b; exec sql end declare section; > and I am not sure how I would declare a multiple-rows array? I'm not sure what you mean with this. Also I have to admit that I didn't completely understand your mail in terms of seeing the problem that you cannot solve with ecpg. Maybe a code example would help me understand it better. > It also appears to make it impossible to have ecpg issue SELECTS/ INSERTS/ > UPDATES that affect many rows as well as columns with each front-end to > back-end SQL exchange (each EXEC SQL) ... something that the present > application does a *lot* of and that - at least with Oracle - gives an > order of > magnitude performance improvement. Once again I do not understand that. Do you mean for instance one select that reads several tuples at the same time? > It appears to me that the libpq layer front-end to back-end interface > is able to pass an open-ended number of columns and rows per PQexec() > call ... that the problem would be a vectoring/ mapping definition capable > of > properly delivering PQgetvalue() returns into stuct members. But that's exactly what ecpg does. It just encapsulates the libpq calls. It seems I did not fully understand your question nor answer it accordingly. :-) Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
I really appreciate your time. I know a lot of what I am saying must sound silly, due to how low on the learning curve I am. > exec sql typedef sf struct foo {...}; > > exec sql begin declare section; > sf a; > exec sql end declare section; This looks promising for what I am trying to do. Is 'exec sql typedef' ecpg specific? I do not remember seeing it called out as anything special in the books I have looked at? I have looked at Bruce Momjian's "PostgreSQL - Introduction and Concepts" book, the Geshwinde-Shonig "Developers Handbook" put out by SAMS, and the 'PostgreSQL 7.2 Programmers Guide' from the postgesql.org website. Anyway, I tried: exec sql typedef sf struct foo {int intOne; int intTwo}; and I get: ERROR: parse error, unexpected IDENT at or near "typedef" from ecpg? I am using ecpg 2.9.0 (Postgres 7.2.1). =============== > exec sql begin declare section; > struct foo {...} a; > exec sql end declare section; I can see that I have a very fundamental understanding problem or misconception of some kind, in how one can/ should use structures inside of an ecpg declare section, and appreciate your patience (and suppose I am giving you and others a good laugh). I know I am being prejudiced by what a structure inside a declare section means to Pro*C, which is "here is the layout of some columns that I intend to get from or write to a row". It seems that ecpg does not have any such understanding about the meaning of a struct and its members and that is (I'm sure) one of my biggest problems. To see what I am up against, I tried to make a struct defining two varchar columns, per your example, but then I could not for the life of me figure out how to use the struct. Here is what I setup: exec sql begin declare section; struct foo {varchar col1[15]; varchar col2[20];} bar; exec sql end declare section; and then made a simple two varchar columns table: exec sql create table mytable (col1 varchar(15), col2 varchar(20)); and then checked that this gave me a new row containing Literal1 in col1 and Literal2 in col2: exec sql insert into mytable values ('Literal1', 'Literal2'); That worked, so then I tried to make ecpg work the way Pro*C works (bear with me). I setup 'bar' in the declarations section by: sprintf(bar.col1.arr, "Variable1") bar.col1.len = strlen(bar.col1.arr); sprintf(bar.col2.len, "Variable1") bar.col2.len = strlen(bar.col2.arr); then tried to load a row with Variable1 in col1 and Variable2 in col2 by: exec sql insert into mytable values (:bar); Pro*C would take this to mean to put the present values of bar.col1 and bar.col2 into columns 1 and 2 of a new row added to mytable. This builds cleanly in ecpg, seems to emit a sensible ECPGdo() call, but gives "-201, too many arguments" at run time. I can't figure out what the library wants ... ECPGdo() is emitted with two vectors, one to each of the struct members? I then tried to figure out how to mention the struct members individually in the values() clause: exec sql insert into mytable values (:bar.col1.arr, :bar.col2.arr); But this causes ecpg to issue the error "'bar.col1.arr' is not declared". I can't figure out what ecpg would want, to be able to refer to the individual members of the struct? ============================= So as you can see, I surely have a some kind of very big gap in my understanding of how one would/ could use structures in the declare section of ecpg in ways that even remotely resemble how the existing Pro*C based application now uses structures in the declare section. To make my job even harder, Pro*C uses its understanding of "struct is row"/ "member of struct is column" in order to support arrays-of-rows exchanges between the front-end and the back-end. Pro*C supports an "EXEC SQL FOR :batchsize" construct that allows you to INSERT, UPDATE, or DELETE many rows with one "EXEC SQL". The current application, to exploit this capability, is full of declarations that are roughly equivalent to: exec sql begin declare section; struct foo {varchar col1[15]; varchar col2[20];} bar [ROWS]; exec sql end declare section; where 'ROWS' establishes the maximum batch size. The Pro*C understanding of the meaning of a structure in the declare section (as signifying a row) and its members (as signifying a column) is fundamental to being able to declare space for batched operations via C-Language array declaration syntax. Claims have been made that this can increase an Oracle application's performance by a factor of 100 or more. Indeed, with our application, using a modest ':batchsize' (ROWS value), we observe an increase in performance of a factor of 10 or more over one front-end/ back-end exchange per insert/ delete/ update. I know this is already *VERY* long, and if you got this far, thanks. If you have not lost patience, here is a "simplified as much as possible" example of how the application is presently organized, in an attempt to show why the application needs for 'row shaping' to be independent of 'declare space for row': /* begin row_shape.h */ typedef struct { varchar col1 [15]; ... varchar coln [140]; } row_shape_t; /* end row shape .h */ /* begin multi_row_insert.pc */ === DB INTERFACE COMPILE UNIT === #define ROWS_N 50 exec sql include row_shape.h int multi_row_insert (msg_t *msg_p) { row_shape_t *row_p; exec sql begin declare section; int row_n; row_shape_t rowray [ROWS_N]; exec sql end declare section; for (row_n = 0, row_p = rowray; row_n < ROWS_N; row_n++, row_p++) { if (!load_row_from_msg (msg_p, row_p, row_n)) break; } exec sql begin transaction; exec sql whenever sqlerror goto errexit; /* inserts 'row_n' rows into "table_in_question" */ exec sql for :row_n insertinto table_in_question values (:rowray); exec sql commit work; return row_n; errexit: exec sql rollback work; return -1; } /* end multi_row_insert.pc */ === MESSAGE HANDLING COMPILE UNIT === /* begin load_row_from_msg.pc */ exec sql include row_shape.h /* returns TRUE if message loaded another row_shape_t */ /* returns FALSE if message has no more rows */ bool load_row_from_msg (msg_t *msg_p, row_shape_t *row_p, int row_n) { /* Message indicates how many rows it has in msg_p->rows_n */ if (msg_p->rows_n < row_n) return FALSE; memset (row_p, 0, sizeof (*row_p)); /* Transfers message data for each column of the * current row into the current row_shape_t */ memcpy(row_p->col1.arr, msg_p[row_n]col1_data, msg_p[row_n]col1_data_len); row_p->col1.len = msg_p[row_n]col1_data_len); .... memcpy(row_p->coln.arr, msg_p[row_n]coln_data, msg_p[row_n]coln_data_len); row_p->coln.len = msg_p[row_n]coln_data_len); return TRUE } /* end load_row_from_msg.pc */ ============ Lack of the "exec sql for :batchsize" capability is a big problem to overcome, and inability to treat structs as 'row shapes' is another big problem to overcome ... partly because of the inability to pass rows by reference across different compile units and partly because there is no longer any way to clear all columns in row (prior to insert) by using 'memset()'. The combined effect of no longer being able to pass-by-reference, needing to perform one front-end to back-end exchange for each row insert/ update/ delete, and needing to clear each and every column buffer separately puts us in danger of being unable to handle the necessary realtime message arrival rates. This fear of the effects on performance, along with the scope of application-internal interface changes that seem to be needed, is what is making me inclined to consider, much as I hate it, going to libpq and/ or libpqeasy directly. I really do need to get going due to deadline pressures, and would hate to make a dumb/ bad decision about the basic approach just because of something I am missing or do not correctly understand.
On Mon, Aug 12, 2002 at 09:53:03PM -0400, William West wrote: > I really appreciate your time. I know a lot of what I am saying > must sound silly, due to how low on the learning curve I am. Actually it sounds very reasonable, I just don't fully understand it. > > exec sql typedef sf struct foo {...}; Oops. This is incorrect syntax, must read: exec sql type sf is struct foo {...}; > This looks promising for what I am trying to do. Is > 'exec sql typedef' ecpg specific? I do not remember seeing > it called out as anything special in the books I have looked at? Informix has it also. > exec sql typedef sf struct foo {int intOne; int intTwo}; > > and I get: > > ERROR: parse error, unexpected IDENT at or near "typedef" Sure. My fault. Should be "exec sql type ... is ...;" > Here is what I setup: > ... > exec sql insert into mytable values (:bar); Hmm, that should work. Could you please send me the complete file? > Pro*C would take this to mean to put the present values of > bar.col1 and bar.col2 into columns 1 and 2 of a new row > added to mytable. This builds cleanly in ecpg, seems to emit > a sensible ECPGdo() call, but gives "-201, too many arguments" > at run time. I can't figure out what the library wants ... ECPGdo() > is emitted with two vectors, one to each of the struct members? Yes, that should be the correct ECPGdo() call. > I then tried to figure out how to mention the struct members > individually in the values() clause: > > exec sql insert into mytable values (:bar.col1.arr, :bar.col2.arr); > > But this causes ecpg to issue the error "'bar.col1.arr' is not > declared". I can't figure out what ecpg would want, to be > able to refer to the individual members of the struct? To get them individually you have to use: exec sql insert into mytable values (:bar.col1, :bar.col2); > where 'ROWS' establishes the maximum batch size. The Pro*C > understanding of the meaning of a structure in the declare section > (as signifying a row) and its members (as signifying a column) is > fundamental to being able to declare space for batched operations > via C-Language array declaration syntax. You can use arrays of struct with ecpg as well. Please look into ecpg/test/*.pgc for some examples. > Lack of the "exec sql for :batchsize" capability is a big problem to > overcome, Hmm, looks like a good idea to implement in ecpg. > This fear of the effects on performance, along with the scope of > application-internal > interface changes that seem to be needed, is what is making me inclined to > consider, > much as I hate it, going to libpq and/ or libpqeasy directly. Of course there may be reasons to use libpq, but I will certainly work on minimizing these. As soon as I learn about a missing feature and find time I will implement it. Now that won't help you of course. :-) Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
I have worked out a method to export references to C-Variables in a DECLARE section such that I can pass pointer-to-struct for the 'ecpg' compile unit to other compile units that do message and communications processing (as the original Oracle based package now does). I did this by writing a tool to parse the CREATE TABLE psql SQL directives that setup the database tables, to emit a C-Language structure definition into one file that non-ecpg compile units can '#include', while emitting a separate ecpg syntax DECLARE statement into another file that the ecpg compile unit can EXEC SQL INCLUDE; the two are guaranteed to be synchronized with each other because they both come from the same CREATE TABLE statement via the tool. ==== However, I am having a problem with the SET arguments to UPDATE. I have a DECLARE section that looks like this: EXEC SQL BEGIN DECLARE SECTION; static struct hcs_status_info_dd { varchar h_f242a_opsts_3c [3]; } H_hcs_status_info; EXEC SQL END DECLARE SECTION; which ecpg translates to: /* exec sql begin declare section */ static struct hcs_status_info_dd { struct varchar_h_f242a_opsts_3c { int len; char arr[ 3 ]; } h_f242a_opsts_3c ; } H_hcs_status_info ; /* exec sql end declare section */ (In the real system, the structure has a much larger number of members, because the external message communication and processing compile units need to access a much larger number of columns). ecpg does all the things I wanted for SELECT, INSERT, etc ... but I cannot get the 'SET' arguments needed for UPDATE to work. I have an UPDATE statement that looks like this (again, simplified because the 'real'; system has *many* members-of-structure needing to be SET): EXEC SQL UPDATE hcs_status_info SET f242a_opsts_3c = :H_hcs_status_info.h_f242a_opsts_3c; ecpg translates this (chopped a little to isolate the variable references) into: { ECPGdo(__LINE__, NULL, "update hcs_status_info set f242a_opsts_3c = ? ", ECPGt_varchar, &(H_hcs_status_info.h_f242a_opsts_3c), 3L,1L, sizeof(struct varchar_H_hcs_status_info.h_f242a_opsts_3c), ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EOIT, ECPGt_EORT);} The "&(H_hcs_status_info.h_f242a_opsts_3c)" reference is exactly correct to get to the 'f242a_opsts_3c' column value. But I get an error from gcc (rightly so) because it does not like "sizeof(struct varchar_H_hcs_status_info.h_f242a_opsts_3c)" because the whole structure is not a vachar, only the member-of-structure is a varchar. In fact, the struct name "varchar_H_hcs_status_info" does *not* exist, it is the struct name "varchar_h_f242a_opsts_3c" that *does* exist. I believe the expression (to be correct for C) would need to read something like: "sizeof(struct varchar_h_f242a_opsts_3c)" -or- "sizeof (H_hcs_status_info.h_f242a_opsts_3c)" -?or?- ==== Is there any approach that can get around this? Note that it is far too impractical to make those varchar members separate variables, due to the need to 'lasso' them all into a structure, so the address of the structure can be passed around to other compile units that do communications and message processing. ==== I have attached a tarbun that has the entire .pgc, the '.c' file that I get from ecpg, and a makefile file (upd_hcs_tst.m) that runs .pcg through ecpg, and then gcc the .c file. (See attached file: upd_tst.tar.gz)
Attachment
On Tue, Aug 27, 2002 at 08:11:58PM -0400, William West wrote: > However, I am having a problem with the SET > arguments to UPDATE. > ... This appears to be a bug in ecpg. It should only list the part behind the dot. But I won't be able to look for it for at least another week sorry. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!