Thread: Proposed new libpq API
I've been thinking about what changes are necessary to the libpq interface to support returning variable type tuples. This was discussed a number of months back but an exact interface wasn't nailed down. Let me then put forward the following suggestion open for comment. The suggestion is very similar to the original postgres solution to this problem. What I have added is some consideration of how a streaming interface should work, and hopefully I will incorporate that enhancement while I'm at it. Into libpq will be (re)introduced the concept of a group. Tuples which are returned will be from a finite number of different layouts. Thus there will be an API PQnfieldsGroup(PGresult, group_num). And similar for PQftypeGroup etc. There will be a PQgroup(PGresult, tuple_num) which will tell you which group any given tuple belongs to. To support streaming of results a new function PQflush(PGresult, tuple_num) would be introduced. It discards previous results that are cached. PQexec would be changed so that it doesn't absorb the full result set straight away like it does now (*). Instead it would only absorb results on a need to basis when calling say PQgetValue. Currently you might read results like this... PGresult *res = PQexec("select * from foo"); for (int i = 0; i < PQntuples(res); i++) { printf("%s\n", PQgetValue(res, i, 0); } It has the disadvantage that all the results are kept in memory at once. This code would in the future be modified to be... PGresult *res = PQexec("select * from foo"); for (int i = 0; i < PQntuples(res); i++) { printf("%s\n", PQgetValue(res, i, 0); PQflush(res) // NEW NEW } Now PQexec doesn't absorb all the results at once. PQgetValue will read them on a need-to basis. PQflush will discard each result through the loop. I could also write... PGresult *res = PQexec("select * from foo"); for (int i = 0; i < PQntuples(res); i++) { printf("%s\n", PQgetValue(res, i, 0); if (i % 20) { PQflush(res, -1) } } In this case the results are cached in chunks of 20. Or I could write... PGresult *res = PQexec("select * from foo"); for (int i = 0; i < PQntuples(res); i++) { printf("%s\n", PQgetValue(res, i, 0); PQflush(res, i-20) } In this case the last 20 tuples are kept in memory in any one time as a sliding window. If I try to access something out of range of the current cache I get a NULL result. Back to the multiple tuple return types issue. psql code may do something like... int currentGroup = -1, group; PGresult *res = PQexec(someQuery); for (int i = 0; i < PQntuples(res); i++) { group = PQgroup(res, i); if (group != currentGroup) printHeaders(res, group);} currentGroup = group; for (j = 0; j < PQnfieldsGroup(res, group); j++) { printf("%s |", PQgetValue(res, i, j);} printf("\n"); PQflush(res) } printHeaders(PGresult *res, int group) { for (j = 0; j < PQnfieldsGroup(res, group); j++) { printf("%s |", PQfnameGroup(res,group)); } printf("\n"); } This would print different result types with appropriate headers... create table a (aa text); create table b under a (bb text); select ** from a; aa | ---- foo jar aa | bb ------- bar|baz boo|bon (*) Assuming that this doesn't unduly affect current behaviour. I can't see that it would, but if it would another API would be needed PQexecStream.
Follow up: Where it says... PGresult *res = PQexec("select * from foo"); for (int i = 0; i < PQntuples(res); i++) { printf("%s\n", PQgetValue(res, i, 0); PQflush(res) // NEW NEW } It should say... PGresult *res = PQexec("select * from foo"); for (int i = 0; i < PQntuples(res); i++) { printf("%s\n", PQgetValue(res, i, 0); PQflush(res, -1) // NEW NEW } The -1 argument signifying "flush everything". A specific number signifying "flush everything below this threshold", where the threshold is a tuple number.
Chris Bitmead wrote: > > I've been thinking about what changes are necessary to the libpq > interface to support returning variable type tuples. This was > discussed a number of months back but an exact interface wasn't nailed > down. Let me propose an additional possible solution for the most common case needing to return multiple types of tuples, the case of select ** -- just have a tupletype for each tuple, possibly as an implies field and return NULL for missing fields (returning nulls is cheap - each only occupies one bit) so that SELECT user UNION SELECT nextval('myseq'); would return a result with the following structure type() | user (text) | nextval(int) ----------------------------------- t1 | postgres | NULL t2 | NULL | 1 such way of returning tuples could possibly make also non-OO folks happy as the result will still be table-shaped ;) > Let me then put forward the following suggestion open for comment. The > suggestion is very similar to the original postgres solution to this > problem. What I have added is some consideration of how a streaming > interface should work, and hopefully I will incorporate that > enhancement while I'm at it. > > Into libpq will be (re)introduced the concept of a group. Tuples which > are returned will be from a finite number of different layouts. > > Thus there will be an API PQnfieldsGroup(PGresult, group_num). And > similar for PQftypeGroup etc. There will be a PQgroup(PGresult, > tuple_num) which will tell you which group any given tuple belongs to. Seems good ;). Will the group carry only structurte or will it have some "higher" meaning - i.e. will rows selected form two different tables with the same structure be in the same group ? ---------- Hannu
Hannu Krosing wrote: > Let me propose an additional possible solution for the most common case > needing to return multiple types of tuples, the case of select ** -- > just > have a tupletype for each tuple, possibly as an implies field and return > NULL > for missing fields (returning nulls is cheap - each only occupies one > bit) > so that > > SELECT user > UNION > SELECT nextval('myseq'); > > would return a result with the following structure > > type() | user (text) | nextval(int) > ----------------------------------- > t1 | postgres | NULL > t2 | NULL | 1 > > such way of returning tuples could possibly make also non-OO folks happy > as the result will still be table-shaped ;) What is the essence of your suggestion? The libpq interface, the protocol or the formatting for psql? The main problem I can see with the way your idea is going, is that if a class has a few dozen subclasses, each with a few dozen fields, you could end up with a couple of thousand resulting columns. That and it doesn't seem very OO. > Will the group carry only structurte or will it have some "higher" > meaning - > i.e. will rows selected form two different tables with the same > structure > be in the same group ? That is the one thing in my mind I'm not certain of. At the moment I will say that aspect is undefined. Hopefully a clearer answer will emerge once it is actually working. -- Chris Bitmead mailto:chris@bitmead.com http://www.techphoto.org - Photography News, Stuff that Matters
Okay, first thing off the top of my head ... how does this deal with backward compatibility, or have we just blown all old apps out to fhte water? On Wed, 5 Jul 2000, Chris Bitmead wrote: > I've been thinking about what changes are necessary to the libpq > interface to support returning variable type tuples. This was > discussed a number of months back but an exact interface wasn't nailed > down. > > Let me then put forward the following suggestion open for comment. The > suggestion is very similar to the original postgres solution to this > problem. What I have added is some consideration of how a streaming > interface should work, and hopefully I will incorporate that > enhancement while I'm at it. > > Into libpq will be (re)introduced the concept of a group. Tuples which > are returned will be from a finite number of different layouts. > > Thus there will be an API PQnfieldsGroup(PGresult, group_num). And > similar for PQftypeGroup etc. There will be a PQgroup(PGresult, > tuple_num) which will tell you which group any given tuple belongs to. > > To support streaming of results a new function PQflush(PGresult, > tuple_num) would > be introduced. It discards previous results that are cached. PQexec > would be changed so that it doesn't absorb the full result set > straight away like it does now (*). Instead it would only absorb > results on a need to basis when calling say PQgetValue. > > Currently you might read results like this... > > PGresult *res = PQexec("select * from foo"); > for (int i = 0; i < PQntuples(res); i++) { > printf("%s\n", PQgetValue(res, i, 0); > } > > It has the disadvantage that all the results are kept in memory at > once. This code would in the future be modified to be... > > PGresult *res = PQexec("select * from foo"); > for (int i = 0; i < PQntuples(res); i++) { > printf("%s\n", PQgetValue(res, i, 0); > PQflush(res) // NEW NEW > } > > Now PQexec doesn't absorb all the results at once. PQgetValue will > read them on a need-to basis. PQflush will discard each result through > the loop. > > I could also write... > > PGresult *res = PQexec("select * from foo"); > for (int i = 0; i < PQntuples(res); i++) { > printf("%s\n", PQgetValue(res, i, 0); > if (i % 20) { > PQflush(res, -1) > } > } > > In this case the results are cached in chunks of 20. Or I could write... > > PGresult *res = PQexec("select * from foo"); > for (int i = 0; i < PQntuples(res); i++) { > printf("%s\n", PQgetValue(res, i, 0); > PQflush(res, i-20) > } > > In this case the last 20 tuples are kept in memory in any one time as > a sliding window. If I try to access something out of range of the > current cache I get a NULL result. > > Back to the multiple tuple return types issue. psql code may do > something like... > > int currentGroup = -1, group; > PGresult *res = PQexec(someQuery); > for (int i = 0; i < PQntuples(res); i++) { > group = PQgroup(res, i); > if (group != currentGroup) > printHeaders(res, group); > } > currentGroup = group; > for (j = 0; j < PQnfieldsGroup(res, group); j++) { > printf("%s |", PQgetValue(res, i, j); > } > printf("\n"); > PQflush(res) > } > > printHeaders(PGresult *res, int group) { > for (j = 0; j < PQnfieldsGroup(res, group); j++) { > printf("%s |", PQfnameGroup(res, group)); > } > printf("\n"); > } > > This would print different result types with appropriate headers... > create table a (aa text); > create table b under a (bb text); > select ** from a; > aa | > ---- > foo > jar > > aa | bb > ------- > bar|baz > boo|bon > > (*) Assuming that this doesn't unduly affect current behaviour. I > can't see that it would, but if it would another API would be needed > PQexecStream. > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker wrote: > > Okay, first thing off the top of my head ... how does this deal with > backward compatibility, or have we just blown all old apps out to fhte > water? There's no issue with compatibility, unless you can see one. It's all backwards compatible. > > On Wed, 5 Jul 2000, Chris Bitmead wrote: > > > I've been thinking about what changes are necessary to the libpq > > interface to support returning variable type tuples. This was > > discussed a number of months back but an exact interface wasn't nailed > > down. > > > > Let me then put forward the following suggestion open for comment. The > > suggestion is very similar to the original postgres solution to this > > problem. What I have added is some consideration of how a streaming > > interface should work, and hopefully I will incorporate that > > enhancement while I'm at it. > > > > Into libpq will be (re)introduced the concept of a group. Tuples which > > are returned will be from a finite number of different layouts. > > > > Thus there will be an API PQnfieldsGroup(PGresult, group_num). And > > similar for PQftypeGroup etc. There will be a PQgroup(PGresult, > > tuple_num) which will tell you which group any given tuple belongs to. > > > > To support streaming of results a new function PQflush(PGresult, > > tuple_num) would > > be introduced. It discards previous results that are cached. PQexec > > would be changed so that it doesn't absorb the full result set > > straight away like it does now (*). Instead it would only absorb > > results on a need to basis when calling say PQgetValue. > > > > Currently you might read results like this... > > > > PGresult *res = PQexec("select * from foo"); > > for (int i = 0; i < PQntuples(res); i++) { > > printf("%s\n", PQgetValue(res, i, 0); > > } > > > > It has the disadvantage that all the results are kept in memory at > > once. This code would in the future be modified to be... > > > > PGresult *res = PQexec("select * from foo"); > > for (int i = 0; i < PQntuples(res); i++) { > > printf("%s\n", PQgetValue(res, i, 0); > > PQflush(res) // NEW NEW > > } > > > > Now PQexec doesn't absorb all the results at once. PQgetValue will > > read them on a need-to basis. PQflush will discard each result through > > the loop. > > > > I could also write... > > > > PGresult *res = PQexec("select * from foo"); > > for (int i = 0; i < PQntuples(res); i++) { > > printf("%s\n", PQgetValue(res, i, 0); > > if (i % 20) { > > PQflush(res, -1) > > } > > } > > > > In this case the results are cached in chunks of 20. Or I could write... > > > > PGresult *res = PQexec("select * from foo"); > > for (int i = 0; i < PQntuples(res); i++) { > > printf("%s\n", PQgetValue(res, i, 0); > > PQflush(res, i-20) > > } > > > > In this case the last 20 tuples are kept in memory in any one time as > > a sliding window. If I try to access something out of range of the > > current cache I get a NULL result. > > > > Back to the multiple tuple return types issue. psql code may do > > something like... > > > > int currentGroup = -1, group; > > PGresult *res = PQexec(someQuery); > > for (int i = 0; i < PQntuples(res); i++) { > > group = PQgroup(res, i); > > if (group != currentGroup) > > printHeaders(res, group); > > } > > currentGroup = group; > > for (j = 0; j < PQnfieldsGroup(res, group); j++) { > > printf("%s |", PQgetValue(res, i, j); > > } > > printf("\n"); > > PQflush(res) > > } > > > > printHeaders(PGresult *res, int group) { > > for (j = 0; j < PQnfieldsGroup(res, group); j++) { > > printf("%s |", PQfnameGroup(res, group)); > > } > > printf("\n"); > > } > > > > This would print different result types with appropriate headers... > > create table a (aa text); > > create table b under a (bb text); > > select ** from a; > > aa | > > ---- > > foo > > jar > > > > aa | bb > > ------- > > bar|baz > > boo|bon > > > > (*) Assuming that this doesn't unduly affect current behaviour. I > > can't see that it would, but if it would another API would be needed > > PQexecStream. > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Chris Bitmead wrote: > > Hannu Krosing wrote: > > > Let me propose an additional possible solution for the most common case > > needing to return multiple types of tuples, the case of select ** -- > > just > > have a tupletype for each tuple, possibly as an implies field and return > > NULL > > for missing fields (returning nulls is cheap - each only occupies one > > bit) > > so that > > > > SELECT user > > UNION > > SELECT nextval('myseq'); > > > > would return a result with the following structure > > > > type() | user (text) | nextval(int) > > ----------------------------------- > > t1 | postgres | NULL > > t2 | NULL | 1 > > > > such way of returning tuples could possibly make also non-OO folks happy > > as the result will still be table-shaped ;) > > What is the essence of your suggestion? The libpq interface, the > protocol or the formatting for psql? I was hoping it to cover all of them, but it may not be that simple on closer ispection ;( > The main problem I can see with the way your idea is going, is that if a > class has a few dozen subclasses, each with a few dozen fields, you > could end up with a couple of thousand resulting columns. Yes. In fact I will end up with that number anyway, only that each tuple does not have all of them in case of returning multiple types of tuples. I still insist that the _overhead_ from returning such colums is quite small as each null is only one _bit_ > That and it doesn't seem very OO. no, it does not, unless we pretend that what "SELECT **" returns is all superobjects which in fact do have all the NULL fields, only they have value NULL :) otoh, doing things that way could "hide" the OO-ness from tools that don't like it. ------- BTW, how does one subscribe to pgsql-oo@postgresql.org list ? I tried, but my response mail said something like "processing your subscription successful, you are _NOT_ subscribed to list" I got the same result with other new lists ;( ---------- Hannu
The Hermit Hacker wrote: > > Okay, first thing off the top of my head ... how does this deal with > backward compatibility, or have we just blown all old apps out to fhte > water? As I see it it is compatible until you _need_ the new features, like SELECT ** or UNION of selects with different structure. I have also posted a different/additional proposal that hopefully hides this from old apps completely. --------- Hannu
On Wed, 5 Jul 2000, Chris Bitmead wrote: > The Hermit Hacker wrote: > > > > Okay, first thing off the top of my head ... how does this deal with > > backward compatibility, or have we just blown all old apps out to fhte > > water? > > There's no issue with compatibility, unless you can see one. It's all > backwards compatible. Okay, I'm definitely missing something then ... > > > Currently you might read results like this... > > > > > > PGresult *res = PQexec("select * from foo"); > > > for (int i = 0; i < PQntuples(res); i++) { > > > printf("%s\n", PQgetValue(res, i, 0); > > > } > > > > > > It has the disadvantage that all the results are kept in memory at > > > once. This code would in the future be modified to be... > > > > > > PGresult *res = PQexec("select * from foo"); > > > for (int i = 0; i < PQntuples(res); i++) { > > > printf("%s\n", PQgetValue(res, i, 0); > > > PQflush(res) // NEW NEW > > > } > > > What is the PQflush() for here? I took it to mean that it was required, but then reading further down, it just sounds like it flushs what's already been used and would be optional? Doesn't this just do what CURSORs already do then? Run the query, fetch what you need, etc?
> Okay, first thing off the top of my head ... how does this deal with > backward compatibility, or have we just blown all old apps out to fhte > water? We have a long tradition of blowing old apps out of the water :) As Chris mentioned, his suggestion closely resembles code which used to be in libpq but was excised due to lack of interest at the time. With the upcoming "query tree redesign", the concept of collections of "tuple sets" will (hopefully) become more clear in the backend, to support, say, distributed databases and also Chris' OO interests. - Thomas
The Hermit Hacker wrote: > What is the PQflush() for here? I took it to mean that it was required, > but then reading further down, it just sounds like it flushs what's > already been used and would be optional? > > Doesn't this just do what CURSORs already do then? Run the query, fetch > what you need, etc? There is similarity to cursors, but there is no need to go to the trouble of using a cursor to get a lot of the benefits which is that you don't have to slurp it all into memory at once. I believe this is how most DBMS interfaces work, like MySQL, you can only fetch the next record, you can't get random access to the whole result set. This means memory usage is very small. Postgres memory usage will be huge. It shouldn't be necessary to resort to cursors to scale. So what PQflush is proposed to do is limit the amount that is cached. It discards earlier results. If you flush after every sequential access then you only have to use enough memory for a single record. If you use PQflush you no longer have random access to earlier results. Other designs are possible, like some interface for getting the next record one at a time and examining it. The idea of this proposal is to make the current random access interface and a streaming interface very interoperable and be able to mix and match them together. You can take a current postgres app, and provided it doesn't actually rely on random access, which I would hazard to say most don't, and just by adding the one line of code PQflush greatly reduce memory consumption. Or you can mix and match and see a sliding window of the most recent X tuples. Or you can just ignore this and use the current features.
On Thu, 6 Jul 2000, Chris Bitmead wrote: > The Hermit Hacker wrote: > > > What is the PQflush() for here? I took it to mean that it was required, > > but then reading further down, it just sounds like it flushs what's > > already been used and would be optional? > > > > Doesn't this just do what CURSORs already do then? Run the query, fetch > > what you need, etc? > > There is similarity to cursors, but there is no need to go to the > trouble of using a cursor to get a lot of the benefits which is that you > don't have to slurp it all into memory at once. I believe this is how > most DBMS interfaces work, like MySQL, you can only fetch the next > record, you can't get random access to the whole result set. This means > memory usage is very small. Postgres memory usage will be huge. It > shouldn't be necessary to resort to cursors to scale. > > So what PQflush is proposed to do is limit the amount that is cached. It > discards earlier results. If you flush after every sequential access > then you only have to use enough memory for a single record. If you use > PQflush you no longer have random access to earlier results. > > Other designs are possible, like some interface for getting the next > record one at a time and examining it. The idea of this proposal is to > make the current random access interface and a streaming interface very > interoperable and be able to mix and match them together. You can take a > current postgres app, and provided it doesn't actually rely on random > access, which I would hazard to say most don't, and just by adding the > one line of code PQflush greatly reduce memory consumption. Or you can > mix and match and see a sliding window of the most recent X tuples. Or > you can just ignore this and use the current features. Okay, just playing devil's advocate here, that's all ... not against the changes, just want to make sure that all bases are covered ... One last comment .. when you say 'random access', are you saying that I can't do a 'PQexec()' to get the results for a SELECT, use a for loop to go through those results, and then start from i=0 to go through that loop again without having to do a new SELECT on it? > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker wrote: > One last comment .. when you say 'random access', are you saying that I > can't do a 'PQexec()' to get the results for a SELECT, use a for loop to > go through those results, and then start from i=0 to go through that loop > again without having to do a new SELECT on it? Random access means that the whole query result is in memory. If you choose to use PQflush then you will no longer be able to go back to 0 and re-iterate. If you don't use PQflush then you can do what you do now which is go back and iterate through. If you use PQflush it means that you don't need to do that.
If I were implementing this in C++, I would have the result object return a different generic STL iterator (forward, random access, etc.) depending on how I wanted to access the data. Perhaps you could emulate this in C. I generally don't like the one-interface-fits-all approach; you get a much cleaner and extensible interface if you introduce a type for each class of behavior being modeled. T. -- Timothy H. Keitt National Center for Ecological Analysis and Synthesis 735 State Street, Suite 300, Santa Barbara, CA 93101 Phone: 805-892-2519, FAX: 805-892-2510 http://www.nceas.ucsb.edu/~keitt/
"Timothy H. Keitt" wrote: > > If I were implementing this in C++, I would have the result object > return a different generic STL iterator (forward, random access, etc.) > depending on how I wanted to access the data. Perhaps you could emulate > this in C. I generally don't like the one-interface-fits-all approach; > you get a much cleaner and extensible interface if you introduce a type > for each class of behavior being modeled. If we want to relagate the current API to the status of "legacy", and build something all-new and well thought out, then this could be done. I'd certainly be willing to do this, but what is the consensus? If I came up with something completely different but better would the rest of the team be happy to make the current interface legacy? Or do we want a compromise (like what Peter Eisentraut suggests perhaps), or do we want something that slots into the current world view with minimum disruption? (what I have suggested).
On Thu, 6 Jul 2000, Chris Bitmead wrote: > "Timothy H. Keitt" wrote: > > > > If I were implementing this in C++, I would have the result object > > return a different generic STL iterator (forward, random access, etc.) > > depending on how I wanted to access the data. Perhaps you could emulate > > this in C. I generally don't like the one-interface-fits-all approach; > > you get a much cleaner and extensible interface if you introduce a type > > for each class of behavior being modeled. > > If we want to relagate the current API to the status of "legacy", and > build something all-new and well thought out, then this could be done. > I'd certainly be willing to do this, but what is the consensus? If I > came up with something completely different but better would the rest of > the team be happy to make the current interface legacy? Or do we want a > compromise (like what Peter Eisentraut suggests perhaps), or do we want > something that slots into the current world view with minimum > disruption? (what I have suggested). Could we create some sort of libpq2? Maintain libpq for a release or two and then slow fade it out? Or maybe have a configure switch (--enable-libpq-compat)?
Chris Bitmead wrote: > > "Timothy H. Keitt" wrote: > > > > If I were implementing this in C++, I would have the result object > > return a different generic STL iterator (forward, random access, etc.) > > depending on how I wanted to access the data. Perhaps you could emulate > > this in C. I generally don't like the one-interface-fits-all approach; > > you get a much cleaner and extensible interface if you introduce a type > > for each class of behavior being modeled. > > If we want to relagate the current API to the status of "legacy", and > build something all-new and well thought out, then this could be done. > I'd certainly be willing to do this, but what is the consensus? If I > came up with something completely different but better would the rest of > the team be happy to make the current interface legacy? Or do we want a > compromise (like what Peter Eisentraut suggests perhaps), or do we want > something that slots into the current world view with minimum > disruption? (what I have suggested). Being designed to be extensible RDBMS, postgres should IMHO also support multiple protocol modules. I would like one that follows standard CLI/ODBC/JDBC conventions, also XML-RPC based one would be nice. We could do it by giving the requested protocol at connection startup and then talking to that backend module afretwards, or we could have different protocols listening on different ports. ----------- Hannu
On Thu, 6 Jul 2000, Chris Bitmead wrote: > The Hermit Hacker wrote: > > > One last comment .. when you say 'random access', are you saying that I > > can't do a 'PQexec()' to get the results for a SELECT, use a for loop to > > go through those results, and then start from i=0 to go through that loop > > again without having to do a new SELECT on it? > > Random access means that the whole query result is in memory. If you > choose to use PQflush then you will no longer be able to go back to 0 > and re-iterate. If you don't use PQflush then you can do what you do now > which is go back and iterate through. If you use PQflush it means that > you don't need to do that. Okay, that sounds cool ... since nobody does the PQflush() during a for() iteration now (I dont' believe), then old apps are fine, and this does add a nice level of functionality as far as memory usage is concerned ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org