Thread: ECPG FETCH readahead

Re: ECPG FETCH readahead

From

Bruce Momjian

Date:

23 June 2010, 17:42:46

Boszormenyi Zoltan wrote:
> Hi,
> 
> we improved ECPG quite a lot in 9.0 because we worked and
> still working with an Informix to PostgreSQL migration project.
> 
> We came across a pretty big performance problem that can be seen in
> every "naive" application that uses only FETCH 1, FETCH RELATIVE
> or FETCH ABSOLUTE. These are almost the only FETCH variations
> usable in Informix, i.e. it doesn't have the grammar for fetching N rows
> at once. Instead, the Client SDK libraries do caching themselves
> behind the scenes to reduce network turnaround time.

I assume our ecpg version supports >1 fetch values, even in Informix
mode.  Does it make sense to add lots of code to our ecpg then?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +

Re: ECPG FETCH readahead

From

Böszörményi Zoltán

Date:

24 June 2010, 04:27:26

Hi,

2010-06-23 22:42 keltezéssel, Bruce Momjian írta:
> Boszormenyi Zoltan wrote:
>    
>> Hi,
>>
>> we improved ECPG quite a lot in 9.0 because we worked and
>> still working with an Informix to PostgreSQL migration project.
>>
>> We came across a pretty big performance problem that can be seen in
>> every "naive" application that uses only FETCH 1, FETCH RELATIVE
>> or FETCH ABSOLUTE. These are almost the only FETCH variations
>> usable in Informix, i.e. it doesn't have the grammar for fetching N rows
>> at once. Instead, the Client SDK libraries do caching themselves
>> behind the scenes to reduce network turnaround time.
>>      
> I assume our ecpg version supports>1 fetch values, even in Informix
> mode.  Does it make sense to add lots of code to our ecpg then?
>    

I think, yes, it does make sense. Because we are talking
about porting a whole lot of COBOL applications.
The ESQL/C or ECPG connector was already written
the Informix quirks in mind, so it fetches only one record
at a time passing it to the application. And similar performance
is expected from ECPG - which excpectation is not fulfilled
currently because libecpg doesn't do the same caching as
ESQL/C does.

And FYI, I haven't added a whole lot of code, most of the
code changes in the patch is execute.c refactoring.
ECPGdo() was split into several functions, the new parts
are still doing the same things.

I can make the test case much smaller, I only needed
to test crossing the readahead window boundary.
This would also make the patch much smaller.

And this readahead is not on by default, it's only activated
by "ecpg -r fetch_readahead".

Best regards,
Zoltán Böszörményi

Re: ECPG FETCH readahead

From

Heikki Linnakangas

Date:

24 June 2010, 06:04:44

On 24/06/10 10:27, Böszörményi Zoltán wrote:
> And this readahead is not on by default, it's only activated
> by "ecpg -r fetch_readahead".

Is there a reason not to enable it by default? I'm a bit worried that it 
will receive no testing if it's not always on.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: ECPG FETCH readahead

From

Böszörményi Zoltán

Date:

24 June 2010, 06:27:19

2010-06-24 11:04 keltezéssel, Heikki Linnakangas írta:
> On 24/06/10 10:27, Böszörményi Zoltán wrote:
>> And this readahead is not on by default, it's only activated
>> by "ecpg -r fetch_readahead".
>
> Is there a reason not to enable it by default? I'm a bit worried that 
> it will receive no testing if it's not always on.

Because in the first step I wanted to minimize the impact
on regression test stderr results. This is what I mentioned
in the initial mail, I stuck to the original wording of ecpg_log()
messages in the split-up parts of the original ECPGdo() and
ecpg_execute() exactly for this reason. The usual policy for
ecpg_log() is to report the function name where it was issued.

I was also thinking about a new feature for pg_regress,
to compare stdout results of two regression tests automatically
so a difference can be reported as an error. It would be good
for automated testing of features in ECPG that can be toggled,
like auto-prepare and fetch readahead. It might come in handy
in other subsystems, too.

Best regards,
Zoltán Böszörményi

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

24 June 2010, 09:15:55

On Wed, Jun 23, 2010 at 04:42:37PM -0400, Bruce Momjian wrote:
> I assume our ecpg version supports >1 fetch values, even in Informix
> mode.  Does it make sense to add lots of code to our ecpg then?

Yes, it does. The big question that zoltan and I haven't figured out yet, is
whether it makes sense to add all this even for native ecpg mode.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber meskes@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

24 June 2010, 09:19:00

> I think, yes, it does make sense. Because we are talking
> about porting a whole lot of COBOL applications.

COBOL???

> The ESQL/C or ECPG connector was already written
> the Informix quirks in mind, so it fetches only one record
> at a time passing it to the application. And similar performance
> is expected from ECPG - which excpectation is not fulfilled
> currently because libecpg doesn't do the same caching as
> ESQL/C does.

Eh, you are talking about a program you wrote for your customer or they wrote
themselves, right? I simply refuse to add this stuff only to fix this situation
for that one customer of yours if it only hits them. Now the thing to discuss
is how common is this situation.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber meskes@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

24 June 2010, 09:19:32

On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
> Is there a reason not to enable it by default? I'm a bit worried
> that it will receive no testing if it's not always on.

Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.

Michael

--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber meskes@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

PostgreSQL - Hans-Jürgen Schönig

Date:

24 June 2010, 09:54:13

On Jun 24, 2010, at 2:13 PM, Michael Meskes wrote:

>> I think, yes, it does make sense. Because we are talking
>> about porting a whole lot of COBOL applications.
>
> COBOL???
>

yes, COBOL :).
it is much more common than people think.
it is not the first COBOL request for PostgreSQL hitting my desk.
in our concrete example we are using a C module written with ECPG which is magically attached to tons of COBOL code ...

>> The ESQL/C or ECPG connector was already written
>> the Informix quirks in mind, so it fetches only one record
>> at a time passing it to the application. And similar performance
>> is expected from ECPG - which excpectation is not fulfilled
>> currently because libecpg doesn't do the same caching as
>> ESQL/C does.
>
> Eh, you are talking about a program you wrote for your customer or they wrote
> themselves, right? I simply refuse to add this stuff only to fix this situation
> for that one customer of yours if it only hits them. Now the thing to discuss
> is how common is this situation.
>
> Michael

i think that this cursor issue is a pretty common thing for many codes.
people are usually not aware of the fact that network round trips and parsing which are naturally related to "FETCH 1"
area lot more expensive than fetching one row somewhere deep inside the DB engine.  
out there there are many applications which fetch data row by row. if an app fetches data row by row in PostgreSQL it
willbe A LOT slower than in, say, Informix because most commercial database clients will cache data inside a cursor
behindthe scenes to avoid the problem we try to solve. 

currently we are talking about a performance penalty of factor 5 or so. so - it is not a small thing; it is a big
difference.
regards,
    hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

Re: ECPG FETCH readahead

From

Böszörményi Zoltán

Date:

24 June 2010, 10:10:37

2010-06-24 14:13 keltezéssel, Michael Meskes írta:
>> I think, yes, it does make sense. Because we are talking
>> about porting a whole lot of COBOL applications.
>>      
> COBOL???
>    

Yes, OpenCOBOL...

>> The ESQL/C or ECPG connector was already written
>> the Informix quirks in mind, so it fetches only one record
>> at a time passing it to the application. And similar performance
>> is expected from ECPG - which excpectation is not fulfilled
>> currently because libecpg doesn't do the same caching as
>> ESQL/C does.
>>      
> Eh, you are talking about a program you wrote for your customer or they wrote
> themselves, right? I simply refuse to add this stuff only to fix this situation
> for that one customer of yours if it only hits them. Now the thing to discuss
> is how common is this situation.
>    

The OpenCOBOL database connector was written by them
but the problem is more generic. There are many "naive"
applications (elsewhere, too) using cursors but fetching
one record at a time perhaps for portability reasons.
This patch provides a big performance boost for those.

Best regards,
Zoltán Böszörményi

Re: ECPG FETCH readahead

From

Robert Haas

Date:

14 October 2010, 06:43:20

On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
> On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
>> Is there a reason not to enable it by default? I'm a bit worried
>> that it will receive no testing if it's not always on.
>
> Yes, there is a reason, namely that you don't need it in native mode at all.
> ECPG can read as many records as you want in one go, something ESQL/C
> apparently cannot do. This patch is a workaround for that restriction. I still
> do not really see that this feature gives us an advantage in native mode
> though.

So, who has the next action on this patch?  Does Zoltan need to revise
it, or does Michael need to review it, or where are we?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

14 October 2010, 07:12:42

Hi,

Robert Haas írta:
> On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
>   
>> On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
>>     
>>> Is there a reason not to enable it by default? I'm a bit worried
>>> that it will receive no testing if it's not always on.
>>>       
>> Yes, there is a reason, namely that you don't need it in native mode at all.
>> ECPG can read as many records as you want in one go, something ESQL/C
>> apparently cannot do. This patch is a workaround for that restriction. I still
>> do not really see that this feature gives us an advantage in native mode
>> though.
>>     
>
> So, who has the next action on this patch?  Does Zoltan need to revise
> it, or does Michael need to review it, or where are we?
>   

Michael reviewed it shortly in private and I need to send
a new revision anyway, regardless of his comments.
I will refresh it ASAP.

Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de    http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

29 December 2011, 05:46:53

Hi,

2011-11-16 20:51 keltezéssel, Boszormenyi Zoltan írta:

2010-10-14 11:56 keltezéssel, Boszormenyi Zoltan írta:

Hi,

Robert Haas írta:

On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:

On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:

Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.

Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.

So, who has the next action on this patch?  Does Zoltan need to revise
it, or does Michael need to review it, or where are we?

Michael reviewed it shortly in private and I need to send
a new revision anyway, regardless of his comments.
I will refresh it ASAP.

The ASAP took a little long. The attached patch is in git diff format,
because (1) the regression test intentionally doesn't do ECPGdebug()
so the patch isn't dominated by a 2MB stderr file, so this file is empty
and (2) regular diff cannot cope with empty new files.

Anyway, here is the new version with a new feature.
Implementation details:

- New ecpglib/cursor.c handles the client-side accounting of cursors.
- Functions in ecpglib/execute.c are split into smaller functions so useful operations can be used by the new cursor.c
- ecpg -r fetch_readahead enables readahead by default for all cursors.
- Default readahead size is 256 rows, it can be modified by an environment variable, ECPGFETCHSZ.
- *NEW FEATURE* Readahead can be individually enabled or disabled by ECPG-side grammar: DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ... Without [ NO ] READAHEAD, the default behaviour is used for cursors.
- Since the server and the client may disagree on the cursor position if readahead is used, ECPG preprocessor throws an error if WHERE CURRENT OF is used on such cursors.
- The default assumed behaviour of cursors with readahead is NO SCROLL. If you want readahead and backward scrolling, SCROLL keyword is mandatory.

The regression test creates a table with 513 rows, so it spans 2 full and
1 incomplete readahead window. It reads the table with two cursors, one
with readahead and one without by 1 records forward and backward.
This is repeated with reading 5 records at a time. Then the whole table is
read into a chain of sqlda structures forward and backward. All other
regression tests pass as well.

The original regression tests also pass with these changes, the split of
execute.c was risky in this regard. Now the split is done more cleanly
than in the previous version, the file is not as rearranged as before.

New patch attached against yesterday's GIT tree. Changes:
- documented the new ECPG_INVALID_CURSOR error code
- consistently free everything in error paths in cursor.c


Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de    http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-9.2-git-v5.patch

Re: ECPG FETCH readahead

From

Noah Misch

Date:

02 March 2012, 12:41:33

On Thu, Dec 29, 2011 at 10:46:23AM +0100, Boszormenyi Zoltan wrote:
> 2011-11-16 20:51 keltez?ssel, Boszormenyi Zoltan ?rta:
> > 2010-10-14 11:56 keltez?ssel, Boszormenyi Zoltan ?rta:
> >>> On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
> >>>> On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
> >>>>> Is there a reason not to enable it by default? I'm a bit worried
> >>>>> that it will receive no testing if it's not always on.
> >>>>>       
> >>>> Yes, there is a reason, namely that you don't need it in native mode at all.
> >>>> ECPG can read as many records as you want in one go, something ESQL/C
> >>>> apparently cannot do. This patch is a workaround for that restriction. I still
> >>>> do not really see that this feature gives us an advantage in native mode
> >>>> though.

We yet lack a consensus on whether native ECPG apps should have access to the
feature.  My 2c is to make it available, because it's useful syntactic sugar.
If your program independently processes each row of an arbitrary-length result
set, current facilities force you to add an extra outer loop to batch the
FETCHes for every such code site.  Applications could define macros to
abstract that pattern, but this seems common-enough to justify bespoke
handling.  Besides, minimalists already use libpq directly.

I suggest enabling the feature by default but drastically reducing the default
readahead chunk size from 256 to, say, 5.  That still reduces the FETCH round
trip overhead by 80%, but it's small enough not to attract pathological
behavior on a workload where each row is a 10 MiB document.  I would not offer
an ecpg-time option to disable the feature per se.  Instead, let the user set
the default chunk size at ecpg time.  A setting of 1 effectively disables the
feature, though one could later re-enable it with ECPGFETCHSZ.

> > The ASAP took a little long. The attached patch is in git diff format,
> > because (1) the regression test intentionally doesn't do ECPGdebug()
> > so the patch isn't dominated by a 2MB stderr file, so this file is empty
> > and (2) regular diff cannot cope with empty new files.

Avoid the empty file with fprintf(STDERR, "Dummy non-empty error output\n");

> > - *NEW FEATURE* Readahead can be individually enabled or disabled
> >   by ECPG-side grammar:
> >         DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ...
> >   Without [ NO ] READAHEAD, the default behaviour is used for cursors.

Likewise, this may as well take a chunk size rather than a yes/no.


The patch adds warnings:
error.c: In function `ecpg_raise':
error.c:281: warning: format not a string literal and no format arguments
error.c:281: warning: format not a string literal and no format arguments

The patch adds few comments and no larger comments explaining its higher-level
ideas.  That makes it much harder to review.  In this regard it follows the
tradition of the ECPG code, but let's depart from that tradition for new work.
I mention a few cases below where the need for commentary is acute.

I tested full reads of various "SELECT * FROM generate_series(1, $1)" commands
over a 50ms link, and the patch gives a sound performance improvement.  With
no readahead, a mere N=100 takes 5s to drain.  With readahead enabled, N=10000
takes only 3s.  Performance was quite similar to that of implementing my own
batching with "FETCH 256 FROM cur".  When I kicked up ECPGFETCHSZ (after
fixing its implementation -- see below) past the result set size, performance
was comparable to that of simply passing the query through psql.

> --- a/doc/src/sgml/ecpg.sgml
> +++ b/doc/src/sgml/ecpg.sgml

> @@ -5289,6 +5315,17 @@ while (1)
>      </varlistentry>
>  
>      <varlistentry>
> +     <term>-231 (<symbol>ECPG_INVALID_CURSOR</symbol>)</term>
> +     <listitem>
> +      <para>
> +       The cursor you are trying to use with readahead has not been opened yet (SQLSTATE 34000),
> +       invalid values were passed to libecpg (SQLSTATE 42P11) hor not in prerequisite state, i.e.

Typo.

> --- /dev/null
> +++ b/src/interfaces/ecpg/ecpglib/cursor.c
> @@ -0,0 +1,730 @@

cursor.c contains various >78-col lines.  pgindent has limited ability to
improve those, so please split them.

> +static struct cursor_descriptor *
> +add_cursor(int lineno, struct connection *con, const char *name, bool scrollable, int64 n_tuples, bool *existing)
> +{
> +    struct cursor_descriptor *desc,
> +                *ptr, *prev = NULL;
> +    bool    found = false;
> +
> +    if (!name || name[0] == '\0')
> +    {
> +        if (existing)
> +            *existing = false;
> +        return NULL;
> +    }
> +
> +    ptr = con->cursor_desc;
> +    while (ptr)
> +    {
> +        int ret = strcasecmp(ptr->name, name);
> +
> +        if (ret == 0)
> +        {
> +            found = true;
> +            break;
> +        }
> +        if (ret > 0)
> +            break;
> +
> +        prev = ptr;
> +        ptr = ptr->next;
> +    }

Any reason not to use find_cursor() here?

> +static void
> +del_cursor(struct connection *con, const char *name)
> +{
> +    struct cursor_descriptor *ptr, *prev = NULL;
> +    bool    found = false;
> +
> +    ptr = con->cursor_desc;
> +    while (ptr)
> +    {
> +        int ret = strcasecmp(ptr->name, name);
> +
> +        if (ret == 0)
> +        {
> +            found = true;
> +            break;
> +        }
> +        if (ret > 0)
> +            break;
> +
> +        prev = ptr;
> +        ptr = ptr->next;
> +    }

Any reason not to use find_cursor() here?

> +bool
> +ECPGopen(const int lineno, const int compat, const int force_indicator,
> +        const char *connection_name, const bool questionmarks,
> +        const char *curname, const int st, const char *query, ...)
> +{
> +    va_list        args;
> +    bool        ret, scrollable;
> +    char       *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0;
> +    struct sqlca_t *sqlca = ECPGget_sqlca();
> +
> +    if (!query)
> +    {
> +        ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
> +        return false;
> +    }
> +    ptr = strstr(query, "for ");
> +    if (!ptr)
> +    {
> +        ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
> +        return false;
> +    }
> +    whold = strstr(query, "with hold ");
> +    dollar0 = strstr(query, "$0");
> +
> +    noscroll = strstr(query, "no scroll ");
> +    scroll = strstr(query, "scroll ");

A query like 'SELECT 1 AS "with hold "' fools these lexical tests.  Capture
that information in the parser rather than attempting to reconstruct it here.

> +    scrollable = (noscroll == NULL) && (scroll != NULL) && (scroll < ptr);
> +
> +    new_query = ecpg_alloc(strlen(curname) + strlen(ptr) + (whold ? 10 : 0) + 32, lineno);
> +    if (!new_query)
> +        return false;
> +    sprintf(new_query, "declare %s %s cursor %s%s",
> +                    (dollar0 && (dollar0 < ptr) ? "$0" : curname),
> +                    (scrollable ? "scroll" : "no scroll"),
> +                    (whold ? "with hold " : ""),
> +                    ptr);
> +
> +    /* Set the fetch size the first time we are called. */
> +    if (fetch_size == 0)
> +    {
> +        char       *fsize_str = getenv("ECPGFETCHSZ");
> +        char       *endptr = NULL;
> +        int        fsize;
> +
> +        if (fsize_str)
> +        {
> +            fsize = strtoul(fsize_str, &endptr, 10);
> +            if (endptr || (fsize < 4))
> +                fetch_size = DEFAULTFETCHSIZE;

"endptr" will never be NULL; use "*endptr".  As it stands, the code always
ignores ECPGFETCHSZ.  An unusable ECPGFETCHSZ should procedure an error, not
silently give no effect.  Why a minimum of 4?

> +            else
> +                fetch_size = fsize;
> +        }
> +        else
> +            fetch_size = DEFAULTFETCHSIZE;
> +    }
> +
> +    va_start(args, query);
> +    ret = ecpg_do(lineno, compat, force_indicator, connection_name, questionmarks, st, new_query, args);
> +    va_end(args);
> +
> +    ecpg_free(new_query);
> +
> +    /*
> +     * If statement went OK, add the cursor and discover the
> +     * number of rows in the recordset. This will slow down OPEN
> +     * but we gain a lot with caching.
> +     */
> +    if (ret /* && sqlca->sqlerrd[2] == 0 */)

Why is the commented code there?

> +    {
> +        struct connection *con = ecpg_get_connection(connection_name);
> +        struct cursor_descriptor *cur;
> +        bool    existing;
> +        int64    n_tuples;
> +
> +        if (scrollable)
> +        {
> +            PGresult   *res;
> +            char       *query;
> +            char       *endptr = NULL;
> +
> +            query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno);
> +            sprintf(query, "move all in %s", curname);
> +            res = PQexec(con->connection, query);
> +            n_tuples = strtoull(PQcmdTuples(res), &endptr, 10);
> +            PQclear(res);
> +            ecpg_free(query);
> +
> +            /* Go back to the beginning of the resultset. */
> +            query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno);
> +            sprintf(query, "move absolute 0 in %s", curname);
> +            res = PQexec(con->connection, query);
> +            PQclear(res);
> +            ecpg_free(query);
> +        }
> +        else
> +        {
> +            n_tuples = 0;
> +        }

You give this rationale for the above code:

On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
> ECPGopen() also discovers the total number of records in the recordset,
> so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
> didn't report the (possibly estimated) number of rows in the resultset
> is now
> overcome. This slows down OPEN for cursors serving larger datasets
> but it makes possible to position the readahead window using MOVE
> ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
> variants are used by the application. And the caching is more than
> overweighs
> the slowdown in OPEN it seems.

From the documentation for Informix and Oracle, those databases do not
populate sqlerrd[2] this way:
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139

The performance impact will vary widely depending on the query cost per row
and the fraction of rows the application will actually retrieve.  Consider a
complex aggregate returning only a handful of rows.  Consider SELECT * on a
1B-row table with the application ceasing reads after 1000 rows.  Performance
aside, this will yield double execution of any volatile functions involved.
So, I think we ought to diligently avoid this step.  (Failing that, the
documentation must warn about the extra full cursor scan and this feature must
stay disabled by default.)

> +
> +        /* Add the cursor */
> +        cur = add_cursor(lineno, con, curname, scrollable, n_tuples, &existing);
> +
> +        /*
> +         * Report the number of tuples for the [scrollable] cursor.
> +         * The server didn't do it for us.
> +         */
> +        sqlca->sqlerrd[2] = (cur->n_tuples < LONG_MAX ? cur->n_tuples : LONG_MAX);
> +    }
> +
> +    return ret;
> +}
> +
> +static bool
> +ecpg_cursor_execute(struct statement * stmt, struct cursor_descriptor *cur)
> +{
> +    char        tmp[64];
> +    char       *query;
> +    int64        start_pos;
> +
> +    if ((cur->cache_pos >= cur->start_pos) && cur->res && (cur->cache_pos < cur->start_pos + PQntuples(cur->res)))
> +    {
> +        stmt->results = cur->res;
> +        ecpg_free_params(stmt, true, stmt->lineno);
> +        return true;
> +    }

Why does ecpg_cursor_execute() also call ecpg_free_params()?  Offhand, it
seems that ECPGfetch() always takes care of that and is the more appropriate
place, seeing it's the one calling ecpg_build_params().

> --- a/src/interfaces/ecpg/ecpglib/data.c
> +++ b/src/interfaces/ecpg/ecpglib/data.c
> @@ -120,7 +120,7 @@ check_special_value(char *ptr, double *retval, char **endptr)
>  }
>  
>  bool
> -ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno,
> +ecpg_get_data(const PGresult *results, int var_index, int act_tuple, int act_field, int lineno,
>                enum ECPGttype type, enum ECPGttype ind_type,
>                char *var, char *ind, long varcharsize, long offset,
>                long ind_offset, enum ARRAY_TYPE isarray, enum COMPAT_MODE compat, bool force_indicator)

This function could sure use a block comment such as would be customary in
src/backend.  Compare the one at heap_update(), for example.

> --- a/src/interfaces/ecpg/ecpglib/error.c
> +++ b/src/interfaces/ecpg/ecpglib/error.c
> @@ -268,6 +268,20 @@ ecpg_raise(int line, int code, const char *sqlstate, const char *str)
>                       ecpg_gettext("could not connect to database \"%s\" on line %d"), str, line);
>              break;
>  
> +        case ECPG_INVALID_CURSOR:
> +            if (strcmp(sqlstate, ECPG_SQLSTATE_OBJECT_NOT_IN_PREREQUISITE_STATE) == 0)
> +                snprintf(sqlca->sqlerrm.sqlerrmc, sizeof(sqlca->sqlerrm.sqlerrmc),
> +
> +            /*
> +             * translator: this string will be truncated at 149 characters
> +             * expanded.
> +             */
> +                ecpg_gettext("cursor can only scan forward"));

Every other message works in the line number somehow; this should do the same.

> --- a/src/interfaces/ecpg/ecpglib/execute.c
> +++ b/src/interfaces/ecpg/ecpglib/execute.c

> @@ -1707,46 +1809,20 @@ ecpg_execute(struct statement * stmt)
>  }
>  
>  bool
> -ECPGdo(const int lineno, const int compat, const int force_indicator, const char *connection_name, const bool
questionmarks,const int st, const char *query,...)
 
> +ecpg_do_prologue(int lineno, const int compat, const int force_indicator,
> +        const char *connection_name, const bool questionmarks,
> +        enum ECPG_statement_type statement_type, const char *query,
> +        va_list args, struct statement **stmt_out)

A block comment would especially help this function, considering the name
tells one so little.

> --- a/src/interfaces/ecpg/ecpglib/extern.h
> +++ b/src/interfaces/ecpg/ecpglib/extern.h
> @@ -60,6 +60,12 @@ struct statement
>      bool        questionmarks;
>      struct variable *inlist;
>      struct variable *outlist;
> +    char        *oldlocale;
> +    const char    **dollarzero;
> +    int        ndollarzero;
> +    const char    **param_values;
> +    int        nparams;
> +    PGresult    *results;
>  };

Please comment the members of this struct like we do in most of src/include.
dollarzero has something to do with dynamic cursor names, right?  Does it have
other roles?

> --- a/src/interfaces/ecpg/preproc/ecpg.header
> +++ b/src/interfaces/ecpg/preproc/ecpg.header

> @@ -111,6 +115,26 @@ mmerror(int error_code, enum errortype type, const char *error, ...)
>  }
>  
>  /*
> + * set use_fetch_readahead based on the current cursor
> + * doesn't return if the cursor is not declared
> + */
> +static void
> +set_cursor_readahead(const char *curname)
> +{
> +    struct cursor *ptr;
> +
> +    for (ptr = cur; ptr != NULL; ptr = ptr->next)
> +    {
> +        if (strcmp(ptr->name, curname) == 0)
> +            break;
> +    }
> +    if (!ptr)
> +        mmerror(PARSE_ERROR, ET_FATAL, "cursor \"%s\" does not exist", curname);
> +
> +    use_fetch_readahead = ptr->fetch_readahead;
> +}

Following add_additional_variables(), use strcasecmp() for literal cursor
names and strcmp() for cursor name host variables.

> --- a/src/interfaces/ecpg/preproc/extern.h
> +++ b/src/interfaces/ecpg/preproc/extern.h
> @@ -24,12 +24,17 @@ extern bool autocommit,
>              force_indicator,
>              questionmarks,
>              regression_mode,
> -            auto_prepare;
> +            auto_prepare,
> +            fetch_readahead;
> +extern bool    use_fetch_readahead;

The names of the last two variables don't make clear the difference between
them.  I suggest default_fetch_readahead and current_fetch_readahead.

> --- /dev/null
> +++ b/src/interfaces/ecpg/test/preproc/cursor-readahead.pgc
> @@ -0,0 +1,244 @@
> +#include <stdio.h>
> +#include <malloc.h>

Why <malloc.h>?  I only see this calling free(); use <stdlib.h> instead.

Thanks,
nm

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

04 March 2012, 11:33:54

Hi,

first, thank you for answering and for the review.

2012-03-02 17:41 keltezéssel, Noah Misch írta:
> On Thu, Dec 29, 2011 at 10:46:23AM +0100, Boszormenyi Zoltan wrote:
>> 2011-11-16 20:51 keltez?ssel, Boszormenyi Zoltan ?rta:
>>> 2010-10-14 11:56 keltez?ssel, Boszormenyi Zoltan ?rta:
>>>>> On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
>>>>>> On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
>>>>>>> Is there a reason not to enable it by default? I'm a bit worried
>>>>>>> that it will receive no testing if it's not always on.
>>>>>>>
>>>>>> Yes, there is a reason, namely that you don't need it in native mode at all.
>>>>>> ECPG can read as many records as you want in one go, something ESQL/C
>>>>>> apparently cannot do. This patch is a workaround for that restriction. I still
>>>>>> do not really see that this feature gives us an advantage in native mode
>>>>>> though.
> We yet lack a consensus on whether native ECPG apps should have access to the
> feature.

I don't even remember about any opinion on this matter.
So, at this point  don't know whether it's lack of interest.
We also have a saying "silence means agreement". :-)

>   My 2c is to make it available, because it's useful syntactic sugar.

Thanks, we thought the same.

> If your program independently processes each row of an arbitrary-length result
> set, current facilities force you to add an extra outer loop to batch the
> FETCHes for every such code site.  Applications could define macros to
> abstract that pattern, but this seems common-enough to justify bespoke
> handling.

We have similar opinions.

>   Besides, minimalists already use libpq directly.

Indeed. On the other hand, ECPG provides a safety net with syntax checking
so it's useful for not minimalist types. :-)

> I suggest enabling the feature by default but drastically reducing the default
> readahead chunk size from 256 to, say, 5.


>   That still reduces the FETCH round
> trip overhead by 80%, but it's small enough not to attract pathological
> behavior on a workload where each row is a 10 MiB document.

I see. How about 8? Nice "round" power of 2 value, still small and avoids
87.5% of overhead.

BTW, the default disabled behaviour was to avoid "make check" breakage,
see below.

>   I would not offer
> an ecpg-time option to disable the feature per se.  Instead, let the user set
> the default chunk size at ecpg time.  A setting of 1 effectively disables the
> feature, though one could later re-enable it with ECPGFETCHSZ.

This means all code previously going through ECPGdo() would go through
ECPGopen()/ECPGfetch()/ECPGclose(). This is more intrusive and all
regression tests that were only testing certain features would also
test the readahead feature, too.

Also, the test for WHERE CURRENT OF at ecpg time would have to be done
at runtime, possibly making previously working code fail if ECPGFETCHSZ is enabled.

How about still allowing "NO READAHEAD" cursors that compile into plain ECPGdo()?
This way, ECPGFETCHSZ don't interfere with WHERE CURRENT OF. But this would
mean code changes everywhere where WHERE CURRENT OF is used.

Or how about a new feature in the backend, so ECPG can do   UPDATE/DELETE ... WHERE OFFSET N OF cursor
and the offset of computed from the actual cursor position and the position known
by the application? This way an app can do readahead and do work on rows collected
by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
behind the scenes.

>
>>> The ASAP took a little long. The attached patch is in git diff format,
>>> because (1) the regression test intentionally doesn't do ECPGdebug()
>>> so the patch isn't dominated by a 2MB stderr file, so this file is empty
>>> and (2) regular diff cannot cope with empty new files.
> Avoid the empty file with fprintf(STDERR, "Dummy non-empty error output\n");

Fixed.

>
>>> - *NEW FEATURE* Readahead can be individually enabled or disabled
>>>   by ECPG-side grammar:
>>>         DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ...
>>>   Without [ NO ] READAHEAD, the default behaviour is used for cursors.
> Likewise, this may as well take a chunk size rather than a yes/no.

Done.

> The patch adds warnings:
> error.c: In function `ecpg_raise':
> error.c:281: warning: format not a string literal and no format arguments
> error.c:281: warning: format not a string literal and no format arguments

Fixed.

> The patch adds few comments and no larger comments explaining its higher-level
> ideas.  That makes it much harder to review.  In this regard it follows the
> tradition of the ECPG code, but let's depart from that tradition for new work.
> I mention a few cases below where the need for commentary is acute.

Understood. Adding comments as I go over that code again.

> I tested full reads of various "SELECT * FROM generate_series(1, $1)" commands
> over a 50ms link, and the patch gives a sound performance improvement.  With
> no readahead, a mere N=100 takes 5s to drain.  With readahead enabled, N=10000
> takes only 3s.  Performance was quite similar to that of implementing my own
> batching with "FETCH 256 FROM cur".  When I kicked up ECPGFETCHSZ (after
> fixing its implementation -- see below) past the result set size, performance
> was comparable to that of simply passing the query through psql.
>
>> --- a/doc/src/sgml/ecpg.sgml
>> +++ b/doc/src/sgml/ecpg.sgml
>> @@ -5289,6 +5315,17 @@ while (1)
>>      </varlistentry>
>>
>>      <varlistentry>
>> +     <term>-231 (<symbol>ECPG_INVALID_CURSOR</symbol>)</term>
>> +     <listitem>
>> +      <para>
>> +       The cursor you are trying to use with readahead has not been opened yet (SQLSTATE 34000),
>> +       invalid values were passed to libecpg (SQLSTATE 42P11) hor not in prerequisite state, i.e.
> Typo.

Fixed.

>
>> --- /dev/null
>> +++ b/src/interfaces/ecpg/ecpglib/cursor.c
>> @@ -0,0 +1,730 @@
> cursor.c contains various >78-col lines.  pgindent has limited ability to
> improve those, so please split them.
>
>> +static struct cursor_descriptor *
>> +add_cursor(int lineno, struct connection *con, const char *name, bool scrollable, int64 n_tuples, bool *existing)
>> +{
>> +    struct cursor_descriptor *desc,
>> +                *ptr, *prev = NULL;
>> +    bool    found = false;
>> +
>> +    if (!name || name[0] == '\0')
>> +    {
>> +        if (existing)
>> +            *existing = false;
>> +        return NULL;
>> +    }
>> +
>> +    ptr = con->cursor_desc;
>> +    while (ptr)
>> +    {
>> +        int ret = strcasecmp(ptr->name, name);
>> +
>> +        if (ret == 0)
>> +        {
>> +            found = true;
>> +            break;
>> +        }
>> +        if (ret > 0)
>> +            break;
>> +
>> +        prev = ptr;
>> +        ptr = ptr->next;
>> +    }
> Any reason not to use find_cursor() here?

Because both add_cursor() and del_cursor() needs the "prev" pointer.
I now modified find_cursor() and they use it.

>
>> +static void
>> +del_cursor(struct connection *con, const char *name)
>> +{
>> +    struct cursor_descriptor *ptr, *prev = NULL;
>> +    bool    found = false;
>> +
>> +    ptr = con->cursor_desc;
>> +    while (ptr)
>> +    {
>> +        int ret = strcasecmp(ptr->name, name);
>> +
>> +        if (ret == 0)
>> +        {
>> +            found = true;
>> +            break;
>> +        }
>> +        if (ret > 0)
>> +            break;
>> +
>> +        prev = ptr;
>> +        ptr = ptr->next;
>> +    }
> Any reason not to use find_cursor() here?
>
>> +bool
>> +ECPGopen(const int lineno, const int compat, const int force_indicator,
>> +        const char *connection_name, const bool questionmarks,
>> +        const char *curname, const int st, const char *query, ...)
>> +{
>> +    va_list        args;
>> +    bool        ret, scrollable;
>> +    char       *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0;
>> +    struct sqlca_t *sqlca = ECPGget_sqlca();
>> +
>> +    if (!query)
>> +    {
>> +        ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
>> +        return false;
>> +    }
>> +    ptr = strstr(query, "for ");
>> +    if (!ptr)
>> +    {
>> +        ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
>> +        return false;
>> +    }
>> +    whold = strstr(query, "with hold ");
>> +    dollar0 = strstr(query, "$0");
>> +
>> +    noscroll = strstr(query, "no scroll ");
>> +    scroll = strstr(query, "scroll ");
> A query like 'SELECT 1 AS "with hold "' fools these lexical tests.

But SELECT 1 AS "with hold" doesn't go through ECPGopen(), it's run by ECPGdo()
so no breakage there. ecpglib functions are not intended to be called from manually
constructed C code.

>   Capture
> that information in the parser rather than attempting to reconstruct it here.

Okay, this makes sense anyway.

>
>> +    scrollable = (noscroll == NULL) && (scroll != NULL) && (scroll < ptr);
>> +
>> +    new_query = ecpg_alloc(strlen(curname) + strlen(ptr) + (whold ? 10 : 0) + 32, lineno);
>> +    if (!new_query)
>> +        return false;
>> +    sprintf(new_query, "declare %s %s cursor %s%s",
>> +                    (dollar0 && (dollar0 < ptr) ? "$0" : curname),
>> +                    (scrollable ? "scroll" : "no scroll"),
>> +                    (whold ? "with hold " : ""),
>> +                    ptr);
>> +
>> +    /* Set the fetch size the first time we are called. */
>> +    if (fetch_size == 0)
>> +    {
>> +        char       *fsize_str = getenv("ECPGFETCHSZ");
>> +        char       *endptr = NULL;
>> +        int        fsize;
>> +
>> +        if (fsize_str)
>> +        {
>> +            fsize = strtoul(fsize_str, &endptr, 10);
>> +            if (endptr || (fsize < 4))
>> +                fetch_size = DEFAULTFETCHSIZE;
> "endptr" will never be NULL; use "*endptr".  As it stands, the code always
> ignores ECPGFETCHSZ.

You're right.

>   An unusable ECPGFETCHSZ should procedure an error, not
> silently give no effect.

Point taken. Which error handling do imagine? abort() or simply returning false
and raise and error in SQLCA?

>   Why a minimum of 4?

I forgot.

>
>> +            else
>> +                fetch_size = fsize;
>> +        }
>> +        else
>> +            fetch_size = DEFAULTFETCHSIZE;
>> +    }
>> +
>> +    va_start(args, query);
>> +    ret = ecpg_do(lineno, compat, force_indicator, connection_name, questionmarks, st, new_query, args);
>> +    va_end(args);
>> +
>> +    ecpg_free(new_query);
>> +
>> +    /*
>> +     * If statement went OK, add the cursor and discover the
>> +     * number of rows in the recordset. This will slow down OPEN
>> +     * but we gain a lot with caching.
>> +     */
>> +    if (ret /* && sqlca->sqlerrd[2] == 0 */)
> Why is the commented code there?

Some leftover from testing, it shouldn't be there.

>
>> +    {
>> +        struct connection *con = ecpg_get_connection(connection_name);
>> +        struct cursor_descriptor *cur;
>> +        bool    existing;
>> +        int64    n_tuples;
>> +
>> +        if (scrollable)
>> +        {
>> +            PGresult   *res;
>> +            char       *query;
>> +            char       *endptr = NULL;
>> +
>> +            query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno);
>> +            sprintf(query, "move all in %s", curname);
>> +            res = PQexec(con->connection, query);
>> +            n_tuples = strtoull(PQcmdTuples(res), &endptr, 10);
>> +            PQclear(res);
>> +            ecpg_free(query);
>> +
>> +            /* Go back to the beginning of the resultset. */
>> +            query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno);
>> +            sprintf(query, "move absolute 0 in %s", curname);
>> +            res = PQexec(con->connection, query);
>> +            PQclear(res);
>> +            ecpg_free(query);
>> +        }
>> +        else
>> +        {
>> +            n_tuples = 0;
>> +        }
> You give this rationale for the above code:
>
> On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
>> ECPGopen() also discovers the total number of records in the recordset,
>> so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
>> didn't report the (possibly estimated) number of rows in the resultset
>> is now
>> overcome. This slows down OPEN for cursors serving larger datasets
>> but it makes possible to position the readahead window using MOVE
>> ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
>> variants are used by the application. And the caching is more than
>> overweighs
>> the slowdown in OPEN it seems.
> From the documentation for Informix and Oracle, those databases do not
> populate sqlerrd[2] this way:
> http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
> http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139

The problem here is that Informix in the field in fact returns the number of rows
in the cursor and the customer we developed this readahead code for relied on this.
Maybe this was eliminated in newer versions of Informix to make it faster.

> The performance impact will vary widely depending on the query cost per row
> and the fraction of rows the application will actually retrieve.  Consider a
> complex aggregate returning only a handful of rows.

Indeed.

>   Consider SELECT * on a
> 1B-row table with the application ceasing reads after 1000 rows.  Performance
> aside, this will yield double execution of any volatile functions involved.
> So, I think we ought to diligently avoid this step.  (Failing that, the
> documentation must warn about the extra full cursor scan and this feature must
> stay disabled by default.)

OK, how about enabling it for Informix-compat mode only, or only via an
environment variable? I agree it should be documented.

>
>> +
>> +        /* Add the cursor */
>> +        cur = add_cursor(lineno, con, curname, scrollable, n_tuples, &existing);
>> +
>> +        /*
>> +         * Report the number of tuples for the [scrollable] cursor.
>> +         * The server didn't do it for us.
>> +         */
>> +        sqlca->sqlerrd[2] = (cur->n_tuples < LONG_MAX ? cur->n_tuples : LONG_MAX);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static bool
>> +ecpg_cursor_execute(struct statement * stmt, struct cursor_descriptor *cur)
>> +{
>> +    char        tmp[64];
>> +    char       *query;
>> +    int64        start_pos;
>> +
>> +    if ((cur->cache_pos >= cur->start_pos) && cur->res && (cur->cache_pos < cur->start_pos + PQntuples(cur->res)))
>> +    {
>> +        stmt->results = cur->res;
>> +        ecpg_free_params(stmt, true, stmt->lineno);
>> +        return true;
>> +    }
> Why does ecpg_cursor_execute() also call ecpg_free_params()?  Offhand, it
> seems that ECPGfetch() always takes care of that and is the more appropriate
> place, seeing it's the one calling ecpg_build_params().

I will look at it.

>> --- a/src/interfaces/ecpg/ecpglib/data.c
>> +++ b/src/interfaces/ecpg/ecpglib/data.c
>> @@ -120,7 +120,7 @@ check_special_value(char *ptr, double *retval, char **endptr)
>>  }
>>
>>  bool
>> -ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno,
>> +ecpg_get_data(const PGresult *results, int var_index, int act_tuple, int act_field, int lineno,
>>                enum ECPGttype type, enum ECPGttype ind_type,
>>                char *var, char *ind, long varcharsize, long offset,
>>                long ind_offset, enum ARRAY_TYPE isarray, enum COMPAT_MODE compat, bool force_indicator)
> This function could sure use a block comment such as would be customary in
> src/backend.  Compare the one at heap_update(), for example.

OK.

>
>> --- a/src/interfaces/ecpg/ecpglib/error.c
>> +++ b/src/interfaces/ecpg/ecpglib/error.c
>> @@ -268,6 +268,20 @@ ecpg_raise(int line, int code, const char *sqlstate, const char *str)
>>                       ecpg_gettext("could not connect to database \"%s\" on line %d"), str, line);
>>              break;
>>
>> +        case ECPG_INVALID_CURSOR:
>> +            if (strcmp(sqlstate, ECPG_SQLSTATE_OBJECT_NOT_IN_PREREQUISITE_STATE) == 0)
>> +                snprintf(sqlca->sqlerrm.sqlerrmc, sizeof(sqlca->sqlerrm.sqlerrmc),
>> +
>> +            /*
>> +             * translator: this string will be truncated at 149 characters
>> +             * expanded.
>> +             */
>> +                ecpg_gettext("cursor can only scan forward"));
> Every other message works in the line number somehow; this should do the same.

Fixed.

>
>> --- a/src/interfaces/ecpg/ecpglib/execute.c
>> +++ b/src/interfaces/ecpg/ecpglib/execute.c
>> @@ -1707,46 +1809,20 @@ ecpg_execute(struct statement * stmt)
>>  }
>>
>>  bool
>> -ECPGdo(const int lineno, const int compat, const int force_indicator, const char *connection_name, const bool
questionmarks,const int st, const char *query,...) 
>> +ecpg_do_prologue(int lineno, const int compat, const int force_indicator,
>> +        const char *connection_name, const bool questionmarks,
>> +        enum ECPG_statement_type statement_type, const char *query,
>> +        va_list args, struct statement **stmt_out)
> A block comment would especially help this function, considering the name
> tells one so little.

OK.

>
>> --- a/src/interfaces/ecpg/ecpglib/extern.h
>> +++ b/src/interfaces/ecpg/ecpglib/extern.h
>> @@ -60,6 +60,12 @@ struct statement
>>      bool        questionmarks;
>>      struct variable *inlist;
>>      struct variable *outlist;
>> +    char        *oldlocale;
>> +    const char    **dollarzero;
>> +    int        ndollarzero;
>> +    const char    **param_values;
>> +    int        nparams;
>> +    PGresult    *results;
>>  };
> Please comment the members of this struct like we do in most of src/include.

OK.

> dollarzero has something to do with dynamic cursor names, right?  Does it have
> other roles?

Yes, it had other roles. ECPG supports user variables in cases where the
PostgreSQL grammar doesn't. There's this rule:

ECPG: var_valueNumericOnly addon               if ($1[0] == '$')               {                       free($1);
              $1 = mm_strdup("$0");               } 

The "var_value: NumericOnly" case in gram.y can show up in a lot of cases.
This feature was there before the dynamic cursor. You can even use them together
which means more than one $0 placeholders in the statement. E.g.:   FETCH :amount FROM :curname;
gets translated to   FETCH $0 FROM $0;
by ecpg, and both the amount and the cursor name is passed in in user variables.
The value is needed by cursor.c, this is why this "dollarzero" pointer is needed.

>
>> --- a/src/interfaces/ecpg/preproc/ecpg.header
>> +++ b/src/interfaces/ecpg/preproc/ecpg.header
>> @@ -111,6 +115,26 @@ mmerror(int error_code, enum errortype type, const char *error, ...)
>>  }
>>
>>  /*
>> + * set use_fetch_readahead based on the current cursor
>> + * doesn't return if the cursor is not declared
>> + */
>> +static void
>> +set_cursor_readahead(const char *curname)
>> +{
>> +    struct cursor *ptr;
>> +
>> +    for (ptr = cur; ptr != NULL; ptr = ptr->next)
>> +    {
>> +        if (strcmp(ptr->name, curname) == 0)
>> +            break;
>> +    }
>> +    if (!ptr)
>> +        mmerror(PARSE_ERROR, ET_FATAL, "cursor \"%s\" does not exist", curname);
>> +
>> +    use_fetch_readahead = ptr->fetch_readahead;
>> +}
> Following add_additional_variables(), use strcasecmp() for literal cursor
> names and strcmp() for cursor name host variables.

After modifying the grammar to use numeric values for readahead window size,
this function and the "use_fetch_readahead" variable are not needed anymore.

>
>> --- a/src/interfaces/ecpg/preproc/extern.h
>> +++ b/src/interfaces/ecpg/preproc/extern.h
>> @@ -24,12 +24,17 @@ extern bool autocommit,
>>              force_indicator,
>>              questionmarks,
>>              regression_mode,
>> -            auto_prepare;
>> +            auto_prepare,
>> +            fetch_readahead;
>> +extern bool    use_fetch_readahead;
> The names of the last two variables don't make clear the difference between
> them.  I suggest default_fetch_readahead and current_fetch_readahead.

Now fetch_readahead is int, the other one is no more.

>
>> --- /dev/null
>> +++ b/src/interfaces/ecpg/test/preproc/cursor-readahead.pgc
>> @@ -0,0 +1,244 @@
>> +#include <stdio.h>
>> +#include <malloc.h>
> Why <malloc.h>?  I only see this calling free(); use <stdlib.h> instead.

Fixed.

I had to update the code, the previous patch didn't apply cleanly to current GIT.
I will send the new patch some time next week after fixing the "make check"
breakage.

>
> Thanks,
> nm
>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de    http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

04 March 2012, 12:15:44

On Fri, Mar 02, 2012 at 11:41:05AM -0500, Noah Misch wrote:
> We yet lack a consensus on whether native ECPG apps should have access to the
> feature.  My 2c is to make it available, because it's useful syntactic sugar.
> If your program independently processes each row of an arbitrary-length result
> set, current facilities force you to add an extra outer loop to batch the
> FETCHes for every such code site.  Applications could define macros to
> abstract that pattern, but this seems common-enough to justify bespoke
> handling.  Besides, minimalists already use libpq directly.

Sorry, I don't really understand what you're saying here. The program logic
won't change at all when using this feature or what do I misunderstand?

> I suggest enabling the feature by default but drastically reducing the default
> readahead chunk size from 256 to, say, 5.  That still reduces the FETCH round
> trip overhead by 80%, but it's small enough not to attract pathological
> behavior on a workload where each row is a 10 MiB document.  I would not offer
> an ecpg-time option to disable the feature per se.  Instead, let the user set
> the default chunk size at ecpg time.  A setting of 1 effectively disables the
> feature, though one could later re-enable it with ECPGFETCHSZ.

Using 1 to effectively disable the feature is fine with me, but I strongly
object any default enabling this feature. It's farily easy to create cases with
pathological behaviour and this features is not standard by any means. I figure
a normal programmer would expect only one row being transfered when fetching
one.

Other than that, thanks for the great review.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

04 March 2012, 12:35:08

2012-03-04 17:16 keltezéssel, Michael Meskes írta:
> On Fri, Mar 02, 2012 at 11:41:05AM -0500, Noah Misch wrote:
>> We yet lack a consensus on whether native ECPG apps should have access to the
>> feature.  My 2c is to make it available, because it's useful syntactic sugar.
>> If your program independently processes each row of an arbitrary-length result
>> set, current facilities force you to add an extra outer loop to batch the
>> FETCHes for every such code site.  Applications could define macros to
>> abstract that pattern, but this seems common-enough to justify bespoke
>> handling.  Besides, minimalists already use libpq directly.
> Sorry, I don't really understand what you're saying here. The program logic
> won't change at all when using this feature or what do I misunderstand?

The program logic shouldn't change at all. He meant that extra coding effort
is needed if you want manual caching. It requires 2 loops instead of 1 if you use
FETCH N (N>1).

>
>> I suggest enabling the feature by default but drastically reducing the default
>> readahead chunk size from 256 to, say, 5.  That still reduces the FETCH round
>> trip overhead by 80%, but it's small enough not to attract pathological
>> behavior on a workload where each row is a 10 MiB document.  I would not offer
>> an ecpg-time option to disable the feature per se.  Instead, let the user set
>> the default chunk size at ecpg time.  A setting of 1 effectively disables the
>> feature, though one could later re-enable it with ECPGFETCHSZ.
> Using 1 to effectively disable the feature is fine with me,

Something else would be needed. For DML with  WHERE CURRENT OF cursor,
the feature should stay disabled even with the environment variable is set
without adding any decoration to the DECLARE statement. The safe thing
would be to distinguish between uncached (but cachable) and uncachable
cases. A single value cannot work.

>  but I strongly
> object any default enabling this feature. It's farily easy to create cases with
> pathological behaviour and this features is not standard by any means. I figure
> a normal programmer would expect only one row being transfered when fetching
> one.
>
> Other than that, thanks for the great review.
>
> Michael

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de    http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

05 March 2012, 04:25:31

On Sun, Mar 04, 2012 at 05:34:50PM +0100, Boszormenyi Zoltan wrote:
> The program logic shouldn't change at all. He meant that extra coding effort
> is needed if you want manual caching. It requires 2 loops instead of 1 if you use
> FETCH N (N>1).

Ah, thanks for the explanation.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Noah Misch

Date:

05 March 2012, 14:50:24

On Sun, Mar 04, 2012 at 05:16:06PM +0100, Michael Meskes wrote:
> On Fri, Mar 02, 2012 at 11:41:05AM -0500, Noah Misch wrote:
> > I suggest enabling the feature by default but drastically reducing the default
> > readahead chunk size from 256 to, say, 5.  That still reduces the FETCH round
> > trip overhead by 80%, but it's small enough not to attract pathological
> > behavior on a workload where each row is a 10 MiB document.  I would not offer
> > an ecpg-time option to disable the feature per se.  Instead, let the user set
> > the default chunk size at ecpg time.  A setting of 1 effectively disables the
> > feature, though one could later re-enable it with ECPGFETCHSZ.
> 
> Using 1 to effectively disable the feature is fine with me, but I strongly
> object any default enabling this feature. It's farily easy to create cases with
> pathological behaviour and this features is not standard by any means. I figure
> a normal programmer would expect only one row being transfered when fetching
> one.

On further reflection, I agree with you here.  The prospect for queries that
call volatile functions changed my mind; they would exhibit different
functional behavior under readahead.  We mustn't silently give affected
programs different semantics.

Thanks,
nm

Re: ECPG FETCH readahead

From

Noah Misch

Date:

05 March 2012, 14:56:52

On Sun, Mar 04, 2012 at 04:33:32PM +0100, Boszormenyi Zoltan wrote:
> 2012-03-02 17:41 keltez?ssel, Noah Misch ?rta:
> > On Thu, Dec 29, 2011 at 10:46:23AM +0100, Boszormenyi Zoltan wrote:

> > I suggest enabling the feature by default but drastically reducing the default
> > readahead chunk size from 256 to, say, 5. 
> 
> 
> >   That still reduces the FETCH round
> > trip overhead by 80%, but it's small enough not to attract pathological
> > behavior on a workload where each row is a 10 MiB document.
> 
> I see. How about 8? Nice "round" power of 2 value, still small and avoids
> 87.5% of overhead.

Having pondered the matter further, I now agree with Michael that the feature
should stay disabled by default.  See my response to him for rationale.
Assuming that conclusion holds, we can recommended a higher value to users who
enable the feature at all.  Your former proposal of 256 seems fine.

> BTW, the default disabled behaviour was to avoid "make check" breakage,
> see below.
> 
> >   I would not offer
> > an ecpg-time option to disable the feature per se.  Instead, let the user set
> > the default chunk size at ecpg time.  A setting of 1 effectively disables the
> > feature, though one could later re-enable it with ECPGFETCHSZ.
> 
> This means all code previously going through ECPGdo() would go through
> ECPGopen()/ECPGfetch()/ECPGclose(). This is more intrusive and all
> regression tests that were only testing certain features would also
> test the readahead feature, too.

It's a good sort of intrusiveness, reducing the likelihood of introducing bugs
basically unrelated to readahead that happen to afflict only ECPGdo() or only
the cursor.c interfaces.  Let's indeed not have any preexisting test cases use
readahead per se, but having them use the cursor.c interfaces anyway will
build confidence in the new code.  The churn in expected debug output isn't
ideal, but I don't prefer the alternative of segmenting the implementation for
the sake of the test cases.

> Also, the test for WHERE CURRENT OF at ecpg time would have to be done
> at runtime, possibly making previously working code fail if ECPGFETCHSZ is enabled.

Good point.

> How about still allowing "NO READAHEAD" cursors that compile into plain ECPGdo()?
> This way, ECPGFETCHSZ don't interfere with WHERE CURRENT OF. But this would
> mean code changes everywhere where WHERE CURRENT OF is used.

ECPGFETCHSZ should only affect cursors that make no explicit mention of
READAHEAD.  I'm not sure whether that should mean actually routing READHEAD 1
cursors through ECPGdo() or simply making sure that cursor.c achieves the same
outcome; see later for a possible reason to still do the latter.

> Or how about a new feature in the backend, so ECPG can do
>     UPDATE/DELETE ... WHERE OFFSET N OF cursor
> and the offset of computed from the actual cursor position and the position known
> by the application? This way an app can do readahead and do work on rows collected
> by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
> behind the scenes.

That's a neat idea, but I would expect obstacles threatening our ability to
use it automatically for readahead.  You would have to make the cursor a
SCROLL cursor.  We'll often pass a negative offset, making the operation fail
if the cursor query used FOR UPDATE.  Volatile functions in the query will get
more calls.  That's assuming the operation will map internally to something
like MOVE N; UPDATE ... WHERE CURRENT OF; MOVE -N.  You might come up with
innovations to mitigate those obstacles, but those innovations would probably
also apply to MOVE/FETCH.  In any event, this would constitute a substantive
patch in its own right.


One way out of trouble here is to make WHERE CURRENT OF imply READHEAD
1/READHEAD 0 (incidentally, perhaps those two should be synonyms) on the
affected cursor.  If the cursor has some other readahead quantity declared
explicitly, throw an error during preprocessing.

Failing a reasonable resolution, I'm prepared to withdraw my suggestion of
making ECPGFETCHSZ always-usable.  It's nice to have, not critical.

> >> +bool
> >> +ECPGopen(const int lineno, const int compat, const int force_indicator,
> >> +        const char *connection_name, const bool questionmarks,
> >> +        const char *curname, const int st, const char *query, ...)
> >> +{
> >> +    va_list        args;
> >> +    bool        ret, scrollable;
> >> +    char       *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0;
> >> +    struct sqlca_t *sqlca = ECPGget_sqlca();
> >> +
> >> +    if (!query)
> >> +    {
> >> +        ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
> >> +        return false;
> >> +    }
> >> +    ptr = strstr(query, "for ");
> >> +    if (!ptr)
> >> +    {
> >> +        ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
> >> +        return false;
> >> +    }
> >> +    whold = strstr(query, "with hold ");
> >> +    dollar0 = strstr(query, "$0");
> >> +
> >> +    noscroll = strstr(query, "no scroll ");
> >> +    scroll = strstr(query, "scroll ");
> > A query like 'SELECT 1 AS "with hold "' fools these lexical tests.
> 
> But SELECT 1 AS "with hold" doesn't go through ECPGopen(), it's run by ECPGdo()
> so no breakage there. ecpglib functions are not intended to be called from manually
> constructed C code.

I tried something likeEXEC SQL DECLARE cur CURSOR FOR SELECT * FROM generate_series(1 , $1) AS t("with hold ");
It wrongly generated this backend command:declare cur no scroll cursor with hold for select * from generate_series ( 1
,$1 ) as t ( "with hold " )
 

> >   An unusable ECPGFETCHSZ should procedure an error, not
> > silently give no effect.
> 
> Point taken. Which error handling do imagine? abort() or simply returning false
> and raise and error in SQLCA?

The latter.

> >> +    /*
> >> +     * If statement went OK, add the cursor and discover the
> >> +     * number of rows in the recordset. This will slow down OPEN
> >> +     * but we gain a lot with caching.
> >> +     */
> >> +    if (ret /* && sqlca->sqlerrd[2] == 0 */)
> > Why is the commented code there?
> 
> Some leftover from testing, it shouldn't be there.
> 
> >
> >> +    {
> >> +        struct connection *con = ecpg_get_connection(connection_name);
> >> +        struct cursor_descriptor *cur;
> >> +        bool    existing;
> >> +        int64    n_tuples;
> >> +
> >> +        if (scrollable)
> >> +        {
> >> +            PGresult   *res;
> >> +            char       *query;
> >> +            char       *endptr = NULL;
> >> +
> >> +            query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno);
> >> +            sprintf(query, "move all in %s", curname);
> >> +            res = PQexec(con->connection, query);
> >> +            n_tuples = strtoull(PQcmdTuples(res), &endptr, 10);
> >> +            PQclear(res);
> >> +            ecpg_free(query);
> >> +
> >> +            /* Go back to the beginning of the resultset. */
> >> +            query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno);
> >> +            sprintf(query, "move absolute 0 in %s", curname);
> >> +            res = PQexec(con->connection, query);
> >> +            PQclear(res);
> >> +            ecpg_free(query);
> >> +        }
> >> +        else
> >> +        {
> >> +            n_tuples = 0;
> >> +        }
> > You give this rationale for the above code:
> >
> > On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
> >> ECPGopen() also discovers the total number of records in the recordset,
> >> so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
> >> didn't report the (possibly estimated) number of rows in the resultset
> >> is now
> >> overcome. This slows down OPEN for cursors serving larger datasets
> >> but it makes possible to position the readahead window using MOVE
> >> ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
> >> variants are used by the application. And the caching is more than
> >> overweighs
> >> the slowdown in OPEN it seems.
> > From the documentation for Informix and Oracle, those databases do not
> > populate sqlerrd[2] this way:
> > http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
> > http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139
> 
> The problem here is that Informix in the field in fact returns the number of rows
> in the cursor and the customer we developed this readahead code for relied on this.
> Maybe this was eliminated in newer versions of Informix to make it faster.
> 
> > The performance impact will vary widely depending on the query cost per row
> > and the fraction of rows the application will actually retrieve.  Consider a
> > complex aggregate returning only a handful of rows.
> 
> Indeed.
> 
> >   Consider SELECT * on a
> > 1B-row table with the application ceasing reads after 1000 rows.  Performance
> > aside, this will yield double execution of any volatile functions involved.
> > So, I think we ought to diligently avoid this step.  (Failing that, the
> > documentation must warn about the extra full cursor scan and this feature must
> > stay disabled by default.)
> 
> OK, how about enabling it for Informix-compat mode only, or only via an
> environment variable? I agree it should be documented.

For a query where backend execution cost dominates the cost of transferring
rows to the client, does Informix take roughly twice the normal time to
execute the query via an ESQL/C cursor?  Is that acceptable overhead for every
"ecpg -C" user?  (FWIW, I've never used Informix-compat mode.)  If not, the
feature deserves its own option.

Whatever the trigger condition, shouldn't this apply independent of whether
readahead is in use for a given cursor?  (This could constitute a reason to
use the cursor.c interfaces for every cursor.)

Does some vendor-neutral standard define semantics for sqlerrd, or has it
propagated by imitation?

This reminds me to mention: your documentation should note that the use of
readahead or the option that enables sqlerrd[2] calculation may change the
outcome of queries calling volatile functions.  See how the DECLARE
documentation page discusses this hazard for SCROLL/WITH HOLD cursors.

> >> --- a/src/interfaces/ecpg/ecpglib/extern.h
> >> +++ b/src/interfaces/ecpg/ecpglib/extern.h
> >> @@ -60,6 +60,12 @@ struct statement
> >>      bool        questionmarks;
> >>      struct variable *inlist;
> >>      struct variable *outlist;
> >> +    char        *oldlocale;
> >> +    const char    **dollarzero;
> >> +    int        ndollarzero;
> >> +    const char    **param_values;
> >> +    int        nparams;
> >> +    PGresult    *results;
> >>  };
> > Please comment the members of this struct like we do in most of src/include.
> 
> OK.
> 
> > dollarzero has something to do with dynamic cursor names, right?  Does it have
> > other roles?
> 
> Yes, it had other roles. ECPG supports user variables in cases where the
> PostgreSQL grammar doesn't. There's this rule:
> 
> ECPG: var_valueNumericOnly addon
>                 if ($1[0] == '$')
>                 {
>                         free($1);
>                         $1 = mm_strdup("$0");
>                 }
> 
> The "var_value: NumericOnly" case in gram.y can show up in a lot of cases.
> This feature was there before the dynamic cursor. You can even use them together
> which means more than one $0 placeholders in the statement. E.g.:
>     FETCH :amount FROM :curname;
> gets translated to
>     FETCH $0 FROM $0;
> by ecpg, and both the amount and the cursor name is passed in in user variables.
> The value is needed by cursor.c, this is why this "dollarzero" pointer is needed.

Thanks for that explanation; the situation is clearer to me now.

nm

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

06 March 2012, 02:08:07

2012-03-05 19:56 keltezéssel, Noah Misch írta:
>
> Having pondered the matter further, I now agree with Michael that the feature
> should stay disabled by default.  See my response to him for rationale.
> Assuming that conclusion holds, we can recommended a higher value to users who
> enable the feature at all.  Your former proposal of 256 seems fine.

OK.

>
>> BTW, the default disabled behaviour was to avoid "make check" breakage,
>> see below.
>>
>>>   I would not offer
>>> an ecpg-time option to disable the feature per se.  Instead, let the user set
>>> the default chunk size at ecpg time.  A setting of 1 effectively disables the
>>> feature, though one could later re-enable it with ECPGFETCHSZ.
>> This means all code previously going through ECPGdo() would go through
>> ECPGopen()/ECPGfetch()/ECPGclose(). This is more intrusive and all
>> regression tests that were only testing certain features would also
>> test the readahead feature, too.
> It's a good sort of intrusiveness, reducing the likelihood of introducing bugs
> basically unrelated to readahead that happen to afflict only ECPGdo() or only
> the cursor.c interfaces.  Let's indeed not have any preexisting test cases use
> readahead per se, but having them use the cursor.c interfaces anyway will
> build confidence in the new code.  The churn in expected debug output isn't
> ideal, but I don't prefer the alternative of segmenting the implementation for
> the sake of the test cases.

I see.

>
>> Also, the test for WHERE CURRENT OF at ecpg time would have to be done
>> at runtime, possibly making previously working code fail if ECPGFETCHSZ is enabled.
> Good point.
>
>> How about still allowing "NO READAHEAD" cursors that compile into plain ECPGdo()?
>> This way, ECPGFETCHSZ don't interfere with WHERE CURRENT OF. But this would
>> mean code changes everywhere where WHERE CURRENT OF is used.
> ECPGFETCHSZ should only affect cursors that make no explicit mention of
> READAHEAD.  I'm not sure whether that should mean actually routing READHEAD 1
> cursors through ECPGdo() or simply making sure that cursor.c achieves the same
> outcome; see later for a possible reason to still do the latter.
>
>> Or how about a new feature in the backend, so ECPG can do
>>     UPDATE/DELETE ... WHERE OFFSET N OF cursor
>> and the offset of computed from the actual cursor position and the position known
>> by the application? This way an app can do readahead and do work on rows collected
>> by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
>> behind the scenes.
> That's a neat idea, but I would expect obstacles threatening our ability to
> use it automatically for readahead.  You would have to make the cursor a
> SCROLL cursor.  We'll often pass a negative offset, making the operation fail
> if the cursor query used FOR UPDATE.  Volatile functions in the query will get
> more calls.  That's assuming the operation will map internally to something
> like MOVE N; UPDATE ... WHERE CURRENT OF; MOVE -N.  You might come up with
> innovations to mitigate those obstacles, but those innovations would probably
> also apply to MOVE/FETCH.  In any event, this would constitute a substantive
> patch in its own right.

I was thinking along the lines of a Portal keeping the ItemPointerData
for each tuple in the last FETCH statement. The WHERE OFFSET N OF cursor
would treat the offset value relative to the tuple order returned by FETCH.
So, OFFSET 0 OF == CURRENT OF and other values of N are negative.
This way, it doesn't matter if the cursor is SCROLL, NO SCROLL or havethe default behaviour with "SCROLL in some
cases".Then ECPGopen() 
doesn't have to play games with the DECLARE statement. Only ECPGfetch()
needs to play with MOVE statements, passing different offsets to the backend,
not what the application passed.

> One way out of trouble here is to make WHERE CURRENT OF imply READHEAD
> 1/READHEAD 0 (incidentally, perhaps those two should be synonyms) on the
> affected cursor.  If the cursor has some other readahead quantity declared
> explicitly, throw an error during preprocessing.

I played with this idea a while ago, from a different point of view.
If the ECPG code had the DECLARE mycur, DML ... WHERE CURRENT OF mycur
and OPEN mycur in exactly this order, i.e. WHERE CURRENT OF appears in
a standalone function between DECLARE and the first OPEN for the cursor,
then ECPG disabled readahead automatically for that cursor and for that
cursor only. But this requires effort on the user of ECPG and can be very
fragile. Code cleanup with reordering functions can break previously
working code.

> Failing a reasonable resolution, I'm prepared to withdraw my suggestion of
> making ECPGFETCHSZ always-usable.  It's nice to have, not critical.
>
>>>> +bool
>>>> +ECPGopen(const int lineno, const int compat, const int force_indicator,
>>>> +        const char *connection_name, const bool questionmarks,
>>>> +        const char *curname, const int st, const char *query, ...)
>>>> +{
>>>> +    va_list        args;
>>>> +    bool        ret, scrollable;
>>>> +    char       *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0;
>>>> +    struct sqlca_t *sqlca = ECPGget_sqlca();
>>>> +
>>>> +    if (!query)
>>>> +    {
>>>> +        ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
>>>> +        return false;
>>>> +    }
>>>> +    ptr = strstr(query, "for ");
>>>> +    if (!ptr)
>>>> +    {
>>>> +        ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
>>>> +        return false;
>>>> +    }
>>>> +    whold = strstr(query, "with hold ");
>>>> +    dollar0 = strstr(query, "$0");
>>>> +
>>>> +    noscroll = strstr(query, "no scroll ");
>>>> +    scroll = strstr(query, "scroll ");
>>> A query like 'SELECT 1 AS "with hold "' fools these lexical tests.
>> But SELECT 1 AS "with hold" doesn't go through ECPGopen(), it's run by ECPGdo()
>> so no breakage there. ecpglib functions are not intended to be called from manually
>> constructed C code.
> I tried something like
>     EXEC SQL DECLARE cur CURSOR FOR SELECT * FROM generate_series(1 , $1) AS t("with hold ");
> It wrongly generated this backend command:
>     declare cur no scroll cursor with hold for select * from generate_series ( 1 , $1 ) as t ( "with hold " )

Ah, ok. The grammar test in ecpg is better.

>>>   An unusable ECPGFETCHSZ should procedure an error, not
>>> silently give no effect.
>> Point taken. Which error handling do imagine? abort() or simply returning false
>> and raise and error in SQLCA?
> The latter.
>
>>>> +    /*
>>>> +     * If statement went OK, add the cursor and discover the
>>>> +     * number of rows in the recordset. This will slow down OPEN
>>>> +     * but we gain a lot with caching.
>>>> +     */
>>>> +    if (ret /* && sqlca->sqlerrd[2] == 0 */)
>>> Why is the commented code there?
>> Some leftover from testing, it shouldn't be there.

Actually, now I remember. It was a wishful thinking on my part
that whenever PG supports returning a number of rows at opening
the cursor, correct or estimated, this code shouldn't be executed,
just accept what the backend has given us.

>>
>>>> +    {
>>>> +        struct connection *con = ecpg_get_connection(connection_name);
>>>> +        struct cursor_descriptor *cur;
>>>> +        bool    existing;
>>>> +        int64    n_tuples;
>>>> +
>>>> +        if (scrollable)
>>>> +        {
>>>> +            PGresult   *res;
>>>> +            char       *query;
>>>> +            char       *endptr = NULL;
>>>> +
>>>> +            query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno);
>>>> +            sprintf(query, "move all in %s", curname);
>>>> +            res = PQexec(con->connection, query);
>>>> +            n_tuples = strtoull(PQcmdTuples(res), &endptr, 10);
>>>> +            PQclear(res);
>>>> +            ecpg_free(query);
>>>> +
>>>> +            /* Go back to the beginning of the resultset. */
>>>> +            query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno);
>>>> +            sprintf(query, "move absolute 0 in %s", curname);
>>>> +            res = PQexec(con->connection, query);
>>>> +            PQclear(res);
>>>> +            ecpg_free(query);
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            n_tuples = 0;
>>>> +        }
>>> You give this rationale for the above code:
>>>
>>> On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
>>>> ECPGopen() also discovers the total number of records in the recordset,
>>>> so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
>>>> didn't report the (possibly estimated) number of rows in the resultset
>>>> is now
>>>> overcome. This slows down OPEN for cursors serving larger datasets
>>>> but it makes possible to position the readahead window using MOVE
>>>> ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
>>>> variants are used by the application. And the caching is more than
>>>> overweighs
>>>> the slowdown in OPEN it seems.
>>> From the documentation for Informix and Oracle, those databases do not
>>> populate sqlerrd[2] this way:
>>> http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
>>> http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139
>> The problem here is that Informix in the field in fact returns the number of rows
>> in the cursor and the customer we developed this readahead code for relied on this.
>> Maybe this was eliminated in newer versions of Informix to make it faster.
>>
>>> The performance impact will vary widely depending on the query cost per row
>>> and the fraction of rows the application will actually retrieve.  Consider a
>>> complex aggregate returning only a handful of rows.
>> Indeed.
>>
>>>   Consider SELECT * on a
>>> 1B-row table with the application ceasing reads after 1000 rows.  Performance
>>> aside, this will yield double execution of any volatile functions involved.
>>> So, I think we ought to diligently avoid this step.  (Failing that, the
>>> documentation must warn about the extra full cursor scan and this feature must
>>> stay disabled by default.)
>> OK, how about enabling it for Informix-compat mode only, or only via an
>> environment variable? I agree it should be documented.
> For a query where backend execution cost dominates the cost of transferring
> rows to the client, does Informix take roughly twice the normal time to
> execute the query via an ESQL/C cursor?  Is that acceptable overhead for every
> "ecpg -C" user?  (FWIW, I've never used Informix-compat mode.)  If not, the
> feature deserves its own option.
>
> Whatever the trigger condition, shouldn't this apply independent of whether
> readahead is in use for a given cursor?

I guess so.

>   (This could constitute a reason to
> use the cursor.c interfaces for every cursor.)

Indeed.

> Does some vendor-neutral standard define semantics for sqlerrd, or has it
> propagated by imitation?

No idea.

> This reminds me to mention: your documentation should note that the use of
> readahead or the option that enables sqlerrd[2] calculation may change the
> outcome of queries calling volatile functions.  See how the DECLARE
> documentation page discusses this hazard for SCROLL/WITH HOLD cursors.

OK.

>
>>>> --- a/src/interfaces/ecpg/ecpglib/extern.h
>>>> +++ b/src/interfaces/ecpg/ecpglib/extern.h
>>>> @@ -60,6 +60,12 @@ struct statement
>>>>      bool        questionmarks;
>>>>      struct variable *inlist;
>>>>      struct variable *outlist;
>>>> +    char        *oldlocale;
>>>> +    const char    **dollarzero;
>>>> +    int        ndollarzero;
>>>> +    const char    **param_values;
>>>> +    int        nparams;
>>>> +    PGresult    *results;
>>>>  };
>>> Please comment the members of this struct like we do in most of src/include.
>> OK.
>>
>>> dollarzero has something to do with dynamic cursor names, right?  Does it have
>>> other roles?
>> Yes, it had other roles. ECPG supports user variables in cases where the
>> PostgreSQL grammar doesn't. There's this rule:
>>
>> ECPG: var_valueNumericOnly addon
>>                 if ($1[0] == '$')
>>                 {
>>                         free($1);
>>                         $1 = mm_strdup("$0");
>>                 }
>>
>> The "var_value: NumericOnly" case in gram.y can show up in a lot of cases.
>> This feature was there before the dynamic cursor. You can even use them together
>> which means more than one $0 placeholders in the statement. E.g.:
>>     FETCH :amount FROM :curname;
>> gets translated to
>>     FETCH $0 FROM $0;
>> by ecpg, and both the amount and the cursor name is passed in in user variables.
>> The value is needed by cursor.c, this is why this "dollarzero" pointer is needed.
> Thanks for that explanation; the situation is clearer to me now.
>
> nm
>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de    http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Noah Misch

Date:

06 March 2012, 07:07:16

On Tue, Mar 06, 2012 at 07:07:41AM +0100, Boszormenyi Zoltan wrote:
> 2012-03-05 19:56 keltez?ssel, Noah Misch ?rta:
> >> Or how about a new feature in the backend, so ECPG can do
> >>     UPDATE/DELETE ... WHERE OFFSET N OF cursor
> >> and the offset of computed from the actual cursor position and the position known
> >> by the application? This way an app can do readahead and do work on rows collected
> >> by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
> >> behind the scenes.
> > That's a neat idea, but I would expect obstacles threatening our ability to
> > use it automatically for readahead.  You would have to make the cursor a
> > SCROLL cursor.  We'll often pass a negative offset, making the operation fail
> > if the cursor query used FOR UPDATE.  Volatile functions in the query will get
> > more calls.  That's assuming the operation will map internally to something
> > like MOVE N; UPDATE ... WHERE CURRENT OF; MOVE -N.  You might come up with
> > innovations to mitigate those obstacles, but those innovations would probably
> > also apply to MOVE/FETCH.  In any event, this would constitute a substantive
> > patch in its own right.
> 
> I was thinking along the lines of a Portal keeping the ItemPointerData
> for each tuple in the last FETCH statement. The WHERE OFFSET N OF cursor
> would treat the offset value relative to the tuple order returned by FETCH.
> So, OFFSET 0 OF == CURRENT OF and other values of N are negative.
> This way, it doesn't matter if the cursor is SCROLL, NO SCROLL or have
>  the default behaviour with "SCROLL in some cases". Then ECPGopen()
> doesn't have to play games with the DECLARE statement. Only ECPGfetch()
> needs to play with MOVE statements, passing different offsets to the backend,
> not what the application passed.

That broad approach sounds promising.  The main other consideration that comes
to mind is a plan to limit resource usage for a cursor that reads, say, 1B
rows.  However, I think attempting to implement this now will significantly
decrease the chance of getting the core patch features committed now.

> > One way out of trouble here is to make WHERE CURRENT OF imply READHEAD
> > 1/READHEAD 0 (incidentally, perhaps those two should be synonyms) on the
> > affected cursor.  If the cursor has some other readahead quantity declared
> > explicitly, throw an error during preprocessing.
> 
> I played with this idea a while ago, from a different point of view.
> If the ECPG code had the DECLARE mycur, DML ... WHERE CURRENT OF mycur
> and OPEN mycur in exactly this order, i.e. WHERE CURRENT OF appears in
> a standalone function between DECLARE and the first OPEN for the cursor,
> then ECPG disabled readahead automatically for that cursor and for that
> cursor only. But this requires effort on the user of ECPG and can be very
> fragile. Code cleanup with reordering functions can break previously
> working code.

Don't the same challenges apply to accurately reporting an error when the user
specifies WHERE CURRENT OF for a readahead cursor?

Re: ECPG FETCH readahead

From

Robert Haas

Date:

15 March 2012, 17:59:21

On Tue, Mar 6, 2012 at 6:06 AM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Mar 06, 2012 at 07:07:41AM +0100, Boszormenyi Zoltan wrote:
>> 2012-03-05 19:56 keltez?ssel, Noah Misch ?rta:
>> >> Or how about a new feature in the backend, so ECPG can do
>> >>     UPDATE/DELETE ... WHERE OFFSET N OF cursor
>> >> and the offset of computed from the actual cursor position and the position known
>> >> by the application? This way an app can do readahead and do work on rows collected
>> >> by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
>> >> behind the scenes.
>> > That's a neat idea, but I would expect obstacles threatening our ability to
>> > use it automatically for readahead.  You would have to make the cursor a
>> > SCROLL cursor.  We'll often pass a negative offset, making the operation fail
>> > if the cursor query used FOR UPDATE.  Volatile functions in the query will get
>> > more calls.  That's assuming the operation will map internally to something
>> > like MOVE N; UPDATE ... WHERE CURRENT OF; MOVE -N.  You might come up with
>> > innovations to mitigate those obstacles, but those innovations would probably
>> > also apply to MOVE/FETCH.  In any event, this would constitute a substantive
>> > patch in its own right.
>>
>> I was thinking along the lines of a Portal keeping the ItemPointerData
>> for each tuple in the last FETCH statement. The WHERE OFFSET N OF cursor
>> would treat the offset value relative to the tuple order returned by FETCH.
>> So, OFFSET 0 OF == CURRENT OF and other values of N are negative.
>> This way, it doesn't matter if the cursor is SCROLL, NO SCROLL or have
>>  the default behaviour with "SCROLL in some cases". Then ECPGopen()
>> doesn't have to play games with the DECLARE statement. Only ECPGfetch()
>> needs to play with MOVE statements, passing different offsets to the backend,
>> not what the application passed.
>
> That broad approach sounds promising.  The main other consideration that comes
> to mind is a plan to limit resource usage for a cursor that reads, say, 1B
> rows.  However, I think attempting to implement this now will significantly
> decrease the chance of getting the core patch features committed now.
>
>> > One way out of trouble here is to make WHERE CURRENT OF imply READHEAD
>> > 1/READHEAD 0 (incidentally, perhaps those two should be synonyms) on the
>> > affected cursor.  If the cursor has some other readahead quantity declared
>> > explicitly, throw an error during preprocessing.
>>
>> I played with this idea a while ago, from a different point of view.
>> If the ECPG code had the DECLARE mycur, DML ... WHERE CURRENT OF mycur
>> and OPEN mycur in exactly this order, i.e. WHERE CURRENT OF appears in
>> a standalone function between DECLARE and the first OPEN for the cursor,
>> then ECPG disabled readahead automatically for that cursor and for that
>> cursor only. But this requires effort on the user of ECPG and can be very
>> fragile. Code cleanup with reordering functions can break previously
>> working code.
>
> Don't the same challenges apply to accurately reporting an error when the user
> specifies WHERE CURRENT OF for a readahead cursor?

I think we need either an updated version of this patch that's ready
for commit real soon now, or we need to postpone it to 9.3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

24 March 2012, 06:51:15

2012-03-15 21:59 keltezéssel, Robert Haas írta:
> I think we need either an updated version of this patch that's ready for commit real
> soon now, or we need to postpone it to 9.3.

Sorry for the delay, I had been busy with other tasks and I rewrote this code
to better cope with unknown result size, scrollable cursors and negative
cursor positions.

I think all points raised by Noah is addressed: per-cursor readahead window size,
extensive comments, documentation and not enabling result set size discovery.

Also, I noticed this in passing:

static void
free_variable(struct variable * var)
{
         struct variable *var_next;

         if (var == NULL)
                 return;
         var_next = var->next;
         ecpg_free(var);

         while (var_next)
         {
                 var = var_next;
                 var_next = var->next;
                 ecpg_free(var);
         }
}

I rewrote this as below to eliminate manual unrolling of the loop,
they are equivalent:

static void
free_variable(struct variable * var)
{
         struct variable *var_next;

         while (var)
         {
                 var_next = var->next;
                 ecpg_free(var);
                 var = var_next;
         }
}


The problem with WHERE CURRENT OF is solved by a little more grammar
and ecpglib code, which effectively does a MOVE ABSOLUTE N before
executing the DML with WHERE CURRENT OF clause. No patching of the
backend. This way, the new ECPG caching code is compatible with older
servers but obviously reduces the efficiency of caching.

Attached are two patches, the first one is the feature patch.

The second patch makes all cursor statements go through the new
caching functions with 1 tuple readahead window size. It only changes
the preprocessed code and stderr logs of the regression tests affected,
not their results.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

Re: ECPG FETCH readahead

From

Noah Misch

Date:

28 March 2012, 21:43:42

On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote:
> Sorry for the delay, I had been busy with other tasks and I rewrote this code
> to better cope with unknown result size, scrollable cursors and negative
> cursor positions.
>
> I think all points raised by Noah is addressed: per-cursor readahead window size,
> extensive comments, documentation and not enabling result set size discovery.

The new comments are a nice improvement; thanks.

> The problem with WHERE CURRENT OF is solved by a little more grammar
> and ecpglib code, which effectively does a MOVE ABSOLUTE N before
> executing the DML with WHERE CURRENT OF clause. No patching of the
> backend. This way, the new ECPG caching code is compatible with older
> servers but obviously reduces the efficiency of caching.

Good plan.

> diff -dcrpN postgresql.orig/doc/src/sgml/ecpg.sgml postgresql/doc/src/sgml/ecpg.sgml
> *** postgresql.orig/doc/src/sgml/ecpg.sgml    2012-03-12 09:24:31.699560098 +0100
> --- postgresql/doc/src/sgml/ecpg.sgml    2012-03-24 10:15:00.538924601 +0100
> *************** EXEC SQL COMMIT;
> *** 454,459 ****
> --- 454,479 ----
>      details.
>     </para>
>   
> +   <para>
> +    ECPG may use cursor readahead to improve performance of programs
> +    that use single-row FETCH statements. Option <literal>-R number</literal>
> +    option for ECPG modifies the default for all cursors from <literal>NO READAHEAD</literal>

This sentence duplicates the word "option".

> +    to <literal>READAHEAD number</literal>. Explicit <literal>READAHEAD number</literal> or
> +    <literal>NO READAHEAD</literal> turns cursor readahead on (with <literal>number</literal>
> +    as the size of the readahead window) or off for the specified cursor,
> +    respectively. For cursors using a non-0 readahead window size is 256 rows,

The number 256 is no longer present in your implementation.

> +    the window size may be modified by setting the <literal>ECPGFETCHSZ</literal>
> +    environment variable to a different value.

I had in mind that DECLARE statements adorned with READAHEAD syntax would
always behave precisely as written, independent of ECPGFETCHSZ or "ecpg -R".
Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value
passed to "ecpg -R" if provided, and finally a value of 1 (no readahead) in
the absence of both ECPGFETCHSZ and "ecpg -R".  Did you do it differently for
a particular reason?  I don't particularly object to what you've implemented,
but I'd be curious to hear your thinking.

> +   </para>
> + 
> +   <para>
> +    <command>UPDATE</command> or <command>DELETE</command> with the
> +    <literal>WHERE CURRENT OF</literal> clause, cursor readahead may or may not
> +    actually improve performance, as <command>MOVE</command> statement has to be
> +    sent to the backend before the DML statement to ensure correct cursor position
> +    in the backend.
> +   </para>

This sentence seems to be missing a word near its beginning.

> *************** DECLARE <replaceable class="PARAMETER">c
> *** 6639,6649 ****
>         </listitem>
>        </varlistentry>
>       </variablelist>
>   
>       <para>
> !      For the meaning of the cursor options,
> !      see <xref linkend="sql-declare">.
>       </para>
>      </refsect1>
>   
>      <refsect1>
> --- 6669,6728 ----
>         </listitem>
>        </varlistentry>
>       </variablelist>
> +    </refsect1>
> + 
> +    <refsect1>
> +     <title>Cursor options</title>
>   
>       <para>
> !      For the meaning of other cursor options, see <xref linkend="sql-declare">.
>       </para>
> + 
> +     <variablelist>
> +      <varlistentry>
> +       <term><literal>READAHEAD number</literal></term>   
> +       <term><literal>NO READAHEAD</literal></term>   
> +        <listitem>
> +         <para>
> +          <literal>READAHEAD number</literal> makes the ECPG preprocessor and
> +          runtime library use a client-side cursor accounting and data readahead
> +          during <command>FETCH</command>. This improves performance for programs
> +          that use single-row <command>FETCH</command> statements.
> +         </para>
> + 
> +         <para>
> +          <literal>NO READAHEAD</literal> disables data readahead in case
> +          <parameter>-R number</parameter> is used for compiling the file.
> +         </para>
> +        </listitem>
> +      </varlistentry>

One of the new sections about readahead should somehow reference the hazard
around volatile functions.

> + 
> +      <varlistentry>
> +       <term><literal>OPEN RETURNS LAST ROW POSITION</literal></term>
> +       <term><literal>OPEN RETURNS 0 FOR LAST ROW POSITION</literal></term>
> +        <listitem>
> +         <para>
> +          When the cursor is opened, it's possible to discover the size of the result set
> +          using <command>MOVE ALL</command> which traverses the result set and move
> +          the cursor back to the beginning using <command>MOVE ABSOLUTE 0</command>.
> +          The size of the result set is returned in sqlca.sqlerrd[2].
> +         </para>
> + 
> +         <para>
> +          This slows down opening the cursor from the application point of view
> +          but may also have side effects. If the cursor query contains volatile function
> +          calls with side effects, they will be evaluated twice because of traversing
> +          the result set this way during <command>OPEN</command>.
> +         </para>
> + 
> +         <para>
> +          The default is not to discover.
> +         </para>

I mildly oppose having syntax to enable this per-cursor.  Readahead is a
generally handy feature that I might use in new programs, but this feature is
more of a hack for compatibility with some old Informix version.  For new
code, authors should do their own MOVEs instead of using this option.

The patch also adds an undocumented ECPGOPENRETURNSRESULTSETSIZE environment
variable to control this.  I see no use for such a thing, because a program's
dependency on sqlerrd[2] is fixed at build time.  If a program does not
reference sqlerrd[2] for a cursor, there's no benefit from choosing at runtime
to populate that value anyway.  If a program does reference it, any option
needed to have it set correctly had better be set at build time and apply to
every run of the final program.

IOW, I suggest having only the "ecpg"-time option to enable this behavior.
Thoughts?

> +        </listitem>  
> +      </varlistentry>
> + 
> +     </variablelist>
> + 
>      </refsect1>
>   
>      <refsect1>
> diff -dcrpN postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml postgresql/doc/src/sgml/ref/ecpg-ref.sgml
> *** postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml    2011-08-07 11:29:16.003256959 +0200
> --- postgresql/doc/src/sgml/ref/ecpg-ref.sgml    2012-03-24 10:13:08.679284063 +0100
> *************** PostgreSQL documentation
> *** 171,176 ****
> --- 171,196 ----
>       </varlistentry>
>   
>       <varlistentry>
> +      <term><option>-R <replaceable>number</replaceable></option></term>
> +      <listitem>
> +       <para>
> +        Turn on cursor readahead by default using <replaceable>number</replaceable>
> +        as readahead window size.
> +       </para>
> +      </listitem>
> +     </varlistentry>
> + 
> +     <varlistentry>
> +      <term><option>--detect-cursor-resultset-size</option></term>
> +      <listitem>
> +       <para>
> +        Detect the cursor result set size during <command>OPEN</command> and
> +        return it in sqlca.sqlerrd[2].

Make reference to this option in the ecpg-sqlca section, where sqlerrd[2] has
its primary documentation.

> +       </para>
> +      </listitem>
> +     </varlistentry>
> + 
> +     <varlistentry>
>        <term><option>-t</option></term>
>        <listitem>
>         <para>


I did not re-review the code changes in the same detail as before, but nothing
caught my attention when scanning through them.  If some particular section
has changed in a tricky way and deserves a close look, let me know.

The documentation-cosmetic and policy questions I raise above shouldn't entail
large-scale patch churn, so I've marked the patch Ready for Committer.

Thanks,
nm

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

29 March 2012, 08:00:33

2012-03-29 02:43 keltezéssel, Noah Misch írta:
> On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote:
>> Sorry for the delay, I had been busy with other tasks and I rewrote this code
>> to better cope with unknown result size, scrollable cursors and negative
>> cursor positions.
>>
>> I think all points raised by Noah is addressed: per-cursor readahead window size,
>> extensive comments, documentation and not enabling result set size discovery.
> The new comments are a nice improvement; thanks.
>
>> The problem with WHERE CURRENT OF is solved by a little more grammar
>> and ecpglib code, which effectively does a MOVE ABSOLUTE N before
>> executing the DML with WHERE CURRENT OF clause. No patching of the
>> backend. This way, the new ECPG caching code is compatible with older
>> servers but obviously reduces the efficiency of caching.
> Good plan.
>
>> diff -dcrpN postgresql.orig/doc/src/sgml/ecpg.sgml postgresql/doc/src/sgml/ecpg.sgml
>> *** postgresql.orig/doc/src/sgml/ecpg.sgml    2012-03-12 09:24:31.699560098 +0100
>> --- postgresql/doc/src/sgml/ecpg.sgml    2012-03-24 10:15:00.538924601 +0100
>> *************** EXEC SQL COMMIT;
>> *** 454,459 ****
>> --- 454,479 ----
>>       details.
>>      </para>
>>
>> +<para>
>> +    ECPG may use cursor readahead to improve performance of programs
>> +    that use single-row FETCH statements. Option<literal>-R number</literal>
>> +    option for ECPG modifies the default for all cursors from<literal>NO READAHEAD</literal>
> This sentence duplicates the word "option".

Fixed.

>
>> +    to<literal>READAHEAD number</literal>. Explicit<literal>READAHEAD number</literal>  or
>> +<literal>NO READAHEAD</literal>  turns cursor readahead on (with<literal>number</literal>
>> +    as the size of the readahead window) or off for the specified cursor,
>> +    respectively. For cursors using a non-0 readahead window size is 256 rows,
> The number 256 is no longer present in your implementation.

Indeed. Oversight, that part of the sentence is deleted.

>
>> +    the window size may be modified by setting the<literal>ECPGFETCHSZ</literal>
>> +    environment variable to a different value.
> I had in mind that DECLARE statements adorned with READAHEAD syntax would
> always behave precisely as written, independent of ECPGFETCHSZ or "ecpg -R".
> Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value
> passed to "ecpg -R" if provided, and finally a value of 1 (no readahead) in
> the absence of both ECPGFETCHSZ and "ecpg -R".  Did you do it differently for
> a particular reason?  I don't particularly object to what you've implemented,
> but I'd be curious to hear your thinking.

What I had in mind was:

NO READAHEAD == READAHEAD 0. This will translate into 1 tuple sized
readahead window that cannot be modified by ECPGFETCHSZ.

READAHEAD 1 also means uncached by default but ECPGFETCHSZ may
modify the readahead window size.

Values greater than 1 means cached by default with the specified window size
but it can also be overridden (even disabled) just in case. This part can be debated,
I agree. This is in add_cursor():

-----8<-----8<-----8<-----8<-----8<-----8<-----
         desc->cachable = (ra_size > 0);
         /*
          * If this cursor is allowed to be cached, the readahead
          * windows is also allowed to be overridden (possibly
          * even disabled) by default_fetch_size if it's set.
          */
         if (desc->cachable)
                 desc->ra_size = (default_fetch_size > 0 ?
                                                 default_fetch_size :
                                                 ra_size);
         else
                 desc->ra_size = 1;
-----8<-----8<-----8<-----8<-----8<-----8<-----

The check (default_fetch_size > 0) can be modified so it reads
(default_fetch_size > ra_size) so the override can only happen upwards.

This part is policy that can be debated and modified accordingly,
it doesn't affect the internals of the caching code.

>
>> +</para>
>> +
>> +<para>
>> +<command>UPDATE</command>  or<command>DELETE</command>  with the
>> +<literal>WHERE CURRENT OF</literal>  clause, cursor readahead may or may not
>> +    actually improve performance, as<command>MOVE</command>  statement has to be
>> +    sent to the backend before the DML statement to ensure correct cursor position
>> +    in the backend.
>> +</para>
> This sentence seems to be missing a word near its beginning.

Sounds like a corner case to me, but I am not a native english speaker.
Which one sounds better:

    UPDATE or DELETE with the WHERE CURRENT OF clause...

or

    UPDATE or DELETE statements with the WHERE CURRENT OF clause...
?

I went with the second. The committer can change the wording in any way
he wants.

>
>> *************** DECLARE<replaceable class="PARAMETER">c
>> *** 6639,6649 ****
>>          </listitem>
>>         </varlistentry>
>>        </variablelist>
>>
>>        <para>
>> !      For the meaning of the cursor options,
>> !      see<xref linkend="sql-declare">.
>>        </para>
>>       </refsect1>
>>
>>       <refsect1>
>> --- 6669,6728 ----
>>          </listitem>
>>         </varlistentry>
>>        </variablelist>
>> +</refsect1>
>> +
>> +<refsect1>
>> +<title>Cursor options</title>
>>
>>        <para>
>> !      For the meaning of other cursor options, see<xref linkend="sql-declare">.
>>        </para>
>> +
>> +<variablelist>
>> +<varlistentry>
>> +<term><literal>READAHEAD number</literal></term>
>> +<term><literal>NO READAHEAD</literal></term>
>> +<listitem>
>> +<para>
>> +<literal>READAHEAD number</literal>  makes the ECPG preprocessor and
>> +          runtime library use a client-side cursor accounting and data readahead
>> +          during<command>FETCH</command>. This improves performance for programs
>> +          that use single-row<command>FETCH</command>  statements.
>> +</para>
>> +
>> +<para>
>> +<literal>NO READAHEAD</literal>  disables data readahead in case
>> +<parameter>-R number</parameter>  is used for compiling the file.
>> +</para>
>> +</listitem>
>> +</varlistentry>
> One of the new sections about readahead should somehow reference the hazard
> around volatile functions.

Done.

>> +
>> +<varlistentry>
>> +<term><literal>OPEN RETURNS LAST ROW POSITION</literal></term>
>> +<term><literal>OPEN RETURNS 0 FOR LAST ROW POSITION</literal></term>
>> +<listitem>
>> +<para>
>> +          When the cursor is opened, it's possible to discover the size of the result set
>> +          using<command>MOVE ALL</command>  which traverses the result set and move
>> +          the cursor back to the beginning using<command>MOVE ABSOLUTE 0</command>.
>> +          The size of the result set is returned in sqlca.sqlerrd[2].
>> +</para>
>> +
>> +<para>
>> +          This slows down opening the cursor from the application point of view
>> +          but may also have side effects. If the cursor query contains volatile function
>> +          calls with side effects, they will be evaluated twice because of traversing
>> +          the result set this way during<command>OPEN</command>.
>> +</para>
>> +
>> +<para>
>> +          The default is not to discover.
>> +</para>
> I mildly oppose having syntax to enable this per-cursor.  Readahead is a
> generally handy feature that I might use in new programs, but this feature is
> more of a hack for compatibility with some old Informix version.  For new
> code, authors should do their own MOVEs instead of using this option.
>
> The patch also adds an undocumented ECPGOPENRETURNSRESULTSETSIZE environment
> variable to control this.  I see no use for such a thing, because a program's
> dependency on sqlerrd[2] is fixed at build time.  If a program does not
> reference sqlerrd[2] for a cursor, there's no benefit from choosing at runtime
> to populate that value anyway.  If a program does reference it, any option
> needed to have it set correctly had better be set at build time and apply to
> every run of the final program.
>
> IOW, I suggest having only the "ecpg"-time option to enable this behavior.
> Thoughts?

I wanted to make it available for non-Informix mode and a way to
disable it if the command line option enables it.

But the extra environment variable seems to be silly indeed.
I deleted that part of the code.

>> +</listitem>
>> +</varlistentry>
>> +
>> +</variablelist>
>> +
>>       </refsect1>
>>
>>       <refsect1>
>> diff -dcrpN postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml postgresql/doc/src/sgml/ref/ecpg-ref.sgml
>> *** postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml    2011-08-07 11:29:16.003256959 +0200
>> --- postgresql/doc/src/sgml/ref/ecpg-ref.sgml    2012-03-24 10:13:08.679284063 +0100
>> *************** PostgreSQL documentation
>> *** 171,176 ****
>> --- 171,196 ----
>>        </varlistentry>
>>
>>        <varlistentry>
>> +<term><option>-R<replaceable>number</replaceable></option></term>
>> +<listitem>
>> +<para>
>> +        Turn on cursor readahead by default using<replaceable>number</replaceable>
>> +        as readahead window size.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +
>> +<varlistentry>
>> +<term><option>--detect-cursor-resultset-size</option></term>
>> +<listitem>
>> +<para>
>> +        Detect the cursor result set size during<command>OPEN</command>  and
>> +        return it in sqlca.sqlerrd[2].
> Make reference to this option in the ecpg-sqlca section, where sqlerrd[2] has
> its primary documentation.

Done.

>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +
>> +<varlistentry>
>>         <term><option>-t</option></term>
>>         <listitem>
>>          <para>
>
> I did not re-review the code changes in the same detail as before, but nothing
> caught my attention when scanning through them.  If some particular section
> has changed in a tricky way and deserves a close look, let me know.

Well, the extra comments should explain everything. The rewrite did not change
the execute.c function split, only comments were added. The changes in cursor.c
were mainly about being able to properly deal with negative cursor positions,
i.e. traversing the cursor backwards, e.g. with FETCH LAST or FETCH ABSOLUTE -N.

> The documentation-cosmetic and policy questions I raise above shouldn't entail
> large-scale patch churn, so I've marked the patch Ready for Committer.

Thanks. The patch with the above changes is attached.
Also attached the second patch from the previous mail,
in case the committer wants to consider it.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

2012-03-29 19:03 keltezéssel, Noah Misch írta:
> To me, it feels too magical to make READAHEAD 1 and READAHEAD 0 differ only in
> whether ECPGFETCHSZ can override.  I'm willing to go with whatever consensus
> arises on this topic, though.

I now removed this distinction because the grammar does it so
the value 0 doesn't even reach ECPGopen().

>> This part is policy that can be debated and modified accordingly,
>> it doesn't affect the internals of the caching code.
> Agreed.
>
>>>> +</para>
>>>> +
>>>> +<para>
>>>> +<command>UPDATE</command>   or<command>DELETE</command>   with the
>>>> +<literal>WHERE CURRENT OF</literal>   clause, cursor readahead may or may not
>>>> +    actually improve performance, as<command>MOVE</command>   statement has to be
>>>> +    sent to the backend before the DML statement to ensure correct cursor position
>>>> +    in the backend.
>>>> +</para>
>>> This sentence seems to be missing a word near its beginning.
>> Sounds like a corner case to me, but I am not a native english speaker.
>> Which one sounds better:
>>
>>     UPDATE or DELETE with the WHERE CURRENT OF clause...
>>
>> or
>>
>>     UPDATE or DELETE statements with the WHERE CURRENT OF clause...
>> ?
> Here is the simplest change to make the original grammatical:
>
>    Given an UPDATE or DELETE with the WHERE CURRENT OF clause, ...
>
> I might write the whole thing this way:
>
>    To execute an UPDATE or DELETE statement bearing the WHERE CURRENT OF
>    clause, ECPG may first execute an implicit MOVE to synchronize the server
>    cursor position with the local cursor position.  This can reduce or
>    eliminate the performance benefit of readahead for affected cursors.

OK, I used your text.

>>> one of the new sections about readahead should somehow reference the hazard
>>> around volatile functions.
>> Done.
> I don't see the mention in your latest patch.  You do mention it for the
> sqlerrd[2] compatibility stuff.

sqlerrd[2] compatibility stuff? I mentioned it in section "ecpg-sqlca", this is the main
documentation section, not the compatibility one AFAIK. Anyway, I now reference the volatile
function hazard in the first paragraphs added to section "ecpg-cursors".


> I can understand the desire to have a way to disable it.  Perhaps you have a
> bunch of old Informix code, so you enable the global option and later discover
> some expensive cursors that don't use sqlerrd[2].  It would be nice to locally
> suppress the behavior for those cursors.
>
> Still, we're looking at dedicated ECPG syntax, quite visible even to folks
> with no interest in Informix.  We have eschewed littering our syntax with
> compatibility aids, and I like it that way.  IMO, an option to the "ecpg"
> preprocessor is an acceptable level of muddle to provide this aid.  A DECLARE
> CURSOR syntax extension goes too far.

I have now deleted this grammar extension.

Attached is the new core feature patch. Summary of changes:
- documentation changes according to your query
- cursor.c doesn't distinguish between 0 and 1, this is the job of the grammar.
- I saved some MOVE statements in case the cursor position in the backend
   is what we want at the moment. This means READAHEAD 1 is at the same level
   of performance like without caching, there are no extra implicit MOVEs.
   WHERE CURRENT OF also avoids MOVEs in this case. The caching code is now
   clearly faster in case when there are a lot of MOVEs without actual FETCHes.
   This is only an if () condition around the previously coded MOVE block in
   ecpg_cursor_execute(). It doesn't change the semantics, only omits the
   MOVE in case it's not needed.

I also refreshed the second patch that drives all cursors with the new
cursor functions. Because the runtime doesn't handle 0, 1 is the default
readahead window size. Now, NO READAHEAD would mean READAHEAD 1
but ECPGFETCHSZ can also override it. This means that NO READAHEAD
makes no sense, so it's removed from the grammar and the documentation
is changed accordingly.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Hi,

2012-04-10 16:55 keltezéssel, Michael Meskes írta:
> On Tue, Apr 10, 2012 at 10:37:22AM -0400, Noah Misch wrote:
>>> Only a non-decorated cursor can be overridden, even if
>>> a different default readahead window size is specified with
>>> e.g. "ecpg -R 8". If ECPGFETCHSZ is not present, 8 will be used,
>>> if ECPGFETCHSZ is present, its value will be used. ECPGopen()
>>> will need an extra bool argument to distinguish this.
>>>
>>> Is this acceptable? Noah, Michael?
>> Sounds perfect.
> Fine by me.
>
> Michael

you commented on "two new options were added and they should
be suboptions to -r". I looked at "man getopt_long" to see what I can do
about the "-R" option and there seems to be a getsubopt() call which is
an extension to getopt_long. My manpage under Fedora 16 says this:

NAME
        getsubopt - parse suboption arguments from a string

SYNOPSIS
        #include <stdlib.h>

        int getsubopt(char **optionp, char * const *tokens, char **valuep);

    Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

        getsubopt():
            _XOPEN_SOURCE >= 500 || _XOPEN_SOURCE && _XOPEN_SOURCE_EXTENDED
            || /* Since glibc 2.12: */ _POSIX_C_SOURCE >= 200809L

I wonder whether the manual parsing of "-r" suboptions may be rewritten
using this function or PostgreSQL supports systems without the above
X/Open or POSIX support levels.

Anyway, to make it possible to rewrite using the above call, I modified "-R"
and it's now "-r readahead=number". Documentation is adjusted.

With the above, it would be possible to use a comma separated list of "-r"
suboptions, e.g. "-r prepare,questionmarks,readahead=16" in one option.

Summary of other changes:
- The result set size detection is a suboption of "-r", documentation is adjusted.
- Only undecorated cursors use ECPGFETCHSZ, documentation is adjusted
- "ecpg --help says ...default 0 (disabled)..." fixed.
- Comment in cursor-readahead.pgc is fixed.
- New regression test that exercises ECPGFETCHSZ=8 and a "non-readahead"
   cursor. The stderr file shows the "fetch forward 8" executed by the runtime.
- Also added a note to the documentation about a possible performance trap
   if a previously written ECPG application uses its own custom readahead via
   multi-row FETCH statements.

This patch should be applied over the two patches I last sent.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-fixes-v3.patch.gz

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

15 April 2012, 23:47:05

On Tue, Apr 10, 2012 at 07:56:35PM +0200, Boszormenyi Zoltan wrote:
> With the above, it would be possible to use a comma separated list of "-r"
> suboptions, e.g. "-r prepare,questionmarks,readahead=16" in one option.

Yes, that sounds like a good plan. But of course it's outside the scope of this
patch, so we can add this later on.

> - Also added a note to the documentation about a possible performance trap
>   if a previously written ECPG application uses its own custom readahead via
>   multi-row FETCH statements.

I didn't know that before you send this patch. Noah, did you?

Frankly, I don't like this at all. If I got it right that means a FETCH N is
essantially computed as N times FETCH 1 unless you either add a non-standard
option to the DECLARE statement or you add a command-line option to ecpg. Did I
get that right?

If so we would deliberately make ecpglib work incorrectly and remove
performance. Why is that? I'm interested in what others think, but to me that
sounds like a show-stopper.

Michael

--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

16 April 2012, 01:27:22

2012-04-16 04:46 keltezéssel, Michael Meskes írta:
> On Tue, Apr 10, 2012 at 07:56:35PM +0200, Boszormenyi Zoltan wrote:
>> With the above, it would be possible to use a comma separated list of "-r"
>> suboptions, e.g. "-r prepare,questionmarks,readahead=16" in one option.
> Yes, that sounds like a good plan. But of course it's outside the scope of this
> patch, so we can add this later on.
>
>> - Also added a note to the documentation about a possible performance trap
>>    if a previously written ECPG application uses its own custom readahead via
>>    multi-row FETCH statements.
> I didn't know that before you send this patch. Noah, did you?
>
> Frankly, I don't like this at all. If I got it right that means a FETCH N is
> essantially computed as N times FETCH 1 unless you either add a non-standard
> option to the DECLARE statement or you add a command-line option to ecpg. Did I
> get that right?

Yes, just like when the readahead window set to 256, FETCH 1024
will iterate through 4 windows or FETCH 64 iterates through the
same window 4 times. This is the idea behind the "readahead window".

> If so we would deliberately make ecpglib work incorrectly and remove
> performance. Why is that? I'm interested in what others think, but to me that
> sounds like a show-stopper.

How about allowing the readahead window to be resized for the
non-decorated case if the runtime encounters FETCH N and N is
greater than the previous window?

>
> Michael
>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

16 April 2012, 13:09:11

On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote:
> Yes, just like when the readahead window set to 256, FETCH 1024
> will iterate through 4 windows or FETCH 64 iterates through the
> same window 4 times. This is the idea behind the "readahead window".

Really? It's definitely not the idea behind FETCH 1024. Using the same window 4
times for FETCH 64 is the idea though, I agree.

> How about allowing the readahead window to be resized for the
> non-decorated case if the runtime encounters FETCH N and N is
> greater than the previous window?

To be resized by what? IMO a FETCH N should always be a FETCH N, no matter
what, i.e. if the readahead window is larger, use it, but even if it's smaller
we should still fetch N at the same time.

Michael

--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

16 April 2012, 14:19:52

2012-04-16 18:04 keltezéssel, Michael Meskes írta:
> On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote:
>> Yes, just like when the readahead window set to 256, FETCH 1024
>> will iterate through 4 windows or FETCH 64 iterates through the
>> same window 4 times. This is the idea behind the "readahead window".
> Really? It's definitely not the idea behind FETCH 1024. Using the same window 4
> times for FETCH 64 is the idea though, I agree.

OK. I would like to stretch your agreement a little. :-)

Can we state that caching means that if the cache should serve
the incoming request(s) until the request spills out of it?

If your answer to the above is "yes", then please consider this case:
- readahead window is 256 (via ECPGFETCHSZ)
- FETCH 64 was executed twice, so you are in the middle of the cache
- FETCH 1024 is requested

So, if I understand you correctly, you expect this scenario:
- set a "one-time" readahead window size ( N - # of rows that can be served  = 1024 - 128 = 768) so the next FETCH by
theruntime will fullfill  this request fully 
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 768 will be executed
- all subsequent requests use the old readahead size

>> How about allowing the readahead window to be resized for the
>> non-decorated case if the runtime encounters FETCH N and N is
>> greater than the previous window?
> To be resized by what?

By the new FETCH request. Instead of the above, I imagined this:
- the runtime notices that the new request is larger than the current  readahead window size, modifies the readahead
windowsize upwards,  so the next FETCH will use it 
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 1024 will be executed and the remaining  768 rows will be served from the new cache
- all subsequent requests use the new readahead size, 1024


>   IMO a FETCH N should always be a FETCH N, no matter
> what

This part of your statement contradicts with caching. :-)

> , i.e. if the readahead window is larger, use it, but even if it's smaller
> we should still fetch N at the same time.

So, there can be occasional one-time larger requests but
smaller ones should apply the set window size, right?

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

17 April 2012, 00:52:32

On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote:
> OK. I would like to stretch your agreement a little. :-)
> ...

Yeah, you got a point here.

> By the new FETCH request. Instead of the above, I imagined this:
> - the runtime notices that the new request is larger than the current
>   readahead window size, modifies the readahead window size upwards,
>   so the next FETCH will use it
> - serve the request's first 128 rows from the current cache
> - for the 129th row, FETCH 1024 will be executed and the remaining
>   768 rows will be served from the new cache

That means window size goes up to 1024-128 for that one case?

> - all subsequent requests use the new readahead size, 1024

Sounds reasonable to me.

> So, there can be occasional one-time larger requests but
> smaller ones should apply the set window size, right?

Yes. I do agree that FETCH N cannot fetch N all the time, but please make it
work like what you suggested to make sure people don't have to recompile.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

17 April 2012, 01:03:45

2012-04-17 05:52 keltezéssel, Michael Meskes írta:
> On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote:
>> OK. I would like to stretch your agreement a little. :-)
>> ...
> Yeah, you got a point here.
>
>> By the new FETCH request. Instead of the above, I imagined this:
>> - the runtime notices that the new request is larger than the current
>>    readahead window size, modifies the readahead window size upwards,
>>    so the next FETCH will use it
>> - serve the request's first 128 rows from the current cache
>> - for the 129th row, FETCH 1024 will be executed and the remaining
>>    768 rows will be served from the new cache
> That means window size goes up to 1024-128 for that one case?

I listed two scenarios.
1. occasional bump of the readahead window for large requests,   for smaller requests it uses the originally set size
2. permanent bump of the readahead window for large requests   (larger than previously seen), all subsequent requests
use  the new size 

Both can be implemented easily, which one do you prefer?
If you always use very large requests, 1) behaves like 2)

>
>> - all subsequent requests use the new readahead size, 1024
> Sounds reasonable to me.
>
>> So, there can be occasional one-time larger requests but
>> smaller ones should apply the set window size, right?
> Yes. I do agree that FETCH N cannot fetch N all the time, but please make it
> work like what you suggested to make sure people don't have to recompile.
>
> Michael


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

17 April 2012, 01:48:41

On Tue, Apr 17, 2012 at 06:02:34AM +0200, Boszormenyi Zoltan wrote:
> I listed two scenarios.
> 1. occasional bump of the readahead window for large requests,
>    for smaller requests it uses the originally set size
> 2. permanent bump of the readahead window for large requests
>    (larger than previously seen), all subsequent requests use
>    the new size
>
> Both can be implemented easily, which one do you prefer?
> If you always use very large requests, 1) behaves like 2)

I'd say let's go for #2. #1 is probably more efficient but not what the
programmer asked us to do. After all it's easy to increase the window size
accordingly if you want so as a programmer.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

23 April 2012, 11:05:05

Hi,

2012-04-17 06:48 keltezéssel, Michael Meskes írta:
> On Tue, Apr 17, 2012 at 06:02:34AM +0200, Boszormenyi Zoltan wrote:
>> I listed two scenarios.
>> 1. occasional bump of the readahead window for large requests,
>>     for smaller requests it uses the originally set size
>> 2. permanent bump of the readahead window for large requests
>>     (larger than previously seen), all subsequent requests use
>>     the new size
>>
>> Both can be implemented easily, which one do you prefer?
>> If you always use very large requests, 1) behaves like 2)
> I'd say let's go for #2. #1 is probably more efficient but not what the
> programmer asked us to do. After all it's easy to increase the window size
> accordingly if you want so as a programmer.
>
> Michael

OK, I will implement #2. Another question popped up: what to do
with FETCH ALL? The current readahead window size or temporarily
bumping it to say some tens of thousands can be used. We may not
know how much is the "all records". This, although lowers performance,
saves memory.

Please, don't apply this patch yet. I discovered a rather big hole
that can confuse the cursor position tracking if you do this:

DECLARE mycur;
MOVE ABSOLUTE n IN mycur;
MOVE BACKWARD m IN mycur;

If (n+m) is greater, but (n-m) is smaller than the number
of rows in the cursor, the backend's and the caching code's
ideas about where the cursor is will differ. I need to fix this
before it can be applied.

That will also need a new round of review. Sorry for that.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Michael Meskes

Date:

24 April 2012, 05:17:33

> OK, I will implement #2. Another question popped up: what to do
> with FETCH ALL? The current readahead window size or temporarily
> bumping it to say some tens of thousands can be used. We may not
> know how much is the "all records". This, although lowers performance,
> saves memory.

I would say doing a large fetch in two or three batches won't cost too much in
terms of performance.

> Please, don't apply this patch yet. I discovered a rather big hole
> that can confuse the cursor position tracking if you do this:
> ...
> That will also need a new round of review. Sorry for that.

No problem, better to find it now instead of after release.

Anyway, I moved the patch to 2012-next (I hope I did it correctly) so 2012-1
can be closed. Let's try to get this patch done in the next commit fest.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

17 August 2013, 10:08:45

Hi,

I am restarting this old thread... :-)

2012-04-24 10:17 keltezéssel, Michael Meskes írta:
>> OK, I will implement #2. Another question popped up: what to do
>> with FETCH ALL? The current readahead window size or temporarily
>> bumping it to say some tens of thousands can be used. We may not
>> know how much is the "all records". This, although lowers performance,
>> saves memory.
> I would say doing a large fetch in two or three batches won't cost too much in
> terms of performance.
>
>> Please, don't apply this patch yet. I discovered a rather big hole
>> that can confuse the cursor position tracking if you do this:
>> ...
>> That will also need a new round of review. Sorry for that.
> No problem, better to find it now instead of after release.
>
> Anyway, I moved the patch to 2012-next (I hope I did it correctly) so 2012-1
> can be closed. Let's try to get this patch done in the next commit fest.
>
> Michael

I had time to look into this patch of mine again after about 1.5 years.
Frankly, this time was too long to remember every detail of the patch
and looking at parts of the patch as a big entity was confusing.

So I started fresh and to make review easier, I broke the patch up
into small pieces that all build on each other. I have also fixed quite
a few bugs, mostly in my code, but some in the ECPG parser and
the regression tests as well.

I have put the broken up patchset into a GIT tree of mine at GitHub:
https://github.com/zboszor/ecpg-readahead/
but the huge compressed patch is also attached for reference.
It was generated with

$ git diff 221e92f64c6e136e550ec2592aac3ae0d4623209..870922676e6ae0faa4ebbf94b92e0b97ec418e16

ECPG regression tests are now Valgrind-clean except two of them
but both are pre-existing bugs.

1. ecpg/test/compat_informix/rfmtlong.pgc points out a problem in
    ecpg/compatlib/informix.c

==5036== 1 errors in context 1 of 4:
==5036== Invalid read of size 4
==5036==    at 0x4E3453C: rfmtlong (informix.c:941)
==5036==    by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22)
==5036==    by 0x4006BE: main (rfmtlong.pgc:45)
==5036==  Address 0x60677d8 is 24 bytes inside a block of size 25 alloc'd
==5036==    at 0x4C28409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5036==    by 0x4E34268: rfmtlong (informix.c:783)
==5036==    by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22)
==5036==    by 0x4006BE: main (rfmtlong.pgc:45)

The same error is reported 4 times.

2. ecpg_add_mem() seems to leak memory:

==5463== 256 bytes in 16 blocks are definitely lost in loss record 1 of 1
==5463==    at 0x4C2A121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5463==    by 0x4E3E153: ecpg_alloc (memory.c:21)
==5463==    by 0x4E3E212: ecpg_add_mem (memory.c:110)
==5463==    by 0x4E3542B: ecpg_store_result (execute.c:409)
==5463==    by 0x4E37E5A: ecpg_process_output (execute.c:1777)
==5463==    by 0x4E38CCA: ecpg_do (execute.c:2137)
==5463==    by 0x4E38D8A: ECPGdo (execute.c:2159)
==5463==    by 0x400A82: fn (alloc.pgc:51)
==5463==    by 0x5152C52: start_thread (pthread_create.c:308)
==5463==    by 0x545C13C: clone (clone.S:113)

The last two issue we talked about in this thread are also implemented:
- permanently raise the readahead window if the application sends a
   bigger FETCH command, and
- temporarily raise the readahead window for FETCH ALL commands

The cursor position tracking was completely rewritten, so the client side
properly follows the cursor position known by the backend and doesn't
skip MOVE statements where it shouldn't. The previously known
bug is completely eliminated this way.

Please, review that patch.

Thanks in advance and best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-9.4-v16.patch.bz2

New ECPG idea, was: Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

17 August 2013, 11:02:56

<div class="moz-cite-prefix">2013-08-17 12:08 keltezéssel, Boszormenyi Zoltan írta:<br /></div><blockquote
cite="mid:520F4B90.2010800@cybertec.at"type="cite">Hi, <br /><br /> I am restarting this old thread... :-) <br /><br />
2012-04-2410:17 keltezéssel, Michael Meskes írta: <br /><blockquote type="cite"><blockquote type="cite">OK, I will
implement#2. Another question popped up: what to do <br /> with FETCH ALL? The current readahead window size or
temporarily<br /> bumping it to say some tens of thousands can be used. We may not <br /> know how much is the "all
records".This, although lowers performance, <br /> saves memory. <br /></blockquote> I would say doing a large fetch in
twoor three batches won't cost too much in <br /> terms of performance. <br /><br /><blockquote type="cite">Please,
don'tapply this patch yet. I discovered a rather big hole <br /> that can confuse the cursor position tracking if you
dothis: <br /> ... <br /> That will also need a new round of review. Sorry for that. <br /></blockquote> No problem,
betterto find it now instead of after release. <br /><br /> Anyway, I moved the patch to 2012-next (I hope I did it
correctly)so 2012-1 <br /> can be closed. Let's try to get this patch done in the next commit fest. <br /><br />
Michael<br /></blockquote><br /> I had time to look into this patch of mine again after about 1.5 years. <br />
Frankly,this time was too long to remember every detail of the patch <br /> and looking at parts of the patch as a big
entitywas confusing. <br /><br /> So I started fresh and to make review easier, I broke the patch up <br /> into small
piecesthat all build on each other. I have also fixed quite <br /> a few bugs, mostly in my code, but some in the ECPG
parserand <br /> the regression tests as well. <br /><br /> I have put the broken up patchset into a GIT tree of mine
atGitHub: <br /><a class="moz-txt-link-freetext"
href="https://github.com/zboszor/ecpg-readahead/">https://github.com/zboszor/ecpg-readahead/</a><br/> but the huge
compressedpatch is also attached for reference. <br /> It was generated with <br /><br /> $ git diff
221e92f64c6e136e550ec2592aac3ae0d4623209..870922676e6ae0faa4ebbf94b92e0b97ec418e16<br/><br /> ECPG regression tests are
nowValgrind-clean except two of them <br /> but both are pre-existing bugs. <br /><br /> 1.
ecpg/test/compat_informix/rfmtlong.pgcpoints out a problem in <br />    ecpg/compatlib/informix.c <br /><br /> ==5036==
1errors in context 1 of 4: <br /> ==5036== Invalid read of size 4 <br /> ==5036==    at 0x4E3453C: rfmtlong
(informix.c:941)<br /> ==5036==    by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22) <br /> ==5036==    by 0x4006BE:
main(rfmtlong.pgc:45) <br /> ==5036==  Address 0x60677d8 is 24 bytes inside a block of size 25 alloc'd <br />
==5036==   at 0x4C28409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) <br /> ==5036==    by
0x4E34268:rfmtlong (informix.c:783) <br /> ==5036==    by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22) <br />
==5036==   by 0x4006BE: main (rfmtlong.pgc:45) <br /><br /> The same error is reported 4 times. <br /><br /> 2.
ecpg_add_mem()seems to leak memory: <br /><br /> ==5463== 256 bytes in 16 blocks are definitely lost in loss record 1
of1 <br /> ==5463==    at 0x4C2A121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) <br />
==5463==   by 0x4E3E153: ecpg_alloc (memory.c:21) <br /> ==5463==    by 0x4E3E212: ecpg_add_mem (memory.c:110) <br />
==5463==   by 0x4E3542B: ecpg_store_result (execute.c:409) <br /> ==5463==    by 0x4E37E5A: ecpg_process_output
(execute.c:1777)<br /> ==5463==    by 0x4E38CCA: ecpg_do (execute.c:2137) <br /> ==5463==    by 0x4E38D8A: ECPGdo
(execute.c:2159)<br /> ==5463==    by 0x400A82: fn (alloc.pgc:51) <br /> ==5463==    by 0x5152C52: start_thread
(pthread_create.c:308)<br /> ==5463==    by 0x545C13C: clone (clone.S:113) <br /><br /> The last two issue we talked
aboutin this thread are also implemented: <br /> - permanently raise the readahead window if the application sends a
<br/>   bigger FETCH command, and <br /> - temporarily raise the readahead window for FETCH ALL commands <br /><br />
Thecursor position tracking was completely rewritten, so the client side <br /> properly follows the cursor position
knownby the backend and doesn't <br /> skip MOVE statements where it shouldn't. The previously known <br /> bug is
completelyeliminated this way. <br /><br /> Please, review that patch. <br /></blockquote><br /> I have another idea to
makeECPG building on this huge patch.<br /><br /> Currently, UPDATE/DELETE WHERE CURRENT OF has to issue a MOVE<br />
beforethe command in case the cursor positions known by the application<br /> and the backend are different.<br /><br
/>My idea builds on the fact that UPDATE/DELETE RETURNING is present<br /> in all supported back branches.<br /><br />
Amini-parser only understanding SELECT, UPDATE and DELETE should<br /> be added to ecpglib, so<br /><br />     DECLARE
cursorCURSOR FOR SELECT ...<br /><br /> and<br /><br />     PREPARE prepared_stmt FROM :query;<br />     DECLARE cursor
CURSORFOR prepared_stmt;<br /><br /> can be analyzed and tweaked behind the application's back.<br /><br /> This is
neededto detect whether a query is a simple updatable<br /> scan of a table, and returning errors early to the
applicationif it's not,<br /> without actually sending the UPDATE/DELETE WHERE CURRENT OF<br /> query to the
backend.<br/><br /> For the purpose of WHERE CURRENT OF, I would add a ctid<br /> column at the end of the targelist
thatis treated like "resjunk"<br /> in the backend when returning data to the application.<br /><br /> So, SELECTs
wouldreturn the ctid information of the tuples.<br /> The cursor query was a FETCH N with abs(N)>1 because of<br />
thereadahead. For this reason, the cursor positions known<br /> by the application and the backend are different.<br
/><br/> The extra MOVE can be eliminated by replacing<br /><br />     UPDATE table SET ... WHERE CURRENT OF cursor;<br
/><br/> with<br /><br />     UPDATE table SET ... WHERE ctid='...' RETURNING ctid;<br /><br /> Of course, if the
originalquery contained RETURNING, the<br /> ctid field would be appended in resjunk fashion.<br /><br /> The new ctid
canbe saved back into the cache using PQsetvalue()<br /> and the (position ; new ctid) pair into a hash, both can be<br
/>looked up in case the application wants to modify the<br /> same tuple again. Some protection is needed against
growing<br/> the hash too big. But usually[*] programs don't go back<br /> to modify the same record twice.<br /><br />
[*]This is only an educated guess.<br /><br /> How about it?<br /><br /> Best regards,<br /> Zoltán Böszörményi<br
/><br/><blockquote cite="mid:520F4B90.2010800@cybertec.at" type="cite"><br /> Thanks in advance and best regards, <br
/>Zoltán Böszörményi <br /><br /><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre wrap="">
 
</pre></blockquote><br /><br /><pre class="moz-signature" cols="90">-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: <a class="moz-txt-link-freetext" href="http://www.postgresql-support.de">http://www.postgresql-support.de</a>
<aclass="moz-txt-link-freetext" href="http://www.postgresql.at/">http://www.postgresql.at/</a>
 
</pre>

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

04 September 2013, 08:06:59

2013-08-17 12:08 keltezéssel, Boszormenyi Zoltan írta:
>
> I have put the broken up patchset into a GIT tree of mine at GitHub:
> https://github.com/zboszor/ecpg-readahead/
> but the huge compressed patch is also attached for reference.

I merged current PG GIT HEAD in the above tree and fixed a merge conflict
caused by commit 673b527534893a4a8adb3cdef52fc645c13598ce

The huge patch is attached for reference.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-9.4-v17.patch.bz2

Re: ECPG FETCH readahead

From

Peter Eisentraut

Date:

07 September 2013, 01:25:10

On Wed, 2013-09-04 at 10:06 +0200, Boszormenyi Zoltan wrote:
> 2013-08-17 12:08 keltezéssel, Boszormenyi Zoltan írta:
> >
> > I have put the broken up patchset into a GIT tree of mine at GitHub:
> > https://github.com/zboszor/ecpg-readahead/
> > but the huge compressed patch is also attached for reference.
>
> I merged current PG GIT HEAD in the above tree and fixed a merge conflict
> caused by commit 673b527534893a4a8adb3cdef52fc645c13598ce
>
> The huge patch is attached for reference.

The documentation doesn't build:

openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
openjade:ecpg.sgml:477:40: start tag was here
openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
openjade:ecpg.sgml:477:20: start tag was here
openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
openjade:ecpg.sgml:473:81: start tag was here
openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
openjade:ecpg.sgml:473:56: start tag was here

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

07 September 2013, 18:33:03

2013-09-07 03:25 keltezéssel, Peter Eisentraut írta:
> On Wed, 2013-09-04 at 10:06 +0200, Boszormenyi Zoltan wrote:
>> 2013-08-17 12:08 keltezéssel, Boszormenyi Zoltan írta:
>>> I have put the broken up patchset into a GIT tree of mine at GitHub:
>>> https://github.com/zboszor/ecpg-readahead/
>>> but the huge compressed patch is also attached for reference.
>> I merged current PG GIT HEAD in the above tree and fixed a merge conflict
>> caused by commit 673b527534893a4a8adb3cdef52fc645c13598ce
>>
>> The huge patch is attached for reference.
> The documentation doesn't build:
>
> openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
> openjade:ecpg.sgml:477:40: start tag was here
> openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
> openjade:ecpg.sgml:477:20: start tag was here
> openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
> openjade:ecpg.sgml:473:81: start tag was here
> openjade:ecpg.sgml:478:8:E: end tag for "LITERAL" omitted, but OMITTAG NO was specified
> openjade:ecpg.sgml:473:56: start tag was here

Thanks, fixed.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-9.4-v18.patch.bz2

Re: ECPG FETCH readahead

From

Peter Eisentraut

Date:

10 September 2013, 01:04:16

You need to update the dblink regression tests.

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

10 September 2013, 04:52:13

2013-09-10 03:04 keltezéssel, Peter Eisentraut írta:
> You need to update the dblink regression tests.

Done.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-9.4-v19.patch.bz2

Proposal: UPDATE/DELETE ... WHERE OFFSET n OF cursor_name, was: Re: New ECPG idea, was: Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

16 September 2013, 13:51:52

Hi,

2013-08-17 13:02 keltezéssel, Boszormenyi Zoltan írta:
[snip, discussion of WHERE CURRENT OF in the ECPG client lib]

I had a second thought about it and the client side caching and
parser behind the application's back seems to be an overkill.

Instead, I propose a different solution, which is a logical extension of
FETCH { FORWARD | BACKWARD } N, which is a PostgreSQL extension.

The proposed solution would be:

UPDATE / DELETE ... WHERE OFFSET SignedIconst OF cursor_name

I imagine that FETCH would keep the array of TIDs/ItemPointerDatas
of the last FETCH statement.

The argument to OFFSET would be mostly in negative terms,
with 0 being equivalent of WHERE CURRENT OF.

E.g.:

FETCH 2 FROM mycur; -- fetches two rows
UPDATE mytab SET ... WHERE OFFSET -1 OF mycur; -- updates the first row
UPDATE mytab SET ... WHERE OFFSET 0 OF mycur; -- updates current row

or

FETCH 3 FROM mycur; -- fetches two rows, reaches end of the cursor
UPDATE mytab SET ... WHERE OFFSET -2 OF mycur; -- updates the first row
UPDATE mytab SET ... WHERE OFFSET -1 OF mycur; -- updates the second row
UPDATE mytab SET ... WHERE OFFSET 0 OF mycur; -- throws an error like WHERE CURRENT OF

or

FETCH 3 FROM mycur; -- fetches two rows, reaches end of the cursor
MOVE BACKWARD 2 IN mycur;
UPDATE mytab SET ... WHERE OFFSET 0 OF mycur; -- updates the first row (now current)
UPDATE mytab SET ... WHERE OFFSET 1 OF mycur; -- updates the second row

The cached array can be kept valid until the next FETCH statement,
even if  moves out of the interval of the array, except in case the
application changes the sign of the cursor position, e.g. previously it used
MOVE ABSOLUTE with positive numbers and suddenly it switches to
backward scanning with MOVE ABSOLUTE <negative number> or vice-versa.

This would solve the only source of slowdown in the client side cursor caching
in ECPG present in my current ECPG cursor readahead patch, there would be
no more MOVE + UPDATE/DELETE WHERE CURRENT OF. On the other hand,
exploiting this proposed feature in ECPG would make it incompatible with older
servers unless it detects the server version it connects to and uses the current
method.

Comments?

Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Alvaro Herrera

Date:

10 October 2013, 22:17:11

Boszormenyi Zoltan escribió:
> 2013-09-10 03:04 keltezéssel, Peter Eisentraut írta:
> >You need to update the dblink regression tests.
> 
> Done.

Dude, this is an humongous patch.  I was shocked by it initially, but on
further reading, I observed that it's only a huge patch which also does
some mechanical changes to test output.  I think it'd be better to split
the part that's responsible for the changed lines in test output
mentioning "ecpg_process_output".  That should be a reasonably small
patch which changes ecpg_execute slightly and adds the new function, is
followed by the enormous resulting mechanical changes in test output.
It should be possible to commit that relatively quickly.  Then there's
the rest of the patch, which would adds a huge pile of new code.

I think there are some very minor changes to backend code as well --
would it make sense to post that as a separate piece?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

13 October 2013, 11:37:31

2013-10-11 00:16 keltezéssel, Alvaro Herrera írta:
> Boszormenyi Zoltan escribió:
>> 2013-09-10 03:04 keltezéssel, Peter Eisentraut írta:
>>> You need to update the dblink regression tests.
>> Done.
> Dude, this is an humongous patch.

You *know* that the patch is available in pieces at https://github.com/zboszor/ecpg-readahead
right? You must have read the new "initial" announcement at
http://archives.postgresql.org/message-id/520F4B90.2010800@cybertec.at

You can review and merge them one by one.
The transformation of ecpg_execute() and friends starts at
2d04ff83984c63c34e55175317e3e431eb58e00c

>    I was shocked by it initially, but on
> further reading, I observed that it's only a huge patch which also does
> some mechanical changes to test output.  I think it'd be better to split
> the part that's responsible for the changed lines in test output
> mentioning "ecpg_process_output".

It's 1e0747576e96aae3ec8c60c46baea130aaf8916e in the above repository.

Also, another huge regression test patch is where the cursor readahead
is enabled unconditionally: 27e43069082b29cb6fa4d3414e6484ec7fb80cbe

>    That should be a reasonably small
> patch which changes ecpg_execute slightly and adds the new function, is
> followed by the enormous resulting mechanical changes in test output.
> It should be possible to commit that relatively quickly.  Then there's
> the rest of the patch, which would adds a huge pile of new code.
>
> I think there are some very minor changes to backend code as well --
> would it make sense to post that as a separate piece?

It's 2ad207e6371a33d6a985c76ac066dd51ed5681cb
Also, cd881f0363b1aff1508cfa347e8df6b4981f0ee7 to fix the dblink regression test.

Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

11 November 2013, 09:18:33

2013-10-11 00:16 keltezéssel, Alvaro Herrera írta:
> Boszormenyi Zoltan escribió:
>> 2013-09-10 03:04 keltezéssel, Peter Eisentraut írta:
>>> You need to update the dblink regression tests.
>> Done.
> Dude, this is an humongous patch.  I was shocked by it initially, but on
> further reading, I observed that it's only a huge patch which also does
> some mechanical changes to test output.  I think it'd be better to split
> the part that's responsible for the changed lines in test output
> mentioning "ecpg_process_output".  That should be a reasonably small
> patch which changes ecpg_execute slightly and adds the new function, is
> followed by the enormous resulting mechanical changes in test output.
> It should be possible to commit that relatively quickly.  Then there's
> the rest of the patch, which would adds a huge pile of new code.
>
> I think there are some very minor changes to backend code as well --
> would it make sense to post that as a separate piece?

I had to rebase the patch against current (today morning's) GIT, since
there were a few changes against ECPG in the meantime.

The old contents of my GIT repository was removed so you need to
clone it fresh. https://github.com/zboszor/ecpg-readahead.git
I won't post the humongous patch again, since sending a 90KB
compressed file to everyone on the list is rude. You can pull
the commits individually from the above repository. For the same
reason, I won't add the DECLARE CURSOR command tag change
separately since this is also part of this big feature.

I have reordered some patches, like some independent bug fixes
against the ECPG parser and regression tests. The backend change
is also added early.

Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/

Re: ECPG FETCH readahead

From

Noah Misch

Date:

12 November 2013, 06:02:06

On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
> The old contents of my GIT repository was removed so you need to
> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
> I won't post the humongous patch again, since sending a 90KB
> compressed file to everyone on the list is rude.

Patches of that weight show up on a regular basis.  I don't think it's rude.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

12 November 2013, 06:16:20

2013-11-12 07:01 keltezéssel, Noah Misch írta:
> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>> The old contents of my GIT repository was removed so you need to
>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>> I won't post the humongous patch again, since sending a 90KB
>> compressed file to everyone on the list is rude.
> Patches of that weight show up on a regular basis.  I don't think it's rude.

OK, here it is.

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ecpg-cursor-readahead-9.4-v20.patch.bz2

ECPG fixes, was: Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

20 November 2013, 13:42:04

2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>> The old contents of my GIT repository was removed so you need to
>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>> I won't post the humongous patch again, since sending a 90KB
>>> compressed file to everyone on the list is rude.
>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>
> OK, here it is.

I have rebased the patchset after "ecpg: Split off mmfatal() from mmerror()"
since it caused merge conflicts.

It's at the usual place again, you need to clone it from scratch if you are
interested in looking at git diff/log

I have removed some previous ecpg_log() debug output and
the total patch size is not so huge any more but I am going to post
the split-up set in parts.

Attached is the first few patches that are strictly generic ECPG fixes.
They can be applied independently and obvious enough.

Subsequent patches will come as reply to this email.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

Modify the DECLARE CURSOR command tag depending on the scrollable flag

From

Boszormenyi Zoltan

Date:

20 November 2013, 13:47:55

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

Attached is the patch that modified the command tag returned by
the DECLARE CURSOR command. It returns "DECLARE SCROLL CURSOR"
or "DECLARE NO SCROLL CURSOR" depending on the cursor's
scrollable flag that can be determined internally even if neither is
asked explicitly.

It is expected by the ECPG cursor readahead code.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

08.patch

ECPG infrastructure changes part 1, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

20 November 2013, 13:53:57

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

Infrastructure changes in ecpglib/execute.c to split up
ECPGdo and ecpg_execute() and expose the parts as
functions internal to ecpglib.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ECPG infrastructure changes, part 2, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

20 November 2013, 13:56:56

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

ecpg_log() fixes after part 1 that produces a lot of regression test changes.
This patch is over 200K in itself so I send it separately and compressed.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

17.patch.gz

ECPG infrastructure changes, part 3, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

20 November 2013, 14:00:31

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

Further ecpglib/execute.c changes.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ECPG infrastructure changes, part 4, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

20 November 2013, 14:13:46

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

This is another, semi independent subfeature of ECPG readahead.
It's about 150K by itself, so I send it compressed.

The purpose of this patch is to track (sub-)transactions and cursors
in ecpglib to reduce network turnaround and speed up the application
in certain cases. E.g. cursors are discarded upon ROLLBACK TO
SAVEPOINT and ecpglib needs to know about it. When an unknown
savepoint or cursor name is sent, ecpglib would not send the command
to the server in an open transaction after this patch. Instead, it flips
a "client-side error" flag and returns the same error the backend
would in this case.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

23.patch.gz

ECPG infrastructure changes, part 5, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

20 November 2013, 14:21:55

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

Patch 24 slightly speeds up ECPGsetcommit() at runtime, shifting
the strcmp() cost to the ecpg preprocessor. It also avoids the
confusion coming from a manually crafted C source that sends
a string different from "on" and "off".

Patch 25 is another subfeature of the ECPG readahead code.
ECPGopen() gets the user-specified (or unspecified) scrollable flag.
ECPGfetch() gets the amount of tuples to fetch separately as
a fixed value, which may or may not be also present in the variable
argument list. It makes it unnecessary to parse the FETCH/MOVE
query and extract the amount of tuples that way.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

ECPG FETCH readahead, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

20 November 2013, 14:25:58

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>> The old contents of my GIT repository was removed so you need to
>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>> I won't post the humongous patch again, since sending a 90KB
>>>> compressed file to everyone on the list is rude.
>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>
>> OK, here it is.
>
> ...
> Subsequent patches will come as reply to this email.

$SUBJECT, the real thing. Needs all previous patches.
Compared the the previous incarnation, a lot of debugging
ecpg_log() lines are removed. The newly introduced regression
tests have much smaller .stderr outputs because of this,
the patch is about 800K smaller in total.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

26.patch.gz

Followup patches for ECPG readahead, was: Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

20 November 2013, 14:30:45

2013-11-20 15:25 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>>> The old contents of my GIT repository was removed so you need to
>>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>>> I won't post the humongous patch again, since sending a 90KB
>>>>> compressed file to everyone on the list is rude.
>>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>>
>>> OK, here it is.
>>
>> ...
>> Subsequent patches will come as reply to this email.
>

Followup patches/subfeatures for the ECPG readahead code.

The last (31st) patch drives all cursors through the readahead code
and changes a lot of regression tests. It's over 100K so I send it compressed.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

2013-11-20 14:53 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>>> The old contents of my GIT repository was removed so you need to
>>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>>> I won't post the humongous patch again, since sending a 90KB
>>>>> compressed file to everyone on the list is rude.
>>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>>
>>> OK, here it is.
>>
>> ...
>> Subsequent patches will come as reply to this email.
>
> Infrastructure changes in ecpglib/execute.c to split up
> ECPGdo and ecpg_execute() and expose the parts as
> functions internal to ecpglib.

Rebased after killing the patch that changed the DECLARE CURSOR command tag.
All infrastructure patches are attached, some of them compressed.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

Re: ECPG FETCH readahead, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

28 November 2013, 14:24:23

2013-11-20 15:25 keltezéssel, Boszormenyi Zoltan írta:
> 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>>> The old contents of my GIT repository was removed so you need to
>>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>>> I won't post the humongous patch again, since sending a 90KB
>>>>> compressed file to everyone on the list is rude.
>>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>>
>>> OK, here it is.
>>
>> ...
>> Subsequent patches will come as reply to this email.
>
> $SUBJECT, the real thing. Needs all previous patches.
> Compared the the previous incarnation, a lot of debugging
> ecpg_log() lines are removed. The newly introduced regression
> tests have much smaller .stderr outputs because of this,
> the patch is about 800K smaller in total.

Rebased after killing the patch that changed the DECLARE CURSOR command tag.
The ECPG readahead patch and all the small following patches are attached.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

2013-12-03 16:48 keltezéssel, Antonin Houska írta:
> 22.patch
> --------
>
> tuples_left > 0
>
> instead of just
>
> tuples_left
>
> seems to me safer in for-loops. Not sure if there's a convention for
> this though.

I decided not to change this. The "tuples_left" variable starts at
a non-negative value and decreased by one in every iteration.
The previous code also used the same "test for non-zero" test with
the variable "ntuples".

> 23.patch
> --------
>
> git apply --verbose ~/cybertec/patches/ecpq/23.patch
> /home/anton/cybertec/patches/ecpq/23.patch:494: space before tab in indent.
>                          /*------
> /home/anton/cybertec/patches/ecpq/23.patch:495: space before tab in indent.
>                             translator: this string will be truncated at
> 149 characters expanded.  */
> /home/anton/cybertec/patches/ecpq/23.patch:4019: trailing whitespace.
>          exec sql close :curname;

I fixed this in this patch and all subsequent ones.

> Tests - 23.patch
> ----------------
>
> src/interfaces/ecpg/test/sql/cursorsubxact.pgc
>
>
>          /*
>           * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT
>           * is used with an already existing name.
>           */
>
> Shouldn't it be "... if a CURSOR is used with an already existing
> name?". Or just "... implicit RELEASE SAVEPOINT after an error"?
> I'd also appreciate a comment where exactly the savepoint is
> (implicitly) released.

I have already answered this in my previous answer.

All patches are attached again for completeness.

> 23.patch and 24.patch
> ---------------------
>
> SO_MAJOR_VERSION and also interfaces/ecpg/include/ecpglib.h is changed
>
> Thus all client applications probably need to be preprocessed & compiled
> against the PG 9.4. I don't know how this is usually enforced. I'm
> mentioning it for the sake of completeness.
>
> // Antonin Houska (Tony)
>
>
> On 11/28/2013 03:21 PM, Boszormenyi Zoltan wrote:
>> 2013-11-20 14:53 keltezéssel, Boszormenyi Zoltan írta:
>>> 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
>>>> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
>>>>> 2013-11-12 07:01 keltezéssel, Noah Misch írta:
>>>>>> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>>>>>>> The old contents of my GIT repository was removed so you need to
>>>>>>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>>>>>>> I won't post the humongous patch again, since sending a 90KB
>>>>>>> compressed file to everyone on the list is rude.
>>>>>> Patches of that weight show up on a regular basis.  I don't think it's rude.
>>>>> OK, here it is.
>>>> ...
>>>> Subsequent patches will come as reply to this email.
>>> Infrastructure changes in ecpglib/execute.c to split up
>>> ECPGdo and ecpg_execute() and expose the parts as
>>> functions internal to ecpglib.
>> Rebased after killing the patch that changed the DECLARE CURSOR command tag.
>> All infrastructure patches are attached, some of them compressed.
>>
>> Best regards,
>> Zoltán Böszörményi
>>
>>
>>
>>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

Re: ECPG FETCH readahead, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

04 December 2013, 13:53:41

2013-11-28 15:23 keltezéssel, Boszormenyi Zoltan írta:
> Rebased after killing the patch that changed the DECLARE CURSOR command tag.
> The ECPG readahead patch and all the small following patches are attached.

Fixed the extra spaces that "git apply" complains about.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

Re: Review: ECPG infrastructure changes part 1, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

06 December 2013, 13:02:35

2013-12-04 14:51 keltezéssel, Boszormenyi Zoltan írta:
> 2013-12-03 16:48 keltezéssel, Antonin Houska írta:
>
>> Tests - 23.patch
>> ----------------
>>
>> src/interfaces/ecpg/test/sql/cursorsubxact.pgc
>>
>>
>>          /*
>>           * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT
>>           * is used with an already existing name.
>>           */
>>
>> Shouldn't it be "... if a CURSOR is used with an already existing
>> name?". Or just "... implicit RELEASE SAVEPOINT after an error"?
>> I'd also appreciate a comment where exactly the savepoint is
>> (implicitly) released.
>
> I have already answered this in my previous answer.

And I was wrong in that. The comments in the test were rearranged
a little and the fact in the above comment is now actually tested.

Some harmless unused variables were also removed and an
uninitialized variable usage was fixed. Because of these and the above
changes a lot of patches need to be rebased.

All patches are attached again for completeness.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Attachment

Re: ECPG FETCH readahead, was: Re: ECPG fixes

From

Boszormenyi Zoltan

Date:

06 December 2013, 13:05:39

Rebased patches after the regression test and other details were fixed
in the infrastructure part.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/

Hi,

Alvaro Herrera wrote:
> Boszormenyi Zoltan escribió:
>> Rebased patches after the regression test and other details were fixed
>> in the infrastructure part.
>
> This thread started in 2010, and various pieces have been applied
> already and some others have changed in nature.  Would you please post a
> new patchset, containing rebased patches that still need application, in
> a new email thread to be linked in the commitfest entry?

I hope Thunderbird did the right thing and didn't include the reference
message ID when I told it to "edit as new". So supposedly this is a new
thread but with all the cc: addresses kept.

I have rebased all remaining patches and kept the numbering.
All the patches from 18 to 25 that were previously in the
"ECPG infrastructure" CF link are included here.

There is no functional change.

Best regards,
Zoltán Böszörményi

Attachment

Re: ECPG FETCH readahead

From

Boszormenyi Zoltan

Date:

16 April 2014, 08:55:22

2014-01-18 18:08 keltezéssel, Boszormenyi Zoltan írta:
> Hi,
>
>
> Alvaro Herrera wrote:
>> Boszormenyi Zoltan escribió:
>>> Rebased patches after the regression test and other details were fixed
>>> in the infrastructure part.
>>
>> This thread started in 2010, and various pieces have been applied
>> already and some others have changed in nature.  Would you please post a
>> new patchset, containing rebased patches that still need application, in
>> a new email thread to be linked in the commitfest entry?
>
> I hope Thunderbird did the right thing and didn't include the reference
> message ID when I told it to "edit as new". So supposedly this is a new
> thread but with all the cc: addresses kept.
>
> I have rebased all remaining patches and kept the numbering.
> All the patches from 18 to 25 that were previously in the
> "ECPG infrastructure" CF link are included here.
>
> There is no functional change.

Because of the recent bugfixes in the ECPG area, the patchset
didn't apply cleanly anymore. Rebased with no functional change.

Best regards,
Zoltán Böszörményi

2014-04-24 15:19 keltezéssel, Boszormenyi Zoltan írta:
> 2014-04-24 14:50 keltezéssel, Michael Meskes írta:
>> Thanks an awful lot Antonin.
>>
>>> Committer availability might well be the issue, but missing review
>>> probably too.
>> Yes, you're right. If my taks is mostly one last glance and a commit I will make time 
>> for that.
>>
>>> Whether this review is enough to move the patch to "ready for committer"
>>> - I tend to let the next CFM decide. (I don't find it productive to
>>> ignite another round of discussion about kinds of reviews - already saw
>>> some.)
>> I saw some remarks in your review that Zoltan wants to address. Once I got the
>> updated version I'll have a look at it.
>>
>> Zoltan, could you send a new version by end of day tomorrow? I'll be sitting on
>> a plane for a longer time again on Saturday. :)
>
> I will try to.

Unfortunately, I won't make the deadline because of life
(I had to attend a funeral today) and because Antonin has
opened a can of worms with this comment:

> * How about a regression test for the new ECPGcursor_dml() function?

There are some aspects that may need a new discussion.

The SQL standard wants an "updatable cursor" for positioned DML
(i.e. UPDATE/DELETE with the WHERE CURRENT OF clause)
This means passing FOR UPDATE in the query.

FOR UPDATE + SCROLL cursor is an impossible combination,
ERROR is thrown when DECLARE is executed. This combination can
(and should?) be detected in the ECPG preprocessor and it would
prevent runtime errors. It's not implemented at the moment.

Fortunately, a previous discussion resulted in explicitly passing
NO SCROLL for cursors where SCROLL is not specified, it's in 25.patch

I intend to extend it a little for SQL standard compliance with
embedded SQL. FOR UPDATE should also implicitly mean NO SCROLL.
Both the FOR UPDATE + explicit SCROLL and the explicit SCROLL +
usage of positioned DML can be detected in the preprocessor and
they should throw an error. Then the regression test would really make
sense.

But these checks in ECPG would still leave a big hole, and it's the other
DECLARE variant with the query passed in a prepared statement with
"EXEC SQL PREPARE prep_stmt FROM :query_string;"

Plugging this hole would require adding a simplified syntax checker to
libecpg that only checks the SelectStmt or re-adding the backend code
to tell the application the cursor's scrollable (and perhaps the updatable)
property.

I must have forgotten but surely this was the reason for changing the
DECLARE command tag in the first place which was shot down already.
So, only the other choice remains, the syntax checker in ecpglib.

I think implementing it would make the caching code miss 9.4, since
it adds a whole new set of code but the Perl magic for the ECPG syntax
checker may be mostly reusable here.

Best regards,
Zoltán Böszörményi