Thread: Re: [PATCHES] updated patch for selecting large results sets in psql using cursors

<chrisnospam@1006.org> writes:
> here comes the latest version (version 7) of the patch to handle large
> result sets with psql.  As previously discussed, a cursor is used
> for SELECT queries when \set FETCH_COUNT some_value > 0

Wait a minute.  What I thought we had agreed to was a patch to make
commands sent with \g use a cursor.  This patch changes SendQuery
so that *every* command executed via psql is treated this way.
That's a whole lot bigger behavioral change than I think is warranted.

            regards, tom lane

Re: [PATCHES] updated patch for selecting large results sets in psql using cursors

From
Peter Eisentraut
Date:
Tom Lane wrote:
> Wait a minute.  What I thought we had agreed to was a patch to make
> commands sent with \g use a cursor.  This patch changes SendQuery
> so that *every* command executed via psql is treated this way.

That's what I remembered.  I don't think we want to introduce a
difference between ; and \g.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> Wait a minute.  What I thought we had agreed to was a patch to make
>> commands sent with \g use a cursor.  This patch changes SendQuery
>> so that *every* command executed via psql is treated this way.

> That's what I remembered.  I don't think we want to introduce a 
> difference between ; and \g.

Have we measured the performance impact, then?  The last time I profiled
psql, GetVariable was already a hotspot, and this introduces another
call of it into the basic query loop whether you use the feature or not.
        regards, tom lane


Re: [PATCHES] updated patch for selecting large results sets in

From
Chris Mair
Date:
On Mon, 2006-08-28 at 13:45 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane wrote:
> > > Wait a minute.  What I thought we had agreed to was a patch to make
> > > commands sent with \g use a cursor.  This patch changes SendQuery
> > > so that *every* command executed via psql is treated this way.
> 
> > That's what I remembered.  I don't think we want to introduce a 
> > difference between ; and \g.
> 
> Have we measured the performance impact, then?  The last time I profiled
> psql, GetVariable was already a hotspot, and this introduces another
> call of it into the basic query loop whether you use the feature or not.
> 
>             regards, tom lane

Hi,

after agreeing on using a \set variable, I proposed to have it influence
"\g" as well as ";", because I thought that would be the most expected
behaviour. IMHO I'm with Peter, that introducing a difference between
"\g" and ";" would go against the principle of least surprise.

Performance-wise I took for granted without checking that GetVariable's
running time would be negligible.

[looks at the code]

I see it's it's basically two function calls with a loop over a linked
list of values (in the order of 10) doing strcmps and one strtol.
It is not quite clear to me what the impact of this is. I could
imagine it would show up only if you perform lots of trivial queries
through psql. I'm going to benchmark something now and report back.

Anyway, regardless the benchmark, I feel it's somehow not clean to have
a variable introduce a difference between "\g" and ";".

[goes benchmarking...]

Bye, Chris.


-- 

Chris Mair
http://www.1006.org




Re: [PATCHES] updated patch for selecting large results

From
Chris Mair
Date:
> Performance-wise I took for granted without checking that GetVariable's
> running time would be negligible.
> 
> [looks at the code]
> 
> I see it's it's basically two function calls with a loop over a linked
> list of values (in the order of 10) doing strcmps and one strtol.
> It is not quite clear to me what the impact of this is. I could
> imagine it would show up only if you perform lots of trivial queries
> through psql. I'm going to benchmark something now and report back.
> 
> Anyway, regardless the benchmark, I feel it's somehow not clean to have
> a variable introduce a difference between "\g" and ";".
> 
> [goes benchmarking...]

Ok,
so I ran a file containing 1 million lines of "select 1;" through
psql (discarding the output). 5 runs each with the patch and with the
patch removed (the if() in SendQuery commented).

These are the results in seconds user time of psql on a Pentium M 2.0
GHz (real time was longer, since the postmaster process was on the same
machine).
patch | count |   avg   |      stddev
-------+-------+---------+-------------------f     |     5 | 16.6722 | 0.359759919946455t     |     5 | 17.2762 |
0.259528803796329

The conclusion is that, yes, the overhead is measurable, albeit with
a very synthetic benchmark (of the type MySQL wins ;).

Basically I'm loosing 0.6 usec on each query line (when FETCH_COUNT
is not there and therefore psql need to scan the whole variables list
in GetVariable() for nothing).

Not sure if that's acceptable (I'd say yes, but then again, I'm
not a cycle counter type of programmer *cough* Java *cough* ;)...

Opinions?

Bye, Chris.


-- 

Chris Mair
http://www.1006.org




Re: [PATCHES] updated patch for selecting large results sets in

From
Tom Lane
Date:
Chris Mair <chrisnospam@1006.org> writes:
> The conclusion is that, yes, the overhead is measurable, albeit with
> a very synthetic benchmark (of the type MySQL wins ;).

OK, so about 3 or 4% overhead added to extremely short queries.
That's not enough to kill this patch, but it's still annoying ...
and as I mentioned, there are some existing calls of GetVariable
that are executed often enough to be a problem too.

It strikes me that having to do GetVariable *and* strtol and so on
for these special variables is pretty silly; the work should be done
during the infrequent times they are set, rather than the frequent
times they are read.  I propose that we add the equivalent of a GUC
assign_hook to psql's variable facility, attach an assign hook function
to each special variable, and have the assign hook transpose the
value into an internal variable that can be read with no overhead.
If we do that then the cost of the FETCH_COUNT patch will be
unmeasurable, and I think we'll see a few percent savings overall in
psql runtime from improving the existing hotspot uses of GetVariable.

Barring objections, I'll hack on this this evening ...
        regards, tom lane


Re: [PATCHES] updated patch for selecting large results

From
Chris Mair
Date:
> > The conclusion is that, yes, the overhead is measurable, albeit with
> > a very synthetic benchmark (of the type MySQL wins ;).
> 
> OK, so about 3 or 4% overhead added to extremely short queries.

More accurately, that 3 or 4% overhead is added to about all queries
(we're talking *psql*-only running time).

It's just that for anything but short queries, psql running time
totally dwarfs regarding to postmaster running time anyway.

> That's not enough to kill this patch, but it's still annoying ...
> and as I mentioned, there are some existing calls of GetVariable
> that are executed often enough to be a problem too.
> 
> It strikes me that having to do GetVariable *and* strtol and so on
> for these special variables is pretty silly; the work should be done
> during the infrequent times they are set, rather than the frequent
> times they are read.  I propose that we add the equivalent of a GUC
> assign_hook to psql's variable facility, attach an assign hook function
> to each special variable, and have the assign hook transpose the
> value into an internal variable that can be read with no overhead.
> If we do that then the cost of the FETCH_COUNT patch will be
> unmeasurable, and I think we'll see a few percent savings overall in
> psql runtime from improving the existing hotspot uses of GetVariable.
> 
> Barring objections, I'll hack on this this evening ...

Might help.

Take into account the strtol is not critical at all for FETCH_COUNT,
since when it's actually set, we're supposed to retrieving big data
where a strtol doesn't matter anyway. The overhead comes from scanning
the linked list for nothing in the normal case (when it's not set).

I don't know how the other uses factor in here, but I see it's called
at least twice more on average calls to SendQuery.

Bye,
Chris.



-- 

Chris Mair
http://www.1006.org




Re: [PATCHES] updated patch for selecting large results sets

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Tom Lane wrote:
> > Wait a minute.  What I thought we had agreed to was a patch to make
> > commands sent with \g use a cursor.  This patch changes SendQuery
> > so that *every* command executed via psql is treated this way.
>
> That's what I remembered.  I don't think we want to introduce a
> difference between ; and \g.

I am confused.  I assume \g and ; should be affected, like Peter says.
Tom, what *every* command are you talking about?  You mean \d?

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Bruce Momjian <bruce@momjian.us> writes:
> Peter Eisentraut wrote:
>> Tom Lane wrote:
>>> Wait a minute.  What I thought we had agreed to was a patch to make
>>> commands sent with \g use a cursor.

> I am confused.  I assume \g and ; should be affected, like Peter says.
> Tom, what *every* command are you talking about?  You mean \d?

Like I said, I thought we were intending to modify \g's behavior only;
that was certainly the implication of the discussion of "\gc".

            regards, tom lane

Re: [PATCHES] updated patch for selecting large results sets

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Peter Eisentraut wrote:
> >> Tom Lane wrote:
> >>> Wait a minute.  What I thought we had agreed to was a patch to make
> >>> commands sent with \g use a cursor.
>
> > I am confused.  I assume \g and ; should be affected, like Peter says.
> > Tom, what *every* command are you talking about?  You mean \d?
>
> Like I said, I thought we were intending to modify \g's behavior only;
> that was certainly the implication of the discussion of "\gc".

OK, got it.  I just don't see the value to doing \g and not ;. I think
the \gc case was a hack when he didn't have \set.  Now that we have
\set, we should be consistent.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [PATCHES] updated patch for selecting large results sets

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> OK, got it.  I just don't see the value to doing \g and not ;. I think
> the \gc case was a hack when he didn't have \set.  Now that we have
> \set, we should be consistent.

I'm willing to accept this if we can make sure we aren't adding any
overhead --- see my proposal elsewhere in the thread for fixing that.

            regards, tom lane

Re: [PATCHES] updated patch for selecting large results sets in

From
Chris Mair
Date:
> > > I am confused.  I assume \g and ; should be affected, like Peter says.
> > > Tom, what *every* command are you talking about?  You mean \d?
> >
> > Like I said, I thought we were intending to modify \g's behavior only;
> > that was certainly the implication of the discussion of "\gc".

At some point you OKed the "\g and ;" proposal.
I admit I was quick adding the "and ;" part, but it seemed so natural
once we agreed on using a variable.


> OK, got it.  I just don't see the value to doing \g and not ;. I think
> the \gc case was a hack when he didn't have \set.  Now that we have
> \set, we should be consistent.

I agree with this statement.

If we have a variable that switches just between two versions of \g,
we could have gone with using \u (or whatever) in the first place.

In the mean time I have been converted by the variable camp, and
I think the variable should change "\g" and ";" together, consistently.

If we find we can't live with the performance overhead of that
if(FETCH_COUNT), it is still not clear why we would be better
off moving it into the \g code path only.

Is it because presumably \g is used less often in existing psql scripts?

Bye, Chris.



--

Chris Mair
http://www.1006.org



Re: [PATCHES] updated patch for selecting large results sets

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > OK, got it.  I just don't see the value to doing \g and not ;. I think
> > the \gc case was a hack when he didn't have \set.  Now that we have
> > \set, we should be consistent.
>
> I'm willing to accept this if we can make sure we aren't adding any
> overhead --- see my proposal elsewhere in the thread for fixing that.

Right, if \g has overhead, I don't want people to start using ; because
it is faster.  That is the kind of behavior that makes us look sloppy.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +