[Fwd: Re: [HACKERS] Performance problem with large insert statements] - Mailing list pgsql-jdbc

From Barry Lind
Subject [Fwd: Re: [HACKERS] Performance problem with large insert statements]
Date
Msg-id 3CB31004.6010108@xythos.com
Whole thread Raw
List pgsql-jdbc
David,

Here is the email thread that explains what I think is the issue you are
seeing.

thanks,
--Barry



-------- Original Message --------
Subject: Re: [HACKERS] Performance problem with large insert statements
Date: Mon, 03 Dec 2001 20:31:35 -0800
From: Barry Lind <barry@xythos.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
CC: pgsql-hackers@postgresql.org
References: <3C05D361.8030009@xythos.com> <25924.1007437751@sss.pgh.pa.us>

Tom,

Thanks for looking into this.  What you found is about what I expected.
   What this means to me is that there should be a new todo item along
the lines:
    * allow binding query args over FE/BE protocol

Generally what makes the statement large is the values being
inserted/updated or being specified in the where clause.  If it were
possible from the client declare a cursor with bind variables (i.e. $1,
$2, ...), then later bind in the values and execute the cursor, it
should be possible to work around this problem.

For example the Oracle SQL*Net protocol (Oracle's equivalent to
postgres' FE/BE) has seven distinct operations that can be done on a
cursor from the client:  open, bind, describe, parse, execute, fetch,
close.  This perhaps is a little bit more control than you need, but you
get the general idea.

Currently the FE/BE protocol has two modes for executing queries:  1)
send the entire query as a string and get back the entire result set in
one go, or 2) use cursors in sql, which still sends the entire query in
one string, but allows you to get back a subset of the results.  It
would be nice to have even more control than either of these two options
offer.

It seems to me that the server has all the functionality to do this,
because plpgsql supports it in 7.2.  It just isn't can't be done from
the client.

I have been contemplating creating some SPI functions that could be
called from the client that would implement this type of functionality
and then enhancing the JDBC driver to use them instead of the regular
query execution of the FE/BE protocol.  (i.e. one function to "declare"
a cursor with $1 placeholders, a second function to bind values and
execute the cursor, and then using standard fetch sql statement to get
results, therefore it could be done without changing the actual
protocol).  This could also support the client explicitly "caching"
commonly used cursors and just rebinding/reexecuting them to avoid
having to reparse/replan commonly used queries.

If this all isn't too off the wall, I would like to see something along
these lines added to the todo list and perhapes this email thread along
with.  At a minimum I would like to hear others opinions on the subject.

thanks,
--Barry





Tom Lane wrote:

 > Barry Lind <barry@xythos.com> writes:
 >
 >>In looking at some performance issues (I was trying to look at the
 >>overhead of toast) I found that large insert statements were very slow.
 >>...
 >>I tried to run gprof to see where the time was going, but for some
 >>reason the gprof output on my gmon.out file doesn't have any timing
 >>information (all times are reported as 0.0) and I haven't been able to
 >>figure out why yet.
 >>
 >
 > That seems to be a common disease among Linuxen; dunno why.  gprof
 > works fine for me on HPUX.  I got around to reproducing this today,
 > and what I find is that the majority of the backend time is going into
 > simple scanning of the input statement:
 >
 > Each sample counts as 0.01 seconds.
 >   %   cumulative   self              self     total
 >  time   seconds   seconds    calls  ms/call  ms/call  name
 >  31.24     11.90    11.90                             _mcount
 >  19.51     19.33     7.43    10097     0.74     1.06  base_yylex
 >   7.48     22.18     2.85 21953666     0.00     0.00
appendStringInfoChar
 >   5.88     24.42     2.24      776     2.89     2.89  pglz_compress
 >   4.36     26.08     1.66 21954441     0.00     0.00  pq_getbyte
 >   3.57     27.44     1.36  7852141     0.00     0.00  addlit
 >   3.26     28.68     1.24     1552     0.80     0.81  scanstr
 >   2.84     29.76     1.08      779     1.39     7.18  pq_getstring
 >   2.31     30.64     0.88    10171     0.09     0.09  _doprnt
 >   2.26     31.50     0.86      776     1.11     1.11  byteain
 >   2.07     32.29     0.79                             msquadloop
 >   1.60     32.90     0.61  7931430     0.00     0.00  memcpy
 >   1.18     33.35     0.45                             chunks
 >   1.08     33.76     0.41    46160     0.01     0.01  strlen
 >   1.08     34.17     0.41                             encore
 >   1.05     34.57     0.40     8541     0.05     0.05  XLogInsert
 >   0.89     34.91     0.34                             appendStringInfo
 >
 > 60% of the call graph time is accounted for by these two areas:
 >
 > index % time    self  children    called     name
 >                 7.43    3.32   10097/10097       yylex [14]
 > [13]    41.0    7.43    3.32   10097         base_yylex [13]
 >                 1.36    0.61 7852141/7852141     addlit [28]
 >                 1.24    0.01    1552/1552        scanstr [30]
 >                 0.02    0.03    3108/3108        ScanKeywordLookup [99]
 >                 0.00    0.02    2335/2335        yy_get_next_buffer [144]
 >                 0.02    0.00     776/781         strtol [155]
 >                 0.00    0.01     777/3920        MemoryContextStrdup
[108]
 >                 0.00    0.00       1/1
base_yy_create_buffer [560]
 >                 0.00    0.00    4675/17091       isupper [617]
 >                 0.00    0.00    1556/1556
yy_get_previous_state [671]
 >                 0.00    0.00     779/779         yywrap [706]
 >                 0.00    0.00       1/2337
base_yy_load_buffer_state [654]
 > -----------------------------------------------
 >                 1.08    4.51     779/779         pq_getstr [17]
 > [18]    21.4    1.08    4.51     779         pq_getstring [18]
 >                 2.85    0.00 21953662/21953666
appendStringInfoChar [20]
 >                 1.66    0.00 21954441/21954441     pq_getbyte [29]
 > -----------------------------------------------
 >
 > While we could probably do a little bit to speed up pg_getstring and its
 > children, it's not clear that we can do anything about yylex, which is
 > flex output code not handmade code, and is probably well-tuned already.
 >
 > Bottom line: feeding huge strings through the lexer is slow.
 >
 >             regards, tom lane
 >
 >



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org




pgsql-jdbc by date:

Previous
From: David Bernhill
Date:
Subject: Re: raw data broken in 7.2 driver?
Next
From: Barry Lind
Date:
Subject: Re: raw data broken in 7.2 driver?