Thread: repeated pointless memmove() calls in pqReadData()

repeated pointless memmove() calls in pqReadData()

From
Brent Ewing
Date:
22 May 2001

Dear Sir;

First, thank you for PostGres. It is great!

Second, I am puzzled by repeated, apparently pointless calls to memmove() in
the function pqReadData, which is in `src/interfaces/libpq/fe-misc.c'.

I am running and looking at PostGres version 7.1.1 built on a Alpha running
Digital UNIX. I use the Digital compilers to build PostGres.

I created a table using the `script'

-- Create genome_seq table
--
-- genome_seq contains information extracted from NCBI H sapiens genome contigs
--
--   table label
--   -----------
--   accession
--   version
--   organism
--   chromosome
--   comments
--   sequence
--   length
--
CREATE TABLE genome_seq (
accession               text,
version                 text,
organism                text,
chromosome              text,
comments                text,
sequence                text,
length                  int4
)
--
GENOME_SEQ.TABLE (END)

The `sequence' can be a long string. I have sequence strings of tens of millions
of characters.

I create a file of the rows and load the data using the `COPY FROM' SQL command
for performance as suggested in the documents. (I believe this is irrelevant.)

I am using PostGres and the libpq library calls to get the sequence quickly. It
worked well for sequences of 150k characters but bogged down for the 10M
character strings, taking a minute or so, which seemed excessive to me.

I used top to watch system activity and noticed that the the client was getting
essentially all the CPU cycles, and the backend was getting essentially nothing.
I did not understand why the client needs to spend a lot of time with long text
strings so I repeated this with the `psql' interface and found the same
phenomenon, which makes sense because it uses libpq too.

I ran both my program and psql under dbx and interrupted it repeatedly while
it ate up CPU cycles and found the function stack looked like

  genbank_db=# select sequence from genome_seq where version = 'NT_009151.3';

  Interrupt
  signal Interrupt at >*[memmove, 0x3ff81499998]  subq    r5, 0x8, r5
  (dbx) where
  >  0 memmove(0x0, 0x0, 0x0, 0x0, 0x0) [0x3ff81499998]
     1 pqReadData(0x0, 0x0, 0x3ffbffe8880, 0x14000c400, 0x0) [0x3ffbffeb16c]
     2 PQgetResult(0x3ffbffe8880, 0x14000c400, 0x0, 0x100400000, 0x3ffbffe8aa0) [0x3ffbffe887c]
     3 PQexec(0x14000a4c0, 0x0, 0x140000a00, 0x14000e3c0, 0x0) [0x3ffbffe8a9c]
     4 SendQuery(0x0, 0x14000eb40, 0x12000b410, 0x14000eb40, 0x3b) [0x120009024]
     5 MainLoop(0x14000d320, 0x3e00000000, 0x14000eb40, 0x100000000, 0x1) [0x12000b42c]
     6 main(0x0, 0x0, 0x0, 0x0, 0x100000002) [0x12000d364]
  (dbx)

which puzzled me.

I added diagnostics to pqReadData() so the function looks like

  .
  .
  .
  int
  pqReadData(PGconn *conn)
  {
          int                     someread = 0;
          int                     nread;

  static int mncount = 0;
  static long onumbyte = 0;
  fprintf( stderr,
           "pqReadData: start\n" );

          if (conn->sock < 0)
          {
                  printfPQExpBuffer(&conn->errorMessage,
                                                    "pqReadData() -- connection not open\n");
                  return -1;
          }

          /* Left-justify any data in the buffer to make room */
          if (conn->inStart < conn->inEnd)
          {
  /*
             if( conn->inStart != 0 )
             {
  */
                  memmove(conn->inBuffer, conn->inBuffer + conn->inStart,
                                  conn->inEnd - conn->inStart);
  ++mncount;
  fprintf( stderr,
  "pqReadData: count: %d memmove( dst: %ld  src: %ld  numbyt: %ld  )  del_numbyt: %ld  inStart: %ld\n",
  mncount,
  (long)conn->inBuffer,
  (long)( conn->inBuffer + conn->inStart ),
  (long)( conn->inEnd - conn->inStart ),
  (long)( conn->inEnd - conn->inStart ) - onumbyte,
  (long)( conn->inStart ) );

  onumbyte = conn->inEnd - conn->inStart;
  /*
             }
  */

                  conn->inEnd -= conn->inStart;
                  conn->inCursor -= conn->inStart;
                  conn->inStart = 0;
          }
  .
  .
  .
  .


I start up psql and make a query

  select sequence from genome_seq where version = 'NT_009151.3';

and the resulting stderr looks like

startup: begin: stderr
pqReadData: start
pqReadData: start
pqReadData: start
pqReadData: start
pqReadData: start
pqReadData: count: 1 memmove( dst: 5368891392  src: 5368891421  numbyt: 8163  )  del_numbyt: 8163  inStart: 29
pqReadData: start
pqReadData: count: 2 memmove( dst: 5368891392  src: 5368891392  numbyt: 16355  )  del_numbyt: 8192  inStart: 0
src/interfaces/libpq/fe-misc.c: realloc mem
pqReadData: start
pqReadData: count: 3 memmove( dst: 5381845472  src: 5381845472  numbyt: 24547  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 4 memmove( dst: 5381845472  src: 5381845472  numbyt: 32739  )  del_numbyt: 8192  inStart: 0
src/interfaces/libpq/fe-misc.c: realloc mem
pqReadData: start
pqReadData: count: 5 memmove( dst: 5381878272  src: 5381878272  numbyt: 40931  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 6 memmove( dst: 5381878272  src: 5381878272  numbyt: 49123  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 7 memmove( dst: 5381878272  src: 5381878272  numbyt: 57315  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 8 memmove( dst: 5381878272  src: 5381878272  numbyt: 65507  )  del_numbyt: 8192  inStart: 0
src/interfaces/libpq/fe-misc.c: realloc mem
pqReadData: start
pqReadData: count: 9 memmove( dst: 5381943840  src: 5381943840  numbyt: 73699  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 10 memmove( dst: 5381943840  src: 5381943840  numbyt: 81891  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 11 memmove( dst: 5381943840  src: 5381943840  numbyt: 90083  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 12 memmove( dst: 5381943840  src: 5381943840  numbyt: 98275  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 13 memmove( dst: 5381943840  src: 5381943840  numbyt: 106467  )  del_numbyt: 8192  inStart: 0

.
.
. (many lines omitted)
.
pqReadData: start
pqReadData: count: 1562 memmove( dst: 5398590208  src: 5398590208  numbyt: 12804067  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1563 memmove( dst: 5398590208  src: 5398590208  numbyt: 12812259  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1564 memmove( dst: 5398590208  src: 5398590208  numbyt: 12820451  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1565 memmove( dst: 5398590208  src: 5398590208  numbyt: 12828643  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1566 memmove( dst: 5398590208  src: 5398590208  numbyt: 12836835  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1567 memmove( dst: 5398590208  src: 5398590208  numbyt: 12845027  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1568 memmove( dst: 5398590208  src: 5398590208  numbyt: 12853219  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1569 memmove( dst: 5398590208  src: 5398590208  numbyt: 12861411  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1570 memmove( dst: 5398590208  src: 5398590208  numbyt: 12869603  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1571 memmove( dst: 5398590208  src: 5398590208  numbyt: 12877795  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1572 memmove( dst: 5398590208  src: 5398590208  numbyt: 12885987  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1573 memmove( dst: 5398590208  src: 5398590208  numbyt: 12894179  )  del_numbyt: 8192  inStart: 0
pqReadData: start
pqReadData: count: 1574 memmove( dst: 5398590208  src: 5398590208  numbyt: 12902371  )  del_numbyt: 8192  inStart: 0



Incidentally, the length of sequence is 12908986 characters.

It appears to me that the memmove is moving data from and to the
same memory locations ~1500 times incrementing the number of bytes
moved by 8K each time.

I tried skipping the memmove when the conn->inStart == 0
and found that the output sequence is identical to when it is not
skipped, as I expected.

I appreciate your consideration.

        Best Wishes,
        Brent Ewing

--


--------------------------------
  Brent Ewing
  bge@u.washington.edu
  [206] 616 6040

  University of Washington
  Box 352145
  Seattle, Washington 98195-2145
--------------------------------

Re: repeated pointless memmove() calls in pqReadData()

From
Tom Lane
Date:
Brent Ewing <bge@u.washington.edu> writes:
> Second, I am puzzled by repeated, apparently pointless calls to memmove() in
> the function pqReadData, which is in `src/interfaces/libpq/fe-misc.c'.

Hm.  That was never a problem back when we had an 8K row-size limit ;-)
but I can see that it might get to be an issue for retrieving huge rows.
Thanks for the suggestion.  I've applied the attached patch to current
sources.

            regards, tom lane


*** fe-misc.c~    Sat Mar 31 18:13:30 2001
--- fe-misc.c    Mon May 28 11:29:51 2001
***************
*** 400,413 ****
      /* Left-justify any data in the buffer to make room */
      if (conn->inStart < conn->inEnd)
      {
!         memmove(conn->inBuffer, conn->inBuffer + conn->inStart,
!                 conn->inEnd - conn->inStart);
!         conn->inEnd -= conn->inStart;
!         conn->inCursor -= conn->inStart;
!         conn->inStart = 0;
      }
      else
          conn->inStart = conn->inCursor = conn->inEnd = 0;

      /*
       * If the buffer is fairly full, enlarge it. We need to be able to
--- 400,419 ----
      /* Left-justify any data in the buffer to make room */
      if (conn->inStart < conn->inEnd)
      {
!         if (conn->inStart > 0)
!         {
!             memmove(conn->inBuffer, conn->inBuffer + conn->inStart,
!                     conn->inEnd - conn->inStart);
!             conn->inEnd -= conn->inStart;
!             conn->inCursor -= conn->inStart;
!             conn->inStart = 0;
!         }
      }
      else
+     {
+         /* buffer is logically empty, reset it */
          conn->inStart = conn->inCursor = conn->inEnd = 0;
+     }

      /*
       * If the buffer is fairly full, enlarge it. We need to be able to