Thread:

From
Ian Grant
Date:
Hi,

I sent this message to pgsql-bugs@postresql.org a couple of weeks ago and I
got an automated response from pgsql-bugs-owner@hub.org indicating that I'd
been subscribed to this list, but I've not seen any traffic from it all since
then.  Either there are really very few bugs in PostgreSQL (congratulations!)
or something is wrong with my subscription.  Any (non-automatic) feedback
would be welcomed.

Here's the message again:

I'm using V 7.0 on a Linux machine and I believe I have found a bug in the
large object interface provided by libpq.  The code below will reproduce it, I
hope.  Basically it creates a large object, writes six 'a' characters to it,
then closes it.  Then, in another transaction, it opens the object, seeks to
position 1 from the start, writes a 'b', then seeks to position 3 from the
start and writes another 'b'. Then it closes the object and COMMITs the
transaction.  Finally, in a further separate transaction, it calls lo_export
to write out the resulting object to a file testloseek.c.lobj  I find this
file, instead of containing the string 'ababaa' as expected, contains
'^@b^@baa' where ^@ is ASCII NUL.

Compile with something like

       gcc -o testloseek testloseek.c -lpq

The program sets the PQtrace to STDOUT and writes messages to STDERR, so run
it with STDOUT redirected to a log file.

This is a C version of a basic regression test of guile-pg, my Guile language
bindings for libpq.  You may recall I reported a similar bug a year or so ago,
and I believed it was then fixed by Tatsuo, after a couple of iterations.  I'm
sorry to be the bearer of bad news ...

Please reply to me directly since I'm not on the list.

Thanks
Ian

#include <stdio.h>
#include "libpq-fe.h"
#include "libpq/libpq-fs.h"

void exec_cmd(PGconn *conn, char *str);

main (int argc, char *argv[])
{
   PGconn *conn;
   int lobj_fd;
   char buf[256];
   int ret, i;
   Oid lobj_id;

   conn = PQconnectdb("dbname=test");
   if (PQstatus(conn) != CONNECTION_OK) {
      fprintf(stderr, "Can't connect to backend.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   exec_cmd(conn, "BEGIN TRANSACTION");
   PQtrace (conn, stdout);
   if ((lobj_id = lo_creat(conn, INV_READ | INV_WRITE)) < 0) {
      fprintf(stderr, "Can't create lobj.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   fprintf(stderr, "lo_creat() returned OID %ld.\n", lobj_id);
   if ((lobj_fd = lo_open(conn, lobj_id, INV_READ | INV_WRITE)) < 0) {
      fprintf(stderr, "Can't open lobj.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   fprintf(stderr, "lo_open returned fd = %d.\n", lobj_fd);
   if ((ret = lo_write(conn, lobj_fd, "aaaaaa", 6)) != 6) {
      fprintf(stderr, "Can't write lobj.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   ret = lo_close(conn, lobj_fd);
   printf("lo_close returned %d.\n", ret);
   exec_cmd(conn, "END TRANSACTION");

   exec_cmd(conn, "BEGIN TRANSACTION");
   if ((lobj_fd = lo_open(conn, lobj_id, INV_READ | INV_WRITE)) < 0) {
      fprintf(stderr, "Can't open lobj.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   fprintf(stderr, "lo_open returned fd = %d.\n", lobj_fd);
   if (ret)
      fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn));
   if ((ret = lo_lseek(conn, lobj_fd, 1, 0)) != 1) {
      fprintf(stderr, "error (%d) lseeking in large object.\n", ret);
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   if ((ret = lo_write(conn, lobj_fd, "b", 1)) != 1) {
      fprintf(stderr, "Can't write lobj.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   if ((ret = lo_lseek(conn, lobj_fd, 3, 0)) != 3) {
      fprintf(stderr, "error (%d) lseeking in large object.\n", ret);
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   if ((ret = lo_write(conn, lobj_fd, "b", 1)) != 1) {
      fprintf(stderr, "Can't write lobj.\n");
      fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   ret = lo_close(conn, lobj_fd);
   printf("lo_close returned %d.\n", ret);
   if (ret)
      fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn));
   PQuntrace(conn);
   exec_cmd(conn, "END TRANSACTION");

   exec_cmd(conn, "BEGIN TRANSACTION");
   ret = lo_export(conn, lobj_id, "testloseek.c.lobj");
   printf("lo_export returned %d.\n", ret);
   if (ret != 1)
      fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn));
   exec_cmd(conn, "END TRANSACTION");
   exit(0);
}

void exec_cmd(PGconn *conn, char *str)
{
   PGresult *res;

   if ((res = PQexec(conn, str)) == NULL) {
      fprintf(stderr, "Error executing %s.\n", str);
      fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn));
      exit(1);
   }
   if (PQresultStatus(res) != PGRES_COMMAND_OK) {
      fprintf(stderr, "Error executing %s.\n", str);
      fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn));
      PQclear(res);
      exit(1);
   }
   PQclear(res);
}

--
Ian Grant, Computer Lab., New Museums Site, Pembroke Street, Cambridge
Phone: +44 1223 334420          Personal e-mail: iang at pobox dot com

Re: large object seek/write bug

From
Tom Lane
Date:
Ian Grant <Ian.Grant@cl.cam.ac.uk> writes:
> I'm using V 7.0 on a Linux machine and I believe I have found a bug in the
> large object interface provided by libpq.

Thank you for the carefully developed test case.  The bug is actually
in the backend, not in libpq: there's an ancient hack in inv_write
that looks at rel->rd_nblocks to decide if the large object is empty.
Unfortunately rd_nblocks isn't maintained very carefully, so the test
might mistakenly consider a recently-created LO to be empty, leading
to the Wrong Thing.  But that hack is no longer necessary, since the
index bug it was trying to work around was swatted ages ago; diking
out the check suffices to fix it.

Another error I noticed while testing the fix is that inv_read doesn't
cope gracefully with "holes" in the large object, ie, ranges never
written because of a seek and write past the previous end of file.
With the attached patch, reads of "holes" reliably return zeroes,
as is considered standard behavior for Unix files.  (The garbage data
you saw was actually from this error, although the case would not have
been triggered if not for the first bug.)

I believe the attached patch fixes these problems, but I haven't been
able to wring it out very thoroughly because I don't have applications
that do random seeks and writes in large objects.  If you could bang
on it a little more and report back, that'd be great.

            regards, tom lane


*** src/backend/storage/large_object/inv_api.c.orig    Wed Apr 12 13:15:37 2000
--- src/backend/storage/large_object/inv_api.c    Thu Jun 15 02:10:27 2000
***************
*** 185,190 ****
--- 185,191 ----
      retval->idesc = RelationGetDescr(indr);
      retval->offset = retval->lowbyte = retval->highbyte = 0;
      ItemPointerSetInvalid(&(retval->htid));
+     retval->flags = 0;

      if (flags & INV_WRITE)
      {
***************
*** 233,238 ****
--- 234,240 ----
      retval->idesc = RelationGetDescr(indrel);
      retval->offset = retval->lowbyte = retval->highbyte = 0;
      ItemPointerSetInvalid(&(retval->htid));
+     retval->flags = 0;

      if (flags & INV_WRITE)
      {
***************
*** 371,384 ****
      if (whence == SEEK_CUR)
      {
          offset += obj_desc->offset;        /* calculate absolute position */
-         return inv_seek(obj_desc, offset, SEEK_SET);
      }
!
!     /*
!      * if you seek past the end (offset > 0) I have no clue what happens
!      * :-(                  B.L.     9/1/93
!      */
!     if (whence == SEEK_END)
      {
          /* need read lock for getsize */
          if (!(obj_desc->flags & IFS_RDLOCK))
--- 373,380 ----
      if (whence == SEEK_CUR)
      {
          offset += obj_desc->offset;        /* calculate absolute position */
      }
!     else if (whence == SEEK_END)
      {
          /* need read lock for getsize */
          if (!(obj_desc->flags & IFS_RDLOCK))
***************
*** 389,396 ****
          offset += _inv_getsize(obj_desc->heap_r,
                                 obj_desc->hdesc,
                                 obj_desc->index_r);
-         return inv_seek(obj_desc, offset, SEEK_SET);
      }

      /*
       * Whenever we do a seek, we turn off the EOF flag bit to force
--- 385,392 ----
          offset += _inv_getsize(obj_desc->heap_r,
                                 obj_desc->hdesc,
                                 obj_desc->index_r);
      }
+     /* now we can assume that the operation is SEEK_SET */

      /*
       * Whenever we do a seek, we turn off the EOF flag bit to force
***************
*** 414,422 ****
       * stores the offset of the last byte that appears on it, and we have
       * an index on this.
       */
-
-
-     /* right now, just assume that the operation is SEEK_SET */
      if (obj_desc->iscan != (IndexScanDesc) NULL)
      {
          d = Int32GetDatum(offset);
--- 410,415 ----
***************
*** 424,430 ****
      }
      else
      {
-
          ScanKeyEntryInitialize(&skey, 0x0, 1, F_INT4GE,
                                 Int32GetDatum(offset));

--- 417,422 ----
***************
*** 487,495 ****

          /* copy the data from this block into the buffer */
          d = heap_getattr(&tuple, 2, obj_desc->hdesc, &isNull);
          ReleaseBuffer(buffer);

!         fsblock = (struct varlena *) DatumGetPointer(d);

          off = obj_desc->offset - obj_desc->lowbyte;
          ncopy = obj_desc->highbyte - obj_desc->offset + 1;
--- 479,505 ----

          /* copy the data from this block into the buffer */
          d = heap_getattr(&tuple, 2, obj_desc->hdesc, &isNull);
+         fsblock = (struct varlena *) DatumGetPointer(d);
          ReleaseBuffer(buffer);

!         /*
!          * If block starts beyond current seek point, then we are looking
!          * at a "hole" (unwritten area) in the object.  Return zeroes for
!          * the "hole".
!          */
!         if (obj_desc->offset < obj_desc->lowbyte)
!         {
!             int        nzeroes = obj_desc->lowbyte - obj_desc->offset;
!
!             if (nzeroes > (nbytes - nread))
!                 nzeroes = (nbytes - nread);
!             MemSet(buf, 0, nzeroes);
!             buf += nzeroes;
!             nread += nzeroes;
!             obj_desc->offset += nzeroes;
!             if (nread >= nbytes)
!                 break;
!         }

          off = obj_desc->offset - obj_desc->lowbyte;
          ncopy = obj_desc->highbyte - obj_desc->offset + 1;
***************
*** 535,548 ****
          Buffer        buffer;

          /*
!          * Fetch the current inversion file system block.  If the class
!          * storing the inversion file is empty, we don't want to do an
!          * index lookup, since index lookups choke on empty files (should
!          * be fixed someday).
           */

!         if ((obj_desc->flags & IFS_ATEOF)
!             || obj_desc->heap_r->rd_nblocks == 0)
              tuple.t_data = NULL;
          else
              inv_fetchtup(obj_desc, &tuple, &buffer);
--- 545,555 ----
          Buffer        buffer;

          /*
!          * Fetch the current inversion file system block.  We can skip
!          * the work if we already know we are at EOF.
           */

!         if (obj_desc->flags & IFS_ATEOF)
              tuple.t_data = NULL;
          else
              inv_fetchtup(obj_desc, &tuple, &buffer);

Re: large object seek/write bug

From
Ian Grant
Date:
> Thank you for the carefully developed test case.  The bug is actually
> in the backend, not in libpq: ... snip ...
>
> I believe the attached patch fixes these problems, but I haven't been
> able to wring it out very thoroughly because I don't have applications
> that do random seeks and writes in large objects.  If you could bang
> on it a little more and report back, that'd be great.

Hi Tom.  Thanks for the response and the quick fix.  I'll write some more test cases now.  I just thought that whilst
thispart of the backend is still fresh in your mind you might consider implementing lo_truncate (the lo_ analog of the
unixtruncate system call.)  At present there is no way to reduce the size of a large object except by copying to a new
one(and then we still can't delete the old one, or can we do that now?) 

Cheers
Ian
--
Ian Grant, Computer Lab., New Museums Site, Pembroke Street, Cambridge
Phone: +44 1223 334420          Personal e-mail: iang at pobox dot com

Re: large object seek/write bug

From
Tom Lane
Date:
Ian Grant <Ian.Grant@cl.cam.ac.uk> writes:
> I just thought that whilst this part of the backend is still fresh in
> your mind you might consider implementing lo_truncate (the lo_ analog
> of the unix truncate system call.)

I'll leave that for someone else (Denis Perchine seems to be
interested).  Bugs are one thing, but new features are another,
and large objects aren't high on my personal priority list...

            regards, tom lane