Thread: A little COPY speedup

A little COPY speedup

From
Heikki Linnakangas
Date:
One complaint we've heard from clients trying out EDB or PostgreSQL is
that loading data is slower than on other DBMSs.

I ran oprofile on a COPY FROM to get an overview of where the CPU time
is spent. To my amazement, the function at the top of the list was
PageAddItem with 16% of samples.

On every row, PageAddItem will scan all the line pointers on the target
page, just to see that they're all in use, and create a new line
pointer. That adds up, especially with narrow tuples like what I used in
the test.

Attached is a fix for that. It adds a flag to each heap page that
indicates that "there isn't any free line pointers on this page, so
don't bother trying". Heap pages haven't had any heap-specific per-page
data before, so this patch adds a HeapPageOpaqueData-struct that's
stored in the special space.

My simple test case of a COPY FROM of 10000000 tuples took 19.6 s
without the patch, and 17.7 s with the patch applied. Your mileage may vary.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.228
diff -c -r1.228 heapam.c
*** src/backend/access/heap/heapam.c    9 Feb 2007 03:35:33 -0000    1.228
--- src/backend/access/heap/heapam.c    1 Mar 2007 16:27:38 -0000
***************
*** 3291,3296 ****
--- 3291,3297 ----
      Relation    reln;
      Buffer        buffer;
      Page        page;
+     HeapPageOpaque opq;

      if (record->xl_info & XLR_BKP_BLOCK_1)
          return;
***************
*** 3300,3305 ****
--- 3301,3307 ----
      if (!BufferIsValid(buffer))
          return;
      page = (Page) BufferGetPage(buffer);
+     opq = (HeapPageOpaque) PageGetSpecialPointer(page);

      if (XLByteLE(lsn, PageGetLSN(page)))
      {
***************
*** 3327,3332 ****
--- 3329,3337 ----

      PageRepairFragmentation(page, NULL);

+     /* clear the hint flag since we just freed some line pointers */
+     opq->hpo_flags &= ~HP_NOFREELINEPOINTERS;
+
      PageSetLSN(page, lsn);
      PageSetTLI(page, ThisTimeLineID);
      MarkBufferDirty(buffer);
Index: src/backend/access/heap/hio.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/hio.c,v
retrieving revision 1.65
diff -c -r1.65 hio.c
*** src/backend/access/heap/hio.c    5 Feb 2007 04:22:18 -0000    1.65
--- src/backend/access/heap/hio.c    1 Mar 2007 16:44:47 -0000
***************
*** 33,51 ****
                       HeapTuple tuple)
  {
      Page        pageHeader;
!     OffsetNumber offnum;
      ItemId        itemId;
      Item        item;

-     /* Add the tuple to the page */
      pageHeader = BufferGetPage(buffer);

      offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
!                          tuple->t_len, InvalidOffsetNumber, LP_USED);

      if (offnum == InvalidOffsetNumber)
          elog(PANIC, "failed to add tuple to page");

      /* Update tuple->t_self to the actual position where it was stored */
      ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum);

--- 33,70 ----
                       HeapTuple tuple)
  {
      Page        pageHeader;
!     OffsetNumber offnum, maxoff;
      ItemId        itemId;
      Item        item;
+     HeapPageOpaque opq;

      pageHeader = BufferGetPage(buffer);
+     opq = (HeapPageOpaque)PageGetSpecialPointer(pageHeader);
+     maxoff = PageGetMaxOffsetNumber(pageHeader);
+
+     /* If we know there's no free line pointers, don't waste cycles
+      * searching for one. The flag is set when there definitely isn't
+      * any free line pointers on the page, but the absence of the flag
+      * doesn't mean anything. There might still not be any free line
+      * pointers left. We'll set the flag to save work for future inserts
+      * when that happens.
+      */
+     if(opq->hpo_flags & HP_NOFREELINEPOINTERS)
+         offnum = OffsetNumberNext(maxoff);
+     else
+         offnum = InvalidOffsetNumber;
+
+     /* Add the tuple to the page */

      offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
!                          tuple->t_len, offnum, LP_USED);

      if (offnum == InvalidOffsetNumber)
          elog(PANIC, "failed to add tuple to page");

+     if(offnum > maxoff)
+         opq->hpo_flags |= HP_NOFREELINEPOINTERS;
+
      /* Update tuple->t_self to the actual position where it was stored */
      ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum);

***************
*** 309,315 ****
               BufferGetBlockNumber(buffer),
               RelationGetRelationName(relation));

!     PageInit(pageHeader, BufferGetPageSize(buffer), 0);

      if (len > PageGetFreeSpace(pageHeader))
      {
--- 328,335 ----
               BufferGetBlockNumber(buffer),
               RelationGetRelationName(relation));

!     PageInit(pageHeader, BufferGetPageSize(buffer),
!              sizeof(HeapPageOpaqueData));

      if (len > PageGetFreeSpace(pageHeader))
      {
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.346
diff -c -r1.346 vacuum.c
*** src/backend/commands/vacuum.c    15 Feb 2007 23:23:22 -0000    1.346
--- src/backend/commands/vacuum.c    1 Mar 2007 16:29:03 -0000
***************
*** 2416,2425 ****
--- 2416,2428 ----
                          maxoff;
              int            uncnt;
              int            num_tuples = 0;
+             HeapPageOpaque opq;

              buf = ReadBuffer(onerel, vacpage->blkno);
              LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
              page = BufferGetPage(buf);
+             opq = (HeapPageOpaque) PageGetSpecialPointer(page);
+
              maxoff = PageGetMaxOffsetNumber(page);
              for (offnum = FirstOffsetNumber;
                   offnum <= maxoff;
***************
*** 2453,2458 ****
--- 2456,2463 ----
              START_CRIT_SECTION();

              uncnt = PageRepairFragmentation(page, unused);
+             if(uncnt > 0)
+                 opq->hpo_flags &= ~HP_NOFREELINEPOINTERS;

              MarkBufferDirty(buf);

***************
*** 2907,2912 ****
--- 2912,2918 ----
      Page        page = BufferGetPage(buffer);
      ItemId        itemid;
      int            i;
+     HeapPageOpaque opq = (HeapPageOpaque) PageGetSpecialPointer(page);

      /* There shouldn't be any tuples moved onto the page yet! */
      Assert(vacpage->offsets_used == 0);
***************
*** 2920,2925 ****
--- 2926,2933 ----
      }

      uncnt = PageRepairFragmentation(page, unused);
+     if(uncnt > 0)
+         opq->hpo_flags &= ~HP_NOFREELINEPOINTERS;

      MarkBufferDirty(buffer);

Index: src/backend/commands/vacuumlazy.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.85
diff -c -r1.85 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c    21 Feb 2007 22:47:45 -0000    1.85
--- src/backend/commands/vacuumlazy.c    1 Mar 2007 16:28:02 -0000
***************
*** 588,593 ****
--- 588,594 ----
      int            uncnt;
      Page        page = BufferGetPage(buffer);
      ItemId        itemid;
+     HeapPageOpaque opq = (HeapPageOpaque) PageGetSpecialPointer(page);

      START_CRIT_SECTION();

***************
*** 605,610 ****
--- 606,613 ----
      }

      uncnt = PageRepairFragmentation(page, unused);
+     if(uncnt > 0)
+        opq->hpo_flags &= ~HP_NOFREELINEPOINTERS;

      MarkBufferDirty(buffer);

Index: src/include/access/htup.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/htup.h,v
retrieving revision 1.92
diff -c -r1.92 htup.h
*** src/include/access/htup.h    27 Feb 2007 23:48:09 -0000    1.92
--- src/include/access/htup.h    1 Mar 2007 16:49:11 -0000
***************
*** 18,23 ****
--- 18,38 ----
  #include "storage/relfilenode.h"

  /*
+  * Heap page special space. At the end of the page, we store some per-page
+  * information that's specific to heapam. At the moment there's just one flag.
+  */
+ typedef struct HeapPageOpaqueData
+ {
+     uint8 hpo_flags;
+ } HeapPageOpaqueData;
+
+ typedef HeapPageOpaqueData *HeapPageOpaque;
+
+ /* Bits defined in hpo_flags */
+ #define HP_NOFREELINEPOINTERS    (1 << 0)    /* all line pointers are in use */
+
+
+ /*
   * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
   * The key limit on this value is that the size of the fixed overhead for
   * a tuple, plus the size of the null-values bitmap (at 1 bit per column),
***************
*** 317,331 ****
  /*
   * MaxHeapTupleSize is the maximum allowed size of a heap tuple, including
   * header and MAXALIGN alignment padding.  Basically it's BLCKSZ minus the
!  * other stuff that has to be on a disk page.  Since heap pages use no
!  * "special space", there's no deduction for that.
   *
   * NOTE: we do not need to count an ItemId for the tuple because
   * sizeof(PageHeaderData) includes the first ItemId on the page.  But beware
   * of assuming that, say, you can fit 2 tuples of size MaxHeapTupleSize/2
   * on the same page.
   */
! #define MaxHeapTupleSize  (BLCKSZ - MAXALIGN(sizeof(PageHeaderData)))

  /*
   * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
--- 332,346 ----
  /*
   * MaxHeapTupleSize is the maximum allowed size of a heap tuple, including
   * header and MAXALIGN alignment padding.  Basically it's BLCKSZ minus the
!  * other stuff that has to be on a disk page.
   *
   * NOTE: we do not need to count an ItemId for the tuple because
   * sizeof(PageHeaderData) includes the first ItemId on the page.  But beware
   * of assuming that, say, you can fit 2 tuples of size MaxHeapTupleSize/2
   * on the same page.
   */
! #define MaxHeapTupleSize  (BLCKSZ - MAXALIGN(sizeof(PageHeaderData)) \
!                                   - MAXALIGN(sizeof(HeapPageOpaqueData)))

  /*
   * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can

Re: A little COPY speedup

From
"Pavan Deolasee"
Date:
Heikki Linnakangas wrote:
>
> Attached is a fix for that. It adds a flag to each heap page that
> indicates that "there isn't any free line pointers on this page, so
> don't bother trying". Heap pages haven't had any heap-specific
> per-page data before, so this patch adds a HeapPageOpaqueData-struct
> that's stored in the special space.
>
I would really like this change. I was thinking on similar lines to optimize
some of the HOT code paths

Thanks,
Pavan

Re: A little COPY speedup

From
Andrew Dunstan
Date:
Heikki Linnakangas wrote:
> One complaint we've heard from clients trying out EDB or PostgreSQL is
> that loading data is slower than on other DBMSs.
>
> I ran oprofile on a COPY FROM to get an overview of where the CPU time
> is spent. To my amazement, the function at the top of the list was
> PageAddItem with 16% of samples.
>
> On every row, PageAddItem will scan all the line pointers on the
> target page, just to see that they're all in use, and create a new
> line pointer. That adds up, especially with narrow tuples like what I
> used in the test.
>
> Attached is a fix for that. It adds a flag to each heap page that
> indicates that "there isn't any free line pointers on this page, so
> don't bother trying". Heap pages haven't had any heap-specific
> per-page data before, so this patch adds a HeapPageOpaqueData-struct
> that's stored in the special space.
>
> My simple test case of a COPY FROM of 10000000 tuples took 19.6 s
> without the patch, and 17.7 s with the patch applied. Your mileage may
> vary.

What is the speedup with less narrow tuples? 10% improvement is good but
not stellar.

cheers

andrew

Re: A little COPY speedup

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> On every row, PageAddItem will scan all the line pointers on the target
> page, just to see that they're all in use, and create a new line
> pointer. That adds up, especially with narrow tuples like what I used in
> the test.
> Attached is a fix for that.

This has been proposed before, and rejected before.  IIRC the previous
patch was quite a lot less invasive than this one (it didn't require
making special space on heap pages).  I don't recall why it wasn't
accepted.

            regards, tom lane

Re: A little COPY speedup

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> On every row, PageAddItem will scan all the line pointers on the target
>> page, just to see that they're all in use, and create a new line
>> pointer. That adds up, especially with narrow tuples like what I used in
>> the test.
>> Attached is a fix for that.
>
> This has been proposed before, and rejected before.  IIRC the previous
> patch was quite a lot less invasive than this one (it didn't require
> making special space on heap pages).  I don't recall why it wasn't
> accepted.

Ahh, found that thread:
http://archives.postgresql.org/pgsql-hackers/2005-07/msg00609.php

The main differences between that patch and mine is that
- the previous patch used an offset to the first free line pointer, and
I used just a flag.
- the previous patch stored the offset in the page header, and I used
the special space

I think using the special space is a cleaner approach; the field is only
meaningful in heap pages. However, now that I think of it, if we could
squeeze the flag into one of the existing fields in the page header, we
could put it there without decreasing the amount of space available for
tuples. We could use the unused pd_tli field, as you suggested later in
that thread.

At the end of the thread, Bruce added the patch to his hold-queue, but I
couldn't find a trace of it after that so I'm not clear why it was
rejected in the end. This comment (by you) seems most relevant:

> I tried making a million-row table with just two int4 columns and then
> duplicating it with CREATE TABLE AS SELECT.  In this context gprof
> shows PageAddItem as taking 7% of the runtime, which your patch knocks
> down to 1.5%.  This seems to be about the best possible real-world case,
> though (the wider the rows, the fewer times PageAddItem can loop), and
> so I'm still unconvinced that there's a generic gain here.  Adding an
> additional word to page headers has a very definite cost --- we can
> assume about a .05% increase in net I/O demands across *every*
> application, whether they do a lot of inserts or not --- and so a
> patch that provides a noticeable improvement in only a very small set
> of circumstances is going to have to be rejected.

I believe the PageAddItem overhead has become more noticeable since then
because of other improvements to COPY. In 8.3, we're also going to
reduce the tuple length (combocids and the varvarlen thing), so we can
fit more tuples per page, again making it slightly more significant.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: A little COPY speedup

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> At the end of the thread, Bruce added the patch to his hold-queue, but I
> couldn't find a trace of it after that so I'm not clear why it was
> rejected in the end. This comment (by you) seems most relevant:

I believe we concluded that the distributed cost of enlarging the page
headers would probably outweigh a gain that seemed to have fairly narrow
application.  I wouldn't be surprised if the tradeoffs have changed
since then, but I'm still loath to increase the overhead space.  If we
can do it without that, the argument to reject gets much weaker.

As you say, pd_tli is not really pulling its weight, but I'm also loath
to remove it, as in a multi-timeline situation the page LSN is really
not well defined if you don't know which timeline it refers to.

Now we'd only need 16 bits to store the last-used offset, or a flags
field if you'd prefer that, so one possible compromise is to store only
the 16 least significant bits of TLI (which ought to be enough to
disambiguate in any real-world situation), and insert the new field
where the MSBs had been.

I'm not sure whether I like your flag approach better than the
last-used-offset one.  The previous patch probably buys some teeny
amount more performance, but the flag seems more robust (noting in
passing that neither patch attempts to WAL-log its changes, so we really
need to treat the values as hints).  And a change as sketched here would
leave us 15 free bits for future expansion, which might be nice.

            regards, tom lane

Re: A little COPY speedup

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> As you say, pd_tli is not really pulling its weight, but I'm also loath
> to remove it, as in a multi-timeline situation the page LSN is really
> not well defined if you don't know which timeline it refers to.
>
> Now we'd only need 16 bits to store the last-used offset, or a flags
> field if you'd prefer that, so one possible compromise is to store only
> the 16 least significant bits of TLI (which ought to be enough to
> disambiguate in any real-world situation), and insert the new field
> where the MSBs had been.

Sounds good to me. It's nice to keep the TLI for debugging/forensics
purposes even if it's not used at the moment, but 16 bits is enough for
that.

Another possibility would be to use the unused bits in
pd_upper/lower/special, but that requires more bit-trickery.

> I'm not sure whether I like your flag approach better than the
> last-used-offset one.  The previous patch probably buys some teeny
> amount more performance, but the flag seems more robust (noting in
> passing that neither patch attempts to WAL-log its changes, so we really
> need to treat the values as hints).  And a change as sketched here would
> leave us 15 free bits for future expansion, which might be nice.

I'll post a patch along those lines.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: A little COPY speedup

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Tom Lane wrote:
>> I'm not sure whether I like your flag approach better than the
>> last-used-offset one.  The previous patch probably buys some teeny
>> amount more performance, but the flag seems more robust (noting in
>> passing that neither patch attempts to WAL-log its changes, so we really
>> need to treat the values as hints).  And a change as sketched here would
>> leave us 15 free bits for future expansion, which might be nice.
>
> I'll post a patch along those lines.

Here it is.

I'm not fond of the macro names for the flag, but couldn't think of
anything shorter yet descriptive.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.228
diff -c -r1.228 heapam.c
*** src/backend/access/heap/heapam.c    9 Feb 2007 03:35:33 -0000    1.228
--- src/backend/access/heap/heapam.c    1 Mar 2007 21:29:14 -0000
***************
*** 3327,3332 ****
--- 3327,3335 ----

      PageRepairFragmentation(page, NULL);

+     /* clear the hint flag since we just freed some line pointers */
+     HeapPageClearNoFreeLinePointers(page);
+
      PageSetLSN(page, lsn);
      PageSetTLI(page, ThisTimeLineID);
      MarkBufferDirty(buffer);
Index: src/backend/access/heap/hio.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/hio.c,v
retrieving revision 1.65
diff -c -r1.65 hio.c
*** src/backend/access/heap/hio.c    5 Feb 2007 04:22:18 -0000    1.65
--- src/backend/access/heap/hio.c    1 Mar 2007 21:33:14 -0000
***************
*** 33,51 ****
                       HeapTuple tuple)
  {
      Page        pageHeader;
!     OffsetNumber offnum;
      ItemId        itemId;
      Item        item;

-     /* Add the tuple to the page */
      pageHeader = BufferGetPage(buffer);

      offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
!                          tuple->t_len, InvalidOffsetNumber, LP_USED);

      if (offnum == InvalidOffsetNumber)
          elog(PANIC, "failed to add tuple to page");

      /* Update tuple->t_self to the actual position where it was stored */
      ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum);

--- 33,65 ----
                       HeapTuple tuple)
  {
      Page        pageHeader;
!     OffsetNumber offnum, maxoff;
      ItemId        itemId;
      Item        item;

      pageHeader = BufferGetPage(buffer);
+     maxoff = PageGetMaxOffsetNumber(pageHeader);
+
+     /* If we know there's no free line pointers, don't waste cycles
+      * searching for one. We'll set the flag to save work for future
+      * inserts when that happens.
+      */
+     if(HeapPageNoFreeLinePointers(pageHeader))
+         offnum = OffsetNumberNext(maxoff);
+     else
+         offnum = InvalidOffsetNumber;
+
+     /* Add the tuple to the page */

      offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
!                          tuple->t_len, offnum, LP_USED);

      if (offnum == InvalidOffsetNumber)
          elog(PANIC, "failed to add tuple to page");

+     if(offnum > maxoff)
+         HeapPageSetNoFreeLinePointers(pageHeader);
+
      /* Update tuple->t_self to the actual position where it was stored */
      ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum);

Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.346
diff -c -r1.346 vacuum.c
*** src/backend/commands/vacuum.c    15 Feb 2007 23:23:22 -0000    1.346
--- src/backend/commands/vacuum.c    1 Mar 2007 21:41:20 -0000
***************
*** 2453,2458 ****
--- 2453,2460 ----
              START_CRIT_SECTION();

              uncnt = PageRepairFragmentation(page, unused);
+             if(uncnt > 0)
+                 HeapPageClearNoFreeLinePointers(page);

              MarkBufferDirty(buf);

***************
*** 2920,2925 ****
--- 2922,2929 ----
      }

      uncnt = PageRepairFragmentation(page, unused);
+     if(uncnt > 0)
+         HeapPageClearNoFreeLinePointers(page);

      MarkBufferDirty(buffer);

Index: src/backend/commands/vacuumlazy.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.85
diff -c -r1.85 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c    21 Feb 2007 22:47:45 -0000    1.85
--- src/backend/commands/vacuumlazy.c    1 Mar 2007 21:27:15 -0000
***************
*** 605,610 ****
--- 605,612 ----
      }

      uncnt = PageRepairFragmentation(page, unused);
+     if(uncnt > 0)
+         HeapPageClearNoFreeLinePointers(page);

      MarkBufferDirty(buffer);

Index: src/include/access/htup.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/htup.h,v
retrieving revision 1.92
diff -c -r1.92 htup.h
*** src/include/access/htup.h    27 Feb 2007 23:48:09 -0000    1.92
--- src/include/access/htup.h    1 Mar 2007 22:13:25 -0000
***************
*** 17,22 ****
--- 17,42 ----
  #include "storage/itemptr.h"
  #include "storage/relfilenode.h"

+ /* We use the pd_amprivate field in the page header to store flags.
+  * At the moment, there's just one flag.
+  */
+
+ /* HP_NOFREELINEPOINTERS flag should be set as a hint when there definitely
+  * isn't any free line pointers left on the page. However, absence of the
+  * flag doesn't necessarily mean that there is a free line pointer.
+  */
+ #define HP_NOFREELINEPOINTERS    (1 << 0)
+
+ #define HeapPageNoFreeLinePointers(page) \
+         (((PageHeader)page)->pd_amprivate & HP_NOFREELINEPOINTERS)
+
+ #define HeapPageSetNoFreeLinePointers(page) \
+         (((PageHeader)page)->pd_amprivate |= HP_NOFREELINEPOINTERS)
+
+ #define HeapPageClearNoFreeLinePointers(page) \
+         (((PageHeader)page)->pd_amprivate &= ~HP_NOFREELINEPOINTERS)
+
+
  /*
   * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
   * The key limit on this value is that the size of the fixed overhead for
Index: src/include/storage/bufpage.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/storage/bufpage.h,v
retrieving revision 1.71
diff -c -r1.71 bufpage.h
*** src/include/storage/bufpage.h    21 Feb 2007 20:02:17 -0000    1.71
--- src/include/storage/bufpage.h    1 Mar 2007 22:14:37 -0000
***************
*** 68,74 ****
   * space"; each AM has an "opaque" structure defined somewhere that is
   * stored as the page trailer.    an access method should always
   * initialize its pages with PageInit and then set its own opaque
!  * fields.
   */

  typedef Pointer Page;
--- 68,76 ----
   * space"; each AM has an "opaque" structure defined somewhere that is
   * stored as the page trailer.    an access method should always
   * initialize its pages with PageInit and then set its own opaque
!  * fields.  In addition to the special space, there's a 16-bit
!  * pd_amprivate field in the page header itself reserved for AM-specific
!  * use.
   */

  typedef Pointer Page;
***************
*** 90,95 ****
--- 92,98 ----
   *
   *        pd_lsn        - identifies xlog record for last change to this page.
   *        pd_tli        - ditto.
+  *        pd_amprivate - reserved for AM-specific use
   *        pd_lower    - offset to start of free space.
   *        pd_upper    - offset to end of free space.
   *        pd_special    - offset to start of special space.
***************
*** 98,105 ****
   * The LSN is used by the buffer manager to enforce the basic rule of WAL:
   * "thou shalt write xlog before data".  A dirty buffer cannot be dumped
   * to disk until xlog has been flushed at least as far as the page's LSN.
!  * We also store the TLI for identification purposes (it is not clear that
!  * this is actually necessary, but it seems like a good idea).
   *
   * The page version number and page size are packed together into a single
   * uint16 field.  This is for historical reasons: before PostgreSQL 7.3,
--- 101,109 ----
   * The LSN is used by the buffer manager to enforce the basic rule of WAL:
   * "thou shalt write xlog before data".  A dirty buffer cannot be dumped
   * to disk until xlog has been flushed at least as far as the page's LSN.
!  * We also store the 16 least significant bits of the TLI for identification
!  * purposes (it is not clear that this is actually necessary, but it seems
!  * like a good idea).
   *
   * The page version number and page size are packed together into a single
   * uint16 field.  This is for historical reasons: before PostgreSQL 7.3,
***************
*** 119,125 ****
      /* XXX LSN is member of *any* block, not only page-organized ones */
      XLogRecPtr    pd_lsn;            /* LSN: next byte after last byte of xlog
                                   * record for last change to this page */
!     TimeLineID    pd_tli;            /* TLI of last change */
      LocationIndex pd_lower;        /* offset to start of free space */
      LocationIndex pd_upper;        /* offset to end of free space */
      LocationIndex pd_special;    /* offset to start of special space */
--- 123,131 ----
      /* XXX LSN is member of *any* block, not only page-organized ones */
      XLogRecPtr    pd_lsn;            /* LSN: next byte after last byte of xlog
                                   * record for last change to this page */
!     uint16        pd_tli;            /* least significant bits of the TLI
!                                  * corresponding the LSN. */
!     uint16        pd_amprivate;    /* am-specific information */
      LocationIndex pd_lower;        /* offset to start of free space */
      LocationIndex pd_upper;        /* offset to end of free space */
      LocationIndex pd_special;    /* offset to start of special space */
***************
*** 304,313 ****
  #define PageSetLSN(page, lsn) \
      (((PageHeader) (page))->pd_lsn = (lsn))

  #define PageGetTLI(page) \
      (((PageHeader) (page))->pd_tli)
  #define PageSetTLI(page, tli) \
!     (((PageHeader) (page))->pd_tli = (tli))

  /* ----------------------------------------------------------------
   *        extern declarations
--- 310,320 ----
  #define PageSetLSN(page, lsn) \
      (((PageHeader) (page))->pd_lsn = (lsn))

+ /* NOTE: only the 16 least significant bits are stored */
  #define PageGetTLI(page) \
      (((PageHeader) (page))->pd_tli)
  #define PageSetTLI(page, tli) \
!     (((PageHeader) (page))->pd_tli = (tli) & 0x0000FFFF)

  /* ----------------------------------------------------------------
   *        extern declarations

Re: A little COPY speedup

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> I'll post a patch along those lines.

> Here it is.

> I'm not fond of the macro names for the flag, but couldn't think of
> anything shorter yet descriptive.

Let's reverse the sense of the flag bit; this seems a good idea since
the initial state should be 0 = there are no free pointers.  Also I'd
go with PD_ as the prefix, for consistency with the field names.
This brings us to PD_HAS_FREE_LINE_POINTERS or some abbreviation thereof
(maybe PD_HAS_FREE_LINES is sufficient).

I'd also go with "pd_flags" as the field name; "amprivate" seems to
imply way too much about what we might later use the flags for.

Barring objections, I'll tweak this as above and apply.

            regards, tom lane

Re: A little COPY speedup

From
Tom Lane
Date:
I wrote:
> Barring objections, I'll tweak this as above and apply.

I've applied the attached modified version of this patch.  It seemed
better to me to centralize the handling of this flag bit in PageAddItem
and PageRepairFragmentation, instead of having it in the callers as you
did.  This means that the bit applies to all pages not only heap pages,
but at least for the moment that has no downside.  I note that GIN
indexes do use PageAddItem with offsetNumber = InvalidOffsetNumber,
so they will get some performance benefit too.

            regards, tom lane


Index: doc/src/sgml/storage.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/storage.sgml,v
retrieving revision 1.14
diff -c -r1.14 storage.sgml
*** doc/src/sgml/storage.sgml    31 Jan 2007 20:56:19 -0000    1.14
--- doc/src/sgml/storage.sgml    2 Mar 2007 00:48:25 -0000
***************
*** 427,434 ****
    The first 20 bytes of each page consists of a page header
    (PageHeaderData). Its format is detailed in <xref
    linkend="pageheaderdata-table">. The first two fields track the most
!   recent WAL entry related to this page. They are followed by three 2-byte
!   integer fields
    (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>,
    and <structfield>pd_special</structfield>). These contain byte offsets
    from the page start to the start
--- 427,434 ----
    The first 20 bytes of each page consists of a page header
    (PageHeaderData). Its format is detailed in <xref
    linkend="pageheaderdata-table">. The first two fields track the most
!   recent WAL entry related to this page. Next is a 2-byte field
!   containing flag bits. This is followed by three 2-byte integer fields
    (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>,
    and <structfield>pd_special</structfield>). These contain byte offsets
    from the page start to the start
***************
*** 437,448 ****
    The last 2 bytes of the page header,
    <structfield>pd_pagesize_version</structfield>, store both the page size
    and a version indicator.  Beginning with
!   <productname>PostgreSQL</productname> 8.1 the version number is 3;
    <productname>PostgreSQL</productname> 8.0 used version number 2;
    <productname>PostgreSQL</productname> 7.3 and 7.4 used version number 1;
    prior releases used version number 0.
!   (The basic page layout and header format has not changed in these versions,
!   but the layout of heap row headers has.)  The page size
    is basically only present as a cross-check; there is no support for having
    more than one page size in an installation.

--- 437,449 ----
    The last 2 bytes of the page header,
    <structfield>pd_pagesize_version</structfield>, store both the page size
    and a version indicator.  Beginning with
!   <productname>PostgreSQL</productname> 8.3 the version number is 4;
!   <productname>PostgreSQL</productname> 8.1 and 8.2 used version number 3;
    <productname>PostgreSQL</productname> 8.0 used version number 2;
    <productname>PostgreSQL</productname> 7.3 and 7.4 used version number 1;
    prior releases used version number 0.
!   (The basic page layout and header format has not changed in most of these
!   versions, but the layout of heap row headers has.)  The page size
    is basically only present as a cross-check; there is no support for having
    more than one page size in an installation.

***************
*** 470,478 ****
    </row>
    <row>
     <entry>pd_tli</entry>
!    <entry>TimeLineID</entry>
!    <entry>4 bytes</entry>
!    <entry>TLI of last change</entry>
    </row>
    <row>
     <entry>pd_lower</entry>
--- 471,485 ----
    </row>
    <row>
     <entry>pd_tli</entry>
!    <entry>uint16</entry>
!    <entry>2 bytes</entry>
!    <entry>TimeLineID of last change (only its lowest 16 bits)</entry>
!   </row>
!   <row>
!    <entry>pd_flags</entry>
!    <entry>uint16</entry>
!    <entry>2 bytes</entry>
!    <entry>Flag bits</entry>
    </row>
    <row>
     <entry>pd_lower</entry>
Index: src/backend/storage/page/bufpage.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/page/bufpage.c,v
retrieving revision 1.71
diff -c -r1.71 bufpage.c
*** src/backend/storage/page/bufpage.c    21 Feb 2007 20:02:17 -0000    1.71
--- src/backend/storage/page/bufpage.c    2 Mar 2007 00:48:26 -0000
***************
*** 39,44 ****
--- 39,45 ----
      /* Make sure all fields of page are zero, as well as unused space */
      MemSet(p, 0, pageSize);

+     /* p->pd_flags = 0;                     done by above MemSet */
      p->pd_lower = SizeOfPageHeaderData;
      p->pd_upper = pageSize - specialSize;
      p->pd_special = pageSize - specialSize;
***************
*** 73,78 ****
--- 74,80 ----
      /* Check normal case */
      if (PageGetPageSize(page) == BLCKSZ &&
          PageGetPageLayoutVersion(page) == PG_PAGE_LAYOUT_VERSION &&
+         (page->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
          page->pd_lower >= SizeOfPageHeaderData &&
          page->pd_lower <= page->pd_upper &&
          page->pd_upper <= page->pd_special &&
***************
*** 165,178 ****
      else
      {
          /* offsetNumber was not passed in, so find a free slot */
!         /* look for "recyclable" (unused & deallocated) ItemId */
!         for (offsetNumber = 1; offsetNumber < limit; offsetNumber++)
          {
!             itemId = PageGetItemId(phdr, offsetNumber);
!             if (!ItemIdIsUsed(itemId) && ItemIdGetLength(itemId) == 0)
!                 break;
          }
-         /* if no free slot, we'll put it at limit (1st open slot) */
      }

      if (offsetNumber > limit)
--- 167,193 ----
      else
      {
          /* offsetNumber was not passed in, so find a free slot */
!         /* if no free slot, we'll put it at limit (1st open slot) */
!         if (PageHasFreeLinePointers(phdr))
!         {
!             /* look for "recyclable" (unused & deallocated) ItemId */
!             for (offsetNumber = 1; offsetNumber < limit; offsetNumber++)
!             {
!                 itemId = PageGetItemId(phdr, offsetNumber);
!                 if (!ItemIdIsUsed(itemId) && ItemIdGetLength(itemId) == 0)
!                     break;
!             }
!             if (offsetNumber >= limit)
!             {
!                 /* the hint is wrong, so reset it */
!                 PageClearHasFreeLinePointers(phdr);
!             }
!         }
!         else
          {
!             /* don't bother searching if hint says there's no free slot */
!             offsetNumber = limit;
          }
      }

      if (offsetNumber > limit)
***************
*** 413,425 ****
          pfree(itemidbase);
      }

      return (nline - nused);
  }

  /*
   * PageGetFreeSpace
   *        Returns the size of the free (allocatable) space on a page,
!  *        deducted by the space needed for a new line pointer.
   */
  Size
  PageGetFreeSpace(Page page)
--- 428,446 ----
          pfree(itemidbase);
      }

+     /* Set hint bit for PageAddItem */
+     if (nused < nline)
+         PageSetHasFreeLinePointers(page);
+     else
+         PageClearHasFreeLinePointers(page);
+
      return (nline - nused);
  }

  /*
   * PageGetFreeSpace
   *        Returns the size of the free (allocatable) space on a page,
!  *        reduced by the space needed for a new line pointer.
   */
  Size
  PageGetFreeSpace(Page page)
Index: src/include/catalog/catversion.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/catversion.h,v
retrieving revision 1.388
diff -c -r1.388 catversion.h
*** src/include/catalog/catversion.h    20 Feb 2007 17:32:17 -0000    1.388
--- src/include/catalog/catversion.h    2 Mar 2007 00:48:26 -0000
***************
*** 53,58 ****
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    200702202

  #endif
--- 53,58 ----
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    200703011

  #endif
Index: src/include/storage/bufpage.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/storage/bufpage.h,v
retrieving revision 1.71
diff -c -r1.71 bufpage.h
*** src/include/storage/bufpage.h    21 Feb 2007 20:02:17 -0000    1.71
--- src/include/storage/bufpage.h    2 Mar 2007 00:48:26 -0000
***************
*** 90,95 ****
--- 90,96 ----
   *
   *        pd_lsn        - identifies xlog record for last change to this page.
   *        pd_tli        - ditto.
+  *        pd_flags    - flag bits.
   *        pd_lower    - offset to start of free space.
   *        pd_upper    - offset to end of free space.
   *        pd_special    - offset to start of special space.
***************
*** 98,105 ****
   * The LSN is used by the buffer manager to enforce the basic rule of WAL:
   * "thou shalt write xlog before data".  A dirty buffer cannot be dumped
   * to disk until xlog has been flushed at least as far as the page's LSN.
!  * We also store the TLI for identification purposes (it is not clear that
!  * this is actually necessary, but it seems like a good idea).
   *
   * The page version number and page size are packed together into a single
   * uint16 field.  This is for historical reasons: before PostgreSQL 7.3,
--- 99,107 ----
   * The LSN is used by the buffer manager to enforce the basic rule of WAL:
   * "thou shalt write xlog before data".  A dirty buffer cannot be dumped
   * to disk until xlog has been flushed at least as far as the page's LSN.
!  * We also store the 16 least significant bits of the TLI for identification
!  * purposes (it is not clear that this is actually necessary, but it seems
!  * like a good idea).
   *
   * The page version number and page size are packed together into a single
   * uint16 field.  This is for historical reasons: before PostgreSQL 7.3,
***************
*** 119,125 ****
      /* XXX LSN is member of *any* block, not only page-organized ones */
      XLogRecPtr    pd_lsn;            /* LSN: next byte after last byte of xlog
                                   * record for last change to this page */
!     TimeLineID    pd_tli;            /* TLI of last change */
      LocationIndex pd_lower;        /* offset to start of free space */
      LocationIndex pd_upper;        /* offset to end of free space */
      LocationIndex pd_special;    /* offset to start of special space */
--- 121,129 ----
      /* XXX LSN is member of *any* block, not only page-organized ones */
      XLogRecPtr    pd_lsn;            /* LSN: next byte after last byte of xlog
                                   * record for last change to this page */
!     uint16        pd_tli;            /* least significant bits of the TimeLineID
!                                  * containing the LSN */
!     uint16        pd_flags;        /* flag bits, see below */
      LocationIndex pd_lower;        /* offset to start of free space */
      LocationIndex pd_upper;        /* offset to end of free space */
      LocationIndex pd_special;    /* offset to start of special space */
***************
*** 130,140 ****
  typedef PageHeaderData *PageHeader;

  /*
   * Page layout version number 0 is for pre-7.3 Postgres releases.
   * Releases 7.3 and 7.4 use 1, denoting a new HeapTupleHeader layout.
   * Release 8.0 uses 2; it changed the HeapTupleHeader layout again.
   * Release 8.1 uses 3; it redefined HeapTupleHeader infomask bits.
!  * Release 8.3 uses 4; it changed the HeapTupleHeader layout again.
   */
  #define PG_PAGE_LAYOUT_VERSION        4

--- 134,157 ----
  typedef PageHeaderData *PageHeader;

  /*
+  * pd_flags contains the following flag bits.  Undefined bits are initialized
+  * to zero and may be used in the future.
+  *
+  * PD_HAS_FREE_LINES is set if there are any not-LP_USED line pointers before
+  * pd_lower.  This should be considered a hint rather than the truth, since
+  * changes to it are not WAL-logged.
+  */
+ #define PD_HAS_FREE_LINES    0x0001    /* are there any unused line pointers? */
+
+ #define PD_VALID_FLAG_BITS    0x0001    /* OR of all valid pd_flags bits */
+
+ /*
   * Page layout version number 0 is for pre-7.3 Postgres releases.
   * Releases 7.3 and 7.4 use 1, denoting a new HeapTupleHeader layout.
   * Release 8.0 uses 2; it changed the HeapTupleHeader layout again.
   * Release 8.1 uses 3; it redefined HeapTupleHeader infomask bits.
!  * Release 8.3 uses 4; it changed the HeapTupleHeader layout again, and
!  * added the pd_flags field (by stealing some bits from pd_tli).
   */
  #define PG_PAGE_LAYOUT_VERSION        4

***************
*** 299,313 ****
       ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \
        / sizeof(ItemIdData)))

  #define PageGetLSN(page) \
      (((PageHeader) (page))->pd_lsn)
  #define PageSetLSN(page, lsn) \
      (((PageHeader) (page))->pd_lsn = (lsn))

  #define PageGetTLI(page) \
      (((PageHeader) (page))->pd_tli)
  #define PageSetTLI(page, tli) \
!     (((PageHeader) (page))->pd_tli = (tli))

  /* ----------------------------------------------------------------
   *        extern declarations
--- 316,342 ----
       ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \
        / sizeof(ItemIdData)))

+ /*
+  * Additional macros for access to page headers
+  */
  #define PageGetLSN(page) \
      (((PageHeader) (page))->pd_lsn)
  #define PageSetLSN(page, lsn) \
      (((PageHeader) (page))->pd_lsn = (lsn))

+ /* NOTE: only the 16 least significant bits are stored */
  #define PageGetTLI(page) \
      (((PageHeader) (page))->pd_tli)
  #define PageSetTLI(page, tli) \
!     (((PageHeader) (page))->pd_tli = (uint16) (tli))
!
! #define PageHasFreeLinePointers(page) \
!     (((PageHeader) (page))->pd_flags & PD_HAS_FREE_LINES)
! #define PageSetHasFreeLinePointers(page) \
!     (((PageHeader) (page))->pd_flags |= PD_HAS_FREE_LINES)
! #define PageClearHasFreeLinePointers(page) \
!     (((PageHeader) (page))->pd_flags &= ~PD_HAS_FREE_LINES)
!

  /* ----------------------------------------------------------------
   *        extern declarations

Re: A little COPY speedup

From
"Simon Riggs"
Date:
On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote:

> I ran oprofile on a COPY FROM to get an overview of where the CPU time
> is spent. To my amazement, the function at the top of the list was
> PageAddItem with 16% of samples.

Excellent.

I'm slightly worried though since that seems to have changed from 8.2,
which I oprofiled over Christmas. I can't recall what's changed though.
Was this just on one system? Is it possible there is an effect on one
system and not another?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: A little COPY speedup

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote:
>
>> I ran oprofile on a COPY FROM to get an overview of where the CPU time
>> is spent. To my amazement, the function at the top of the list was
>> PageAddItem with 16% of samples.
>
> Excellent.
>
> I'm slightly worried though since that seems to have changed from 8.2,
> which I oprofiled over Christmas. I can't recall what's changed though.
> Was this just on one system? Is it possible there is an effect on one
> system and not another?

Well, there's one big change: your patch to suppress WAL logging on
tables created in the same transaction. I ran the test again, this time
creating the table in a separate transaction:

samples  %        image name               app name
symbol name
5480     17.0366  postgres                 postgres
XLogInsert
3684     11.4531  postgres                 postgres
PageAddItem
3580     11.1298  libc-2.3.6.so            postgres                 memcpy
2498      7.7660  postgres                 postgres                 DoCopy
1265      3.9327  postgres                 postgres
LWLockAcquire
1210      3.7617  postgres                 postgres
CopyReadLine
1042      3.2394  postgres                 postgres
LWLockRelease
1038      3.2270  postgres                 postgres
heap_formtuple
1033      3.2115  libc-2.3.6.so            postgres
____strtol_l_internal
875       2.7203  postgres                 postgres                 hash_any

The profile will probably look somewhat different depending on your
data, encoding etc.

All the page locking related functions account for ~10% in total,
including the LWLockAcquire/Release, Pin/UnBuffer, hash_any and so on.
And then there's all the memcpying...

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: A little COPY speedup

From
"Simon Riggs"
Date:
On Fri, 2007-03-02 at 10:09 +0000, Heikki Linnakangas wrote:

> Well, there's one big change: your patch to suppress WAL logging on
> tables created in the same transaction.

OK, just checking thats what you meant.

> All the page locking related functions account for ~10% in total,
> including the LWLockAcquire/Release, Pin/UnBuffer, hash_any and so on.
> And then there's all the memcpying...

I think its a great spot that PageAddItem() was so bad. I realise I
didn't actually look at what it was doing, just looked at ways to avoid
doing it on each individual call to the block for each row.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: A little COPY speedup

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> I'm slightly worried though since that seems to have changed from 8.2,
> which I oprofiled over Christmas.

If you were testing a case with wider rows than Heikki tested, you'd see
less impact --- the cost of the old way was O(N^2) in the number of
tuples that fit on a page, so the behavior gets rapidly worse as you get
down to smaller tuple sizes.  (Come to think of it, the cmin/cmax
collapse would be a factor here too.)

            regards, tom lane

Re: A little COPY speedup

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> I'm slightly worried though since that seems to have changed from 8.2,
>> which I oprofiled over Christmas.
>
> If you were testing a case with wider rows than Heikki tested, you'd see
> less impact --- the cost of the old way was O(N^2) in the number of
> tuples that fit on a page, so the behavior gets rapidly worse as you get
> down to smaller tuple sizes.  (Come to think of it, the cmin/cmax
> collapse would be a factor here too.)

Or larger block sizes of course. A 32kb block would be 16x as bad which starts
to be pretty serious.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: A little COPY speedup

From
"Simon Riggs"
Date:
On Fri, 2007-03-02 at 16:25 +0000, Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > "Simon Riggs" <simon@2ndquadrant.com> writes:
> >> I'm slightly worried though since that seems to have changed from 8.2,
> >> which I oprofiled over Christmas.
> >
> > If you were testing a case with wider rows than Heikki tested, you'd see
> > less impact --- the cost of the old way was O(N^2) in the number of
> > tuples that fit on a page, so the behavior gets rapidly worse as you get
> > down to smaller tuple sizes.  (Come to think of it, the cmin/cmax
> > collapse would be a factor here too.)
>
> Or larger block sizes of course. A 32kb block would be 16x as bad which starts
> to be pretty serious.

Well, I was only using 8kb blocks.

But I think the message is clear: we need to profile lots of different
combinations. I was using a 2 col table with integer, char(100).

IIRC there are issues with delimiter handling when we have lots of
columns in the input on COPY FROM, and num of cols on COPY TO. I've not
looked at those recently though.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: A little COPY speedup

From
Andrew Dunstan
Date:
Simon Riggs wrote:
>
> IIRC there are issues with delimiter handling when we have lots of
> columns in the input on COPY FROM, and num of cols on COPY TO. I've not
> looked at those recently though.
>
>

What sort of issues? Anything that breaks on this has catastrophic
consequences.

cheers

andrew


Re: A little COPY speedup

From
"Simon Riggs"
Date:
On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote:
> Simon Riggs wrote:
> >
> > IIRC there are issues with delimiter handling when we have lots of
> > columns in the input on COPY FROM, and num of cols on COPY TO. I've not
> > looked at those recently though.
> >
> >
>
> What sort of issues? Anything that breaks on this has catastrophic
> consequences.

We were talking about performance, not data integrity....

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: A little COPY speedup

From
Andrew Dunstan
Date:
Simon Riggs wrote:
> On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote:
>
>> Simon Riggs wrote:
>>
>>> IIRC there are issues with delimiter handling when we have lots of
>>> columns in the input on COPY FROM, and num of cols on COPY TO. I've not
>>> looked at those recently though.
>>>
>>>
>>>
>> What sort of issues? Anything that breaks on this has catastrophic
>> consequences.
>>
>
> We were talking about performance, not data integrity....
>
>

OK. I'm still curious to know what the issues are with delimiter handling.

cheers

andrew

Re: A little COPY speedup

From
"Simon Riggs"
Date:
On Fri, 2007-03-02 at 12:09 -0500, Andrew Dunstan wrote:

> OK. I'm still curious to know what the issues are with delimiter handling.

Rumours only.

Feedback from someone else looking to the problem last year. IIRC there
was a feeling that if we didn't have to search for delimiters the COPY
FROM input parsing could be easier.

Vague recollection that the COPY TO uses the slower API for getting heap
attributes, but didn't seem to show up when I profiled few months back.

I'm not personally proposing to take those thoughts any further.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: A little COPY speedup

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Feedback from someone else looking to the problem last year. IIRC there
> was a feeling that if we didn't have to search for delimiters the COPY
> FROM input parsing could be easier.

Of what use is the above comment?  You have to parse the input into
fields somehow.

"It can be arbitrarily fast ... if it doesn't have to get the right answer."

            regards, tom lane

Re: A little COPY speedup

From
"Simon Riggs"
Date:
On Fri, 2007-03-02 at 13:24 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Feedback from someone else looking to the problem last year. IIRC there
> > was a feeling that if we didn't have to search for delimiters the COPY
> > FROM input parsing could be easier.
>
> Of what use is the above comment?

To me, none. Andrew asked me why I was vague - I explained.

If I'd written it on IRC it would be wisdom, on email not at all.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: A little COPY speedup

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> Feedback from someone else looking to the problem last year. IIRC there
>> was a feeling that if we didn't have to search for delimiters the COPY
>> FROM input parsing could be easier.
>
> Of what use is the above comment?  You have to parse the input into
> fields somehow.

Well those two requirements aren't inconsistent if you're using fixed-width
input text files. Not that I'm enamoured of such file formats, just saying.

The only file formats I ever wanted when I was doing this kind of stuff is tab
separated, all the others (comma separated and (egads) *pipe* separated?!) are
just disasters.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: A little COPY speedup

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Of what use is the above comment?  You have to parse the input into
>> fields somehow.

> Well those two requirements aren't inconsistent if you're using fixed-width
> input text files. Not that I'm enamoured of such file formats, just saying.

[ shrug... ]  And of course, shoving a large quantity of padding spaces
through the software has zero cost.

            regards, tom lane

Re: A little COPY speedup

From
Andrew Dunstan
Date:
Gregory Stark wrote:
>
>
> The only file formats I ever wanted when I was doing this kind of stuff is tab
> separated, all the others (comma separated and (egads) *pipe* separated?!) are
> just disasters.
>
>



Others of us have to operate in a world where we don't get to choose the
format of data we receive. Having Postgres work with common data
interchange formats is a Good Thing (tm).

(I also have my own opinion of the disutility of tab as a separator.)

cheers

andrew