BTScanOpaqueData size slows down tests - Mailing list pgsql-hackers

From Andres Freund
Subject BTScanOpaqueData size slows down tests
Date
Msg-id kgz63a4hp6s22egd47mlgngkjsz44t6wgojzlzi67zgrx2mzl3@dntq6nrahdgr
Whole thread Raw
Responses Re: BTScanOpaqueData size slows down tests
Re: BTScanOpaqueData size slows down tests
List pgsql-hackers
Hi,

I was a bit annoyed at test times just now. Ran a profile on the entire
regression tests in a cassert -Og build.

Unsurprisingly most of the time is spent in AllocSetCheck(). I was mildly
surprised to see how expensive the new compact attribute checks are.

What I was more surprised to realize is how much of the time is spent in
freeing and allocating BTScanOpaqueData.

+    6.94%  postgres         postgres                                  [.] AllocSetCheck
-    4.96%  postgres         libc.so.6                                 [.] __memset_evex_unaligned_erms
   - 1.94% memset@plt
      - 1.12% _int_malloc
         - 1.11% malloc
            - 0.90% AllocSetAllocLarge
               - AllocSetAlloc
                  - 0.77% palloc
                     - 0.63% btbeginscan
                        - index_beginscan_internal
                           - 0.63% index_beginscan
                              - 0.61% systable_beginscan
                                 + 0.22% SearchCatCacheMiss
                                 + 0.07% ScanPgRelation
                                 + 0.05% RelationBuildTupleDesc
                                 + 0.04% findDependentObjects
                                   0.03% GetNewOidWithIndex
                                 + 0.02% deleteOneObject
                                 + 0.02% shdepDropDependency
                                 + 0.02% DeleteComments
                                 + 0.02% SearchCatCacheList
                                 + 0.02% DeleteSecurityLabel
                                 + 0.02% DeleteInitPrivs
                     + 0.04% text_to_cstring
                     + 0.02% cstring_to_text_with_len
                     + 0.02% datumCopy
                     + 0.02% tuplesort_begin_batch
                  + 0.11% palloc_extended
                  + 0.01% AllocSetRealloc
            + 0.20% AllocSetAllocFromNewBlock
      + 0.82% _int_free_merge_chunk
   - 1.90% __memset_evex_unaligned_erms
      - 1.82% wipe_mem
         - 1.33% AllocSetFree
            - 1.33% pfree
               + 0.73% btendscan
               + 0.22% freedfa
               + 0.06% ExecAggCopyTransValue
               + 0.04% freenfa
               + 0.03% enlarge_list
               + 0.03% ExecDropSingleTupleTableSlot
               + 0.02% xmlconcat
               + 0.01% RemoveLocalLock
               + 0.01% errcontext_msg
               + 0.01% IndexScanEnd
                 0.01% heap_free_minimal_tuple
         + 0.49% AllocSetReset
        0.02% palloc0
        0.01% PageInit
      + 0.01% wipe_mem
   + 0.59% alloc_perturb
   + 0.46% asm_exc_page_fault
   + 0.03% asm_sysvec_apic_timer_interrupt
   + 0.02% wipe_mem


Looking at the size of BTScanOpaqueData I am less surprised:

        /* --- cacheline 1 boundary (64 bytes) --- */
        char *                     currTuples;           /*    64     8 */
        char *                     markTuples;           /*    72     8 */
        int                        markItemIndex;        /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        BTScanPosData              currPos __attribute__((__aligned__(8))); /*    88 13632 */
        /* --- cacheline 214 boundary (13696 bytes) was 24 bytes ago --- */
        BTScanPosData              markPos __attribute__((__aligned__(8))); /* 13720 13632 */

        /* size: 27352, cachelines: 428, members: 17 */
        /* sum members: 27340, holes: 4, sum holes: 12 */
        /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
        /* last cacheline: 24 bytes */
} __attribute__((__aligned__(8)));

allocating, zeroing and freeing 28kB of memory for every syscache miss, yea,
that's gonna hurt.


The reason BTScanPosData is that large is that it stores MaxTIDsPerBTreePage*
sizeof(BTScanPosItem):
        BTScanPosItem              items[1358] __attribute__((__aligned__(2))); /*    48 13580 */


Could we perhaps allocate BTScanPosData->items dynamically if more than a
handful of items are needed?


And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole
only when mark/restore are used?


I'd be rather unsurprised if this isn't just an issue for tests, but also in a
few real workloads.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).
Next
From: Tender Wang
Date:
Subject: Re: bug when apply fast default mechanism for adding new column over domain with default value