Re: A qsort template - Mailing list pgsql-hackers

From John Naylor
Subject Re: A qsort template
Date
Msg-id CAFBsxsEdMW0Jon5s3PMm-HtipoNfN8bK++HDOFNfaFnxfd+7PQ@mail.gmail.com
Whole thread Raw
In response to Re: A qsort template  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: A qsort template
List pgsql-hackers
On Mon, Jun 28, 2021 at 8:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:

[v4 patchset]

Hi Thomas,

(Sorry for the delay -- I have some time to put into this now.)

> The lower numbered patches are all things that are reused in many
> places, and in my humble opinion improve the notation and type safety
> and code deduplication generally when working with common types

I think 0001-0003 have had enough review previously to commit them, as
they are mostly notational. There's a small amount of bitrot, but not
enough to change the conclusions any. Also 0011 with the missing
#undef.

On Thu, Aug 5, 2021 at 7:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> If somebody wants to get a sense of what the size hit is from all of
> these specializations, I can recommend the diff feature of bloaty:
>
> https://github.com/google/bloaty/blob/master/doc/using.md#size-diffs
>
> Obviously you'd approach this by building postgres without the patch,
> and diffing that baseline to postgres with the patch. And possibly
> variations of the patch, with less or more sort specializations.

Thanks, that's a neat feature! For 0001-0003, the diff shows +700
bytes in memory, so pretty small:

$ bloaty -s vm src/backend/postgres -- src/backend/postgres.orig
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.0%    +608  +0.0%    +608    .text
  +0.0%     +64  +0.0%     +64    .eh_frame
  +0.0%     +24  +0.0%     +24    .dynsym
  +0.0%     +14  +0.0%     +14    .dynstr
  +0.0%      +2  +0.0%      +2    .gnu.version
  +0.0%     +58  [ = ]       0    .debug_abbrev
  +0.1%     +48  [ = ]       0    .debug_aranges
  +0.0% +1.65Ki  [ = ]       0    .debug_info
  +0.0%    +942  [ = ]       0    .debug_line
  +0.1%     +26  [ = ]       0    .debug_line_str
  +0.0%    +333  [ = ]       0    .debug_loclists
  -0.0%     -23  [ = ]       0    .debug_rnglists
  +0.0%     +73  [ = ]       0    .debug_str
  -1.0%      -4  [ = ]       0    .shstrtab
  +0.0%     +20  [ = ]       0    .strtab
  +0.0%     +24  [ = ]       0    .symtab
  +131% +3.30Ki  [ = ]       0    [Unmapped]
  +0.0% +7.11Ki  +0.0%    +712    TOTAL

[back to Thomas]

> I tried to measure a speedup in vacuum, but so far I have not.  I did
> learn some things though:  While doing that with an uncorrelated index
> and a lot of deleted tuples, I found that adding more
> maintenance_work_mem doesn't help beyond a few MB, because then cache
> misses dominate to the point where it's not better than doing multiple
> passes (and this is familiar to me from work on hash joins).  If I
> turned on huge pages on Linux and set min_dynamic_shared_memory so
> that the parallel DSM used by vacuum lives in huge pages, then
> parallel vacuum with a large maintenance_work_mem starts to do much
> better than non-parallel vacuum by improving the TLB misses (as with
> hash joins).  I thought that was quite interesting!  Perhaps
> bsearch_itemptr might help with correlated indexes with a lot of
> deleted indexes (so not dominated by cache misses), though?
>
> (I wouldn't be suprised if someone comes up with a much better idea
> than bsearch for that anyway...  a few ideas have been suggested.)

That's interesting about the (un)correlated index having such a large
effect on cache hit rate! By now there has been some discussion and a
benchmark for dead tuple storage [1]. bit there doesn't seem to be
recent activity on that thread. We might consider adding the ItemPtr
comparator work I did in [2] for v15 if we don't have any of the other
proposals in place by feature freeze. My concern there is the speedups
I observed were observed when the values were comfortably in L2 cache,
IIRC. That would need wider testing.

That said, I think what I'll do next is test the v3-0001 standalone
patch with tuplesort specializations for more data types. I already
have a decent test script that I can build on for this. (this is the
one currently in CI)

Then, I want to do at least preliminary testing of the qsort boundary
parameters.

Those two things should be doable for this commitfest.

[1] https://www.postgresql.org/message-id/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFBsxsG_c24CHKA3cWrOP1HynWGLOkLb8hyZfsD9db5g-ZPagA%40mail.gmail.com

-- 
John Naylor
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Windows vs recovery tests
Next
From: Tomas Vondra
Date:
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path