Re: NAMEDATALEN increase because of non-latin languages - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: NAMEDATALEN increase because of non-latin languages
Date
Msg-id CAEze2WjyrWF_1tsYF0ijrZ_aEKhwCtdpeCfRpQUYnDqGXC1DPw@mail.gmail.com
Whole thread Raw
In response to Re: NAMEDATALEN increase because of non-latin languages  (Andres Freund <andres@anarazel.de>)
Responses Re: NAMEDATALEN increase because of non-latin languages
List pgsql-hackers
On Thu, 19 Aug 2021 at 13:44, Andres Freund <andres@anarazel.de> wrote:
>
> > Another fun thing --- and, I fear, another good argument against just
> > raising NAMEDATALEN --- is what about TupleDescs, which last I checked
> > used an array of fixed-width pg_attribute images.  But maybe we could
> > replace that with an array of pointers.  Andres already did a lot of
> > the heavy code churn required to hide that data structure behind
> > TupleDescAttr() macros, so changing the representation should be much
> > less painful than it would once have been.
>
> I was recently wondering if we shouldn't go to a completely bespoke
> datastructure for TupleDesc->attrs, rather than reusing FormData_pg_attribute.
>
> Right now every attribute uses nearly two cachelines (112 bytes). Given how
> frequent a task tuple [de]forming is, and how often it's a bottleneck,
> increasing the cache efficiency of tupledescs would worth quite a bit of
> effort - I do see tupledesc attr cache misses in profiles. A secondary benefit
> would be that we do create a lot of short-lived descs in the executor,
> slimming those down obviously would be good on its own. A third benefit would
> be that we could get rid of attcacheoff in pg_attribute, that always smelled
> funny to me.
>
> One possible way to structure such future tupledescs would be to have multiple
> arrays in struct TupleDescData. With an array of just the data necessary for
> [de]forming at the place ->attrs is, and other stuff in one or more separate
> arrays. The other option could perhaps be omitted for some tupledescs or
> computed lazily.
>
> For deforming we just need attlen (2byte), attbyval (1 byte), attalign (1byte)
> and optionally attcacheoff (4 byte), for forming we also need attstorage (1
> byte). Naively that ends up being 12 bytes - 5 attrs / cacheline is a heck of
> a lot better than ~0.5.

I tried to implement this 'compact attribute access descriptor' a few
months ago in my effort to improve btree index performance.

I abandoned the idea at the time as I didn't find any measurable
difference for the (limited!) tests I ran, where the workload was
mainly re-indexing, select * into, and similar items while
benchmarking reindexing in the 'pp-complete' dataset. But, seeing that
there might be interest outside this effort on a basis seperate from
just plain performance, I'll share the results.

Attached is the latest version of my patch that I could find; it might
be incorrect or fail, as this is something I sent to myself between 2
of my systems during development of the patch. Also, attached as .txt,
as I don't want any CFBot coverage on this (this is not proposed for
inclusion, it is just a show of work, and might be basis for future
work).

The patch allocates an array of 'TupleAttrAlignData'-structs at the
end of the attrs-array, fills it with the correct data upon
TupleDesc-creation, and uses this TupleAttrAlign-data for constructing
and destructing tuples.

One main difference from what you described was that I used a union
for storing attbyval and attstorage, as the latter is only applicable
to attlen < 0, and the first only for attlen >= 0. This keeps the
whole structure in 8 bytes, whilst also being useable in both tuple
forming and deforming.

I hope this can is useful, otherwise sorry for the noise.


Kind regards,

Matthias van de Meent

Attachment

pgsql-hackers by date:

Previous
From: Dipesh Pandit
Date:
Subject: Re: .ready and .done files considered harmful
Next
From: Andres Freund
Date:
Subject: Re: NAMEDATALEN increase because of non-latin languages