Re: NAMEDATALEN increase because of non-latin languages - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: NAMEDATALEN increase because of non-latin languages |
Date | |
Msg-id | CAEze2WjyrWF_1tsYF0ijrZ_aEKhwCtdpeCfRpQUYnDqGXC1DPw@mail.gmail.com Whole thread Raw |
In response to | Re: NAMEDATALEN increase because of non-latin languages (Andres Freund <andres@anarazel.de>) |
Responses |
Re: NAMEDATALEN increase because of non-latin languages
|
List | pgsql-hackers |
On Thu, 19 Aug 2021 at 13:44, Andres Freund <andres@anarazel.de> wrote: > > > Another fun thing --- and, I fear, another good argument against just > > raising NAMEDATALEN --- is what about TupleDescs, which last I checked > > used an array of fixed-width pg_attribute images. But maybe we could > > replace that with an array of pointers. Andres already did a lot of > > the heavy code churn required to hide that data structure behind > > TupleDescAttr() macros, so changing the representation should be much > > less painful than it would once have been. > > I was recently wondering if we shouldn't go to a completely bespoke > datastructure for TupleDesc->attrs, rather than reusing FormData_pg_attribute. > > Right now every attribute uses nearly two cachelines (112 bytes). Given how > frequent a task tuple [de]forming is, and how often it's a bottleneck, > increasing the cache efficiency of tupledescs would worth quite a bit of > effort - I do see tupledesc attr cache misses in profiles. A secondary benefit > would be that we do create a lot of short-lived descs in the executor, > slimming those down obviously would be good on its own. A third benefit would > be that we could get rid of attcacheoff in pg_attribute, that always smelled > funny to me. > > One possible way to structure such future tupledescs would be to have multiple > arrays in struct TupleDescData. With an array of just the data necessary for > [de]forming at the place ->attrs is, and other stuff in one or more separate > arrays. The other option could perhaps be omitted for some tupledescs or > computed lazily. > > For deforming we just need attlen (2byte), attbyval (1 byte), attalign (1byte) > and optionally attcacheoff (4 byte), for forming we also need attstorage (1 > byte). Naively that ends up being 12 bytes - 5 attrs / cacheline is a heck of > a lot better than ~0.5. I tried to implement this 'compact attribute access descriptor' a few months ago in my effort to improve btree index performance. I abandoned the idea at the time as I didn't find any measurable difference for the (limited!) tests I ran, where the workload was mainly re-indexing, select * into, and similar items while benchmarking reindexing in the 'pp-complete' dataset. But, seeing that there might be interest outside this effort on a basis seperate from just plain performance, I'll share the results. Attached is the latest version of my patch that I could find; it might be incorrect or fail, as this is something I sent to myself between 2 of my systems during development of the patch. Also, attached as .txt, as I don't want any CFBot coverage on this (this is not proposed for inclusion, it is just a show of work, and might be basis for future work). The patch allocates an array of 'TupleAttrAlignData'-structs at the end of the attrs-array, fills it with the correct data upon TupleDesc-creation, and uses this TupleAttrAlign-data for constructing and destructing tuples. One main difference from what you described was that I used a union for storing attbyval and attstorage, as the latter is only applicable to attlen < 0, and the first only for attlen >= 0. This keeps the whole structure in 8 bytes, whilst also being useable in both tuple forming and deforming. I hope this can is useful, otherwise sorry for the noise. Kind regards, Matthias van de Meent
Attachment
pgsql-hackers by date: