Re: ResourceOwner refactoring - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: ResourceOwner refactoring |
Date | |
Msg-id | 20231025180729.dmsohuolcv4d5rqa@awork3.anarazel.de Whole thread Raw |
In response to | Re: ResourceOwner refactoring (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: ResourceOwner refactoring
|
List | pgsql-hackers |
Hi, On 2023-10-25 15:43:36 +0300, Heikki Linnakangas wrote: > On 10/07/2023 22:14, Andres Freund wrote: > > > /* > > > - * Initially allocated size of a ResourceArray. Must be power of two since > > > - * we'll use (arraysize - 1) as mask for hashing. > > > + * Size of the fixed-size array to hold most-recently remembered resources. > > > */ > > > -#define RESARRAY_INIT_SIZE 16 > > > +#define RESOWNER_ARRAY_SIZE 32 > > > > That's 512 bytes, pretty large to be searched sequentially. It's somewhat sad > > that the array needs to include 8 byte of ResourceOwnerFuncs... > > Yeah. Early on I considered having a RegisterResourceKind() function that > you need to call first, and then each resource kind could be assigned a > smaller ID. If we wanted to keep the old mechanism of separate arrays for > each different resource kind, that'd be the way to go. But overall this > seems simpler, and the performance is decent despite that linear scan. Realistically, on a 64bit platform, the compiler / we want 64bit alignment for ResourceElem->item, so making "kind" smaller isn't going to do much... > > Due to 'narr' being before the large 'arr', 'nhash' is always on a separate > > cacheline. I.e. the number of cachelines accessed in happy paths is higher > > than necessary, also relevant because there's more dependencies on narr, nhash > > than on the contents of those. > > > > I'd probably order it so that both of the large elements (arr, locks) are at > > the end. > > > > It's hard to know with changes this small, but it does seem to yield a small > > and repeatable performance benefit (pipelined pgbench -S). > > Swapped the 'hash' and 'arr' parts of the struct, so that 'arr' and 'locks' > are at the end. I don't remember what the layout was then, but now there are 10 bytes of holes on x86-64 (and I think most other 64bit architectures). struct ResourceOwnerData { ResourceOwner parent; /* 0 8 */ ResourceOwner firstchild; /* 8 8 */ ResourceOwner nextchild; /* 16 8 */ const char * name; /* 24 8 */ _Bool releasing; /* 32 1 */ _Bool sorted; /* 33 1 */ /* XXX 6 bytes hole, try to pack */ ResourceElem * hash; /* 40 8 */ uint32 nhash; /* 48 4 */ uint32 capacity; /* 52 4 */ uint32 grow_at; /* 56 4 */ uint32 narr; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ ResourceElem arr[32]; /* 64 512 */ /* --- cacheline 9 boundary (576 bytes) --- */ uint32 nlocks; /* 576 4 */ /* XXX 4 bytes hole, try to pack */ LOCALLOCK * locks[15]; /* 584 120 */ /* size: 704, cachelines: 11, members: 14 */ /* sum members: 694, holes: 2, sum holes: 10 */ }; While somewhat annoying from a human pov, it seems like it would make sense to rearrange a bit? At least moving nlocks into the first cacheline would be good (not that we guarantee cacheline alignment rn, though it sometimes works out to be aligned). We also can make narr, nlocks uint8s. With that narrowing and reordering, we end up with 688 bytes. Not worth for most structs, but here perhaps worth doing? Moving the lock length to the start of the struct would make sense to keep things that are likely to be used in a "stalling" manner at the start of the struct / in one cacheline. It's not too awful to have it be in this order: struct ResourceOwnerData { ResourceOwner parent; /* 0 8 */ ResourceOwner firstchild; /* 8 8 */ ResourceOwner nextchild; /* 16 8 */ const char * name; /* 24 8 */ _Bool releasing; /* 32 1 */ _Bool sorted; /* 33 1 */ uint8 narr; /* 34 1 */ uint8 nlocks; /* 35 1 */ uint32 nhash; /* 36 4 */ uint32 capacity; /* 40 4 */ uint32 grow_at; /* 44 4 */ ResourceElem arr[32]; /* 48 512 */ /* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */ ResourceElem * hash; /* 560 8 */ LOCALLOCK * locks[15]; /* 568 120 */ /* size: 688, cachelines: 11, members: 14 */ /* last cacheline: 48 bytes */ }; Requires rephrasing a few comments to document that the lenghts are separate from the array / hashtable / locks, but otherwise... This reliably shows a decent speed improvement in my stress test [1], on the order of 5%. At that point, the first array entry fits into the first cacheline. If we were to move parent, firstchild, nextchild, name further down the struct, three array entries would be on the first line. Just using the array presumably is a very common case, so that might be worth it? I got less clear performance results with this one, and it's also quite possible it could hurt, if resowners aren't actually "used", just released. Therefore it's probably not worth it for now. Greetings, Andres Freund [1] pgbench running DO $$ BEGIN FOR i IN 1..5000 LOOP PERFORM abalance FROM pgbench_accounts WHERE aid = 17;END LOOP;END;$$;
pgsql-hackers by date: