Re: Fast Default WIP patch for discussion - Mailing list pgsql-hackers
From | Serge Rielau |
---|---|
Subject | Re: Fast Default WIP patch for discussion |
Date | |
Msg-id | 2C974F24-F02C-4E32-BAB6-895B546D8D47@rielau.com Whole thread Raw |
In response to | Re: Fast Default WIP patch for discussion (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Fast Default WIP patch for discussion
Re: Fast Default WIP patch for discussion |
List | pgsql-hackers |
> On Oct 28, 2016, at 5:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Oct 21, 2016 at 7:15 PM, Serge Rielau <serge@rielau.com> wrote: >> Some key design points requiring discussion: >> 1. Storage of the “exist” (working name) default >> Right now the patch stores the default value in its binary form as it >> would be in the tuple into a BYTEA. >> It would be feasible to store the pretty printed literal, however this >> requires calling the io functions when a >> tuple descriptor is built. > > A pretty-printed literal is a terrible idea because input functions > need not be immutable (e.g. timestamptz). I did not know that! Interesting. > I would have expected a > pg_node_tree column containing a CONST node, but your bytea thing > might be OK. I was debating following the example given by the current DEFAULT. But figured it’s overkill given that no-one user will ever have to look at it. And in the end a the pretty printed CONST node is no more readable. > >> 3. Delayed vs. early expansion of the tuples. >> To avoid having to decide when to copy tuple descriptors with or without >> constraints and defaults >> I have opted to expand the tuple at the core entry points. >> How do I know I have them all? An omission means wrong results! > > Early expansion seems right. "How do I know I have them all?" - and > not only all of the current ones but all future ones - seems like a > core sticking point for this patch. Yes, there is a certain amount of inherent, hard to control, risk towards the future that we must be willing to accept. > >> 4. attisnull() >> This routine is used in many places, but to give correct result sit must >> now be accompanied >> by the tuple descriptor. This becomes moderately messy and it’s not >> always clear where to get that. >> Interestingly most usages are related to catalog lookups. >> Assuming we have no intention to support fast default for catalog tables >> we could keep using the >> existing attisnull() api for catalog lookups and use a new version (name >> tbd) for user tables. > > attisnull is not a thing. There's heap_attisnull and slot_attisnull. My apologies. Obviously slot_attisnull() is a non issue since slots have tuple descriptors. It’s heap_attisnull() I am struggling with. It’s rather popular. Yet, in the large majority of cases exist defaults do not apply, so digging up the descriptor is rather wasteful. >> 5. My head hurts looking at the PK/FK code - it’s not always clear which >> tuple descriptor belongs >> to which tuple > > I suggest ibuprofen and a stiff upper lip. Sound advise. >> 6. Performance of the expansion code. >> The current code needs to walk all defaults and then start padding by >> filling in values. >> But the outcome is always the same. We will produce the same payload and >> the name null map. >> It would be feasible to cache an “all defaults tuple”, remember the >> offsets (and VARWIDTH, HASNULL) >> for each attribute and then simply splice the short and default tuples >> together. >> This ought to be faster, but the meta data to cache is not insignificant >> and the expansion code is messy enough >> without this already. > > You could experiment to figure out if it makes any difference. I think my first experiment will be to measure the performance impact of “expressed” vs. "non expressed" defaults with the current design. If that is considered excessive I will explore the more complex alternative. > >> 7. Grooming >> Obviously we can remove all exist defaults for a table from pg_attribute >> whenever the table is rewrittem. >> That’s easy. > > But might cause the table to expand a lot, which would suck. Well, a this point I must point back to the original goal which is O(1) ADD COLUMN performance. The footprint of a rewritten table is no worse after this patch. The reduced footprint of a table due to a rewrite avoided on ADD COLUMN is boon one must not rely upon. The existing tuple layout would support an enhancement where trailing defaults may be avoided. But I believe we would be better served with a more general approach to table compresion. > >> But could we/should we keep track of the short tuples and either >> eliminate them or drop exist defaults once they >> become obsolete because there is no tuple short enough for them to >> matter. > > I wouldn't mess with it. *phew* > >> 8. Do we need to worry about toasted defaults? > > Presumably. Time for me to dig into that then. Cheers Serge RIelau salesforce.com
pgsql-hackers by date: