Re: tableam vs. TOAST - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: tableam vs. TOAST |
Date | |
Msg-id | CA+TgmoZFUK3M9ShVuphVT16t=d3TpUP7d9axo-e9DjJe70WBSQ@mail.gmail.com Whole thread Raw |
In response to | Re: tableam vs. TOAST (Andres Freund <andres@anarazel.de>) |
Responses |
Re: tableam vs. TOAST
Re: tableam vs. TOAST Re: tableam vs. TOAST |
List | pgsql-hackers |
On Fri, Aug 2, 2019 at 6:42 PM Andres Freund <andres@anarazel.de> wrote: > Hm, those all include writing, right? And for read-only we don't expect > any additional overhead, correct? The write overhead is probably too > large show a bit of function call overhead - but if so, it'd probably be > on unlogged tables? And with COPY, because that utilizes multi_insert, > which means more toasting in a shorter amount of time? Yes and yes. I guess we could test the unlogged case and with COPY, but in any realistic case you're still looking for a tiny CPU overhead in a sea of I/O costs. Even if an extremely short COPY on an unlogged table regresses slightly, we do not normally reject patches that improve code quality on the grounds that they add function call overhead in a few places. Code like this hard to optimize and maintain; as you remarked yourself, there are multiple opportunities to do this stuff better that are hard to see in the current structure. > .oO(why does everyone attach attachements out of order? Is that > a gmail thing?) Must be. > I wonder if toasting.c should be moved too? I mean, we could, but I don't really see a reason. It'd just be moving it from one fairly-generic place to another, and I'd rather minimize churn. > trailing whitespace after "#ifndef DETOAST_H ". Will fix. > Hm. This leaves toast_insert_or_update() as a name. That makes it sound > like it's generic toast code, rather than heap specific? I'll rename it to heap_toast_insert_or_update(). But I think I'll put that in 0004 with the other renames. > It's definitely nice how a lot of repetitive code has been deduplicated. > Also makes it easier to see how algorithmically expensive > toast_insert_or_update() is :(. Yep. > Shouldn't this condition be the other way round? I had to fight pretty hard to stop myself from tinkering with the algorithm -- this can clearly be done better, but I wanted to make it match the existing structure as far as possible. It also only needs to be tested once, not on every loop iteration, so if we're going to start changing things, we should go further than just swapping the order of the tests. For now I prefer to draw a line in the sand and change nothing. > Couldn't most of these be handled via colflags, instead of having > numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED, > TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the > size check ought to boil down to a single mask test? I'm not really seeing how more flags would significantly simplify this logic, but I might be missing something. > I wonder if a better prefix wouldn't be toasting_... I'm open to other votes, but I think it's toast_tuple is more specific than toasting_ and thus better. > > +/* > > + * Information about one column of a tuple being toasted. > > + * > > + * NOTE: toast_action[i] can have these values: > > + * ' ' default handling > > + * 'p' already processed --- don't touch it > > + * 'x' incompressible, but OK to move off > > + * > > + * NOTE: toast_attr[i].tai_size is only made valid for varlena attributes with > > + * toast_action[i] different from 'p'. > > + */ > > +typedef struct > > +{ > > + struct varlena *tai_oldexternal; > > + int32 tai_size; > > + uint8 tai_colflags; > > +} ToastAttrInfo; > > I think that comment is outdated? Oops. Will fix. > > +/* > > + * Flags indicating the overall state of a TOAST operation. > > + * > > + * TOAST_NEEDS_DELETE_OLD indicates that one or more old TOAST datums need > > + * to be deleted. > > + * > > + * TOAST_NEEDS_FREE indicates that one or more TOAST values need to be freed. > > + * > > + * TOAST_HAS_NULLS indicates that nulls were found in the tuple being toasted. > > + * > > + * TOAST_NEEDS_CHANGE indicates that a new tuple needs to built; in other > > + * words, the toaster did something. > > + */ > > +#define TOAST_NEEDS_DELETE_OLD 0x0001 > > +#define TOAST_NEEDS_FREE 0x0002 > > +#define TOAST_HAS_NULLS 0x0004 > > +#define TOAST_NEEDS_CHANGE 0x0008 > > I'd make these enums. They're more often accessible in a debugger... Ugh, I hate that style. Abusing enums to make flag bits makes my skin crawl. I always wondered what the appeal was -- I guess now I know. Blech. > I'm quite unconvinced that making the chunk size specified by the AM is > a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in > pg_control etc. It seems a bit dangerous to let AMs provide the size, > without being very clear that any change of the value will make data > inaccessible. It'd be different if the max were only used during > toasting. I was actually thinking about proposing that we rip TOAST_MAX_CHUNK_SIZE out of pg_control. No real effort has been made here to make this something that users could configure, and I don't know of a good reason to configure it. It also seems pretty out of place in a world where there are multiple AMs floating around -- why should heap, and only heap, be checked there? Granted it does have some pride of place, but still. > I think the *size* checks should be weakened so we check: > 1) After each chunk, whether the already assembled chunks exceed the > expected size. > 2) After all chunks have been collected, check that the size is exactly > what we expect. > > I don't think that removes a meaningful amount of error > checking. Missing tuples etc get detected by the chunk_ids not being > consecutive. The overall size is still verified. > > The obvious problem with this is the slice fetching logic. For slices > with an offset of 0, it's obviously trivial to implement. For the higher > slice logic, I'd assume we'd have to fetch the first slice by estimating > where the start chunk is based on the current suggest chunk size, and > restarting the scan earlier/later if not accurate. In most cases it'll > be accurate, so we'd not loose efficiency. I don't feel entirely convinced that there's any rush to do all of this right now, and the more I change the harder it is to make sure that I haven't broken anything. How strongly do you feel about this stuff? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: