Thread: My "TOAST slicing" patch -explanation
After reading the summary of Peter's comments on the patches in the list, I thought I would provide a little more explanation, as he suggests. At present, functions that take text arguments retrieve them via the PG_GETARG macros. This reconstructs the whole value into memory (potentially time-consuming for a large value). This may be unnecessary for some applications, where only a specified part of the value is needed e.g. the header data from an image. Because a TOASTed value is stored in equal-size chunks, it is straightforward (for an uncompressed value) to determine which chunk(s) contain a given substring. My patch adds accessor methods similar to the existing macros which fetch the minimal number of chunks to satisfy a particular substring request. (In the case of a compressed value, "minimal number" == "all of them"). In other words, instead of writing PG_GETARG_TEXT_P(0) and receiving back a large allocated value, you can say PG_GETARG_TEXT_P_SLICE(0,0,500) to receive only the first 500 bytes. This has a minimal performance impact -an indexscan is already used to retrieve TOAST chunks -the extra accessor routines just specify the second key as well as the first. I have rewritten text_substr and bytea_substr to use these methods. I have also added an ALTER TABLE x ALTER COLUMN y SET STORAGE command to provide a simple way to change attstorage for a column. (The macros above will be most efficient with uncompressed values.) Finally, I am aware of the following items which are not covered by the patch: 1) Efficient updating of parts of a value. This is not trivial[1]. 2) Should the large object interface be handled via TOAST?[2] As for the applications, well, it foes allow you to stream MP3s out of the database reasonably efficiently :-) Any questions, please ask... John [1] And I don't know how it might be used. Maybe if anyone has any ideas about substring assignment (someone did point me in the direction of an OVERLAY() function) or its applications I can think about that. [2] AIUI, the underlying mechanism is similar, but implemented separately.
John Gray <jgray@azuli.co.uk> writes: > Finally, I am aware of the following items which are not covered by the > patch: > 1) Efficient updating of parts of a value. This is not trivial[1]. Actually, based on subsequent discussion I now understand that efficient updating of parts of a TOASTed value is impossible, if by that you mean rewriting only the modified part. This is so because TOAST does not use MVCC, really: it relies on MVCC for the owning tuple to determine visibility of a tuple value. The only safe way to update a TOAST item is to rewrite the whole thing with a new TOAST id number and then update the owning tuple to reference that new id. Despite this, it'd be a really good idea to offer functions that allow applications to write part of a large TOASTed value. Even if it can't be as efficient as we'd like, we could still eliminate pushing the rest of the value back and forth to the client. > 2) Should the large object interface be handled via TOAST?[2] Probably not, given the above facts. We do have MVCC behavior for partial updates of large objects, and we shouldn't lose it. Having said all that, I think John's patch for substring extraction is fine in concept. I haven't looked at it in detail, but I think we should review it and expect to apply it (possibly with some cleanups). regards, tom lane
On Sun, 2002-02-24 at 23:52, Tom Lane wrote: > John Gray <jgray@azuli.co.uk> writes: > > Finally, I am aware of the following items which are not covered by the > > patch: > > > 1) Efficient updating of parts of a value. This is not trivial[1]. > > Actually, based on subsequent discussion I now understand that efficient > updating of parts of a TOASTed value is impossible, if by that you mean > rewriting only the modified part. This is so because TOAST does not > use MVCC, really: it relies on MVCC for the owning tuple to determine > visibility of a tuple value. Do TOAST tables participate in WAL ? > The only safe way to update a TOAST item > is to rewrite the whole thing with a new TOAST id number and then > update the owning tuple to reference that new id. Can't we still devise some way to reuse the chunks that did not change ? With some kind of double indirection and associated bookkeeping perhaps? > Despite this, it'd be a really good idea to offer functions that allow > applications to write part of a large TOASTed value. Even if it can't > be as efficient as we'd like, we could still eliminate pushing the rest > of the value back and forth to the client. I guess this can be already done with creative use of substring() and || > > 2) Should the large object interface be handled via TOAST?[2] > > Probably not, given the above facts. We do have MVCC behavior for > partial updates of large objects, and we shouldn't lose it. It would feel "cleaner" to have one representation for LOs - can't TOAST just be made to participate in MVCC? We could restict WAL of LOs to UPDATES only (and force fsync on TOAST FILE after INSERT) just to conserve log space. ---------------- Hannu
Hannu Krosing <hannu@krosing.net> writes: > Do TOAST tables participate in WAL ? Sure. regards, tom lane