Thread: ALTER TABLE uses a bistate but not for toast tables
ATRewriteTable() calls table_tuple_insert() with a bistate, to avoid clobbering and polluting the buffers. But heap_insert() then calls heap_prepare_insert() > heap_toast_insert_or_update > toast_tuple_externalize > toast_save_datum > heap_insert(toastrel, toasttup, mycid, options, NULL /* without bistate:( */); I came up with this patch. I'm not sure but maybe it should be implemented at the tableam layer and not inside heap. Maybe the BulkInsertState should have a 2nd strategy buffer for toast tables. CREATE TABLE t(i int, a text, b text, c text,d text,e text,f text,g text); INSERT INTO t SELECT 0, array_agg(a),array_agg(a),array_agg(a),array_agg(a),array_agg(a),array_agg(a) FROM generate_series(1,999)n,repeat(n::text,99)a,generate_series(1,99)bGROUP BY b; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; ALTER TABLE t ALTER i TYPE smallint; SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM pg_buffercache b JOIN pg_class c ON c.oid=b.relfilenode GROUPBY 2 ORDER BY 1 DESC LIMIT 9; Without this patch: postgres=# SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM pg_buffercache b JOIN pg_class c ON c.oid=b.relfilenodeGROUP BY 2 ORDER BY 1 DESC LIMIT 9; 10283 | pg_toast_55759 | 8967 With this patch: 1418 | pg_toast_16597 | 1418 -- Justin
Attachment
Hi,
On 6/22/22 4:38 PM, Justin Pryzby wrote:
ATRewriteTable() calls table_tuple_insert() with a bistate, to avoid clobbering and polluting the buffers. But heap_insert() then calls heap_prepare_insert() > heap_toast_insert_or_update > toast_tuple_externalize > toast_save_datum > heap_insert(toastrel, toasttup, mycid, options, NULL /* without bistate:( */);
Good catch!
I came up with this patch.
+ /* Release pin after main table, before switching to write to toast table */
+ if (bistate)
+ ReleaseBulkInsertStatePin(bistate);
I'm not sure we should release and reuse here the bistate of the main table: it looks like that with the patch ReadBufferBI() on the main relation wont have the desired block already pinned (then would need to perform a read).
What do you think about creating earlier a new dedicated bistate for the toast table?
+ if (bistate)
+ {
+ table_finish_bulk_insert(toastrel, options); // XXX
I think it's too early, as it looks to me that at this stage we may have not finished the whole bulk insert yet.
Regards,
-- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Sep 07, 2022 at 10:48:39AM +0200, Drouvot, Bertrand wrote: > + if (bistate) > + { > + table_finish_bulk_insert(toastrel, options); // XXX > > I think it's too early, as it looks to me that at this stage we may have not > finished the whole bulk insert yet. Yeah, that feels fishy. Not sure what's the idea behind the XXX comment, either. I have marked this patch as RwF, following the lack of reply. -- Michael
Attachment
On Wed, Sep 07, 2022 at 10:48:39AM +0200, Drouvot, Bertrand wrote: > Hi, > > On 6/22/22 4:38 PM, Justin Pryzby wrote: > > ATRewriteTable() calls table_tuple_insert() with a bistate, to avoid clobbering > > and polluting the buffers. > > > > But heap_insert() then calls > > heap_prepare_insert() > > > heap_toast_insert_or_update > > > toast_tuple_externalize > > > toast_save_datum > > > heap_insert(toastrel, toasttup, mycid, options, NULL /* without bistate:( */); > > What do you think about creating earlier a new dedicated bistate for the > toast table? Yes, but I needed to think about what data structure to put it in... Here, I created a 2nd bistate for toast whenever creating a bistate for heap. That avoids the need to add arguments to tableam's table_tuple_insert(), in addition to the 6 other functions in the call stack. I also updated rewriteheap.c to handle the same problem in CLUSTER: postgres=# DROP TABLE t; CREATE TABLE t AS SELECT i, repeat((5555555+i)::text, 123456)t FROM generate_series(1,9999)i; postgres=# VACUUM FULL VERBOSE t ; SELECT COUNT(1), datname, coalesce(c.relname,b.relfilenode::text), d.relname FROM pg_buffercacheb LEFT JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) LEFT JOIN pg_class d ON d.reltoastrelid=c.oidLEFT JOIN pg_database db ON db.oid=b.reldatabase GROUP BY 2,3,4 ORDER BY 1 DESC LIMIT 22; Unpatched: 5000 | postgres | pg_toast_96188840 | t => 40MB of shared buffers Patched: 2048 | postgres | pg_toast_17097 | t Note that a similar problem seems to exist in COPY ... but I can't see how to fix that one. -- Justin
Attachment
Hi!
Found this discussion for our experiments with TOAST, I'd have to check it under [1].
I'm not sure, what behavior is expected when the main table is unpinned, bulk insert
to the TOAST table is in progress, and the second query with a heavy bulk insert to
the same TOAST table comes in?
Thank you!
On Sun, Nov 27, 2022 at 11:15 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Sep 07, 2022 at 10:48:39AM +0200, Drouvot, Bertrand wrote:
> Hi,
>
> On 6/22/22 4:38 PM, Justin Pryzby wrote:
> > ATRewriteTable() calls table_tuple_insert() with a bistate, to avoid clobbering
> > and polluting the buffers.
> >
> > But heap_insert() then calls
> > heap_prepare_insert() >
> > heap_toast_insert_or_update >
> > toast_tuple_externalize >
> > toast_save_datum >
> > heap_insert(toastrel, toasttup, mycid, options, NULL /* without bistate:( */);
>
> What do you think about creating earlier a new dedicated bistate for the
> toast table?
Yes, but I needed to think about what data structure to put it in...
Here, I created a 2nd bistate for toast whenever creating a bistate for
heap. That avoids the need to add arguments to tableam's
table_tuple_insert(), in addition to the 6 other functions in the call
stack.
I also updated rewriteheap.c to handle the same problem in CLUSTER:
postgres=# DROP TABLE t; CREATE TABLE t AS SELECT i, repeat((5555555+i)::text, 123456)t FROM generate_series(1,9999)i;
postgres=# VACUUM FULL VERBOSE t ; SELECT COUNT(1), datname, coalesce(c.relname,b.relfilenode::text), d.relname FROM pg_buffercache b LEFT JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) LEFT JOIN pg_class d ON d.reltoastrelid=c.oid LEFT JOIN pg_database db ON db.oid=b.reldatabase GROUP BY 2,3,4 ORDER BY 1 DESC LIMIT 22;
Unpatched:
5000 | postgres | pg_toast_96188840 | t
=> 40MB of shared buffers
Patched:
2048 | postgres | pg_toast_17097 | t
Note that a similar problem seems to exist in COPY ... but I can't see
how to fix that one.
--
Justin
Hi Justin, This patch has gone stale quite some time ago; CFbot does not seem to have any history of a successful apply attemps, nor do we have any succesful build history (which was introduced some time ago already). Are you planning on rebasing this patch? Kind regards, Matthias van de Meent
@cfbot: rebased
Attachment
@cfbot: rebased
Attachment
> On Mon, Jul 15, 2024 at 03:43:24PM GMT, Justin Pryzby wrote: > @cfbot: rebased Hey Justin, Thanks for rebasing. To help with review, could you also describe current status of the patch? I have to admit, currently the commit message doesn't tell much, and looks more like notes for the future you. The patch numbering is somewhat confusing as well, should it be v5 now? From what I understand, the new patch does address the review feedback, but you want to do more, something with copy to / copy from? Since it's in the performance category, I'm also curious how much overhead does this shave off? I mean, I get it that bulk insert strategy helps with buffers usage, as you've implied in the thread -- but how does it look like in benchmark numbers?
On Tue, Nov 19, 2024 at 03:45:19PM +0100, Dmitry Dolgov wrote: > > On Mon, Jul 15, 2024 at 03:43:24PM GMT, Justin Pryzby wrote: > > @cfbot: rebased > > Thanks for rebasing. To help with review, could you also describe > current status of the patch? I have to admit, currently the commit > message doesn't tell much, and looks more like notes for the future you. The patch does what it aims to do and AFAIK in a reasonable way. I'm not aware of any issue with it. It's, uh, waiting for review. I'm happy to expand on the message to describe something like design choices, but the goal here is really simple: why should wide column values escape the intention of the ring buffer? AFAICT it's fixing an omission. If you have a question, please ask; that would help to indicate what needs to be explained. > The patch numbering is somewhat confusing as well, should it be v5 now? The filename was 0001-WIP-use-BulkInsertState-for-toast-tuples-too.patch. I guess you're referring to the previous filename: v4-*. That shouldn't be so confusing -- I just didn't specify a version, either by choice or by omission. > From what I understand, the new patch does address the review feedback, > but you want to do more, something with copy to / copy from? If I were to do more, it'd be for a future patch, if the current patch were to ever progress. > Since it's in the performance category, I'm also curious how much > overhead does this shave off? I mean, I get it that bulk insert strategy > helps with buffers usage, as you've implied in the thread -- but how > does it look like in benchmark numbers? The intent of using a bistate isn't to help the performance of the process using the bistate. Rather, the intent is to avoid harming the performance of other processes. If anything, I expect it could slow down the process using bistate -- the same as for non-toast data. https://www.postgresql.org/message-id/CA%2BTgmobC6RD2N8kbPPTvATpUY1kisY2wJLh2jsg%3DHGoCp2RiXw%40mail.gmail.com -- Justin
> On Wed, Nov 20, 2024 at 06:43:58AM -0600, Justin Pryzby wrote: > > > Thanks for rebasing. To help with review, could you also describe > > current status of the patch? I have to admit, currently the commit > > message doesn't tell much, and looks more like notes for the future you. > > The patch does what it aims to do and AFAIK in a reasonable way. I'm > not aware of any issue with it. It's, uh, waiting for review. > > I'm happy to expand on the message to describe something like design > choices, but the goal here is really simple: why should wide column > values escape the intention of the ring buffer? AFAICT it's fixing an > omission. If you have a question, please ask; that would help to > indicate what needs to be explained. Here is what I see in the commit message: DONE: ALTER, CLUSTER TODO: copyto, copyfrom? slot_getsomeattrs slot_deform_heap_tuple fetchatt heap_getnextslot => heapgettup => heapgetpage => ReadBufferExtended initscan table_beginscan table_scan_getnextslot RelationCopyStorageUsingBuffer ReadBufferWithoutRelcache (gdb) bt #0 table_open (relationId=relationId@entry=16390, lockmode=lockmode@entry=1) at table.c:40 #1 0x000056444cb23d3c in toast_fetch_datum (attr=attr@entry=0x7f67933cc6cc) at detoast.c:372 #2 0x000056444cb24217 in detoast_attr (attr=attr@entry=0x7f67933cc6cc) at detoast.c:123 #3 0x000056444d07a4c8 in pg_detoast_datum_packed (datum=datum@entry=0x7f67933cc6cc) at fmgr.c:1743 #4 0x000056444d042c8d in text_to_cstring (t=0x7f67933cc6cc) at varlena.c:224 #5 0x000056444d0434f9 in textout (fcinfo=<optimized out>) at varlena.c:573 #6 0x000056444d078f10 in FunctionCall1Coll (flinfo=flinfo@entry=0x56444e4706b0, collation=collation@entry=0, arg1=arg1@entry=140082828592844)at fmgr.c:1124 #7 0x000056444d079d7f in OutputFunctionCall (flinfo=flinfo@entry=0x56444e4706b0, val=val@entry=140082828592844) atfmgr.c:1561 #8 0x000056444ccb1665 in CopyOneRowTo (cstate=cstate@entry=0x56444e470898, slot=slot@entry=0x56444e396d20) at copyto.c:975 #9 0x000056444ccb2c7d in DoCopyTo (cstate=cstate@entry=0x56444e470898) at copyto.c:891 #10 0x000056444ccab4c2 in DoCopy (pstate=pstate@entry=0x56444e396bb0, stmt=stmt@entry=0x56444e3759b0, stmt_location=0,stmt_len=48, processed=processed@entry=0x7ffc212a6310) at copy.c:308 cluster: heapam_relation_copy_for_cluster reform_and_rewrite_tuple rewrite_heap_tuple raw_heap_insert This gave me an impression, that the patch is deeply WIP, and it doesn't make any sense to review it. I can imagine chances are good, that I'm not alone who get such impression, and you loose potential reviewers. Thus, shaping up a meaningful message might be helpful. > > Since it's in the performance category, I'm also curious how much > > overhead does this shave off? I mean, I get it that bulk insert strategy > > helps with buffers usage, as you've implied in the thread -- but how > > does it look like in benchmark numbers? > > The intent of using a bistate isn't to help the performance of the > process using the bistate. Rather, the intent is to avoid harming the > performance of other processes. If anything, I expect it could slow > down the process using bistate -- the same as for non-toast data. > > https://www.postgresql.org/message-id/CA%2BTgmobC6RD2N8kbPPTvATpUY1kisY2wJLh2jsg%3DHGoCp2RiXw%40mail.gmail.com Right, but the question is still there, how much does it bring? My point is that if you demonstrate "under this and that load, we get so and so many percents boost", this will hopefully attract more attention to the patch.