making update/delete of inheritance trees scale better - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | making update/delete of inheritance trees scale better |
Date | |
Msg-id | CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com Whole thread Raw |
Responses |
Re: making update/delete of inheritance trees scale better
|
List | pgsql-hackers |
Here is a sketch for implementing the design that Tom described here: https://www.postgresql.org/message-id/flat/357.1550612935%40sss.pgh.pa.us In short, we would like to have only one plan for ModifyTable to get tuples out of to update/delete, not N for N child result relations as is done currently. I suppose things are the way they are because creating a separate plan for each result relation makes the job of ModifyTable node very simple, which is currently this: 1. Take the plan's output tuple, extract the tupleid of the tuple to update/delete in the currently active result relation, 2. If delete, go to 3, else if update, filter out the junk columns from the above tuple 3. Call ExecUpdate()/ExecDelete() on the result relation with the new tuple, if any If we make ModifyTable do a little more work for the inheritance case, we can create only one plan but without "expanding" the targetlist. That is, it will contain entries only for attributes that are assigned values in the SET clause. This makes the plan reusable across result relations, because all child relations must have those attributes, even though the attribute numbers might be different. Anyway, the work that ModifyTable will now have to do is this: 1. Take the plan's output tuple, extract tupleid of the tuple to update/delete and "tableoid" 2. Select the result relation to operate on using the tableoid 3. If delete, go to 4, else if update, fetch the tuple identified by tupleid from the result relation and fill in the unassigned columns using that "old" tuple, also filtering out the junk columns 4. Call ExecUpdate()/ExecDelete() on the result relation with the new tuple, if any I do think that doing this would be worthwhile even if we may be increasing ModifyTable's per-row overhead slightly, because planning overhead of the current approach is very significant, especially for partition trees with beyond a couple of thousand partitions. As to how bad the problem is, trying to create a generic plan for `update foo set ... where key = $1`, where foo has over 2000 partitions, causes OOM even on a machine with 6GB of memory. The one plan shared by all result relations will be same as the one we would get if the query were SELECT, except it will contain junk attributes such as ctid needed to identify tuples and a new "tableoid" junk attribute if multiple result relations will be present due to inheritance. One major way in which this targetlist differs from the current per-result-relation plans is that it won't be passed through expand_targetlist(), because the set of unassigned attributes may not be unique among children. As mentioned above, those missing attributes will be filled by ModifyTable doing some extra work, whereas previously they would have come with the plan's output tuple. For child result relations that are foreign tables, their FDW adds junk attribute(s) to the query’s targetlist by updating it in-place (AddForeignUpdateTargets). However, as the child tables will no longer get their own parsetree, we must use some hack around this interface to obtain the foreign table specific junk attributes and add them to the original/parent query’s targetlist. Assuming that all or most of the children will belong to the same FDW, we will end up with only a handful such junk columns in the final targetlist. I am not sure if it's worthwhile to change the API of AddForeignUpdateTargets to require FDWs to not scribble on the passed-in parsetree as part of this patch. As for how ModifyTable will create the new tuple for updates, I have decided to use a ProjectionInfo for each result relation, which projects a full, *clean* tuple ready to be put back into the relation. When projecting, plan’s output tuple serves as OUTER tuple and the old tuple fetched to fill unassigned attributes serves as SCAN tuple. By having this ProjectionInfo also serve as the “junk filter”, we don't need JunkFilters. The targetlist that this projection computes is same as that of the result-relation-specific plan. Initially, I thought to generate this "expanded" targetlist in ExecInitModifyTable(). But as it can be somewhat expensive, doing it only once in the planner seemed like a good idea. These per-result-relations targetlists are carried in the ModifyTable node. To identify the result relation from the tuple produced by the plan, “tableoid” junk column will be used. As the tuples for different result relations won’t necessarily come out in the order in which result relations are laid out in the ModifyTable node, we need a way to map the tableoid value to result relation indexes. I have decided to use a hash table here. A couple of things that I didn't think very hard what to do about now, but may revisit later. * We will no longer be able use DirectModify APIs to push updates to remote servers for foreign child result relations * Over in [1], I have said that we get run-time pruning for free for ModifyTable because the plan we are using is same as that for SELECT, although now I think that I hadn't thought that through. With the PoC patch that I have: prepare q as update foo set a = 250001 where a = $1; set plan_cache_mode to 'force_generic_plan'; explain execute q(1); QUERY PLAN -------------------------------------------------------------------- Update on foo (cost=0.00..142.20 rows=40 width=14) Update on foo_1 Update on foo_2 foo Update on foo_3 foo Update on foo_4 foo -> Append (cost=0.00..142.20 rows=40 width=14) Subplans Removed: 3 -> Seq Scan on foo_1 (cost=0.00..35.50 rows=10 width=14) Filter: (a = $1) (9 rows) While it's true that we will never have to actually update foo_2, foo_3, and foo_4, ModifyTable still sets up its ResultRelInfos, which ideally it shouldn't. Maybe we'll need to do something about that after all. I will post the patch shortly. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA%2BHiwqGXmP3-S9y%3DOQHyJyeWnZSOmcxBGdgAMWcLUOsnPTL88w%40mail.gmail.com
pgsql-hackers by date: