Thread: update non-indexed value is slow if some non-related index/fk are enabled

update non-indexed value is slow if some non-related index/fk are enabled

From
Philippe Doussot
Date:

Hi all,

I'm look for some hint about this behaviour :


One UPDATE of one boolean value which is non-indexed take some time ( for many tuples .. 3 Millions ).

But if we disable all indexes/fk the UPDATE is 10x faster and do 10x less shared hit .

I don't understand why disabling all index from the table speed up the update because the boolean column is not indexed


For exemple , disabling the index was done like this :

UPDATE pg_index SET indisready=false WHERE indexrelid in (620809,620837,620839,620841,620854,618764,620855,620790,620790,620840);


In both UPDATE with or without index enabled/disable, the PLAN is the same.


Postgres 9.5 :
Explain with index : slow
Update on public.t_acte_entite (cost=0.00..52924.83 rows=2758149 width=45) (actual time=120908.338..120908.338 rows=0 loops=1)
Buffers: shared hit=101546408 read=164106 dirtied=171996 written=9529
-> Seq Scan on public.t_acte_entite (cost=0.00..52924.83 rows=2758149 width=45) (actual time=0.028..1783.946 rows=3033768 loops=1)
Output: id_acte_entite, fk_acte, fk_rubrique, fk_client, fk_salarie, fk_cycldet, fk_planning, act_user_modified, fk_ssiad_visite, fk_ssiad_visite_planning, true, acte_realise, acte_realise_modifie, ctid
Filter: ((NOT t_acte_entite.acte_prevu) AND ((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR (t_acte_entite.fk_ssiad_visite IS NOT NULL)))
Rows Removed by Filter: 16115
Buffers: shared hit=10 read=22416 written=9441
Planning time: 0.133 ms
Execution time: 120908.387 ms

Explain with index disabled :
Update on public.t_acte_entite (cost=0.00..52924.83 rows=2757021 width=45) (actual time=9007.353..9007.353 rows=0 loops=1)
Buffers: shared hit=9145912 read=44740 dirtied=44620 written=60
-> Seq Scan on public.t_acte_entite (cost=0.00..52924.83 rows=2757021 width=45) (actual time=2.608..1121.776 rows=3033768 loops=1)
Output: id_acte_entite, fk_acte, fk_rubrique, fk_client, fk_salarie, fk_cycldet, fk_planning, act_user_modified, fk_ssiad_visite, fk_ssiad_visite_planning, true, acte_realise, acte_realise_modifie, ctid
Filter: ((NOT t_acte_entite.acte_prevu) AND ((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR (t_acte_entite.fk_ssiad_visite IS NOT NULL)))
Rows Removed by Filter: 16115
Buffers: shared hit=1 read=22425 written=60
Planning time: 5.835 ms
Execution time: 9007.400 ms

You can see that the Buffers: shared hit= drop from 101 Millions to 10Million without the indexes
But as the query plan only do a Sec Scan and don't acces an index I'm little surprised.


Same on postgres 13 :

//without explain to know how many update are done :
test=# update t_acte_entite set acte_prevu = true where (fk_ssiad_visite_planning is not null or fk_ssiad_visite is not null);
UPDATE 3049406

//First explain ( with index enabled )
test=# explain (analyse, COSTS, buffers, verbose, FORMAT JSON) update t_acte_entite set acte_prevu = true where (fk_ssiad_visite_planning is not null or fk_ssiad_visite is not null);
[
  {
    "Plan": {
      "Node Type": "ModifyTable",
      "Operation": "Update",
      "Parallel Aware": false,
      "Relation Name": "t_acte_entite",
      "Schema": "public",
      "Alias": "t_acte_entite",
      "Startup Cost": 0.00,
      "Total Cost": 87603.92,
      "Plan Rows": 2748711,
      "Plan Width": 46,
      "Actual Startup Time": 74072.048,
      "Actual Total Time": 74072.049,
      "Actual Rows": 0,
      "Actual Loops": 1,
      "Shared Hit Blocks": 89724572,
      "Shared Read Blocks": 115370,
      "Shared Dirtied Blocks": 101855,
      "Shared Written Blocks": 54351,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Member",
          "Parallel Aware": false,
          "Relation Name": "t_acte_entite",
          "Schema": "public",
          "Alias": "t_acte_entite",
          "Startup Cost": 0.00,
          "Total Cost": 87603.92,
          "Plan Rows": 2748711,
          "Plan Width": 46,
          "Actual Startup Time": 0.047,
          "Actual Total Time": 1040.544,
          "Actual Rows": 3049406,
          "Actual Loops": 1,
          "Output": ["id_acte_entite", "fk_acte", "fk_rubrique", "fk_client", "fk_salarie", "fk_cycldet", "fk_planning", "act_user_modified", "fk_ssiad_visite", "fk_ssiad_visite_planning", "true", "acte_realise", "acte_realise_modifie", "ctid"],
          "Filter": "((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR (t_acte_entite.fk_ssiad_visite IS NOT NULL))",
          "Rows Removed by Filter": 477,
          "Shared Hit Blocks": 4334,
          "Shared Read Blocks": 53032,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 14709,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 0
        }
      ]
    },
    "Planning": {
      "Shared Hit Blocks": 94,
      "Shared Read Blocks": 12,
      "Shared Dirtied Blocks": 2,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning Time": 1.007,
    "Triggers": [
    ],
    "Execution Time": 74075.229
  }
]

//Second explain ( with index enabled )
test=# explain (analyse, COSTS, buffers, verbose, FORMAT JSON) update t_acte_entite set acte_prevu = true where (fk_ssiad_visite_planning is not null or fk_ssiad_visite is not null);
[
  {
    "Plan": {
      "Node Type": "ModifyTable",
      "Operation": "Update",
      "Parallel Aware": false,
      "Relation Name": "t_acte_entite",
      "Schema": "public",
      "Alias": "t_acte_entite",
      "Startup Cost": 0.00,
      "Total Cost": 91080.72,
      "Plan Rows": 2782747,
      "Plan Width": 46,
      "Actual Startup Time": 76641.220,
      "Actual Total Time": 76641.221,
      "Actual Rows": 0,
      "Actual Loops": 1,
      "Shared Hit Blocks": 94719974,
      "Shared Read Blocks": 122224,
      "Shared Dirtied Blocks": 104959,
      "Shared Written Blocks": 42452,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Member",
          "Parallel Aware": false,
          "Relation Name": "t_acte_entite",
          "Schema": "public",
          "Alias": "t_acte_entite",
          "Startup Cost": 0.00,
          "Total Cost": 91080.72,
          "Plan Rows": 2782747,
          "Plan Width": 46,
          "Actual Startup Time": 48.873,
          "Actual Total Time": 1128.069,
          "Actual Rows": 3049406,
          "Actual Loops": 1,
          "Output": ["id_acte_entite", "fk_acte", "fk_rubrique", "fk_client", "fk_salarie", "fk_cycldet", "fk_planning", "act_user_modified", "fk_ssiad_visite", "fk_ssiad_visite_planning", "true", "acte_realise", "acte_realise_modifie", "ctid"],
          "Filter": "((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR (t_acte_entite.fk_ssiad_visite IS NOT NULL))",
          "Rows Removed by Filter": 477,
          "Shared Hit Blocks": 4033,
          "Shared Read Blocks": 56444,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 7256,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 0
        }
      ]
    },
    "Planning": {
      "Shared Hit Blocks": 101,
      "Shared Read Blocks": 5,
      "Shared Dirtied Blocks": 2,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning Time": 0.902,
    "Triggers": [
    ],
    "Execution Time": 76644.337
  }
]

//now disabling indexes
test=# UPDATE pg_index set indisready=false WHERE indrelid = ( SELECT oid FROM pg_class WHERE relname='t_acte_entite' );                     
UPDATE 9

//first run with index disabled
test=# explain (analyse, COSTS, buffers, verbose, FORMAT JSON) update t_acte_entite set acte_prevu = true where (fk_ssiad_visite_planning is not null or fk_ssiad_visite is not null);
[
  {
    "Plan": {
      "Node Type": "ModifyTable",
      "Operation": "Update",
      "Parallel Aware": false,
      "Relation Name": "t_acte_entite",
      "Schema": "public",
      "Alias": "t_acte_entite",
      "Startup Cost": 0.00,
      "Total Cost": 75577.00,
      "Plan Rows": 2794251,
      "Plan Width": 46,
      "Actual Startup Time": 12513.195,
      "Actual Total Time": 12513.195,
      "Actual Rows": 0,
      "Actual Loops": 1,
      "Shared Hit Blocks": 9228453,
      "Shared Read Blocks": 54260,
      "Shared Dirtied Blocks": 44853,
      "Shared Written Blocks": 27665,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Member",
          "Parallel Aware": false,
          "Relation Name": "t_acte_entite",
          "Schema": "public",
          "Alias": "t_acte_entite",
          "Startup Cost": 0.00,
          "Total Cost": 75577.00,
          "Plan Rows": 2794251,
          "Plan Width": 46,
          "Actual Startup Time": 0.049,
          "Actual Total Time": 807.352,
          "Actual Rows": 3049406,
          "Actual Loops": 1,
          "Output": ["id_acte_entite", "fk_acte", "fk_rubrique", "fk_client", "fk_salarie", "fk_cycldet", "fk_planning", "act_user_modified", "fk_ssiad_visite", "fk_ssiad_visite_planning", "true", "acte_realise", "acte_realise_modifie", "ctid"],
          "Filter": "((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR (t_acte_entite.fk_ssiad_visite IS NOT NULL))",
          "Rows Removed by Filter": 477,
          "Shared Hit Blocks": 13018,
          "Shared Read Blocks": 31830,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 13868,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 0
        }
      ]
    },
    "Planning": {
      "Shared Hit Blocks": 147,
      "Shared Read Blocks": 6,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning Time": 1.286,
    "Triggers": [
    ],
    "Execution Time": 12516.468
  }
]

//second run
test=# explain (analyse, COSTS, buffers, verbose, FORMAT JSON) update t_acte_entite set acte_prevu = true where (fk_ssiad_visite_planning is not null or fk_ssiad_visite is not null);
[
  {
    "Plan": {
      "Node Type": "ModifyTable",
      "Operation": "Update",
      "Parallel Aware": false,
      "Relation Name": "t_acte_entite",
      "Schema": "public",
      "Alias": "t_acte_entite",
      "Startup Cost": 0.00,
      "Total Cost": 75577.00,
      "Plan Rows": 2794251,
      "Plan Width": 46,
      "Actual Startup Time": 18273.687,
      "Actual Total Time": 18273.688,
      "Actual Rows": 0,
      "Actual Loops": 1,
      "Shared Hit Blocks": 9214814,
      "Shared Read Blocks": 43530,
      "Shared Dirtied Blocks": 51243,
      "Shared Written Blocks": 55114,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Member",
          "Parallel Aware": false,
          "Relation Name": "t_acte_entite",
          "Schema": "public",
          "Alias": "t_acte_entite",
          "Startup Cost": 0.00,
          "Total Cost": 75577.00,
          "Plan Rows": 2794251,
          "Plan Width": 46,
          "Actual Startup Time": 50.822,
          "Actual Total Time": 959.917,
          "Actual Rows": 3049406,
          "Actual Loops": 1,
          "Output": ["id_acte_entite", "fk_acte", "fk_rubrique", "fk_client", "fk_salarie", "fk_cycldet", "fk_planning", "act_user_modified", "fk_ssiad_visite", "fk_ssiad_visite_planning", "true", "acte_realise", "acte_realise_modifie", "ctid"],
          "Filter": "((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR (t_acte_entite.fk_ssiad_visite IS NOT NULL))",
          "Rows Removed by Filter": 477,
          "Shared Hit Blocks": 1324,
          "Shared Read Blocks": 43524,
          "Shared Dirtied Blocks": 9507,
          "Shared Written Blocks": 19397,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 0
        }
      ]
    },
    "Planning": {
      "Shared Hit Blocks": 0,
      "Shared Read Blocks": 0,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning Time": 0.192,
    "Triggers": [
    ],
    "Execution Time": 18276.726
  }
]
test=#


If no one has an idea I will try to make a simple reproducer with a ligth schema

Thanks

Philippe

Re: update non-indexed value is slow if some non-related index/fk are enabled

From
"David G. Johnston"
Date:
On Friday, September 3, 2021, Philippe Doussot <philippe.doussot@up.coop> wrote:

One UPDATE of one boolean value which is non-indexed take some time ( for many tuples .. 3 Millions ).

But if we disable all indexes/fk the UPDATE is 10x faster and do 10x less shared hit .

I don't understand why disabling all index from the table speed up the update because the boolean column is not indexed


Index entries point to physical records.  You just deleted one physical record and added another.  The indexes need to be updated with that information.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, September 3, 2021, Philippe Doussot <philippe.doussot@up.coop>
> wrote:
>> I don't understand why disabling all index from the table speed up the
>> update because the boolean column is not indexed

> Index entries point to physical records.  You just deleted one physical
> record and added another.  The indexes need to be updated with that
> information.

Yeah.  The OP's mental model is apparently update-in-place, but that's
not how Postgres does things.

The index-update overhead is avoided if the update is "HOT", which
requires that (a) no indexed column changes and (b) there is room
on the same page for the new copy of the row.  Ensuring (b) requires
running with a fairly low fill-factor, which bloats your table and
thereby creates its own costs.  Still, that might be worth doing
depending on your particular circumstances.

            regards, tom lane



Re: update non-indexed value is slow if some non-related index/fk are enabled

From
rob stone
Date:

On Sun, 2021-09-05 at 11:21 -0400, Tom Lane wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Friday, September 3, 2021, Philippe Doussot <
> > philippe.doussot@up.coop>
> > wrote:
> > > I don't understand why disabling all index from the table speed
> > > up the
> > > update because the boolean column is not indexed
> 
> > Index entries point to physical records.  You just deleted one
> > physical
> > record and added another.  The indexes need to be updated with that
> > information.
> 
> Yeah.  The OP's mental model is apparently update-in-place, but
> that's
> not how Postgres does things.
> 
> The index-update overhead is avoided if the update is "HOT", which
> requires that (a) no indexed column changes and (b) there is room
> on the same page for the new copy of the row.  Ensuring (b) requires
> running with a fairly low fill-factor, which bloats your table and
> thereby creates its own costs.  Still, that might be worth doing
> depending on your particular circumstances.
> 
>                         regards, tom lane
> 
> 
If the DDL for that table had the column defined like this:-

my_boolean  BOOLEAN,

instead of:-

my_boolean  BOOLEAN NOT NULL DEFAULT FALSE/TRUE, (whichever is
convenient)

then that column would contain either 'f' or 't' on insert instead of
null.

Then even if a fillfactor was not specified for that table, an update
of that single column (which does not appear in an index) would merely
swap the values.
Surely that would write it back in place?

Also, having boolean columns containing a null makes it difficult for
the getter's of that table deciding if 'null' is true or false.

Just an observation.

Rob





Re: update non-indexed value is slow if some non-related index/fk are enabled

From
Philippe Doussot
Date:
On 06/09/2021 10:21, rob stone wrote:

On Sun, 2021-09-05 at 11:21 -0400, Tom Lane wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Friday, September 3, 2021, Philippe Doussot <
philippe.doussot@up.coop>
wrote:
I don't understand why disabling all index from the table speed
up the
update because the boolean column is not indexed
Index entries point to physical records.  You just deleted one
physical
record and added another.  The indexes need to be updated with that
information.
Yeah.  The OP's mental model is apparently update-in-place, but
that's
not how Postgres does things.

The index-update overhead is avoided if the update is "HOT", which
requires that (a) no indexed column changes and (b) there is room
on the same page for the new copy of the row.  Ensuring (b) requires
running with a fairly low fill-factor, which bloats your table and
thereby creates its own costs.  Still, that might be worth doing
depending on your particular circumstances.

                        regards, tom lane


If the DDL for that table had the column defined like this:-

my_boolean  BOOLEAN,

instead of:-

my_boolean  BOOLEAN NOT NULL DEFAULT FALSE/TRUE, (whichever is
convenient)

then that column would contain either 'f' or 't' on insert instead of
null.

Then even if a fillfactor was not specified for that table, an update
of that single column (which does not appear in an index) would merely
swap the values.
Surely that would write it back in place?

Also, having boolean columns containing a null makes it difficult for
the getter's of that table deciding if 'null' is true or false.

Just an observation.

Rob

Yes my columns are NOT NULL DEFAULT FALSE

but the update always change the ctid ( new row in page )

I whas hopping the same optimisation as you: Write in place.

For boolean it is maybe doable because the value is fixed in size but for variable length ..

I was also expecting no row rewrite if value don't change .. easy for boolean but not for bigger fields


Philippe


Re: update non-indexed value is slow if some non-related index/fk are enabled

From
Philippe Doussot
Date:
On 05/09/2021 17:21, Tom Lane wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Friday, September 3, 2021, Philippe Doussot <philippe.doussot@up.coop>
wrote:
I don't understand why disabling all index from the table speed up the
update because the boolean column is not indexed
Index entries point to physical records.  You just deleted one physical
record and added another.  The indexes need to be updated with that
information.
Yeah.  The OP's mental model is apparently update-in-place, but that's
not how Postgres does things.

The index-update overhead is avoided if the update is "HOT", which
requires that (a) no indexed column changes and (b) there is room
on the same page for the new copy of the row.  Ensuring (b) requires
running with a fairly low fill-factor, which bloats your table and
thereby creates its own costs.  Still, that might be worth doing
depending on your particular circumstances.
			regards, tom lane
Hi,

Thank you both David and Tom for this precious information.

Yes, I was thinking that the update was made in-place because a boolean true or false take the same size :).
I did't expect the ctid was changed.

The main reason is effectivly
> (b) there is (no) room on the same page for the new copy of the row.

I now see that the default TABLE fillfactor is 100. No room for update in same page by default (our case).

The CTID is changing for each update (even if I UPDATE many time the same boolean column with same value false, false, false .. maybe a room for optimisation here )

And with some room (lower fillfactor) I see the HOT working well with the help from https://habr.com/en/company/postgrespro/blog/483768/ , Thanks Егор Рогов @erogov


Disabing index during update obviously put index out of sync with the new row location which require an reindex.

I can now better explain to my team why this update is slow ( without lower fillfactor).
Why it is quick with index disabled.
Why disabling index without reindexing it after enabling it is a very bad idea .. the ctid as changed (without HOT update) and index is out-of-date.

Many thanks

Philippe


Re: update non-indexed value is slow if some non-related index/fk are enabled

From
"David G. Johnston"
Date:
On Monday, September 6, 2021, Philippe Doussot <philippe.doussot@up.coop> wrote:
I whas hopping the same optimisation as you: Write in place.



How exactly would you expect “update-in-place” to work given the nature of MVCC?

David J.

Re: update non-indexed value is slow if some non-related index/fk are enabled

From
Philippe Doussot
Date:

On 06/09/2021 16:13, David G. Johnston wrote:
On Monday, September 6, 2021, Philippe Doussot <philippe.doussot@up.coop> wrote:
I whas hopping the same optimisation as you: Write in place.



How exactly would you expect “update-in-place” to work given the nature of MVCC?

David J.

Yes right, it is probably not possible due to concurrency.

I don't know enough about it.

My expectations was naive


Again, thanks David for pointing me to the fact that the raw record was new on each update.


Philippe

Re: update non-indexed value is slow if some non-related index/fk are enabled

From
"Peter J. Holzer"
Date:
On 2021-09-06 07:13:07 -0700, David G. Johnston wrote:
> On Monday, September 6, 2021, Philippe Doussot <philippe.doussot@up.coop>
> wrote:
>
>     I whas hopping the same optimisation as you: Write in place.
>
> How exactly would you expect “update-in-place” to work given the nature of
> MVCC?

Some databases (e.g. Oracle) store the old row in a separate place
(called an UNDO segment in Oracle) and then overwrite the row in place.
When another transaction tries to access the (not yet committed) row or
when the transaction is rolled the old row is retrieved from the undo
segment.

There are pros and cons to both approaches. Personally, I had less
problems with PostgreSQL's approach than with Oracle's.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment