Re: Insert query performance - Mailing list pgsql-general

From David Rowley
Subject Re: Insert query performance
Date
Msg-id CAApHDvpeXfhQkM8R9EbHriCdA92PQjsT4jP-cUpoFhrn=USu9g@mail.gmail.com
Whole thread Raw
In response to Re: Insert query performance  (sud <suds1434@gmail.com>)
List pgsql-general
On Tue, 20 Aug 2024 at 19:09, sud <suds1434@gmail.com> wrote:
> However, my initial understanding of "having the FK index will improve the insert performance in the child table" is
notaccurate it seems. Rather as you mentioned it may negatively impact the loading/insert performance because it has to
nowupdate the additional index in each insert. In case of insert into child table, to ensure if the child row is
alreadypresent in the parent ,  it just scans the parent by the Primary key of the parent table (which is be default
indexed)and thus it doesn't need an index in the child table foreign keys or having an index in the foreign key in the
childtable won't help the constraint validation faster. Please correct me if my understanding is wrong here. 

If you think about what must happen when you insert into the
referencing table, the additional validation that the foreign key must
do is check that a corresponding record exists in the referenced
table. An index on the referencing table does not help speed that up.

> Additionally as you mentioned "explain analyze" will show a section on how much time it really takes for the
constraintvalidation , I can see that section now. But it seems it will really need that INSERT statement to be
executedand that we can't really do in production as that will physically insert data into the table. So do you mean to
justdo the "explain analyze" for the INSERT query and capture the plan and then do the rollback?  And in our case it's
arow by row insert happening , so we will see if we can club/sum that "constraint validation" time for a handful if
insertsomehow to get a better idea on the percentage of time we really spent in the constraint validation. 

I'd recommend performing a schema-only dump of your production
database and experimenting well away from production. See pg_dump
--schema-only.

I also recommend not leaving performance to chance and testing the
impact of index vs no index away from production with some data loaded
that is representative of your production data (or use the production
data if it's available and small enough to manage).  Use pgbench to
see what impact having the index on the referencing table has on
performance on inserts into that table vs what improvements you gain
from having the index when there's cascading delete from the
referenced table.

You might also want to look into auto_explain [1].  You can load this
into a single session and set auto_explain.log_min_duration = 0,
auto_explain.log_analyze = on and auto_explain.log_nested_statements =
on. That should give you the plan for the cascade DELETE query that's
executed by the trigger when you perform the DELETE on the referenced
table. (Also see the note about auto_explain.log_timing)

David

[1] https://www.postgresql.org/docs/15/auto-explain.html



pgsql-general by date:

Previous
From: Muhammad Ikram
Date:
Subject: Re: Insert query performance
Next
From: Greg Sabino Mullane
Date:
Subject: Re: Planet Postgres and the curse of AI