Thread: Insert query performance

Insert query performance

From

sud

Date:

19 August 2024, 10:48:18

Hello All,

In a version 15.4 postgres database, Is it possible that , if we have two big range partition tables with foreign key relationships between them, insert into the child table can cause slowness if we don't have foreign key index present in the child table? Basically it need to make sure the new row already added to parent partition table or not.

And if there is any possible way(example query tracing etc) to get the underlying system queries which gets triggered as part of the main insert query? For example in above scenario, postgres must be executing some query to check if the incoming row to the child table already exists in the parent table or not?

Regards

Sud

Re: Insert query performance

From

Muhammad Ikram

Date:

19 August 2024, 10:55:32

Hi Sud,

Please make following change in your postgresql.conf file

log_statement = 'all'

Regards,

Ikram

On Mon, Aug 19, 2024 at 12:48 PM sud <suds1434@gmail.com> wrote:

Hello All,
In a version 15.4 postgres database, Is it possible that , if we have two big range partition tables with foreign key relationships between them, insert into the child table can cause slowness if we don't have foreign key index present in the child table? Basically it need to make sure the new row already added to parent partition table or not.

And if there is any possible way(example query tracing etc) to get the underlying system queries which gets triggered as part of the main insert query? For example in above scenario, postgres must be executing some query to check if the incoming row to the child table already exists in the parent table or not?

Regards
Sud

Muhammad Ikram

Re: Insert query performance

From

David Rowley

Date:

19 August 2024, 14:03:32

On Mon, 19 Aug 2024 at 19:48, sud <suds1434@gmail.com> wrote:
> In a version 15.4 postgres database, Is it possible that , if we have two big range partition tables with foreign key
relationshipsbetween them, insert into the child table can cause slowness if we don't have foreign key index present in
thechild table? Basically it need to make sure the new row already added to parent partition table or not. 

Having an index on the referencing columns is only useful for DELETEs
and UPDATEs affecting the foreign key column(s).  For INSERTs to the
referencing table, technically having indexes there would only slow
down inserts due to the additional overhead of having to maintain the
index, however, the overhead of having the index might be fairly
minuscule when compared to performing a CASCADE UPDATE or DELETE to
the referencing table when the DDL is performed on the referenced
table.

> And if there is any possible way(example query tracing etc) to get the underlying system queries which gets triggered
aspart of the main insert query? For example in above scenario, postgres must be executing some query to check if the
incoming row to the child table already exists in the parent table or not? 

EXPLAIN ANALYZE will list the time it took to execute the foreign key
trigger in the "Trigger for constraint" section.

David

Re: Insert query performance

From

sud

Date:

20 August 2024, 10:09:01

On Mon, Aug 19, 2024 at 4:33 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 19 Aug 2024 at 19:48, sud <suds1434@gmail.com> wrote:
> In a version 15.4 postgres database, Is it possible that , if we have two big range partition tables with foreign key relationships between them, insert into the child table can cause slowness if we don't have foreign key index present in the child table? Basically it need to make sure the new row already added to parent partition table or not.

Having an index on the referencing columns is only useful for DELETEs
and UPDATEs affecting the foreign key column(s). For INSERTs to the
referencing table, technically having indexes there would only slow
down inserts due to the additional overhead of having to maintain the
index, however, the overhead of having the index might be fairly
minuscule when compared to performing a CASCADE UPDATE or DELETE to
the referencing table when the DDL is performed on the referenced
table.

> And if there is any possible way(example query tracing etc) to get the underlying system queries which gets triggered as part of the main insert query? For example in above scenario, postgres must be executing some query to check if the incoming row to the child table already exists in the parent table or not?

EXPLAIN ANALYZE will list the time it took to execute the foreign key
trigger in the "Trigger for constraint" section.

David

Thank you so much David.

If I get it correct , the index on the foreign key mainly helps improve the deletes/updates performance of the parent table , if the same FK column gets impacted from the parent table. (This might be the reason why our detach partition in the parent table runs long and never completes as we have no foreign key indexed).

However, my initial understanding of "having the FK index will improve the insert performance in the child table" is not accurate it seems. Rather as you mentioned it may negatively impact the loading/insert performance because it has to now update the additional index in each insert. In case of insert into child table, to ensure if the child row is already present in the parent , it just scans the parent by the Primary key of the parent table (which is be default indexed) and thus it doesn't need an index in the child table foreign keys or having an index in the foreign key in the child table won't help the constraint validation faster. Please correct me if my understanding is wrong here.

Additionally as you mentioned "explain analyze" will show a section on how much time it really takes for the constraint validation , I can see that section now. But it seems it will really need that INSERT statement to be executed and that we can't really do in production as that will physically insert data into the table. So do you mean to just do the "explain analyze" for the INSERT query and capture the plan and then do the rollback? And in our case it's a row by row insert happening , so we will see if we can club/sum that "constraint validation" time for a handful if insert somehow to get a better idea on the percentage of time we really spent in the constraint validation.

Re: Insert query performance

From

sud

Date:

20 August 2024, 10:19:12

On Mon, Aug 19, 2024 at 1:25 PM Muhammad Ikram <mmikram@gmail.com> wrote:

Hi Sud,

Please make following change in your postgresql.conf file

log_statement = 'all'

Will this put all the internal sql query or the recursive query entries in the pg_stats_statement view which we can analyze then? And also to debug issues in the production will it be a good idea to set it for a few times and then turn it off or it can have significant performance overhead.

Re: Insert query performance

From

Muhammad Ikram

Date:

20 August 2024, 13:59:13

It will record all statements in logs. If you are concerned about query times then you may use pg_stat_statements.

Muhammad Ikram

On Tue, 20 Aug 2024 at 12:19, sud <suds1434@gmail.com> wrote:

On Mon, Aug 19, 2024 at 1:25 PM Muhammad Ikram <mmikram@gmail.com> wrote:
Hi Sud,

Please make following change in your postgresql.conf file

log_statement = 'all'

Will this put all the internal sql query or the recursive query entries in the pg_stats_statement view which we can analyze then? And also to debug issues in the production will it be a good idea to set it for a few times and then turn it off or it can have significant performance overhead.

Re: Insert query performance

From

David Rowley

Date:

20 August 2024, 14:43:13

On Tue, 20 Aug 2024 at 19:09, sud <suds1434@gmail.com> wrote:
> However, my initial understanding of "having the FK index will improve the insert performance in the child table" is
notaccurate it seems. Rather as you mentioned it may negatively impact the loading/insert performance because it has to
nowupdate the additional index in each insert. In case of insert into child table, to ensure if the child row is
alreadypresent in the parent , it just scans the parent by the Primary key of the parent table (which is be default
indexed)and thus it doesn't need an index in the child table foreign keys or having an index in the foreign key in the
childtable won't help the constraint validation faster. Please correct me if my understanding is wrong here.

If you think about what must happen when you insert into the
referencing table, the additional validation that the foreign key must
do is check that a corresponding record exists in the referenced
table. An index on the referencing table does not help speed that up.

> Additionally as you mentioned "explain analyze" will show a section on how much time it really takes for the
constraintvalidation , I can see that section now. But it seems it will really need that INSERT statement to be
executedand that we can't really do in production as that will physically insert data into the table. So do you mean to
justdo the "explain analyze" for the INSERT query and capture the plan and then do the rollback? And in our case it's
arow by row insert happening , so we will see if we can club/sum that "constraint validation" time for a handful if
insertsomehow to get a better idea on the percentage of time we really spent in the constraint validation.

I'd recommend performing a schema-only dump of your production
database and experimenting well away from production. See pg_dump
--schema-only.

I also recommend not leaving performance to chance and testing the
impact of index vs no index away from production with some data loaded
that is representative of your production data (or use the production
data if it's available and small enough to manage). Use pgbench to
see what impact having the index on the referencing table has on
performance on inserts into that table vs what improvements you gain
from having the index when there's cascading delete from the
referenced table.

You might also want to look into auto_explain [1]. You can load this
into a single session and set auto_explain.log_min_duration = 0,
auto_explain.log_analyze = on and auto_explain.log_nested_statements =
on. That should give you the plan for the cascade DELETE query that's
executed by the trigger when you perform the DELETE on the referenced
table. (Also see the note about auto_explain.log_timing)

David

[1] https://www.postgresql.org/docs/15/auto-explain.html