Re: zabbix on postgresql - very slow delete of events - Mailing list pgsql-performance
From | Kristian Ejvind |
---|---|
Subject | Re: zabbix on postgresql - very slow delete of events |
Date | |
Msg-id | 4D1A9A2F-EC62-4580-B310-FC8FDD4519C5@resurs.se Whole thread Raw |
In response to | Re: zabbix on postgresql - very slow delete of events (Maxim Boguk <maxim.boguk@gmail.com>) |
Responses |
Re: zabbix on postgresql - very slow delete of events
|
List | pgsql-performance |
Hi Maxim
Thanks for your advice, and let me start with your second email, which I'll copy here:
=====
Hi Kristian,
After comparing structure of zabbix tables with same in my zabbix installation I found one very weird difference.
Why type of events.eventid had been changed from default bigint to numeric?
I suspect that the difference between events.eventid (numeric) type and event_recovery.*_eventid (bigint) types might lead to inability of use index during foreign key checks.
Anyway it will be clearly visible on the pg_stat_xact_user_tables results (I now expect to see 3 sequential scan on event_recovery and may be on some other tables as well).
Kind Regards,
Maxim
=====
Well spotted! On closer examination it seems that data types are wrong in several places. I suspect that this comes
from the time when our Zabbix ran on a MySQL database, which was converted over to PostgreSQL a few years
ago. I agree this discrepancy is suspicious and I will continue to examine it.
Regarding your ideas in the email below, I can say that 1) is not valid, disk latency is in the range of a few ms.
This is the output from your recommended query, which seems to verify your suspicions.
zabbix_34=# begin; delete from zabbix.events where eventid = 7123123; select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc; rollback;
Time: 0.113 ms
Time: 4798.189 ms (00:04.798)
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd
--------+------------+----------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------
41940 | zabbix | event_recovery | 3 | 35495224 | 0 | 0 | 0 | 0 | 1 | 0
41675 | zabbix | alerts | 1 | 544966 | 1 | 0 | 0 | 0 | 0 | 0
42573 | zabbix | problem | 2 | 13896 | 0 | 0 | 0 | 0 | 0 | 0
41943 | zabbix | event_tag | 1 | 22004 | 0 | 0 | 0 | 0 | 0 | 0
41649 | zabbix | acknowledges | 1 | 47 | 0 | 0 | 0 | 0 | 0 | 0
41951 | zabbix | events | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0
260215 | zabbix | event_suppress | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0
(7 rows)
Time: 2.857 ms
Time: 0.162 ms
Regards
Kristian
| ||||||||
| ||||||||
| ||||||||
From: Maxim Boguk <maxim.boguk@gmail.com>
Date: Wednesday, 24 July 2019 at 15:55
To: Kristian Ejvind <Kristian.Ejvind@resurs.se>
Cc: "pgsql-performance@lists.postgresql.org" <pgsql-performance@lists.postgresql.org>
Subject: Re: zabbix on postgresql - very slow delete of events
Hi Kristian,
If you look for explain analyze results for delete,
you will see that 99% of time query spent on the foreign key triggers checks.
In the same time the database have indexes on foreign key side in place.
I recommend try this:
\timing on
BEGIN;
delete from zabbix.events where eventid = [some testing id];
select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc;
ABORT;
And provide result of the last query and how long delete runs.
It might help us understand whats going on.
Currently I have 3 ideas:
1)very very slow and overloaded IO subsystem
2)a lot of stuff being delete by ON DELETE CASCADE
3)some locking prevent foreign key checks run fast
On Wed, Jul 24, 2019 at 11:12 AM Kristian Ejvind <Kristian.Ejvind@resurs.se> wrote:
Hi.
Well, the events table has both a primary key and foreign keys referencing it, which is not possible
on a partitioned table in postgresql 10. How did you work around this issue?
On the other hand, if we can get the deletion of rows from the events table run at normal speed, I
can't imagine we would have a problem with it in a long time. After all, although our Zabbix installation
definitely is larger than "small", it's still far from "large".
I think I would need assistance with debugging why postgresql behaves like it does.
Is there a defect with deleting data from a table that has multiple foreign keys referencing it from a certain table?
Is there a problem with the query optimizer that chooses the wrong plan when working on the foreign key constraints?
How do I inspect how the db works on the deletion of rows from the referencing tables?
Regards
Kristian
?On 2019-07-23, 16:33, "Kenneth Marshall" <ktm@rice.edu> wrote:
On Tue, Jul 23, 2019 at 01:41:53PM +0000, Kristian Ejvind wrote:
> Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables
> and that has been running fine for a year. Performance was vastly improved. But since you
> can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.
>
> Regards
> Kristian
Hi Kristian,
Why are you not partitioning the events and alerts tables as well? That
would eliminate this problem and you already have the infrastructure in
place to support the management since you are using it for the history
and trends tables.
Regards,
Ken
Resurs Bank AB
Kristian Ejvind
Linux System Administrator
IT Operations | Technical Operations
Ekslingan 8
Box 222 09, SE-25467 Helsingborg
Direkt Tfn:
Mobil: +46 728571483
Vxl: +46 42 382000
Fax:
E-post: Kristian.Ejvind@resurs.se
Webb: http://www.resursbank.se
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"??????, ?? ??? ?????????? ??? ?? ??????, ?? ?????? ??? ??-???????? ?????? ????? ? ??? ????? ??? ????"
Attachment
pgsql-performance by date: