Thread: zabbix on postgresql - very slow delete of events

zabbix on postgresql - very slow delete of events

From

Kristian Ejvind

Date:

23 July 2019, 08:07:55

This will be a rather lengthy post, just to give the full (I hope) picture. We're using Zabbix for monitoring and I'm having problems

understanding why the deletion of rows in the events table is so slow.

Zabbix: 4.2 (never mind the name of the db - it is 4.2)

new values per second: ~400

hosts: ~600

items: ~45000

OS: CentOS Linux release 7.6.1810 (Core)

Postgresql was installed from the yum repo on postgresql.org

zabbix_34=> select version();

version

---------------------------------------------------------------------------------------------------------

PostgreSQL 10.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit

(1 row)

The database is analyzed + vacuumed nightly. The server runs Zabbix and the database, has 16 GB memory, 4 vCPUs (modern hardware).

Some parameters:

shared_buffers = 3GB

work_mem = 10MB (I also tested with work_mem = 128MB - no difference)

effective_cache_size = 6 GB

effective_io_concurrency = 40

checkpoint_timeout = 5 min (default)

max_wal_size = 1 GB (default)

checkpoint_completion_target = 0.8

pg_wal is already on a separate device.

events table: ~25 million rows / 2.9 GB

event_recovery table: ~12 million rows / 550 MB

alerts table: ~600000 rows / 530 MB

Generally the database is quite snappy and shows no indication of problems. But now I've seen that housekeeping of events is

very slow - a single (normally hourly) run can take more than one day to finish, so events keep stacking up in the table. A typical slow

delete statement, from the postgres log:

postgresql-10-20190717-031404.log:2019-07-17 03:37:43 CEST [80965]: [4-1] user=zabbix,db=zabbix_34,app=[unknown],client=[local]: LOG: duration: 27298798.930 ms statement: delete from events where (eventid between 5580621 and 5580681 or eventid between 5580689 and 5580762 or eventid between 5580769 and 5580844 or eventid between 5580851 and 5580867 or eventid between 5580869 and 5580926 or eventid between 5580933 and 5580949 or eventid between 5580963 and 5581024

--- 8< --- a lot of similar eventids snipped away -----

or eventid between 5586799 and 5586839 or eventid in (5581385,5581389,5581561,5581563,5581564,5581580,5 581582,5581584,5581585,5581635))

I've analyzed the deletion of a single row in events. First, some table information:

zabbix_34=> \d events

Table "zabbix.events"

Column | Type | Collation | Nullable | Default

--------------+-------------------------+-----------+----------+-----------------------

eventid | numeric | | not null |

source | bigint | | not null | '0'::bigint

object | bigint | | not null | '0'::bigint

objectid | numeric | | not null | '0'::numeric

clock | bigint | | not null | '0'::bigint

value | bigint | | not null | '0'::bigint

acknowledged | bigint | | not null | '0'::bigint

ns | bigint | | not null | '0'::bigint

name | character varying(2048) | | not null | ''::character varying

severity | integer | | not null | 0

Indexes:

"idx_29337_primary" PRIMARY KEY, btree (eventid)

"events_1" btree (source, object, objectid, clock)

"events_2" btree (source, object, clock)

"events_clk_3" btree (clock)

Referenced by:

TABLE "acknowledges" CONSTRAINT "c_acknowledges_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE

TABLE "alerts" CONSTRAINT "c_alerts_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE

TABLE "alerts" CONSTRAINT "c_alerts_5" FOREIGN KEY (p_eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "event_recovery" CONSTRAINT "c_event_recovery_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "event_recovery" CONSTRAINT "c_event_recovery_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "event_recovery" CONSTRAINT "c_event_recovery_3" FOREIGN KEY (c_eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "event_suppress" CONSTRAINT "c_event_suppress_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "event_tag" CONSTRAINT "c_event_tag_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "problem" CONSTRAINT "c_problem_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

TABLE "problem" CONSTRAINT "c_problem_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE

zabbix_34=> \d event_recovery

Table "zabbix.event_recovery"

Column | Type | Collation | Nullable | Default

---------------+--------+-----------+----------+---------

eventid | bigint | | not null |

r_eventid | bigint | | not null |

c_eventid | bigint | | |

correlationid | bigint | | |

userid | bigint | | |

Indexes:

"event_recovery_pkey" PRIMARY KEY, btree (eventid)

"event_recovery_1" btree (r_eventid)

"event_recovery_2" btree (c_eventid)

Foreign-key constraints:

"c_event_recovery_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

"c_event_recovery_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE

"c_event_recovery_3" FOREIGN KEY (c_eventid) REFERENCES events(eventid) ON DELETE CASCADE

zabbix_34=> \d alerts

Table "zabbix.alerts"

Column | Type | Collation | Nullable | Default

---------------+-------------------------+-----------+----------+-----------------------

alertid | numeric | | not null |

actionid | numeric | | not null |

eventid | numeric | | not null |

userid | numeric | | |

clock | bigint | | not null | '0'::bigint

mediatypeid | numeric | | |

sendto | character varying(1024) | | not null | ''::character varying

subject | character varying(255) | | not null | ''::character varying

message | text | | not null | ''::text

status | bigint | | not null | '0'::bigint

retries | bigint | | not null | '0'::bigint

error | character varying(2048) | | not null | ''::character varying

esc_step | bigint | | not null | '0'::bigint

alerttype | bigint | | not null | '0'::bigint

p_eventid | bigint | | |

acknowledgeid | bigint | | |

Indexes:

"idx_29120_primary" PRIMARY KEY, btree (alertid)

"alerts_1" btree (actionid)

"alerts_2" btree (clock)

"alerts_3" btree (eventid)

"alerts_4" btree (status)

"alerts_5" btree (mediatypeid)

"alerts_6" btree (userid)

"alerts_7" btree (p_eventid)

Foreign-key constraints:

"c_alerts_1" FOREIGN KEY (actionid) REFERENCES actions(actionid) ON UPDATE RESTRICT ON DELETE CASCADE

"c_alerts_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE

"c_alerts_3" FOREIGN KEY (userid) REFERENCES users(userid) ON UPDATE RESTRICT ON DELETE CASCADE

"c_alerts_4" FOREIGN KEY (mediatypeid) REFERENCES media_type(mediatypeid) ON UPDATE RESTRICT ON DELETE CASCADE

"c_alerts_5" FOREIGN KEY (p_eventid) REFERENCES events(eventid) ON DELETE CASCADE

"c_alerts_6" FOREIGN KEY (acknowledgeid) REFERENCES acknowledges(acknowledgeid) ON DELETE CASCADE

Let's look at what's in the tables for event 7123123:

zabbix_34=> select * from events where eventid=7123123;

---------+--------+--------+----------+------------+-------+--------------+---------+--------------------------------------+----------

7123123 | 3 | 0 | 27562 | 1525264196 | 1 | 0 | 1980875 | Cannot calculate trigger expression. | 0

(1 row)

zabbix_34=> select * from event_recovery where eventid=7123123;

eventid | r_eventid | c_eventid | correlationid | userid

---------+-----------+-----------+---------------+--------

7123123 | 7124371 | | |

(1 row)

zabbix_34=> select * from alerts where eventid=7123123;

ttype | p_eventid | acknowledgeid

---------+----------+---------+--------+-------+-------------+--------+---------+---------+--------+---------+-------+----------+-----

------+-----------+---------------

(0 rows)

All these queries execute well below 1 ms, using indexes.

Let's delete one row. See explain results here: https://explain.depesz.com/s/aycf . 5 seconds to delete a single row, wow!

This shows that it is the foreign key constraints on event_recovery and alerts that take a lot of time.

But why? I far as I can see, the delete is fully CPU bound during execution.

Deleting the corresponding row directly from event_recovery or alerts executes in less than 0.1 ms.

Any ideas?

I've observed that alerts and event_recovery tables both have more than one foreign key that references events, if that matters.

Regards

Kristian Ejvind

Linux System Administrator

IT Operations | Technical Operations

Resurs Bank

Ekslingan 8

Box 222 09, SE-25467 Helsingborg

Mobil:	+46 728571483
Växel:	+46 42 38 20 00
E-post:	Kristian.Ejvind@resurs.se
Webb:	www.resursbank.se

Attachment

image8741bc.PNG

Re: zabbix on postgresql - very slow delete of events

From

Kenneth Marshall

Date:

23 July 2019, 12:58:03

On Tue, Jul 23, 2019 at 08:07:55AM +0000, Kristian Ejvind wrote:
> Hi
> 
> This will be a rather lengthy post, just to give the full (I hope) picture. We're using Zabbix for monitoring and I'm
havingproblems

> understanding why the deletion of rows in the events table is so slow.
> 
> Zabbix: 4.2 (never mind the name of the db - it is 4.2)
> new values per second: ~400
> hosts: ~600
> items: ~45000
> 

Hi Kristian,

Time series databases like Zabbix work poorly with the Housekeeper
service. We had many similar sorts of problems as our Zabbix usage
grew. Once we partitioned the big tables, turned off the Housekeeper,
and cleaned up by dropping partitions instead everything worked much,
much, much better. When we started using partitioning, we used the
old inheiritance style. Now you can use the native partitioning.

Regards,
Ken

Re: zabbix on postgresql - very slow delete of events

From

Kristian Ejvind

Date:

23 July 2019, 13:41:53

Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables
and that has been running fine for a year. Performance was vastly improved. But since you
can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.

Regards
Kristian


?On 2019-07-23, 14:58, "Kenneth Marshall" <ktm@rice.edu> wrote:

    Hi Kristian,

    Time series databases like Zabbix work poorly with the Housekeeper
    service. We had many similar sorts of problems as our Zabbix usage
    grew. Once we partitioned the big tables, turned off the Housekeeper,
    and cleaned up by dropping partitions instead everything worked much,
    much, much better. When we started using partitioning, we used the
    old inheiritance style. Now you can use the native partitioning.

    Regards,
    Ken






Resurs Bank AB
Kristian Ejvind
Linux System Administrator
IT Operations | Technical Operations

Ekslingan 8
Box 222 09, SE-25467 Helsingborg

Direkt Tfn:
Mobil: +46 728571483
Vxl: +46 42 382000
Fax:
E-post: Kristian.Ejvind@resurs.se
Webb: http://www.resursbank.se

Re: zabbix on postgresql - very slow delete of events

From

Kenneth Marshall

Date:

23 July 2019, 14:33:00

On Tue, Jul 23, 2019 at 01:41:53PM +0000, Kristian Ejvind wrote:
> Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables
> and that has been running fine for a year. Performance was vastly improved. But since you
> can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.
> 
> Regards
> Kristian

Hi Kristian,

Why are you not partitioning the events and alerts tables as well? That
would eliminate this problem and you already have the infrastructure in
place to support the management since you are using it for the history
and trends tables.

Regards,
Ken

Re: zabbix on postgresql - very slow delete of events

From

Kristian Ejvind

Date:

24 July 2019, 08:11:59

Hi.

Well, the events table has both a primary key and foreign keys referencing it, which is not possible
on a partitioned table in postgresql 10. How did you work around this issue?

On the other hand, if we can get the deletion of rows from the events table run at normal speed, I
can't imagine we would have a problem with it in a long time. After all, although our Zabbix installation
definitely is larger than "small", it's still far from "large".

I think I would need assistance with debugging why postgresql behaves like it does.
Is there a defect with deleting data from a table that has multiple foreign keys referencing it from a  certain table?
Is there a problem with the query optimizer that chooses the wrong plan when working on the foreign key constraints?
How do I inspect how the db works on the deletion of rows from the referencing tables?

Regards
Kristian

?On 2019-07-23, 16:33, "Kenneth Marshall" <ktm@rice.edu> wrote:

    On Tue, Jul 23, 2019 at 01:41:53PM +0000, Kristian Ejvind wrote:
    > Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables
    > and that has been running fine for a year. Performance was vastly improved. But since you
    > can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.
    >
    > Regards
    > Kristian

    Hi Kristian,

    Why are you not partitioning the events and alerts tables as well? That
    would eliminate this problem and you already have the infrastructure in
    place to support the management since you are using it for the history
    and trends tables.

    Regards,
    Ken

Resurs Bank AB
Kristian Ejvind
Linux System Administrator
IT Operations | Technical Operations

Ekslingan 8
Box 222 09, SE-25467 Helsingborg

Direkt Tfn:
Mobil: +46 728571483
Vxl: +46 42 382000
Fax:
E-post: Kristian.Ejvind@resurs.se
Webb: http://www.resursbank.se

Re: zabbix on postgresql - very slow delete of events

From

Maxim Boguk

Date:

24 July 2019, 13:54:26

Hi Kristian,

If you look for explain analyze results for delete,
you will see that 99% of time query spent on the foreign key triggers checks.
In the same time the database have indexes on foreign key side in place.


I recommend try this:

\timing on
BEGIN;
delete from zabbix.events where eventid = [some testing id];
select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc;
ABORT;

And provide result of the last query and how long delete runs.
It might help us understand whats going on.

Currently I have 3 ideas:
1)very very slow and overloaded IO subsystem
2)a lot of stuff being delete by ON DELETE CASCADE 
3)some locking prevent foreign key checks run fast

On Wed, Jul 24, 2019 at 11:12 AM Kristian Ejvind <Kristian.Ejvind@resurs.se> wrote:

Hi.

Well, the events table has both a primary key and foreign keys referencing it, which is not possible
on a partitioned table in postgresql 10. How did you work around this issue?

On the other hand, if we can get the deletion of rows from the events table run at normal speed, I
can't imagine we would have a problem with it in a long time. After all, although our Zabbix installation
definitely is larger than "small", it's still far from "large".

I think I would need assistance with debugging why postgresql behaves like it does.
Is there a defect with deleting data from a table that has multiple foreign keys referencing it from a certain table?
Is there a problem with the query optimizer that chooses the wrong plan when working on the foreign key constraints?
How do I inspect how the db works on the deletion of rows from the referencing tables?

Regards
Kristian

?On 2019-07-23, 16:33, "Kenneth Marshall" <ktm@rice.edu> wrote:

On Tue, Jul 23, 2019 at 01:41:53PM +0000, Kristian Ejvind wrote:
> Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables
> and that has been running fine for a year. Performance was vastly improved. But since you
> can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.
>
> Regards
> Kristian

Hi Kristian,

Why are you not partitioning the events and alerts tables as well? That
would eliminate this problem and you already have the infrastructure in
place to support the management since you are using it for the history
and trends tables.

Regards,
Ken

Resurs Bank AB
Kristian Ejvind
Linux System Administrator
IT Operations | Technical Operations

Ekslingan 8
Box 222 09, SE-25467 Helsingborg

Direkt Tfn:
Mobil: +46 728571483
Vxl: +46 42 382000
Fax:
E-post: Kristian.Ejvind@resurs.se
Webb: http://www.resursbank.se

Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

Re: zabbix on postgresql - very slow delete of events

From

Maxim Boguk

Date:

24 July 2019, 14:05:34

All these queries execute well below 1 ms, using indexes.

Let's delete one row. See explain results here: https://explain.depesz.com/s/aycf . 5 seconds to delete a single row, wow!
This shows that it is the foreign key constraints on event_recovery and alerts that take a lot of time.
But why? I far as I can see, the delete is fully CPU bound during execution.

Deleting the corresponding row directly from event_recovery or alerts executes in less than 0.1 ms.

Any ideas?

I've observed that alerts and event_recovery tables both have more than one foreign key that references events, if that matters.

Hi Kristian,

After comparing structure of zabbix tables with same in my zabbix installation I found one very weird difference.

Why type of events.eventid had been changed from default bigint to numeric?

I suspect that the difference between events.eventid (numeric) type and event_recovery.*_eventid (bigint) types might lead to inability of use index during foreign key checks.

Anyway it will be clearly visible on the pg_stat_xact_user_tables results (I now expect to see 3 sequential scan on event_recovery and may be on some other tables as well).

Kind Regards,

Maxim

Attachment

image8741bc.PNG

Re: zabbix on postgresql - very slow delete of events

From

Kristian Ejvind

Date:

24 July 2019, 15:12:18

Hi Maxim

Thanks for your advice, and let me start with your second email, which I'll copy here:

=====

Hi Kristian,

After comparing structure of zabbix tables with same in my zabbix installation I found one very weird difference.

Why type of events.eventid had been changed from default bigint to numeric?

I suspect that the difference between events.eventid (numeric) type and event_recovery.*_eventid (bigint) types might lead to inability of use index during foreign key checks.

Anyway it will be clearly visible on the pg_stat_xact_user_tables results (I now expect to see 3 sequential scan on event_recovery and may be on some other tables as well).

Kind Regards,

Maxim

=====

Well spotted! On closer examination it seems that data types are wrong in several places. I suspect that this comes

from the time when our Zabbix ran on a MySQL database, which was converted over to PostgreSQL a few years

ago. I agree this discrepancy is suspicious and I will continue to examine it.

Regarding your ideas in the email below, I can say that 1) is not valid, disk latency is in the range of a few ms.

This is the output from your recommended query, which seems to verify your suspicions.

zabbix_34=# begin; delete from zabbix.events where eventid = 7123123; select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc; rollback;

Time: 0.113 ms

Time: 4798.189 ms (00:04.798)

--------+------------+----------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------

41940 | zabbix | event_recovery | 3 | 35495224 | 0 | 0 | 0 | 0 | 1 | 0

41675 | zabbix | alerts | 1 | 544966 | 1 | 0 | 0 | 0 | 0 | 0

42573 | zabbix | problem | 2 | 13896 | 0 | 0 | 0 | 0 | 0 | 0

41943 | zabbix | event_tag | 1 | 22004 | 0 | 0 | 0 | 0 | 0 | 0

41649 | zabbix | acknowledges | 1 | 47 | 0 | 0 | 0 | 0 | 0 | 0

41951 | zabbix | events | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0

260215 | zabbix | event_suppress | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0

(7 rows)

Time: 2.857 ms

Time: 0.162 ms

Regards

Kristian

Kristian Ejvind

Linux System Administrator

IT Operations | Technical Operations

Resurs Bank

Ekslingan 8

Box 222 09, SE-25467 Helsingborg

Mobil:	+46 728571483
Växel:	+46 42 38 20 00
E-post:	Kristian.Ejvind@resurs.se
Webb:	www.resursbank.se

From: Maxim Boguk <maxim.boguk@gmail.com>
Date: Wednesday, 24 July 2019 at 15:55
To: Kristian Ejvind <Kristian.Ejvind@resurs.se>
Cc: "pgsql-performance@lists.postgresql.org" <pgsql-performance@lists.postgresql.org>
Subject: Re: zabbix on postgresql - very slow delete of events

Hi Kristian,

If you look for explain analyze results for delete,

you will see that 99% of time query spent on the foreign key triggers checks.

In the same time the database have indexes on foreign key side in place.

I recommend try this:

\timing on

BEGIN;

delete from zabbix.events where eventid = [some testing id];

select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc;

ABORT;

And provide result of the last query and how long delete runs.

It might help us understand whats going on.

Currently I have 3 ideas:

1)very very slow and overloaded IO subsystem

2)a lot of stuff being delete by ON DELETE CASCADE

3)some locking prevent foreign key checks run fast

On Wed, Jul 24, 2019 at 11:12 AM Kristian Ejvind <Kristian.Ejvind@resurs.se> wrote:

Hi.

Well, the events table has both a primary key and foreign keys referencing it, which is not possible
on a partitioned table in postgresql 10. How did you work around this issue?

On the other hand, if we can get the deletion of rows from the events table run at normal speed, I
can't imagine we would have a problem with it in a long time. After all, although our Zabbix installation
definitely is larger than "small", it's still far from "large".

I think I would need assistance with debugging why postgresql behaves like it does.
Is there a defect with deleting data from a table that has multiple foreign keys referencing it from a certain table?
Is there a problem with the query optimizer that chooses the wrong plan when working on the foreign key constraints?
How do I inspect how the db works on the deletion of rows from the referencing tables?

Regards
Kristian

?On 2019-07-23, 16:33, "Kenneth Marshall" <ktm@rice.edu> wrote:

On Tue, Jul 23, 2019 at 01:41:53PM +0000, Kristian Ejvind wrote:
> Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables
> and that has been running fine for a year. Performance was vastly improved. But since you
> can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.
>
> Regards
> Kristian

Hi Kristian,

Why are you not partitioning the events and alerts tables as well? That
would eliminate this problem and you already have the infrastructure in
place to support the management since you are using it for the history
and trends tables.

Regards,
Ken

Resurs Bank AB
Kristian Ejvind
Linux System Administrator
IT Operations | Technical Operations

Ekslingan 8
Box 222 09, SE-25467 Helsingborg

Direkt Tfn:
Mobil: +46 728571483
Vxl: +46 42 382000
Fax:
E-post: Kristian.Ejvind@resurs.se
Webb: http://www.resursbank.se

Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"??????, ?? ??? ?????????? ??? ?? ??????, ?? ?????? ??? ??-???????? ?????? ????? ? ??? ????? ??? ????"

Attachment

imageec8d1d.PNG

Re: zabbix on postgresql - very slow delete of events

From

Maxim Boguk

Date:

24 July 2019, 17:16:24

On Wed, Jul 24, 2019 at 6:12 PM Kristian Ejvind <Kristian.Ejvind@resurs.se> wrote:

Hi Maxim

Thanks for your advice, and let me start with your second email, which I'll copy here:

=====
Hi Kristian,

After comparing structure of zabbix tables with same in my zabbix installation I found one very weird difference.
Why type of events.eventid had been changed from default bigint to numeric?

I suspect that the difference between events.eventid (numeric) type and event_recovery.*_eventid (bigint) types might lead to inability of use index during foreign key checks.
Anyway it will be clearly visible on the pg_stat_xact_user_tables results (I now expect to see 3 sequential scan on event_recovery and may be on some other tables as well).

Kind Regards,
Maxim
=====

Well spotted! On closer examination it seems that data types are wrong in several places. I suspect that this comes
from the time when our Zabbix ran on a MySQL database, which was converted over to PostgreSQL a few years
ago. I agree this discrepancy is suspicious and I will continue to examine it.

Regarding your ideas in the email below, I can say that 1) is not valid, disk latency is in the range of a few ms.
This is the output from your recommended query, which seems to verify your suspicions.

zabbix_34=# begin; delete from zabbix.events where eventid = 7123123; select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc; rollback;
Time: 0.113 ms
Time: 4798.189 ms (00:04.798)
relid | schemaname |    relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd
--------+------------+----------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------
41940 | zabbix     | event_recovery |        3 |     35495224 |        0 |             0 |         0 |         0 |         1 |             0
41675 | zabbix     | alerts         |        1 |       544966 |        1 |             0 |         0 |         0 |         0 |             0
42573 | zabbix     | problem        |        2 |        13896 |        0 |             0 |         0 |         0 |         0 |             0
41943 | zabbix     | event_tag      |        1 |        22004 |        0 |             0 |         0 |         0 |         0 |             0
41649 | zabbix     | acknowledges   |        1 |           47 |        0 |             0 |         0 |         0 |         0 |             0
41951 | zabbix     | events         |        0 |            0 |        1 |             1 |         0 |         0 |         1 |             0
260215 | zabbix     | event_suppress |        1 |            0 |        0 |             0 |         0 |         0 |         0 |             0

Hi Kristian,

This result definitely proves that indexes not used during foreign key checks (see that non-zero seq_scan counters for linked tables).

Only possible reason (IMHO) that wrong usage numeric in place of bigint.

I recommend change types of events.eventid (and any other similar fields) to bigint.

It should resolve your performance issues with deletes on events table (as additional bonus - bigint a lot faster and compact type than numeric).

--

Re: zabbix on postgresql - very slow delete of events

From

Kristian Ejvind

Date:

13 August 2019, 11:08:21

Hi.

Just a short message, confirming that after we've altered the tables to have matching

types, deletes now take 1 ms, instead of 5 sec. Indexes are being used now.

Thanks for assistance.

Regards

Kristian

ps. would be nice with some warnings or indications in analyze output when this happens.

Kristian Ejvind

Linux System Administrator

IT Operations | Technical Operations

Resurs Bank

Ekslingan 8

Box 222 09, SE-25467 Helsingborg

Mobil:	+46 728571483
Växel:	+46 42 38 20 00
E-post:	Kristian.Ejvind@resurs.se
Webb:	www.resursbank.se

From: Maxim Boguk <maxim.boguk@gmail.com>
Date: Wednesday, 24 July 2019 at 19:17
To: Kristian Ejvind <Kristian.Ejvind@resurs.se>
Cc: "pgsql-performance@lists.postgresql.org" <pgsql-performance@lists.postgresql.org>
Subject: Re: zabbix on postgresql - very slow delete of events

On Wed, Jul 24, 2019 at 6:12 PM Kristian Ejvind <Kristian.Ejvind@resurs.se> wrote:

Hi Maxim

Thanks for your advice, and let me start with your second email, which I'll copy here:

=====
Hi Kristian,

After comparing structure of zabbix tables with same in my zabbix installation I found one very weird difference.
Why type of events.eventid had been changed from default bigint to numeric?

I suspect that the difference between events.eventid (numeric) type and event_recovery.*_eventid (bigint) types might lead to inability of use index during foreign key checks.
Anyway it will be clearly visible on the pg_stat_xact_user_tables results (I now expect to see 3 sequential scan on event_recovery and may be on some other tables as well).

Kind Regards,
Maxim
=====

Well spotted! On closer examination it seems that data types are wrong in several places. I suspect that this comes
from the time when our Zabbix ran on a MySQL database, which was converted over to PostgreSQL a few years
ago. I agree this discrepancy is suspicious and I will continue to examine it.

Regarding your ideas in the email below, I can say that 1) is not valid, disk latency is in the range of a few ms.
This is the output from your recommended query, which seems to verify your suspicions.

zabbix_34=# begin; delete from zabbix.events where eventid = 7123123; select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc; rollback;
Time: 0.113 ms
Time: 4798.189 ms (00:04.798)
relid | schemaname |    relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd
--------+------------+----------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------
41940 | zabbix     | event_recovery |        3 |     35495224 |        0 |             0 |         0 |         0 |         1 |             0
41675 | zabbix     | alerts         |        1 |       544966 |        1 |             0 |         0 |         0 |         0 |             0
42573 | zabbix     | problem        |        2 |        13896 |        0 |             0 |         0 |         0 |         0 |             0
41943 | zabbix     | event_tag      |        1 |        22004 |        0 |             0 |         0 |         0 |         0 |             0
41649 | zabbix     | acknowledges   |        1 |           47 |        0 |             0 |         0 |         0 |         0 |             0
41951 | zabbix     | events         |        0 |            0 |        1 |             1 |         0 |         0 |         1 |             0
260215 | zabbix     | event_suppress |        1 |            0 |        0 |             0 |         0 |         0 |         0 |             0

Hi Kristian,

This result definitely proves that indexes not used during foreign key checks (see that non-zero seq_scan counters for linked tables).

Only possible reason (IMHO) that wrong usage numeric in place of bigint.

I recommend change types of events.eventid (and any other similar fields) to bigint.

It should resolve your performance issues with deletes on events table (as additional bonus - bigint a lot faster and compact type than numeric).

Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"??????, ?? ??? ?????????? ??? ?? ??????, ?? ?????? ??? ??-???????? ?????? ????? ? ??? ????? ??? ????"

Attachment

imagec5a681.PNG