Best practices to manage custom statistics - Mailing list pgsql-general

From Moreno Andreo
Subject Best practices to manage custom statistics
Date
Msg-id a426f23c-8812-f4e2-ac4c-6a773d11b993@evolu-s.it
Whole thread Raw
Responses Re: Best practices to manage custom statistics  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi,
     I'm checking if there's a best way to obtain stastistics based on
my database tables

Here's the scenario.
First of all, technical details:
- Postgresql 9.1, Ubuntu 12 on a 4 core, 32 GB machine with 600 GB disk
migrating to Postgresql 9.5.3, Debian 8 on a 8-core, 52 GB machine with
2 TB disk.
- 350 databases, 350 users, every user connects to his own database and
his teammates' (max 10 in total) so each user can connect to max 10
databases at a time


My application needs to achieve a certain number of statistics (how many
records are in a certain state, how many are in another state) to send
back to user.
This is obtained, at the moment, with a select count(*) from ..... (that
involves 4 joins on 4 tables) to be run run every 20 secs from each
client connected to the cluster (ATM about 650 clients configured, about
200 concurrent) to each database it has rights to connect.

I noticed that in some cases, especially when working with not-so-small
datasets (200k rows x 95 cols), and sometines returning not-so-small
datasets (10k rows) the query performs not so well, but the worst thing
is that it raises overall server load (I/O) and bandwidth usage. While
bandwidth is not a problem (I have spikes at 20 Mbps while "normal"
traffic speed is at about 3Mbps, but I have 1 Gbps available), server
load *is* a main problem, because in high-access periods
(summer/holidays) I see my server load go up to 22-25 on a 4-core
machine, and users call complaining for timeouts and slowness.

Even if I'm migrating to a better instance, I'm still trying to
"normalize" this feature.
I can start looking at indices (I'm not quite sure that those fields in
WHERE clause are all indexed), but I don't think it would boost its
performance.

I thought about having a table, say, 'tbl_counters', like this

CREATE TABLE tbl_counters{
uuid coduser,
int counter1,
int counter2,
....
int counterx
};
updated by trigger (when a value in a table is INSERTed/UPDATEd/DELETEd
it fires a function that increments/decrements values for counter x at
user y).
Just to avoid eventual trigger skipping its update, one time a day (say
at 2 am) a cron performs the above "monster query" for every database
and adjusts, if necessary, counter values for each user reflecting real
values.

In your experience, would this approach help me lower server load?
Are there any other approach I can try?

If more details are needed, just ask.

Thanks in advance and sorry for the long message (but I had to explain
such a complex thing)
Moreno.-



pgsql-general by date:

Previous
From: Karl Czajkowski
Date:
Subject: resolution order for foreign key actions?
Next
From: "David G. Johnston"
Date:
Subject: Re: Running on Docker, AWS with Data Stored on EBS