Re: [SPAM] Re: [SPAM] Re: Best practices to manage custom statistics - Mailing list pgsql-general

From Moreno Andreo
Subject Re: [SPAM] Re: [SPAM] Re: Best practices to manage custom statistics
Date
Msg-id b0373392-564f-5676-b0fc-7e41bf15061e@evolu-s.it
Whole thread Raw
In response to Re: [SPAM] Re: Best practices to manage custom statistics  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: [SPAM] Re: [SPAM] Re: Best practices to manage custom statistics  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi Adrian,
     First of all: now I've seen that not all fields touched by WHERE
clauses in queries are indexed. Time to try some indices and EXPLAIN a
bit....
(I must be blind... I've checked it multiple times....)
Scroll down for the rest of the thread.....

Il 23/11/2016 16:59, Adrian Klaver ha scritto:
> On 11/23/2016 05:24 AM, Moreno Andreo wrote:
>> Il 15/11/2016 18:19, Adrian Klaver ha scritto:
>>> On 11/15/2016 07:39 AM, Moreno Andreo wrote:
>>>> Sorry for late reply... i'm in some quite rough days....
>>>>
>>>> Il 08/11/2016 21:28, Adrian Klaver ha scritto:
>>>>> On 11/08/2016 12:13 PM, Moreno Andreo wrote:
>>>>>> [...]
>>>>>>
>>>>>> In your experience, would this approach help me lower server load?
>>>>>> Are there any other approach I can try?
>>>>>
>
>>>
>>> So does the user need only their data or do they need the other users
>>> data also?
>> they can be able to download also their group mates' data (stats,
>> numbers), if they want. Numbers can be confusing because evey user can
>> have more than one workplace.
>>>
>
>>>
>>> How can they be working with 'old' data? The queries you are running
>>> are compiling stats on data that exist at the time they are run and at
>>> any point in time between stats runs the user is working with current
>>> data regardless of what the last stats say.
>> Since we are on ADO.NET (with Npgsql) and we don't keep connections open
>> (query, fill a structure and disconnect), in the time slice between two
>> updates they will have both data and stats that are not "real" in that
>> moment...
>> I'll try to make an example
>> You and me are teammates and work everyone at his place (several miles
>> away). You read data from my database and get both stats and work data.
>> Suddenly I make a change. This change won't be visible to you until you
>> refresh data (that was pressing "Update" button, now it's automated
>> every 60 secs).
>> Hope It's clearer now...
>>
>
> Aah, I get it now. You are refreshing forms every 60 seconds over 350
> users each with their own database. Actually worse then that as there
> is user overlap over databases(up to 10 per user), so the same
> database can be hit multiple times at a given refresh. Seems physics
> is at work here as you have already pointed out. Namely fetching all
> that data at regular intervals taxes the bandwith as well as the
> CPU/storage.
Exactly. At the moment the bottleneck is I/O (running on a VM over a
RAID-5 with 15kRPM SAS), in the next weeks I'm up to prepare a test
server with SSDs ("attached SSD" on Google Cloud Platform), that's
pretty much expensive, but data sheets point out a very good IOPS rate
(rising as size, and price, rises). CPU is not a problem (almost never
over 70-80%, average is about 40% over a 4 core server)
>
> High levels solutions that come to mind to spread the load out:
>
> 1) Replication to spread data across multiple machines.
> Or just split the databases over multiple non-replicated Postgres
> instances on separate machines
Already in place, but only for owner's database. Other databases can't
be "downloaded" essentially for privacy matters.
>
> 2) Caching results from each individual database so subsequent calls
> for the information do not touch the database. You already touched on
> this with your counter table.
Yes, and I'm all ears on suggestions on what to be done and if there are
hidden caveats...
I heard that stored procedures (and triggers, I suppose) are faster to
execute than the same "operation" coming from outside because the
backend has not to translate it, and in this case can be executed only
when needed
>
> 3) Spreading out the refresh interval. Not sure if the 60 second
> interval is synced across users. Still maybe setting different refresh
> intervals and/or changing it on the fly when load increases. A back
> off equation so to speak.
I'm afraid that if we have a moment when load is extremely high (I
experienced top measuring 45 (5-minutes basis) on a 4-core machine, it
was even hard to type in putty console!!), we won't have any upgrade.
Yes, in these conditions we're still not having upgrades, but my goal is
to find a way to reduce overall query load to have a good (V-)hardware
design so I can have better performance with lower cost.
"Pushing" data would be the best way... but I need to have a look to
LISTEN/NOTIFY, and what it means with Npgsql and JDBC (one thing I
didn't specify is that between user and database we have a web server,
in LAN with PgSQL server, hosting web services for the user to interact
with database. Maybe web server can "proxy" connections (keep them open)
and the listen/notify method can be achieved (yes, it's a bit of hard
work to rewrite all connection policies, but if it's necessary...)

I think that a combination of the last two should be the best.
Tell me what you think.....

>
>>>
>>> Except when they are wrong:) Still been there.
>> Don't tell me.... :-)
>>>
>
>>>>>>
>>>>>> Thanks in advance and sorry for the long message (but I had to
>>>>>> explain
>>>>>> such a complex thing)
>>>>>> Moreno.-
>
>
>




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: query locks up when run concurrently
Next
From: David Richer
Date:
Subject: Re: Extension compatibility between postgresql minor version