Re: how could select id=xx so slow? - Mailing list pgsql-performance

From Yan Chunlu
Subject Re: how could select id=xx so slow?
Date
Msg-id CAOA66tFRfXGFvNaU2hvBq1d285QeiFcJR-0jru+bS=TBJ=nZ6A@mail.gmail.com
Whole thread Raw
In response to Re: how could select id=xx so slow?  (Ants Aasma <ants@cybertec.at>)
List pgsql-performance
huge thanks for the patient explanations, I think you are right, it is really related to the IO.
I monitor the IO using iostat -x and found the utilize part reach 100% frequently, postgresql is the only service running on that machine, so I think it is either checkpoint or queries caused the problem. 

and I agree that checkpoint may not the problem, I guess I need to tackle those damn queries.
currently the data dir(pgsql/data/base) used 111GB disk space, some tables has tens of millions records. could that cause the query heavy disk IO?  when should I split the data to other machines(aka sharding)? 


and you are right the machine has 16GB memory and commodity 500GB disk. 
kernel: Linux adams 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011 x86_64 GNU/Linux

by "new kernel" which version do you mean? 

and about those IO intensive queries,  I can only tell the time used from slow query log, is there anything like "explain analyze" that shows specific information about IO usage?




On Wed, Jul 11, 2012 at 7:59 PM, Ants Aasma <ants@cybertec.at> wrote:
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider@gmail.com> wrote:
> I have logged one day data and found the checkpoint is rather
> frequently(detail: https://gist.github.com/3088338). Not sure if it is
> normal, but the average time of checkpoint is about 100sec~200sec, it seems
> related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because otherwise
> the slave server always can not follow the master, should I lower that
> value?
>
> or the slow query is about something else?  thanks!

Some things to notice from the checkpoints log:
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)

This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.

If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.

On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

pgsql-performance by date:

Previous
From: "ktm@rice.edu"
Date:
Subject: Re: DELETE vs TRUNCATE explanation
Next
From: "Midge Brown"
Date:
Subject: Re: moving tables