Thread: Queries seldomly take >4s while normally take <1ms?

Queries seldomly take >4s while normally take <1ms?

From
Christian Hammers
Date:
Hello

I have a setup with one master and two slaves which are used by a closed
source application. The database is asked the same query, a stored procedure,
with different parameters about 4 million times per second at a peak rate of
150 times per second using 10 parallel connections. The slaves are decent
Dell servers with 64GB RAM with dual hexacore CPUs and RAID10.

Usually this stored procedure takes <1ms as it basically just does two
selects against a GIST index for a prefix_range type. Seldomly though,
about 1-3 times per day, one of these queries takes up to 4000ms!
All those queries also runs in <1ms when executed manually some times later.
Queries with similar parameters (who I supposed to use the same area of the
index) also continues to run fast during that time. Queries with different
paramers which are running parallel on different threads take <1ms, too,
so it's not a general "load problem".

Cronjobs and other applications seem quiet during that time, there is
no peak in any of our monitoring graphs. Automatic vacuum/analyze log
entries on the master are not near the timestamps in question.

So my problem seems not the query itself nor the way I indexed my data
but what could it be? Some strange effects with streaming replication
or cache invalidation?

Apologies for not giving you reproducible problem but maybe you
still have some ideas as I'm just curious as I've never seem such an
effect during my MySQL years :-) The queries contain obvious customer
data so I'm reluctant to give examples but again I doubt that
an explain plan will help if only 1 out of 4E6 queries takes too long.

bye,

-christian-


Re: Queries seldomly take >4s while normally take <1ms?

From
Greg Williamson
Date:
Christian --

<original text snip because this POS editor won't let me properly edit>

postgres version ?

type of replication ?

changes from postgres config defaults ?

Do they happen more at peak usage, semi regularly or sporadically ?

Possibly some sporadic postgres process such as checkpoints of autovac processes kicking off. Do your logs show
anything? 

HTH,

Greg W.



Re: Queries seldomly take >4s while normally take <1ms?

From
Christian Hammers
Date:
Hello

On Tue, 9 Apr 2013 03:53:13 -0700 (PDT)
Greg Williamson <gwilliamson39@yahoo.com> wrote:

> Christian --
>
> <original text snip because this POS editor won't let me properly edit>
>
> postgres version ?

9.2.3

> type of replication ?

As written, one master does streaming replication to two slaves.

> changes from postgres config defaults ?

max_connections = 1000                  # (change requires restart)
shared_buffers = 20GB                   # min 128kB
wal_level = hot_standby                 # minimal, archive, or hot_standby
archive_mode = on               # allows archiving to be done
archive_command = 'test ! -f /srv/postgresql-data/archivedir/%f && cp %p /srv/postgresql-data/archivedir/%f'
#command to use to archive a logfile segment 
max_wal_senders = 3             # max number of walsender processes
hot_standby = on                        # "on" allows queries during recovery
max_standby_archive_delay = 1h          # max delay before canceling queries
max_standby_streaming_delay = 1h        # max delay before canceling queries
hot_standby_feedback = on               # send info from standby to prevent
effective_cache_size = 1024MB
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr and csvlog
log_directory = '/var/log/postgresql/'  # directory where log files are written,
log_filename = 'postgresql-9.2-data.log'                # log file name pattern,
log_file_mode = 0640                    # creation mode for log files,
log_rotation_age = 0                    # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_min_messages = notice
log_min_duration_statement = 500        # -1 is disabled, 0 logs all statements
log_connections = on
log_disconnections = on
log_duration = on
log_line_prefix = '%t [%p] %u@%d '      # special values:
log_statement = 'all'                   # none, ddl, mod, all
log_timezone = 'Europe/Berlin'
track_activities = on
track_counts = on
track_io_timing = on
datestyle = 'iso, dmy'
timezone = 'Europe/Berlin'
lc_messages = 'en_GB.UTF-8'                     # locale for system error message
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = all

effective_cache_size is way too low, I noticed, can that be the cause?

> Do they happen more at peak usage, semi regularly or sporadically ?

Sporadically. Not at a specific hour or minute-of-hour. Not only at peak
times either. Sometimes at 05:xx in the morning, sometings at 10:xxh.

> Possibly some sporadic postgres process such as checkpoints of
> autovac processes kicking off. Do your logs show anything ?

As far as I understood, Auto-Vacuum and Auto-Cleaning only happen on
the master and their results get to the slaves via streaming
replication. I therefore checked the master logs and, as written,
all occurences of Auto-* are at least half an hour before or after.

> HTH,
>
> Greg W.

bye,

-christian-


Re: Queries seldomly take >4s while normally take <1ms?

From
Kevin Grittner
Date:
Christian Hammers <ch@lathspell.de> wrote:

> 9.2.3

You really need to think about 9.2.4 Real Soon Now; there's a
security fix that you probably should not wait on.

> max_connections = 1000                  # (change requires restart)
> shared_buffers = 20GB                  # min 128kB

Those are both potential causes.  For max_connections, see this:

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

Maybe you happened to have enough users hit the enter key at the
same moment to cause a process holding a lock to be starved of
cycles or something similar.

One problem with a large shared_buffers setting is that PostgreSQL
can accumulate a very large number of dirty pages and flush them to
the OS all at once.  This can overwhelm the storage system and
cause exactly the kind of symptoms you're seeing.

> effective_cache_size is way too low, I noticed, can that be the cause?

Probably not.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Queries seldomly take >4s while normally take <1ms?

From
Christian Hammers
Date:
On Tue, 9 Apr 2013 07:25:16 -0700 (PDT)
Kevin Grittner <kgrittn@ymail.com> wrote:

> Christian Hammers <ch@lathspell.de> wrote:
>
> > 9.2.3
>
> You really need to think about 9.2.4 Real Soon Now; there's a
> security fix that you probably should not wait on.

Is scheduled (no access from outside to that network segment at least)

> > max_connections = 1000                  # (change requires restart)
> > shared_buffers = 20GB                  # min 128kB
>
> Those are both potential causes.  For max_connections, see this:
>
> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
> Maybe you happened to have enough users hit the enter key at the
> same moment to cause a process holding a lock to be starved of
> cycles or something similar.

The application connects permanently with a fixed number of only 20
connections.

> One problem with a large shared_buffers setting is that PostgreSQL
> can accumulate a very large number of dirty pages and flush them to
> the OS all at once.  This can overwhelm the storage system and
> cause exactly the kind of symptoms you're seeing.

I have pretty big changes during early night hours on the master and
then almost only read-only accesses during the day. As checkpoint_timeout
is at 5min, there should not be any significant amount of dirty pages
during daytime, right?
Where would I verity this, with pg_stats_bgwriter.buffers_checkpoint and the
Linux I/O graphs?

bye,

-christian-


Re: Queries seldomly take >4s while normally take <1ms?

From
Scott Marlowe
Date:
One of the most common causes I've seen for this is linux's vm.*dirty* settings to get in the way. Like so many linux kernel "optimizations" this one looks good on paper but gives at best middling improvements with occasional io storms that block everything else.  On big mem machines doing a lot of writing IO I just set these to 0. Also tend to turn off swap as well as it's known to get in the way as well.

settings for /etc/sysctl.conf
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0



On Tue, Apr 9, 2013 at 3:41 AM, Christian Hammers <ch@lathspell.de> wrote:
Hello

I have a setup with one master and two slaves which are used by a closed
source application. The database is asked the same query, a stored procedure,
with different parameters about 4 million times per second at a peak rate of
150 times per second using 10 parallel connections. The slaves are decent
Dell servers with 64GB RAM with dual hexacore CPUs and RAID10.

Usually this stored procedure takes <1ms as it basically just does two
selects against a GIST index for a prefix_range type. Seldomly though,
about 1-3 times per day, one of these queries takes up to 4000ms!
All those queries also runs in <1ms when executed manually some times later.
Queries with similar parameters (who I supposed to use the same area of the
index) also continues to run fast during that time. Queries with different
paramers which are running parallel on different threads take <1ms, too,
so it's not a general "load problem".

Cronjobs and other applications seem quiet during that time, there is
no peak in any of our monitoring graphs. Automatic vacuum/analyze log
entries on the master are not near the timestamps in question.

So my problem seems not the query itself nor the way I indexed my data
but what could it be? Some strange effects with streaming replication
or cache invalidation?

Apologies for not giving you reproducible problem but maybe you
still have some ideas as I'm just curious as I've never seem such an
effect during my MySQL years :-) The queries contain obvious customer
data so I'm reluctant to give examples but again I doubt that
an explain plan will help if only 1 out of 4E6 queries takes too long.

bye,

-christian-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
To understand recursion, one must first understand recursion.

Re: Queries seldomly take >4s while normally take <1ms?

From
Steven Schlansker
Date:
On Apr 9, 2013, at 11:25 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> One of the most common causes I've seen for this is linux's vm.*dirty* settings to get in the way. Like so many linux
kernel"optimizations" this one looks good on paper but gives at best middling improvements with occasional io storms
thatblock everything else.  On big mem machines doing a lot of writing IO I just set these to 0. Also tend to turn off
swapas well as it's known to get in the way as well. 
>
> settings for /etc/sysctl.conf
> vm.dirty_background_ratio = 0
> vm.dirty_ratio = 0
>

I'll +1 on the "you have to tune your Linux install" advice.

I found the "PostgreSQL 9.0 High Performance" book to be worth its weight in gold.  A few days spent with the book and
researchon mailing lists improved our PostgreSQL performance multiple times over, and responsiveness under load by
ordersof magnitude. 

http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Smith/dp/184951030X

>
>
> On Tue, Apr 9, 2013 at 3:41 AM, Christian Hammers <ch@lathspell.de> wrote:
> Hello
>
> I have a setup with one master and two slaves which are used by a closed
> source application. The database is asked the same query, a stored procedure,
> with different parameters about 4 million times per second at a peak rate of
> 150 times per second using 10 parallel connections. The slaves are decent
> Dell servers with 64GB RAM with dual hexacore CPUs and RAID10.
>
> Usually this stored procedure takes <1ms as it basically just does two
> selects against a GIST index for a prefix_range type. Seldomly though,
> about 1-3 times per day, one of these queries takes up to 4000ms!
> All those queries also runs in <1ms when executed manually some times later.
> Queries with similar parameters (who I supposed to use the same area of the
> index) also continues to run fast during that time. Queries with different
> paramers which are running parallel on different threads take <1ms, too,
> so it's not a general "load problem".
>
> Cronjobs and other applications seem quiet during that time, there is
> no peak in any of our monitoring graphs. Automatic vacuum/analyze log
> entries on the master are not near the timestamps in question.
>
> So my problem seems not the query itself nor the way I indexed my data
> but what could it be? Some strange effects with streaming replication
> or cache invalidation?
>
> Apologies for not giving you reproducible problem but maybe you
> still have some ideas as I'm just curious as I've never seem such an
> effect during my MySQL years :-) The queries contain obvious customer
> data so I'm reluctant to give examples but again I doubt that
> an explain plan will help if only 1 out of 4E6 queries takes too long.
>
> bye,
>
> -christian-
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> To understand recursion, one must first understand recursion.



Re: Queries seldomly take >4s while normally take <1ms?

From
Scott Marlowe
Date:
On Tue, Apr 9, 2013 at 12:37 PM, Steven Schlansker <steven@likeness.com> wrote:

On Apr 9, 2013, at 11:25 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> One of the most common causes I've seen for this is linux's vm.*dirty* settings to get in the way. Like so many linux kernel "optimizations" this one looks good on paper but gives at best middling improvements with occasional io storms that block everything else.  On big mem machines doing a lot of writing IO I just set these to 0. Also tend to turn off swap as well as it's known to get in the way as well.
>
> settings for /etc/sysctl.conf
> vm.dirty_background_ratio = 0
> vm.dirty_ratio = 0
>

I'll +1 on the "you have to tune your Linux install" advice.

I found the "PostgreSQL 9.0 High Performance" book to be worth its weight in gold.  A few days spent with the book and research on mailing lists improved our PostgreSQL performance multiple times over, and responsiveness under load by orders of magnitude.

http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Smith/dp/184951030X

Yep.  That's probably the single most useful performance tuning book anyone working with dbs can own. Even if you don't run postgresql, the hardware tuning and testing section is fantastic.