Thread: EXPLAIN SELECT .. does not return

EXPLAIN SELECT .. does not return

From
David Link
Date:
Hi, This has become a major problem for us.  Thank you in advance for
your help.

OS: SUSE Linux 2.6.5-7.191-bigsmp
PostgreSQL: 7.4.8
Application: ModPerl Web application using DBI.pm
Database size: 100 Gb, 1025 Tables.

Problem: EXPLAIN SELECT ... does not return.

Description:

The Application uses an EXPLAIN cost to determine whether a client's dynamic
request for data is too demanding for the server so it can gracefully deny
them.  (Currently, anything over cost=0.00..500000.00).

The system gets about 3000 page requests a day.

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return.  As a result they seem to stay churning in the
system.  Once that happens other queries build up and the performance of the
whole database server grinds to a halt.  Postgresql never dies, but
eventually, user requests start timing out.

This happens on average two or three times a week.  I kill an offending
process and all's well again.  I have not been able to identify with
certainty
an offending SQL statement.

Config params, that have changed from default:

tcpip_socket = true
max_connections = 200
shared_buffers = 2000
sort_mem = 1048576
vacuum_mem = 65536
max_fsm_pages = 100000
max_fsm_relations = 1000
max_files_per_process = 1000
fsync = false
wal_sync_method = fsync
wal_buffers = 800
checkpoint_segments = 30
commit_delay = 100
commit_siblings = 50
effective_cache_size = 1000
random_page_cost = 4
geqo = true
geqo_threshold = 14
default_statistics_target = 100
from_collapse_limit = 13
join_collapse_limit = 13

Note: we load lumps of data ea. week.  Then primarily it is a readonly
database.


Re: EXPLAIN SELECT .. does not return

From
Jaime Casanova
Date:
On 12/5/05, David Link <dlink@soundscan.com> wrote:
> Hi, This has become a major problem for us.  Thank you in advance for
> your help.
>
> OS: SUSE Linux 2.6.5-7.191-bigsmp
> PostgreSQL: 7.4.8
> Application: ModPerl Web application using DBI.pm
> Database size: 100 Gb, 1025 Tables.
>
> Problem: EXPLAIN SELECT ... does not return.
>
> Description:
>
> The Application uses an EXPLAIN cost to determine whether a client's dynamic
> request for data is too demanding for the server so it can gracefully deny
> them.  (Currently, anything over cost=0.00..500000.00).
>
> The system gets about 3000 page requests a day.
>
> Certain SQL Queries, I believe those with many table joins, when run as
> EXPLAIN plans, never return.  As a result they seem to stay churning in the
> system.  Once that happens other queries build up and the performance of the
> whole database server grinds to a halt.  Postgresql never dies, but
> eventually, user requests start timing out.
>
> This happens on average two or three times a week.  I kill an offending
> process and all's well again.  I have not been able to identify with
> certainty
> an offending SQL statement.
>
> Config params, that have changed from default:
>
> tcpip_socket = true
> max_connections = 200
> shared_buffers = 2000
> sort_mem = 1048576
> vacuum_mem = 65536
> max_fsm_pages = 100000
> max_fsm_relations = 1000
> max_files_per_process = 1000
> fsync = false
> wal_sync_method = fsync
> wal_buffers = 800
> checkpoint_segments = 30
> commit_delay = 100
> commit_siblings = 50
> effective_cache_size = 1000
> random_page_cost = 4
> geqo = true
> geqo_threshold = 14
> default_statistics_target = 100
> from_collapse_limit = 13
> join_collapse_limit = 13
>
> Note: we load lumps of data ea. week.  Then primarily it is a readonly
> database.
>
>

when you have thoses cases, you can take a look in pg_stats_activity
to find the offending query...

or simply logs all queries

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: EXPLAIN SELECT .. does not return

From
Tom Lane
Date:
David Link <dlink@soundscan.com> writes:
> Certain SQL Queries, I believe those with many table joins, when run as
> EXPLAIN plans, never return.

I'd guess that one or all of these settings are excessive:

> geqo_threshold = 14
> from_collapse_limit = 13
> join_collapse_limit = 13

Keep in mind that the planning cost is exponential in these limits,
eg geqo_threshold = 14 probably allows planning times about 14 times
greater than geqo_threshold = 13.

While I'm looking:

> shared_buffers = 2000

That seems extremely low for modern machines.

> sort_mem = 1048576

That, on the other hand, is almost certainly way too high for a system-wide
setting.  You're promising you have 1Gb available for *each* sort.

> max_fsm_pages = 100000

And this way too low for a 100Gb database, unless most of the tables
never see any UPDATEs or DELETEs.

> wal_buffers = 800

Seems a bit high, especially considering you have fsync disabled and
thus there is no benefit whatever to buffering WAL.

> commit_delay = 100
> commit_siblings = 50

Have you measured any benefit to having this turned on?

All in all it looks like your configuration settings were chosen by
throwing darts :-(

            regards, tom lane

Re: EXPLAIN SELECT .. does not return

From
David Link
Date:
Jaime Casanova wrote:

>when you have thoses cases, you can take a look in pg_stats_activity
>to find the offending query...
>
>or simply logs all queries
>
>
Thanks for the advice.  I also turned on stat_command_string


Re: EXPLAIN SELECT .. does not return

From
David Link
Date:
Tom Lane wrote:

>David Link <dlink@soundscan.com> writes:
>
>
>>Certain SQL Queries, I believe those with many table joins, when run as
>>EXPLAIN plans, never return.
>>
>>
>
>I'd guess that one or all of these settings are excessive:
>
>
>
>>geqo_threshold = 14
>>from_collapse_limit = 13
>>join_collapse_limit = 13
>>
>>
>
>Keep in mind that the planning cost is exponential in these limits,
>eg geqo_threshold = 14 probably allows planning times about 14 times
>greater than geqo_threshold = 13.
>
>While I'm looking:
>
>
>
>>shared_buffers = 2000
>>
>>
>
>That seems extremely low for modern machines.
>
>
>
>>sort_mem = 1048576
>>
>>
>
>That, on the other hand, is almost certainly way too high for a system-wide
>setting.  You're promising you have 1Gb available for *each* sort.
>
>
>
>>max_fsm_pages = 100000
>>
>>
>
>And this way too low for a 100Gb database, unless most of the tables
>never see any UPDATEs or DELETEs.
>
>
>
>>wal_buffers = 800
>>
>>
>
>Seems a bit high, especially considering you have fsync disabled and
>thus there is no benefit whatever to buffering WAL.
>
>
>
>>commit_delay = 100
>>commit_siblings = 50
>>
>>
>
>Have you measured any benefit to having this turned on?
>
>All in all it looks like your configuration settings were chosen by
>throwing darts :-(
>
>            regards, tom lane
>
>
>
Thanks for your reply, Tom.  Different folks have made different
suggestions.  Can you suggest more reasonable values for these?  But
more importantly, do you think the problem I am having is due to these
configuration short comings?

Thanks much.
David




Re: EXPLAIN SELECT .. does not return

From
"Joshua D. Drake"
Date:
>>
>>
> Thanks for your reply, Tom.  Different folks have made different
> suggestions.  Can you suggest more reasonable values for these?  But
> more importantly, do you think the problem I am having is due to these
> configuration short comings?
>
> Thanks much.
David take a look at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Joshua D. Drake



> David
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


Re: EXPLAIN SELECT .. does not return

From
Tom Lane
Date:
David Link <dlink@soundscan.com> writes:
> more importantly, do you think the problem I am having is due to these
> configuration short comings?

Yeah, the planning thresholds ...

            regards, tom lane