Thread: Query kills machine.

Query kills machine.

From
Stef
Date:
Hi all,

I've attached all the query in query.sql

I'm using postgres 7.3.4 on Linux version 2.4.26-custom
( /proc/sys/vm/overcommit_memory  = 0  this time )

free :
             total       used       free     shared    buffers     cached
Mem:       1810212    1767384      42828          0       5604    1663908
-/+ buffers/cache:      97872    1712340
Swap:       505912     131304     374608

After I rebuilt the database, the query was fast (28255.12 msec).
After one night's insertion into the tables that the query select from,
the query all of a sudden uses up all resources , and the kernel
starts swapping, and I haven't seen the query actually finish when
this happens. I did vacuum analyze AND reindex, but that doesn't
help.

I attached the explain analyze of the query before this happens, and
the explain plan from when it actually happens that the query doesn't finish.

The one noticeable difference, was that before, it used merge joins, and
after, it used hash joins.

When the query was slow, I tried to : set enable_hashjoin to off
for this query, and the query finished relatively fast again (316245.16 msec)

I attached the output of that explain analyze as well, as well as the postgres
settings.

Can anyone shed some light on what's happening here. I can't figure it out.

Kind Regards
Stefan

Attachment

Re: Query kills machine.

From
Christopher Kings-Lynne
Date:
For starters,


> ------------------------------------------------------------------------
>
> shared_buffers = 110592
> wal_buffers = 400
> sort_mem = 30720
> vacuum_mem = 10240
> checkpoint_segments = 30
> commit_delay = 5000
> commit_siblings = 100
> effective_cache_size = 201413

Try more like this:

shared_buffers = 30000
wal_buffers = <default>
sort_mem = 4096
vacuum_mem = 10240
checkpoint_segments = 30
commit_delay = <default>
commit_siblings = <default>
effective_cache_size = 100000

Chris


Re: Query kills machine.

From
Stef
Date:
Christopher Kings-Lynne mentioned :
=> sort_mem = 4096

Reducing sort_mem to 4096 seems to make it run in a reasonable time
again. Any idea why? The database does a whole lot of huge sorts
every day, so I thought upping this parameter would help.

A couple of queries do seem to run slower now that I reduced
the sort_mem.

The shared buffers still makes a significant difference when I increase it.

Kind Regards
Stefan


Re: Query kills machine.

From
"Magnus Naeslund(pg)"
Date:
Stef wrote:

 > Christopher Kings-Lynne mentioned :
 > => sort_mem = 4096
 >
 > Reducing sort_mem to 4096 seems to make it run in a reasonable time
 > again. Any idea why? The database does a whole lot of huge sorts
 > every day, so I thought upping this parameter would help.
 >
 > A couple of queries do seem to run slower now that I reduced
 > the sort_mem.
 > The shared buffers still makes a significant difference when I
increase it.
 >

Well you have to take in account that sort_mem is not the total memory
allocated for sorting but per connection and in complex expressions
serveral times that too.

So if you sort a lot it can push your operating system off the cliff and
it might start reaping things that shouldn't be reaped and start swapping.

If that happens _everything_ on that box will get slow...

Shared buffers on the other hand is only allocated once.

Regards,
Magnus


Re: Query kills machine.

From
Tom Lane
Date:
Stef <svb@ucs.co.za> writes:
> Reducing sort_mem to 4096 seems to make it run in a reasonable time
> again. Any idea why? The database does a whole lot of huge sorts
> every day, so I thought upping this parameter would help.

Not if you haven't got the RAM to support it :-(

Another thing you might look at is ANALYZEing the tables again after
you've loaded all the new data.  The row-count estimates seem way off
in these plans.  You might need to increase the statistics target,
too, to get better plans.

            regards, tom lane

Re: Query kills machine.

From
"Magnus Naeslund(t)"
Date:
Stef wrote:

> Christopher Kings-Lynne mentioned :
> => sort_mem = 4096
>
> Reducing sort_mem to 4096 seems to make it run in a reasonable time
> again. Any idea why? The database does a whole lot of huge sorts
> every day, so I thought upping this parameter would help.
>
> A couple of queries do seem to run slower now that I reduced
> the sort_mem.
>
> The shared buffers still makes a significant difference when I increase it.
>

Well you have to take in account that sort_mem is not the total memory
allocated for sorting but per connection and in complex expressions
serveral times that too.

So if you sort a lot it can push your operating system off the cliff and
it might start reaping things that shouldn't be reaped and start swapping.

If that happens _everything_ on that box will get slow...

Shared buffers on the other hand is only allocated once.

Regards,
Magnus

Re: Query kills machine.

From
Stef
Date:
Tom Lane mentioned :
=> Not if you haven't got the RAM to support it :-(
=>
=> Another thing you might look at is ANALYZEing the tables again after
=> you've loaded all the new data.  The row-count estimates seem way off
=> in these plans.  You might need to increase the statistics target,
=> too, to get better plans.

Thanks Tom, Christopher and Magnus!

I tested this, and found the correct sort_mem setting for my situation.
I'm testing a new default_statistics_target setting.
This is something I never considered.

Kind Regards
Stefan