Thread: How to speed up the first-time-searching in pgsql?

How to speed up the first-time-searching in pgsql?

From
zxo102 ouyang
Date:
Hi all,
    I have a table which has more than 10millions records in pgsql which is running on window 2003. During night, nobody search the database. In morning, when people start to the application, it will take more than 30 seconds to get database back. After several times of same searching, the results can be returned in 8 seconds.
    Is this related to some kinds of process priority setting in window 2003? If so, how do I set the pgsql processes in  highest priority?
 
Thanks in advance.
 
ouyang

Re: How to speed up the first-time-searching in pgsql?

From
John R Pierce
Date:
zxo102 ouyang wrote:
> Hi all,
>     I have a table which has more than 10millions records in pgsql
> which is running on window 2003. During night, nobody search the
> database. In morning, when people start to the application, it will
> take more than 30 seconds to get database back. After several times of
> same searching, the results can be returned in 8 seconds.
>     Is this related to some kinds of process priority setting in
> window 2003? If so, how do I set the pgsql processes in  highest priority?

are there other applications running on this same server?   my first
guess is, other programs are doing disk IO when the postgres database is
idle, and pushing the postgres stuff out of the in-memory cache.
Windows is pretty aggressive about reclaiming idle memory, and there's
not much you can do about it.

maybe schedule something to run in the AM shortly before the regular
database users show up which will do queries that force a full table
scan on each table, this will cause all the imporant data to swap back in.

Re: How to speed up the first-time-searching in pgsql?

From
Scott Marlowe
Date:
On Tue, May 26, 2009 at 5:57 PM, zxo102 ouyang <zxo102@gmail.com> wrote:
> Hi all,
>     I have a table which has more than 10millions records in pgsql which is
> running on window 2003. During night, nobody search the database.
> In morning, when people start to the application, it will take more than 30
> seconds to get database back. After several times of same searching, the
> results can be returned in 8 seconds.
>     Is this related to some kinds of process priority setting in window
> 2003? If so, how do I set the pgsql processes in  highest priority?

There are two types of "look how we've optimized our OS and now your
database server runs like crap" common scenarios.  The most common is
that the one where the OS has simply stopped caching your database
files because other things are happening.  Not much you can do about
that one.  The other is the VM in your OS slowly swapping out all of
pgsql's shared_buffers because they appear idle, making even more room
to cache files on the machine for processes that are not pgsql.  Which
is why the other poster has asked what other things this server does.

If the OS is busy swapping out idle chunks of memory for more cache,
you can stop it in linux by adjusting the vm.swappiness setting.  No
clue how to do that in windows, but googling on swappiness and windows
might help.

It's a good idea to put a database on its own server for these
reasons.  Also, in the morning, have a cron job crank up that does
"select * from mybigtable" for each big table to load it into cache.

This is possibly made worse if you've lowered your random_page_cost to
near 1, and have effective_cache_size cranked up.  Those settings are
likely right for your setup, but first thing in the morning they're
wrong.  Actual random page cost really is 10 or more, and the
effective cache size means nothing because the kernel cache is full of
stuff that's NOT pgsql files.  In which case the tendency towards
index access and not seq scan is really gonna cost you.  Hence the
need for the select * from bigtable queries to prime the pump.

Re: How to speed up the first-time-searching in pgsql?

From
Greg Smith
Date:
On Tue, 26 May 2009, Scott Marlowe wrote:

> Also, in the morning, have a cron job crank up that does "select * from
> mybigtable" for each big table to load it into cache.

Just to clarify:  on 8.3 and later versions, doing this doesn't do what
some people expect.  Sequential scans like that will continuously re-use a
256KB section of the PostgreSQL shared_buffers space, so this won't cause
all of that to get paged back in if the problem is related to it being
swapped out.  It will pass everything through the OS buffer cache though
and prime it usefully, which might be all that's actually needed.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: How to speed up the first-time-searching in pgsql?

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> On Tue, 26 May 2009, Scott Marlowe wrote:
>> Also, in the morning, have a cron job crank up that does "select * from
>> mybigtable" for each big table to load it into cache.

> Just to clarify:  on 8.3 and later versions, doing this doesn't do what
> some people expect.  Sequential scans like that will continuously re-use a
> 256KB section of the PostgreSQL shared_buffers space, so this won't cause
> all of that to get paged back in if the problem is related to it being
> swapped out.  It will pass everything through the OS buffer cache though
> and prime it usefully, which might be all that's actually needed.

Bearing in mind that this is a Windows server ... I seem to recall that
the conventional wisdom is still to keep shared_buffers relatively small
on Windows.  So priming the OS cache is exactly what it's about.
(Keeping that down should also help avoid the other scenario Scott was
worried about, where shared memory itself gets paged out.)

            regards, tom lane

Re: How to speed up the first-time-searching in pgsql?

From
Scott Marlowe
Date:
On Tue, May 26, 2009 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
>> On Tue, 26 May 2009, Scott Marlowe wrote:
>>> Also, in the morning, have a cron job crank up that does "select * from
>>> mybigtable" for each big table to load it into cache.
>
>> Just to clarify:  on 8.3 and later versions, doing this doesn't do what
>> some people expect.  Sequential scans like that will continuously re-use a
>> 256KB section of the PostgreSQL shared_buffers space, so this won't cause
>> all of that to get paged back in if the problem is related to it being
>> swapped out.  It will pass everything through the OS buffer cache though
>> and prime it usefully, which might be all that's actually needed.
>
> Bearing in mind that this is a Windows server ... I seem to recall that
> the conventional wisdom is still to keep shared_buffers relatively small
> on Windows.  So priming the OS cache is exactly what it's about.
> (Keeping that down should also help avoid the other scenario Scott was
> worried about, where shared memory itself gets paged out.)

Yeah, I thought it was pretty obvious I was talking OS cache up there.

Re: How to speed up the first-time-searching in pgsql?

From
zxo102 ouyang
Date:
Hi there,
Thanks for your suggestions. I do have an application running on the machine all the time. In fact, the application keeps writing real-time monitoring data into the database. Based on my understanding of your messages, I can't do anything to speed up the first-time-searching. Probably I can give a waiting process bar to the users and let them wait for the results.

Thanks for your help.

ouyang

2009/5/27 Scott Marlowe <scott.marlowe@gmail.com>
On Tue, May 26, 2009 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
>> On Tue, 26 May 2009, Scott Marlowe wrote:
>>> Also, in the morning, have a cron job crank up that does "select * from
>>> mybigtable" for each big table to load it into cache.
>
>> Just to clarify:  on 8.3 and later versions, doing this doesn't do what
>> some people expect.  Sequential scans like that will continuously re-use a
>> 256KB section of the PostgreSQL shared_buffers space, so this won't cause
>> all of that to get paged back in if the problem is related to it being
>> swapped out.  It will pass everything through the OS buffer cache though
>> and prime it usefully, which might be all that's actually needed.
>
> Bearing in mind that this is a Windows server ... I seem to recall that
> the conventional wisdom is still to keep shared_buffers relatively small
> on Windows.  So priming the OS cache is exactly what it's about.
> (Keeping that down should also help avoid the other scenario Scott was
> worried about, where shared memory itself gets paged out.)

Yeah, I thought it was pretty obvious I was talking OS cache up there.

Re: How to speed up the first-time-searching in pgsql?

From
Richard Huxton
Date:
zxo102 ouyang wrote:
> Hi there,
> Thanks for your suggestions. I do have an application running on the machine
> all the time. In fact, the application keeps writing real-time monitoring
> data into the database. Based on my understanding of your messages, I can't
> do anything to speed up the first-time-searching. Probably I can give a
> waiting process bar to the users and let them wait for the results.

No, I think you missed the detail on some of the answers. There are
limitations (as discussed in the answers), but the simple suggestion in
the first answer will probably help a lot.

Set up a scheduled task to run a big search of the database an hour
before people start work. This can be as simple as a .BAT file running
"SELECT * FROM big_table" triggered by Windows' Task Scheduler.

--
   Richard Huxton
   Archonet Ltd

Re: How to speed up the first-time-searching in pgsql?

From
"Tim Bruce - Postgres"
Date:
On Thu, May 28, 2009 01:23, Richard Huxton wrote:
> zxo102 ouyang wrote:
>> Hi there,
>> Thanks for your suggestions. I do have an application running on the
>> machine
>> all the time. In fact, the application keeps writing real-time
>> monitoring
>> data into the database. Based on my understanding of your messages, I
>> can't
>> do anything to speed up the first-time-searching. Probably I can give a
>> waiting process bar to the users and let them wait for the results.
>
> No, I think you missed the detail on some of the answers. There are
> limitations (as discussed in the answers), but the simple suggestion in
> the first answer will probably help a lot.
>
> Set up a scheduled task to run a big search of the database an hour
> before people start work. This can be as simple as a .BAT file running
> "SELECT * FROM big_table" triggered by Windows' Task Scheduler.
>
> --
>    Richard Huxton
>    Archonet Ltd
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

As a point of note - if you're running Windows XP / Windows NT or higher
(i.e., not Win 9x) you should use .CMD files instead of traditional .BAT
files.  While they accomplish the same thing, the .BAT files run in a
"shared" 16-bit environment as opposed to the 32-bit (or 64-bit) memory
space.  Any application that crashes in the 16-bit environment will crash
or cause instability in other applications running in the same 16-bit
environment (since it's shared).

Functionally, there is no difference (other than expanded functions) in
the .CMD scripts.  All .BAT commands work the same in .CMD - you just need
to rename your .BAT files to .CMD files.

Tim
--
Timothy J. Bruce