Re: Performance Bottleneck - Mailing list pgsql-performance

From Martin Foster
Subject Re: Performance Bottleneck
Date
Msg-id 4111109D.7020006@ethereal-realms.org
Whole thread Raw
In response to Performance Bottleneck  (Martin Foster <martin@ethereal-realms.org>)
List pgsql-performance
Gaetano Mendola wrote:

> Martin Foster wrote:
>
>> Gaetano Mendola wrote:
>>
>>> Martin Foster wrote:
>>>
>>>> I run a Perl/CGI driven website that makes extensive use of
>>>> PostgreSQL (7.4.3) for everything from user information to
>>>> formatting and display of specific sections of the site.   The
>>>> server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and
>>>> 2 x 120GB hard drives mirrored for redundancy running under FreeBSD
>>>> 5.2.1 (AMD64).
>>>>
>>>> Recently loads on the site have increased during peak hours to the
>>>> point of showing considerable loss in performance.    This can be
>>>> observed when connections move from the 120 concurrent connections
>>>> to PostgreSQL to roughly 175 or more.     Essentially, the machine
>>>> seems to struggle to keep up with continual requests and slows down
>>>> respectively as resources are tied down.
>>>>
>>>> Code changes have been made to the scripts to essentially back off
>>>> in high load working environments which have worked to an extent.
>>>> However, as loads continue to increase the database itself is not
>>>> taking well to the increased traffic taking place.
>>>>
>>>> Having taken a look at 'Tuning PostgreSQL for Performance'
>>>> (http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as
>>>> best I could in order to set my settings.    However, even with
>>>> statistics disabled and ever setting tweaked things still consider
>>>> to deteriorate.
>>>>
>>>> Is there anything anyone can recommend in order to give the system a
>>>> necessary speed boost?   It would seem to me that a modest dataset
>>>> of roughly a Gig combined with that type of hardware should be able
>>>> to handle substantially more load then what it is.  Can anyone
>>>> provide me with clues as where to pursue?    Would disabling 'fsync'
>>>> provide more performance if I choose that information may be lost in
>>>> case of a crash?
>>>>
>>>> If anyone needs access to logs, settings et cetera.   Please ask, I
>>>> simply wish to test the waters first on what is needed. Thanks!
>>>
>>>
>>>
>>>
>>> Tell us about your tipical queries, show us your configuration file.
>>> The access are only in read only mode or do you have concurrent writers
>>> and readers ? During peak hours your processors are tied to 100% ?
>>> What say the vmstat and the iostat ?
>>>
>>> May be you are not using indexes some where, or may be yes but the
>>> planner is not using it... In two words we needs other informations
>>> in order to help you.
>>>
>>>
>>>
>>> Regards
>>> Gaetano Mendola
>>>
>>>
>>
>> I included all the files in attachments, which will hopefully cut down
>> on any replied to Emails.    As for things like connection pooling,
>> the web server makes use of Apache::DBI to pool the connections for
>> the Perl scripts being driven on that server.    For the sake of being
>> thorough, a quick 'apachectl status' was thrown in when the database
>> was under a good load.
>
>
> Let start from your postgres configuration:
>
> shared_buffers = 8192    <==== This is really too small for your
> configuration
> sort_mem = 2048
>
> wal_buffers = 128    <==== This is really too small for your configuration
>
> effective_cache_size = 16000
>
> change this values in:
>
> shared_buffers = 50000
> sort_mem = 16084
>
> wal_buffers = 1500
>
> effective_cache_size = 32000
>
>
> to bump up the shm usage you have to configure your OS in order to be
> allowed to use that ammount of SHM.
>
> This are the numbers that I feel good for your HW, the second step now is
> analyze your queries
>
>> The queries themselves are simple, normally drawing information from
>> one table with few conditions or in the most complex cases using joins
>> on two table or sub queries.   These behave very well and always have,
>> the problem is that these queries take place in rather large amounts
>> due to the dumb nature of the scripts themselves.
>
>
> Show us the explain analyze on that queries, how many rows the tables are
> containing, the table schema could be also usefull.
>
>
>
> regards
> Gaetano Mendola
>

I will look into moving up those values and seeing how they interact
with the system once I get back from work.    Since it was requested, I
have a visual representation of an older schema, one that was used under
MySQL.  Note that all of the timestamps are now properly set to
LOCALTIME on PostgreSQL.

http://prdownloads.sourceforge.net/ethereal-realms/ethereal-3_0_0.png?download

The amount of rows for tables of note are as follows:
  Puppeteer 1606
  Puppet    33176
  Realm     83
  Post      36156
  Audit     61961

The post table is continually cleared of old information since the
nature of the information is time very critical and archiving would only
hinder performance.    As a result, this will vary wildly based on time
of day since users (Puppeteers) tend to post more during peak hours.

NOTE:   The scripts make use of different schema's with the same
         information in order to virtualize the script in order
         to support more then one site on the same hardware.

On a side note, this would be a normal post-authentication session once
in realm for getting new posts:
  * Script is executed and schema is determined through stored procedure;
  * Formatting information is fetched from Tag and RealmDesign as needed;
  * Script will retrieve stored parameters in the Param table;
  * Script will decode, analyze and authenticate against Puppeteer;
  * Script will scan the Puppet and Post tables to generate posts;
      * Sub-query to determine ignored puppeteers/users;
      * Sub-query to determine ignored puppets/handles; and
  * Loop above if necessary until expiry of script delaying
    the execution of the script from 5 to 25 seconds.

This should provide an idea on that portion.   of course the flow
changes when one posts, but is handled by a different script instance as
is authentication et cetera.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



pgsql-performance by date:

Previous
From: Martin Foster
Date:
Subject: Re: Performance Bottleneck
Next
From: Pierre-Frédéric Caillaud
Date:
Subject: Re: Performance Bottleneck