Re: sytem log audit/reporting and psql - Mailing list pgsql-performance

From Andreas Haumer
Subject Re: sytem log audit/reporting and psql
Date
Msg-id 4637734A.6070802@xss.co.at
Whole thread Raw
In response to sytem log audit/reporting and psql  (Fei Liu <fei.liu@aepnetworks.com>)
List pgsql-performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Fei Liu schrieb:
[...]
>
> Now here are my questions:

These are a lot of questions, and some of them are not related
to pgsql-performance or even PostgreSQL.

I'll try to answer some of them, because I'm currently experimenting
with partitioned tables, too.


> 2) Is partitioning a good approach to speed up log query/view? The user
> comment in partitioning in pgsql manual seems to indicate partitioning
> may be slower than non-partitioned table under certain circumstances.

You can look at table partitioning under several points of view.
My experience with table partitioning under the aspect of "performance" is:

*) It is a benefit for large tables only, and the definition of
   "large" depends on your data.
   I did some testing for an application we are developing and here
   it shows that table partitioning does not seem to make sense for
   tables with less than 10 million rows (perhaps even more)

*) The performance benefit depends on your queries. Some queries get
   a big improvement, but some queries might even run significantly
   slower.

*) Depending on the way you setup your system, inserts can be much
   slower with partitioned tables (e.g. if you are using triggers
   to automatically create your partitions on demand)

> 3) How to avoid repetitive log entry scanning since my cron job script
> is run daily but logrotate runs weekly? This means everytime my script
> will be parsing duplicate entries.

This has nothing to do with postgres, but I wrote something similar years
ago. Here's what I did and what you could do:

Remember the last line of your logfile in some external file (or even the
database). Then on the next run you can read the logfile again line by line
and skip all lines until you have found the line you saved on the last run.
* If you find the line that way, just start parsing the logfile beginning
  at the next line.
* If you can not find your line and you reach EOF, start parsing again at
  the beginning of the logfile.
* If this is your first run and you don't have a line stored yet, start
  parsing at the beginning of the logfile

When you are finished you have to remember the last line from the logfile
again at some place.


> 6) What are the best approaches to analyze postgresql query performance
> and how to improve postgresql query performance?

Here are some general recommendations for performance testing from
my experience:

*) Test with real data. For table partitioning this means you have
   to create really large datasets to make your tests useful.
   You should write a small program to generate your test data if
   possible or use some other means to create your test database.
   You also need time: creating a test database and importing 100
   million rows of test data will take several hours or even days!

*) Test with the real queries from your application!
   Testing with just a few easy standard queries will almost for sure
   not be sufficient to get the right numbers for the performance you
   will see in your application later on!
   Look at the thread "Query performance problems with partitioned tables"
   I started on pgsql-performance just yesterday to see what I mean!

*) Use "EXPLAIN ANALYZE" and look at the "cost" and "actual time" numbers
   this gives you. It also will show you the query plan used by PostgreSQL
   when executing your query. You sometimes might be surprised what is going
   on behind the scenes...

*) If you are just using some stopwatch to time your queries be aware
   of other factors which might significantly influence your test:
   Caching, other jobs running on the machine in parallel, cosmic rays, ...

*) Before running your tests you should always try to get to some well
   defined starting point (this might even mean rebooting your server
   before running each test) and you should always repeat each test
   several times and then calculate a mean value (and standard deviation
   to see how "good" your results are...)

*) Document your test setup and procedure as well as your results
   (otherwise two days later you won't remember which test obtained
   what result)

HTH

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGN3NIxJmyeGcXPhERAspaAJ9MgymiwyehN6yU6jGtA0pbkdolsACfb6JC
kB5KLyQ5WOTUD9uabVzsjwY=
=3QSa
-----END PGP SIGNATURE-----

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Feature Request --- was: PostgreSQL Performance Tuning
Next
From: Andrew Lazarus
Date:
Subject: Re: index structure for 114-dimension vector