Re: sytem log audit/reporting and psql - Mailing list pgsql-performance
From | Andreas Haumer |
---|---|
Subject | Re: sytem log audit/reporting and psql |
Date | |
Msg-id | 4637734A.6070802@xss.co.at Whole thread Raw |
In response to | sytem log audit/reporting and psql (Fei Liu <fei.liu@aepnetworks.com>) |
List | pgsql-performance |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Fei Liu schrieb: [...] > > Now here are my questions: These are a lot of questions, and some of them are not related to pgsql-performance or even PostgreSQL. I'll try to answer some of them, because I'm currently experimenting with partitioned tables, too. > 2) Is partitioning a good approach to speed up log query/view? The user > comment in partitioning in pgsql manual seems to indicate partitioning > may be slower than non-partitioned table under certain circumstances. You can look at table partitioning under several points of view. My experience with table partitioning under the aspect of "performance" is: *) It is a benefit for large tables only, and the definition of "large" depends on your data. I did some testing for an application we are developing and here it shows that table partitioning does not seem to make sense for tables with less than 10 million rows (perhaps even more) *) The performance benefit depends on your queries. Some queries get a big improvement, but some queries might even run significantly slower. *) Depending on the way you setup your system, inserts can be much slower with partitioned tables (e.g. if you are using triggers to automatically create your partitions on demand) > 3) How to avoid repetitive log entry scanning since my cron job script > is run daily but logrotate runs weekly? This means everytime my script > will be parsing duplicate entries. This has nothing to do with postgres, but I wrote something similar years ago. Here's what I did and what you could do: Remember the last line of your logfile in some external file (or even the database). Then on the next run you can read the logfile again line by line and skip all lines until you have found the line you saved on the last run. * If you find the line that way, just start parsing the logfile beginning at the next line. * If you can not find your line and you reach EOF, start parsing again at the beginning of the logfile. * If this is your first run and you don't have a line stored yet, start parsing at the beginning of the logfile When you are finished you have to remember the last line from the logfile again at some place. > 6) What are the best approaches to analyze postgresql query performance > and how to improve postgresql query performance? Here are some general recommendations for performance testing from my experience: *) Test with real data. For table partitioning this means you have to create really large datasets to make your tests useful. You should write a small program to generate your test data if possible or use some other means to create your test database. You also need time: creating a test database and importing 100 million rows of test data will take several hours or even days! *) Test with the real queries from your application! Testing with just a few easy standard queries will almost for sure not be sufficient to get the right numbers for the performance you will see in your application later on! Look at the thread "Query performance problems with partitioned tables" I started on pgsql-performance just yesterday to see what I mean! *) Use "EXPLAIN ANALYZE" and look at the "cost" and "actual time" numbers this gives you. It also will show you the query plan used by PostgreSQL when executing your query. You sometimes might be surprised what is going on behind the scenes... *) If you are just using some stopwatch to time your queries be aware of other factors which might significantly influence your test: Caching, other jobs running on the machine in parallel, cosmic rays, ... *) Before running your tests you should always try to get to some well defined starting point (this might even mean rebooting your server before running each test) and you should always repeat each test several times and then calculate a mean value (and standard deviation to see how "good" your results are...) *) Document your test setup and procedure as well as your results (otherwise two days later you won't remember which test obtained what result) HTH - - andreas - -- Andreas Haumer | mailto:andreas@xss.co.at *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGN3NIxJmyeGcXPhERAspaAJ9MgymiwyehN6yU6jGtA0pbkdolsACfb6JC kB5KLyQ5WOTUD9uabVzsjwY= =3QSa -----END PGP SIGNATURE-----
pgsql-performance by date: