Thread: Need for speed
Hello, one of our services is click counting for on line advertising. We do this by importing Apache log files every five minutes. This results in a lot of insert and delete statements. At the same time our customers shall be able to do on line reporting. We have a box with Linux Fedora Core 3, Postgres 7.4.2 Intel(R) Pentium(R) 4 CPU 2.40GHz 2 scsi 76GB disks (15.000RPM, 2ms) I did put pg_xlog on another file system on other discs. Still when several users are on line the reporting gets very slow. Queries can take more then 2 min. I need some ideas how to improve performance in some orders of magnitude. I already thought of a box with the whole database on a ram disc. So really any idea is welcome. Ulrich -- Ulrich Wisser / System Developer RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769 ________________________________________________________________ http://www.relevanttraffic.com
Ulrich Wisser wrote: > Hello, > > one of our services is click counting for on line advertising. We do > this by importing Apache log files every five minutes. This results in a > lot of insert and delete statements. At the same time our customers > shall be able to do on line reporting. > I need some ideas how to improve performance in some orders of > magnitude. I already thought of a box with the whole database on a ram > disc. So really any idea is welcome. So what's the problem - poor query plans? CPU saturated? I/O saturated? Too much context-switching? What makes it worse - adding another reporting user, or importing another logfile? -- Richard Huxton Archonet Ltd
Ulrich Wisser wrote: > Hello, > > one of our services is click counting for on line advertising. We do > this by importing Apache log files every five minutes. This results in a > lot of insert and delete statements. At the same time our customers > shall be able to do on line reporting. What are you deleting? I can see having a lot of updates and inserts, but I'm trying to figure out what the deletes would be. Is it just that you completely refill the table based on the apache log, rather than doing only appending? Or are you deleting old rows? > > We have a box with > Linux Fedora Core 3, Postgres 7.4.2 > Intel(R) Pentium(R) 4 CPU 2.40GHz > 2 scsi 76GB disks (15.000RPM, 2ms) > > I did put pg_xlog on another file system on other discs. > > Still when several users are on line the reporting gets very slow. > Queries can take more then 2 min. If it only gets slow when you have multiple clients it sounds like your select speed is the issue, more than conflicting with your insert/deletes. > > I need some ideas how to improve performance in some orders of > magnitude. I already thought of a box with the whole database on a ram > disc. So really any idea is welcome. How much ram do you have in the system? It sounds like you only have 1 CPU, so there is a lot you can do to make the box scale. A dual Opteron (possibly a dual motherboard with dual core (but only fill one for now)), with 16GB of ram, and an 8-drive RAID10 system would perform quite a bit faster. How big is your database on disk? Obviously it isn't very large if you are thinking to hold everything in RAM (and only have 76GB of disk storage to put it in anyway). If your machine only has 512M, an easy solution would be to put in a bunch more memory. In general, your hardware is pretty low in overall specs. So if you are willing to throw money at the problem, there is a lot you can do. Alternatively, turn on statement logging, and then post the queries that are slow. This mailing list is pretty good at fixing poor queries. One thing you are probably hitting is a lot of sequential scans on the main table. If you are doing mostly inserting, make sure you are in a transaction, and think about doing a COPY. There is a lot more that can be said, we just need to have more information about what you want. John =:-> > > Ulrich > > >
Attachment
On Tue, 2005-08-16 at 17:39 +0200, Ulrich Wisser wrote: > Hello, > > one of our services is click counting for on line advertising. We do > this by importing Apache log files every five minutes. This results in a > lot of insert and delete statements. At the same time our customers > shall be able to do on line reporting. > > We have a box with > Linux Fedora Core 3, Postgres 7.4.2 > Intel(R) Pentium(R) 4 CPU 2.40GHz This is not a good CPU for this workload. Try an Opteron or Xeon. Also of major importance is the amount of memory. If possible, you would like to have memory larger than the size of your database. > 2 scsi 76GB disks (15.000RPM, 2ms) If you decide your application is I/O bound, here's an obvious place for improvement. More disks == faster. > I did put pg_xlog on another file system on other discs. Did that have a beneficial effect? > Still when several users are on line the reporting gets very slow. > Queries can take more then 2 min. Is this all the time or only during the insert? > I need some ideas how to improve performance in some orders of > magnitude. I already thought of a box with the whole database on a ram > disc. So really any idea is welcome. You don't need a RAM disk, just a lot of RAM. Your operating system will cache disk contents in memory if possible. You have a very small configuration, so more CPU, more memory, and especially more disks will probably all yield improvements.
Are you calculating aggregates, and if so, how are you doing it (I ask the question from experience of a similar application where I found that my aggregating PGPLSQL triggers were bogging the system down, and changed them so scheduled jobs instead). Alex Turner NetEconomist On 8/16/05, Ulrich Wisser <ulrich.wisser@relevanttraffic.se> wrote: > Hello, > > one of our services is click counting for on line advertising. We do > this by importing Apache log files every five minutes. This results in a > lot of insert and delete statements. At the same time our customers > shall be able to do on line reporting. > > We have a box with > Linux Fedora Core 3, Postgres 7.4.2 > Intel(R) Pentium(R) 4 CPU 2.40GHz > 2 scsi 76GB disks (15.000RPM, 2ms) > > I did put pg_xlog on another file system on other discs. > > Still when several users are on line the reporting gets very slow. > Queries can take more then 2 min. > > I need some ideas how to improve performance in some orders of > magnitude. I already thought of a box with the whole database on a ram > disc. So really any idea is welcome. > > Ulrich > > > > -- > Ulrich Wisser / System Developer > > RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden > Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769 > ________________________________________________________________ > http://www.relevanttraffic.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Tue, 16 Aug 2005, Ulrich Wisser wrote: > Still when several users are on line the reporting gets very slow. > Queries can take more then 2 min. Could you show an exampleof such a query and the output of EXPLAIN ANALYZE on that query (preferably done when the database is slow). It's hard to say what is wrong without more information. -- /Dennis Björklund
Hello, thanks for all your suggestions. I can see that the Linux system is 90% waiting for disc io. At that time all my queries are *very* slow. My scsi raid controller and disc are already the fastest available. The query plan uses indexes and "vacuum analyze" is run once a day. To avoid aggregating to many rows, I already made some aggregation tables which will be updated after the import from the Apache logfiles. That did help, but only to a certain level. I believe the biggest problem is disc io. Reports for very recent data are quite fast, these are used very often and therefor already in the cache. But reports can contain (and regulary do) very old data. In that case the whole system slows down. To me this sounds like the recent data is flushed out of the cache and now all data for all queries has to be fetched from disc. My machine has 2GB memory, please find postgresql.conf below. Ulrich #--------------------------------------------------------------------------- # RESOURCE USAGE (except WAL) #--------------------------------------------------------------------------- # - Memory - shared_buffers = 20000 # min 16, at least max_connections*2, sort_mem = 4096 # min 64, size in KB vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - max_fsm_pages = 50000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 3000 # min 100, ~50 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' #--------------------------------------------------------------------------- # WRITE AHEAD LOG #--------------------------------------------------------------------------- # - Settings - fsync = false # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: wal_buffers = 128 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 16 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000
Ulrich Wisser <ulrich.wisser@relevanttraffic.se> writes: > My machine has 2GB memory, please find postgresql.conf below. > max_fsm_pages = 50000 # min max_fsm_relations*16, 6 bytes each FWIW, that index I've been groveling through in connection with your other problem contains an astonishingly large amount of dead space --- almost 50%. I suspect that you need a much larger max_fsm_pages setting, and possibly more-frequent vacuuming, in order to keep a lid on the amount of wasted space. regards, tom lane
On Wed, 2005-08-17 at 11:15 +0200, Ulrich Wisser wrote: > Hello, > > thanks for all your suggestions. > > I can see that the Linux system is 90% waiting for disc io. At that time > all my queries are *very* slow. My scsi raid controller and disc are > already the fastest available. What RAID controller? Initially you said you have only 2 disks, and since you have your xlog on a separate spindle, I assume you have 1 disk for the xlog and 1 for the data. Even so, if you have a RAID, I'm going to further assume you are using RAID 1, since no sane person would use RAID 0. In those cases you are getting the performance of a single disk, which is never going to be very impressive. You need a RAID. Please be more precise when describing your system to this list. -jwb
Ulrich, > I believe the biggest problem is disc io. Reports for very recent data > are quite fast, these are used very often and therefor already in the > cache. But reports can contain (and regulary do) very old data. In that > case the whole system slows down. To me this sounds like the recent data > is flushed out of the cache and now all data for all queries has to be > fetched from disc. How large is the database on disk? > My machine has 2GB memory, please find postgresql.conf below. hmmmm ... effective_cache_size? random_page_cost? cpu_tuple_cost? etc. -- Josh Berkus Aglio Database Solutions San Francisco
At 05:15 AM 8/17/2005, Ulrich Wisser wrote: >Hello, > >thanks for all your suggestions. > >I can see that the Linux system is 90% waiting for disc io. A clear indication that you need to improve your HD IO subsystem. >At that time all my queries are *very* slow. To be more precise, your server performance at that point is essentially equal to your HD IO subsystem performance. > My scsi raid controller and disc are already the fastest available. Oh, REALLY? This is the description of the system you gave us: "We have a box with Linux Fedora Core 3, Postgres 7.4.2 Intel(R) Pentium(R) 4 CPU 2.40GHz 2 scsi 76GB disks (15.000RPM, 2ms)" The is far, Far, FAR from the "the fastest available" in terms of SW, OS, CPU host, _or_ HD subsystem. The "fastest available" means 1= you should be running 8.0.3 2= you should be running the latest stable 2.6 based kernel 3= you should be running an Opteron based server 4= Fibre Channel HDs are higher performance than SCSI ones. 5= (and this is the big one) YOU NEED MORE SPINDLES AND A HIGHER END RAID CONTROLLER. The absolute "top of the line" for RAID controllers is something based on Fibre Channel from Xyratex (who make the RAID engines for EMC and NetApps), Engino (the enterprise division of LSI Logic who sell mostly to IBM. Apple has a server based on an Engino card), dot-hill (who bought Chaparral among others). I suspect you can't afford them even if they would do business with you. The ante for a FC-based RAID subsystem in this class is in the ~$32K to ~$128K range, even if you buy direct from the actual RAID HW manufacturer rather than an OEM like In the retail commodity market, the current best RAID controllers are probably the 16 and 24 port versions of the Areca cards ( www.areca.us ). They come darn close to saturating the the Real World Peak Bandwidth of a 64b 133MHz PCI-X bus. I did put pg_xlog on another file system on other discs. > The query plan uses indexes and "vacuum analyze" is run once a day. That >To avoid aggregating to many rows, I already made some aggregation >tables which will be updated after the import from the Apache >logfiles. That did help, but only to a certain level. > >I believe the biggest problem is disc io. Reports for very recent >data are quite fast, these are used very often and therefor already >in the cache. But reports can contain (and regulary do) very old >data. In that case the whole system slows down. To me this sounds >like the recent data is flushed out of the cache and now all data >for all queries has to be fetched from disc. > >My machine has 2GB memory,
At 05:15 AM 8/17/2005, Ulrich Wisser wrote: >Hello, > >thanks for all your suggestions. > >I can see that the Linux system is 90% waiting for disc io. A clear indication that you need to improve your HD IO subsystem if possible. >At that time all my queries are *very* slow. To be more precise, your server performance at that point is essentially equal to your HD IO subsystem performance. > My scsi raid controller and disc are already the fastest available. Oh, REALLY? This is the description of the system you gave us: "We have a box with Linux Fedora Core 3, Postgres 7.4.2 Intel(R) Pentium(R) 4 CPU 2.40GHz 2 scsi 76GB disks (15.000RPM, 2ms)" The is far, Far, FAR from the "the fastest available" in terms of SW, OS, CPU host, _or_ HD subsystem. The "fastest available" means 1= you should be running PostgreSQL 8.0.3 2= you should be running the latest stable 2.6 based kernel 3= you should be running an Opteron based server 4= Fibre Channel HDs are slightly higher performance than SCSI ones. 5= (and this is the big one) YOU NEED MORE SPINDLES AND A HIGHER END RAID CONTROLLER. Your description of you workload was: "one of our services is click counting for on line advertising. We do this by importing Apache log files every five minutes. This results in a lot of insert and delete statements. At the same time our customers shall be able to do on line reporting." There are two issues here: 1= your primary usage is OLTP-like, but you are also expecting to do reports against the same schema that is supporting your OLTP-like usage. Bad Idea. Schemas that are optimized for reporting and other data mining like operation are pessimal for OLTP-like applications and vice versa. You need two schemas: one optimized for lots of inserts and deletes (OLTP-like), and one optimized for reporting (data-mining like). 2= 2 spindles, even 15K rpm spindles, is minuscule. Real enterprise class RAID subsystems have at least 10-20x that many spindles, usually split into 6-12 sets dedicated to different groups of tables in the DB. Putting xlog on its own dedicated spindles is just the first step. The absolute "top of the line" for RAID controllers is something based on Fibre Channel from Xyratex (who make the RAID engines for EMC and NetApps), Engino (the enterprise division of LSI Logic who sell mostly to IBM. Apple has a server based on an Engino card), or dot-hill (who bought Chaparral among others). I suspect you can't afford them even if they would do business with you. The ante for a FC-based RAID subsystem in this class is in the ~$32K to ~$128K range, even if you buy direct from the actual RAID HW manufacturer rather than an OEM like EMC, IBM, or NetApp who will 2x or 4x the price. OTOH, these subsystems will provide OLTP or OLTP-like DB apps with performance that is head-and-shoulders better than anything else to be found. Numbers like 50K-200K IOPS. You get what you pay for. In the retail commodity market where you are more realistically going to be buying, the current best RAID controllers are probably the Areca cards ( www.areca.us ). They come darn close to saturating the Real World Peak Bandwidth of a 64b 133MHz PCI-X bus and have better IOPS numbers than their commodity brethren. However, _none_ of the commodity RAID cards have IOPS numbers anywhere near as high as those mentioned above. >To avoid aggregating to many rows, I already made some aggregation >tables which will be updated after the import from the Apache >logfiles. That did help, but only to a certain level. > >I believe the biggest problem is disc io. Reports for very recent >data are quite fast, these are used very often and therefor already >in the cache. But reports can contain (and regulary do) very old >data. In that case the whole system slows down. To me this sounds >like the recent data is flushed out of the cache and now all data >for all queries has to be fetched from disc. I completely agree. Hopefully my above suggestions make sense and are of use to you. >My machine has 2GB memory, ...and while we are at it, OLTP like apps benefit less from RAM than data mining ones, but still 2GB of RAM is just not that much for a real DB server... Ron Peacetree
On 8/17/05, Ron <rjpeace@earthlink.net> wrote: > At 05:15 AM 8/17/2005, Ulrich Wisser wrote: > >Hello, > > > >thanks for all your suggestions. > > > >I can see that the Linux system is 90% waiting for disc io. ... > 1= your primary usage is OLTP-like, but you are also expecting to do > reports against the same schema that is supporting your OLTP-like > usage. Bad Idea. Schemas that are optimized for reporting and other > data mining like operation are pessimal for OLTP-like applications > and vice versa. You need two schemas: one optimized for lots of > inserts and deletes (OLTP-like), and one optimized for reporting > (data-mining like). Ulrich, If you meant that your disc/scsi system is already the fastest available *with your current budget* then following Ron's advise I quoted above will be a good step. I have some systems very similar to yours. What I do is import in batches and then immediately pre-process the batch data into tables optimized for quick queries. For example, if your reports frequenly need to find the total number of views per hour for each customer, create a table whose data contains just the totals for each customer for each hour of the day. This will make it a tiny fraction of the size, allowing it to fit largely in RAM for the query and making the indexes more efficient. This is a tricky job, but if you do it right, your company will be a big success and buy you more hardware to work with. Of course, they'll also ask you to create dozens of new reports, but that's par for the course. Even if you have the budget for more hardware, I feel that creating an effective db structure is a much more elegant solution than to throw more hardware. (I admit, sometimes its cheaper to throw more hardware) If you have particular queries that are too slow, posting the explain analyze for each on the list should garner some help. -- Matthew Nuzum www.bearfruit.org
> Ulrich Wisser wrote: > > > > one of our services is click counting for on line advertising. We do > > this by importing Apache log files every five minutes. This results in a > > lot of insert and delete statements. ... > If you are doing mostly inserting, make sure you are in a transaction, Well, yes, but you may need to make sure that a single transaction doesn't have too many inserts in it. I was having a performance problem when doing transactions with a huge number of inserts (tens of thousands), and I solved the problem by putting a simple counter in the loop (in the Java import code, that is) and doing a commit every 100 or so inserts. -Roger > John > > > Ulrich
>> Ulrich Wisser wrote: >> > >> > one of our services is click counting for on line advertising. We do >> > this by importing Apache log files every five minutes. This results in a >> > lot of insert and delete statements. > ... >> If you are doing mostly inserting, make sure you are in a transaction, > > Well, yes, but you may need to make sure that a single transaction > doesn't have too many inserts in it. I was having a performance > problem when doing transactions with a huge number of inserts (tens > of thousands), and I solved the problem by putting a simple counter > in the loop (in the Java import code, that is) and doing a commit > every 100 or so inserts. Are you sure that was an issue with PostgreSQL? I have certainly observed that issue with Oracle, but NOT with PostgreSQL. I have commonly done data loads where they loaded 50K rows at a time, the reason for COMMITting at that point being "programming paranoia" at the possibility that some data might fail to load and need to be retried, and I'd rather have less fail... It would seem more likely that the issue would be on the Java side; it might well be that the data being loaded might bloat JVM memory usage, and that the actions taken at COMMIT time might keep the size of the Java-side memory footprint down. -- (reverse (concatenate 'string "moc.liamg" "@" "enworbbc")) http://cbbrowne.com/info/ If we were meant to fly, we wouldn't keep losing our luggage.
RRS (http://rrs.decibel.org) might be of use in this case. On Tue, Aug 16, 2005 at 01:59:53PM -0400, Alex Turner wrote: > Are you calculating aggregates, and if so, how are you doing it (I ask > the question from experience of a similar application where I found > that my aggregating PGPLSQL triggers were bogging the system down, and > changed them so scheduled jobs instead). > > Alex Turner > NetEconomist > > On 8/16/05, Ulrich Wisser <ulrich.wisser@relevanttraffic.se> wrote: > > Hello, > > > > one of our services is click counting for on line advertising. We do > > this by importing Apache log files every five minutes. This results in a > > lot of insert and delete statements. At the same time our customers > > shall be able to do on line reporting. > > > > We have a box with > > Linux Fedora Core 3, Postgres 7.4.2 > > Intel(R) Pentium(R) 4 CPU 2.40GHz > > 2 scsi 76GB disks (15.000RPM, 2ms) > > > > I did put pg_xlog on another file system on other discs. > > > > Still when several users are on line the reporting gets very slow. > > Queries can take more then 2 min. > > > > I need some ideas how to improve performance in some orders of > > magnitude. I already thought of a box with the whole database on a ram > > disc. So really any idea is welcome. > > > > Ulrich > > > > > > > > -- > > Ulrich Wisser / System Developer > > > > RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden > > Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769 > > ________________________________________________________________ > > http://www.relevanttraffic.com > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com 512-569-9461
Hello, I realize I need to be much more specific. Here is a more detailed description of my hardware and system design. Pentium 4 2.4GHz Memory 4x DIMM DDR 1GB PC3200 400MHZ CAS3, KVR Motherboard chipset 'I865G', two IDE channels on board 2x SEAGATE BARRACUDA 7200.7 80GB 7200RPM ATA/100 (software raid 1, system, swap, pg_xlog) ADAPTEC SCSI RAID 2100S ULTRA160 32MB 1-CHANNEL 2x SEAGATE CHEETAH 15K.3 73GB ULTRA320 68-PIN WIDE (raid 1, /var/lib/pgsql) Database size on disc is 22GB. (without pg_xlog) Please find my postgresql.conf below. Putting pg_xlog on the IDE drives gave about 10% performance improvement. Would faster disks give more performance? What my application does: Every five minutes a new logfile will be imported. Depending on the source of the request it will be imported in one of three "raw click" tables. (data from two months back, to be able to verify customer complains) For reporting I have a set of tables. These contain data from the last two years. My app deletes all entries from today and reinserts updated data calculated from the raw data tables. The queries contain no joins only aggregates. I have several indexes to speed different kinds of queries. My problems occur when one users does a report that contains to much old data. In that case all cache mechanisms will fail and disc io is the limiting factor. If one query contains so much data, that a full table scan is needed, I do not care if it takes two minutes to answer. But all other queries with less data (at the same time) still have to be fast. I can not stop users doing that kind of reporting. :( I need more speed in orders of magnitude. Will more disks / more memory do that trick? Money is of course a limiting factor but it doesn't have to be real cheap. Ulrich # ----------------------------- # PostgreSQL configuration file # ----------------------------- #--------------------------------------------------------------------------- # CONNECTIONS AND AUTHENTICATION #--------------------------------------------------------------------------- # - Connection Settings - tcpip_socket = true max_connections = 100 # note: increasing max_connections costs about 500 bytes of shared # memory per connection slot, in addition to costs from shared_buffers # and max_locks_per_transaction. #superuser_reserved_connections = 2 #port = 5432 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #virtual_host = '' # what interface to listen on; defaults to any #rendezvous_name = '' # defaults to the computer name # - Security & Authentication - #authentication_timeout = 60 # 1-600, in seconds #ssl = false #password_encryption = true #krb_server_keyfile = '' #db_user_namespace = false #--------------------------------------------------------------------------- # RESOURCE USAGE (except WAL) #--------------------------------------------------------------------------- # - Memory - shared_buffers = 20000 # min 16, at least max_connections*2, 8KB each sort_mem = 4096 # min 64, size in KB vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - max_fsm_pages = 200000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 10000 # min 100, ~50 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' #--------------------------------------------------------------------------- # WRITE AHEAD LOG #--------------------------------------------------------------------------- # - Settings - fsync = false # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 128 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 16 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 #--------------------------------------------------------------------------- # QUERY TUNING #--------------------------------------------------------------------------- # - Planner Method Enabling - #enable_hashagg = true #enable_hashjoin = true #enable_indexscan = true #enable_mergejoin = true #enable_nestloop = true #enable_seqscan = true #enable_sort = true #enable_tidscan = true # - Planner Cost Constants - #effective_cache_size = 1000 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = true #geqo_threshold = 11 #geqo_effort = 1 #geqo_generations = 0 #geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs #--------------------------------------------------------------------------- # ERROR REPORTING AND LOGGING #--------------------------------------------------------------------------- # - Syslog - syslog = 2 # range 0-2; 0=stdout; 1=both; 2=syslog syslog_facility = 'LOCAL0' syslog_ident = 'postgres' # - When to Log - client_min_messages = info # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, info, notice, warning, error log_min_messages = info # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic log_error_verbosity = verbose # terse, default, or verbose messages log_min_error_statement = info # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) log_min_duration_statement = 1000 # Log all statements whose # execution time exceeds the value, in # milliseconds. Zero prints all queries. # Minus-one disables. silent_mode = false # DO NOT USE without Syslog! # - What to Log - #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false log_connections = true #log_duration = false #log_pid = false #log_statement = false #log_timestamp = false #log_hostname = false #log_source_port = false #--------------------------------------------------------------------------- # RUNTIME STATISTICS #--------------------------------------------------------------------------- # - Statistics Monitoring - #log_parser_stats = false #log_planner_stats = false #log_executor_stats = false #log_statement_stats = false # - Query/Index Statistics Collector - #stats_start_collector = true #stats_command_string = false #stats_block_level = false #stats_row_level = false #stats_reset_on_server_start = true #--------------------------------------------------------------------------- # CLIENT CONNECTION DEFAULTS #--------------------------------------------------------------------------- # - Statement Behavior - #search_path = '$user,public' # schema names #check_function_bodies = true #default_transaction_isolation = 'read committed' #default_transaction_read_only = false #statement_timeout = 0 # 0 is disabled, in milliseconds # - Locale and Formatting - #datestyle = 'iso, mdy' #timezone = unknown # actually, defaults to TZ environment setting #australian_timezones = false #extra_float_digits = 0 # min -15, max 2 #client_encoding = sql_ascii # actually, defaults to database encoding # These settings are initialized by initdb -- they may be changed lc_messages = 'en_US' # locale for system error message strings lc_monetary = 'en_US' # locale for monetary formatting lc_numeric = 'en_US' # locale for number formatting lc_time = 'en_US' # locale for time formatting # - Other Defaults - #explain_pretty_print = true #dynamic_library_path = '$libdir' #max_expr_depth = 10000 # min 10 #--------------------------------------------------------------------------- # LOCK MANAGEMENT #--------------------------------------------------------------------------- #deadlock_timeout = 1000 # in milliseconds #max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each #--------------------------------------------------------------------------- # VERSION/PLATFORM COMPATIBILITY #--------------------------------------------------------------------------- # - Previous Postgres Versions - #add_missing_from = true #regex_flavor = advanced # advanced, extended, or basic #sql_inheritance = true # - Other Platforms & Clients - #transform_null_equals = false
On Thu, 25 Aug 2005 09:10:37 +0200 Ulrich Wisser <ulrich.wisser@relevanttraffic.se> wrote: > Pentium 4 2.4GHz > Memory 4x DIMM DDR 1GB PC3200 400MHZ CAS3, KVR > Motherboard chipset 'I865G', two IDE channels on board > 2x SEAGATE BARRACUDA 7200.7 80GB 7200RPM ATA/100 > (software raid 1, system, swap, pg_xlog) > ADAPTEC SCSI RAID 2100S ULTRA160 32MB 1-CHANNEL > 2x SEAGATE CHEETAH 15K.3 73GB ULTRA320 68-PIN WIDE > (raid 1, /var/lib/pgsql) > > Database size on disc is 22GB. (without pg_xlog) > > Please find my postgresql.conf below. > > Putting pg_xlog on the IDE drives gave about 10% performance > improvement. Would faster disks give more performance? Faster as in RPM on your pg_xlog partition probably won't make much of a difference. However, if you can get a drive with better overall write performance then it would be a benefit. Another thing to consider on this setup is whether or not you're hitting swap often and/or logging to that same IDE RAID set. For optimal insertion benefit you want the heads of your disks to essentially be only used for pg_xlog. If you're having to jump around the disk in the following manner: write to pg_xlog read from swap write syslog data write to pg_xlog ... ... You probably aren't getting anywhere near the benefit you could. One thing you could easily try is to break your IDE RAID set and put OS/swap on one disk and pg_xlog on the other. > If one query contains so much data, that a full table scan is needed, > I do not care if it takes two minutes to answer. But all other > queries with less data (at the same time) still have to be fast. > > I can not stop users doing that kind of reporting. :( > > I need more speed in orders of magnitude. Will more disks / more > memory do that trick? More disk and more memory always helps out. Since you say these queries are mostly on not-often-used data I would lean toward more disks in your SCSI RAID-1 setup than maxing out available RAM based on the size of your database. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
At 03:10 AM 8/25/2005, Ulrich Wisser wrote: >I realize I need to be much more specific. Here is a more detailed >description of my hardware and system design. > > >Pentium 4 2.4GHz >Memory 4x DIMM DDR 1GB PC3200 400MHZ CAS3, KVR >Motherboard chipset 'I865G', two IDE channels on board First suggestion: Get better server HW. AMD Opteron based dual processor board is the current best in terms of price/performance ratio, _particularly_ for DB applications like the one you have described. Such mainboards cost ~$400-$500. RAM will cost about $75-$150/GB. Opteron 2xx are ~$200-$700 apiece. So a 2P AMD system can be had for as little as ~$850 + the cost of the RAM you need. In the worst case where you need 24GB of RAM (~$3600), the total comes in at ~$4450. As you can see from the numbers, buying only what RAM you actually need can save you a great deal on money. Given what little you said about how much of your DB is frequently accessed, I'd suggest buying a server based around the 2P 16 DIMM slot IWill DK88 mainboard (Tyan has announced a 16 DIMM slot mainboard, but I do not think it is actually being sold yet.). Then fill it with the minimum amount of RAM that will allow the "working set" of the DB to be cached in RAM. In the worst case where DB access is essentially uniform and essentially random, you will need 24GB of RAM to hold the 22GB DB + OS + etc. That worst case is _rare_. Usually DB's have a working set that is smaller than the entire DB. You want to keep that working set in RAM. If you can't identify the working set, buy enough RAM to hold the entire DB. In particular, you want to make sure that any frequently accessed read only tables or indexes are kept in RAM. The "read only" part is very important. Tables (and their indexes) that are frequently written to _have_ to access HD. Therefore you get much less out of having them in RAM. Read only tables and their indexes can be loaded into tmpfs at boot time thereby keeping out of the way of the file system buffer cache. tmpfs does not save data if the host goes down so it is very important that you ONLY use this trick with read only tables. The other half of the trick is to make sure that the file system buffer cache does _not_ cache whatever you have loaded into tmpfs. >2x SEAGATE BARRACUDA 7200.7 80GB 7200RPM ATA/100 >(software raid 1, system, swap, pg_xlog) >ADAPTEC SCSI RAID 2100S ULTRA160 32MB 1-CHANNEL >2x SEAGATE CHEETAH 15K.3 73GB ULTRA320 68-PIN WIDE >(raid 1, /var/lib/pgsql) Second suggestion: you need a MUCH better IO subsystem. In fact, given that you have described this system as being primarily OLTP like, this is more important that the above server HW. Best would be to upgrade everything, but if you are strapped for cash, upgrade the IO subsystem first. You need many more spindles and a decent RAID card or cards. You want 15Krpm (best) or 10Krpm HDs. As long as all of the HD's are at least 10Krpm, more spindles is more important than faster spindles. If it's a choice between more 10Krpm discs or fewer 15Krpm discs, buy the 10Krpm discs. Get the spindle count as high as you RAID cards can handle. Whatever RAID cards you get should have as much battery backed write buffer as possible. In the commodity market, presently the highest performance RAID cards I know of, and the ones that support the largest battery backed write buffer, are made by Areca. >Database size on disc is 22GB. (without pg_xlog) Find out what the working set, ie the most frequently accessed portion, of this 22GB is and you will know how much RAM is worth having. 4GB is definitely too little! >Please find my postgresql.conf below. Third suggestion: make sure you are running a 2.6 based kernel and at least PG 8.0.3. Helping beta test PG 8.1 might be an option for you as well. >Putting pg_xlog on the IDE drives gave about 10% performance >improvement. Would faster disks give more performance? > >What my application does: > >Every five minutes a new logfile will be imported. Depending on the >source of the request it will be imported in one of three "raw click" >tables. (data from two months back, to be able to verify customer >complains) For reporting I have a set of tables. These contain data >from the last two years. My app deletes all entries from today and >reinserts updated data calculated from the raw data tables. The raw data tables seem to be read only? If so, you should buy enough RAM to load them into tmpfs at boot time and have them be completely RAM resident in addition to having enough RAM for the OS to cache an appropriate amount of the rest of the DB. >The queries contain no joins only aggregates. I have several indexes >to speed different kinds of queries. > >My problems occur when one users does a report that contains too >much old data. In that case all cache mechanisms will fail and disc >io is the limiting factor. > >If one query contains so much data, that a full table scan is >needed, I do not care if it takes two minutes to answer. But all >other queries with less data (at the same time) still have to be fast. HDs can only do one thing at once. If they are in the middle of a full table scan, everything else that requires HD access is going to wait until it is done. At some point, looking at your DB schema and queries will be worth it for optimization purposes. Right now, you HW is so underpowered compared to the demands you are placing on it that there's little point to SW tuning. >I can not stop users doing that kind of reporting. :( > >I need more speed in orders of magnitude. Will more disks / more >memory do that trick? If you do the right things with them ;) >Money is of course a limiting factor but it doesn't have to be real cheap. > >Ulrich > > > > > ># ----------------------------- ># PostgreSQL configuration file ># ----------------------------- >#--------------------------------------------------------------------------- ># CONNECTIONS AND AUTHENTICATION >#--------------------------------------------------------------------------- > ># - Connection Settings - > >tcpip_socket = true >max_connections = 100 > # note: increasing max_connections costs about 500 bytes of shared > # memory per connection slot, in addition to costs from > shared_buffers > # and max_locks_per_transaction. >#superuser_reserved_connections = 2 >#port = 5432 >#unix_socket_directory = '' >#unix_socket_group = '' >#unix_socket_permissions = 0777 # octal >#virtual_host = '' # what interface to listen on; defaults to any >#rendezvous_name = '' # defaults to the computer name > ># - Security & Authentication - > >#authentication_timeout = 60 # 1-600, in seconds >#ssl = false >#password_encryption = true >#krb_server_keyfile = '' >#db_user_namespace = false > > >#--------------------------------------------------------------------------- ># RESOURCE USAGE (except WAL) >#--------------------------------------------------------------------------- > ># - Memory - > >shared_buffers = 20000 # min 16, at least max_connections*2, 8KB each >sort_mem = 4096 # min 64, size in KB 4MB seems small. Find out how much memory you usually need for a sort, and how many sorts you are usually doing at once to set this to a sane size. >vacuum_mem = 8192 # min 1024, size in KB > ># - Free Space Map - > >max_fsm_pages = 200000 # min max_fsm_relations*16, 6 bytes each >max_fsm_relations = 10000 # min 100, ~50 bytes each > ># - Kernel Resource Usage - > >#max_files_per_process = 1000 # min 25 >#preload_libraries = '' > > >#--------------------------------------------------------------------------- ># WRITE AHEAD LOG >#--------------------------------------------------------------------------- > ># - Settings - > >fsync = false # turns forced synchronization on or off >#wal_sync_method = fsync # the default varies across platforms: > # fsync, fdatasync, open_sync, or I hope you have a battery backed write buffer! >open_datasync >wal_buffers = 128 # min 4, 8KB each There might be a better value for you to use. I'll hold off on looking at the rest of this... ># - Checkpoints - > >checkpoint_segments = 16 # in logfile segments, min 1, 16MB each >#checkpoint_timeout = 300 # range 30-3600, in seconds >#checkpoint_warning = 30 # 0 is off, in seconds >#commit_delay = 0 # range 0-100000, in microseconds >#commit_siblings = 5 # range 1-1000 > > >#--------------------------------------------------------------------------- ># QUERY TUNING >#--------------------------------------------------------------------------- > ># - Planner Method Enabling - > >#enable_hashagg = true >#enable_hashjoin = true >#enable_indexscan = true >#enable_mergejoin = true >#enable_nestloop = true >#enable_seqscan = true >#enable_sort = true >#enable_tidscan = true > ># - Planner Cost Constants - > >#effective_cache_size = 1000 # typically 8KB each >#random_page_cost = 4 # units are one sequential page fetch cost >#cpu_tuple_cost = 0.01 # (same) >#cpu_index_tuple_cost = 0.001 # (same) >#cpu_operator_cost = 0.0025 # (same) > ># - Genetic Query Optimizer - > >#geqo = true >#geqo_threshold = 11 >#geqo_effort = 1 >#geqo_generations = 0 >#geqo_pool_size = 0 # default based on tables in statement, > # range 128-1024 >#geqo_selection_bias = 2.0 # range 1.5-2.0 > ># - Other Planner Options - > >#default_statistics_target = 10 # range 1-1000 >#from_collapse_limit = 8 >#join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs > > >#--------------------------------------------------------------------------- ># ERROR REPORTING AND LOGGING >#--------------------------------------------------------------------------- > ># - Syslog - > >syslog = 2 # range 0-2; 0=stdout; 1=both; 2=syslog >syslog_facility = 'LOCAL0' >syslog_ident = 'postgres' > ># - When to Log - > >client_min_messages = info # Values, in order of decreasing detail: > # debug5, debug4, debug3, debug2, debug1, > # log, info, notice, warning, error > >log_min_messages = info # Values, in order of decreasing detail: > # debug5, debug4, debug3, debug2, debug1, > # info, notice, warning, error, log, >fatal, > # panic > >log_error_verbosity = verbose # terse, default, or verbose messages > >log_min_error_statement = info # Values in order of increasing severity: > # debug5, debug4, debug3, debug2, >debug1, > # info, notice, warning, error, >panic(off) > >log_min_duration_statement = 1000 # Log all statements whose > # execution time exceeds the value, in > # milliseconds. Zero prints all queries. > # Minus-one disables. > >silent_mode = false # DO NOT USE without Syslog! > ># - What to Log - > >#debug_print_parse = false >#debug_print_rewritten = false >#debug_print_plan = false >#debug_pretty_print = false >log_connections = true >#log_duration = false >#log_pid = false >#log_statement = false >#log_timestamp = false >#log_hostname = false >#log_source_port = false > > >#--------------------------------------------------------------------------- ># RUNTIME STATISTICS >#--------------------------------------------------------------------------- > ># - Statistics Monitoring - > >#log_parser_stats = false >#log_planner_stats = false >#log_executor_stats = false >#log_statement_stats = false > ># - Query/Index Statistics Collector - > >#stats_start_collector = true >#stats_command_string = false >#stats_block_level = false >#stats_row_level = false >#stats_reset_on_server_start = true > > >#--------------------------------------------------------------------------- ># CLIENT CONNECTION DEFAULTS >#--------------------------------------------------------------------------- > ># - Statement Behavior - > >#search_path = '$user,public' # schema names >#check_function_bodies = true >#default_transaction_isolation = 'read committed' >#default_transaction_read_only = false >#statement_timeout = 0 # 0 is disabled, in milliseconds > ># - Locale and Formatting - > >#datestyle = 'iso, mdy' >#timezone = unknown # actually, defaults to TZ environment >setting >#australian_timezones = false >#extra_float_digits = 0 # min -15, max 2 >#client_encoding = sql_ascii # actually, defaults to database encoding > ># These settings are initialized by initdb -- they may be changed >lc_messages = 'en_US' # locale for system error message strings >lc_monetary = 'en_US' # locale for monetary formatting >lc_numeric = 'en_US' # locale for number formatting >lc_time = 'en_US' # locale for time formatting > ># - Other Defaults - > >#explain_pretty_print = true >#dynamic_library_path = '$libdir' >#max_expr_depth = 10000 # min 10 > > >#--------------------------------------------------------------------------- ># LOCK MANAGEMENT >#--------------------------------------------------------------------------- > >#deadlock_timeout = 1000 # in milliseconds >#max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each > > >#--------------------------------------------------------------------------- ># VERSION/PLATFORM COMPATIBILITY >#--------------------------------------------------------------------------- > ># - Previous Postgres Versions - > >#add_missing_from = true >#regex_flavor = advanced # advanced, extended, or basic >#sql_inheritance = true > ># - Other Platforms & Clients - > >#transform_null_equals = false > > > >---------------------------(end of broadcast)--------------------------- >TIP 2: Don't 'kill -9' the postmaster
On Thu, 2005-08-25 at 11:16 -0400, Ron wrote: > ># - Settings - > > > >fsync = false # turns forced synchronization on or off > >#wal_sync_method = fsync # the default varies across platforms: > > # fsync, fdatasync, open_sync, or > > I hope you have a battery backed write buffer! Battery backed write buffer will do nothing here, because the OS is taking it's sweet time flushing to the controller's battery backed write buffer! Isn't the reason for batter backed controller cache to make fsync()s fast? -K
I have found that while the OS may flush to the controller fast with fsync=true, the controller does as it pleases (it has BBU, so I'm not too worried), so you get great performance because your controller is determine read/write sequence outside of what is being demanded by an fsync.
Alex Turner
NetEconomist
Alex Turner
NetEconomist
On 8/25/05, Kelly Burkhart <kelly@tradebotsystems.com> wrote:
On Thu, 2005-08-25 at 11:16 -0400, Ron wrote:
> ># - Settings -
> >
> >fsync = false # turns forced synchronization on or off
> >#wal_sync_method = fsync # the default varies across platforms:
> > # fsync, fdatasync, open_sync, or
>
> I hope you have a battery backed write buffer!
Battery backed write buffer will do nothing here, because the OS is
taking it's sweet time flushing to the controller's battery backed write
buffer!
Isn't the reason for batter backed controller cache to make fsync()s
fast?
-K
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match