Thread: Re: University Masters Project
[Charset iso-8859-1 unsupported, filtering to ASCII...] > Bruce, > > The replacement of the existing client/server communication project with > CORBA looks very interesting, I would love to get involved with something > like that. Is there anyone working on it at the moment? What area of it > would you like me to look into, any ideas of how I could turn a project like > that into a good Thesis? If you can give me some pointers I'll go and speak > to my tutor about it all. [CC'ing to hackers for comments.] Well, one idea is to create a server that listens on a certain port for CORBA requests, sends them to a backend for processing, and returns the result. The other idea is to replace our current communication system that uses single-character flags and data with a corba model. See developers documentation for deals on that. I think the first on is clearly good, the second may suffer from performance problems, or it may not be worth changing all our interfaces to handle a new protocol. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 2 Sep 1999, Bruce Momjian wrote: > [Charset iso-8859-1 unsupported, filtering to ASCII...] > > Bruce, > > > > The replacement of the existing client/server communication project with > > CORBA looks very interesting, I would love to get involved with something > > like that. Is there anyone working on it at the moment? What area of it > > would you like me to look into, any ideas of how I could turn a project like > > that into a good Thesis? If you can give me some pointers I'll go and speak > > to my tutor about it all. > > > [CC'ing to hackers for comments.] > > Well, one idea is to create a server that listens on a certain port for > CORBA requests, sends them to a backend for processing, and returns the > result. > > The other idea is to replace our current communication system that uses > single-character flags and data with a corba model. See developers > documentation for deals on that. > > I think the first on is clearly good, the second may suffer from > performance problems, or it may not be worth changing all our interfaces > to handle a new protocol. I'm curious as to whether there is a way of testing that without too much trouble? Even the investigation of *that* might make for the thesis in itself? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Hello, I am using Postgres extensively for a number of projects. I am extremely happy with its performance and flexibility. I am trying to optimize the system, currently I run the postmaster with the following setting: postmaster -i -B 2048 -o '-S 2048' I have a couple of large(?) tables which I would like to keep them in memory (cached) so that searches are performed as fast as possible. Is it possible to 'pin' the tables and it's indexes in memory? Are there any other options/values which would yield better performance? Thanks, -Edwin S. Ramirez-
> I have a couple of large(?) tables which I would like to keep them in > memory (cached) so that searches are performed as fast as possible. > Is it possible to 'pin' the tables and it's indexes in memory? Not explicitly. We rely on the OS to do that. > Are there any other options/values which would yield better performance? By default, the backend "fsyncs" for every query. You can disable this, which would then allow the tables to hang around in memory until the OS decides to flush to disk. Not everyone should do this, since there is a (small) risk that if your computer crashes after some updates but before things are flushed then the db might become inconsistant. afaik we have never had an unambiguous report that this has actually happened (but others might remember differently). There is already that risk to some extent, but instead of the window being O(1sec) it becomes O(30sec). Run the backend by adding '-o -F' (or just '-F' to your existing list of "-o" options). - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Edwin Ramirez <ramirez@doc.mssm.edu> writes: > I have a couple of large(?) tables which I would like to keep them in > memory (cached) so that searches are performed as fast as possible. > Is it possible to 'pin' the tables and it's indexes in memory? If the tables are being touched often, then they will stay in buffer cache of their own accord. I doubt that pinning them would improve performance --- if they do get swapped out it'd be because some other table(s) need to be accessed now, and if you did have these tables pinned you'd be taking a large hit in access performance for those other tables because of inadequate buffer space. LRU buffering policy really works pretty well, so I don't think you need to worry about it. > currently I run the postmaster with the following setting: > postmaster -i -B 2048 -o '-S 2048' > Are there any other options/values which would yield better performance? If you have a reliable OS and power source, consider -o -F (no fsync). This usually makes for a very substantial performance improvement, and it can only hurt if your machine goes down without having performed all the writes the kernel was told to do. regards, tom lane
Dear All, Yes I agree with you that something like that might make a thesis in itself, and definitely sounds interesting. I really need to sit down and go through PostgreSQL so that I understand how it all works, so that I can ask questions without wasting everyone's time, as I'm sure a lot of the questions I currently have will be in the documentation. I start Uni in 4 weeks time, which by then I hope to have the basics to PostgreSQL and its architecture, that along with guidance from my tutor should then give me a good base to start the project on. I'll keep you all informed of my progress with this over the next few weeks, and my University's response to my request to work on a project of this nature. Who should I direct my correspondance to, as I don't want to start filling up people's email box's with unessecary email. Regards Mark Proctor Brunel University Email : M.Proctor@bigfoot.com ICQ : 8106598 -----Original Message----- From: The Hermit Hacker [mailto:scrappy@hub.org] Sent: Friday, September 03, 1999 12:19 PM To: Bruce Momjian Cc: mark@polar-digital.com; PostgreSQL-development Subject: Re: [HACKERS] Re: University Masters Project On Thu, 2 Sep 1999, Bruce Momjian wrote: > [Charset iso-8859-1 unsupported, filtering to ASCII...] > > Bruce, > > > > The replacement of the existing client/server communication project with > > CORBA looks very interesting, I would love to get involved with something > > like that. Is there anyone working on it at the moment? What area of it > > would you like me to look into, any ideas of how I could turn a project like > > that into a good Thesis? If you can give me some pointers I'll go and speak > > to my tutor about it all. > > > [CC'ing to hackers for comments.] > > Well, one idea is to create a server that listens on a certain port for > CORBA requests, sends them to a backend for processing, and returns the > result. > > The other idea is to replace our current communication system that uses > single-character flags and data with a corba model. See developers > documentation for deals on that. > > I think the first on is clearly good, the second may suffer from > performance problems, or it may not be worth changing all our interfaces > to handle a new protocol. I'm curious as to whether there is a way of testing that without too much trouble? Even the investigation of *that* might make for the thesis in itself? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
[Charset iso-8859-1 unsupported, filtering to ASCII...] > Dear All, > > Yes I agree with you that something like that might make a thesis in itself, > and definitely sounds interesting. > > I really need to sit down and go through PostgreSQL so that I understand how > it all works, so that I can ask questions without wasting everyone's time, > as I'm sure a lot of the questions I currently have will be in the > documentation. I start Uni in 4 weeks time, which by then I hope to have the > basics to PostgreSQL and its architecture, that along with guidance from my > tutor should then give me a good base to start the project on. > > I'll keep you all informed of my progress with this over the next few weeks, > and my University's response to my request to work on a project of this > nature. > > Who should I direct my correspondance to, as I don't want to start filling > up people's email box's with unessecary email. Hackers list is fine. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > there is a (small) risk that if your computer crashes after some > updates but before things are flushed then the db might become > inconsistant. afaik we have never had an unambiguous report that this > has actually happened (but others might remember differently). There > is already that risk to some extent, but instead of the window being > O(1sec) it becomes O(30sec). I believe we use fsync not so much to reduce the time window where you could lose a supposedly-committed update as to ensure that writes are performed in a known order. With fsync enabled, the data-file pages touched by an update query will hit the disk before the pg_log entry saying the transaction is committed hits the disk. If you crash somewhere during that sequence, the transaction appears uncommitted and there is no loss of consistency. (We assume here that writing a single page to disk is an atomic operation, which is only sort-of true, but it's the best we can do atop a Unix kernel. Other than that, there is no "window" for possible inconsistency.) Without fsync, the kernel writes the pages to disk in whatever order it finds convenient, so following a crash there might be a pg_log entry saying transaction N was committed, when in fact only some of transaction N's tuples made it to disk. Then you see an inconsistent database: some of the transaction's updates are there, some are not. This might be relatively harmless, or deadly, depending on your application logic and just what the missing updates are. Another risk without fsync is that a client application might have been told that the transaction was committed, when in fact it gets lost due to a crash moments later before pg_log gets physically updated. Again, the possible consequences would depend on your application. The total number of writes performed without fsync is usually way less than with, since we tend to write certain pages (esp. pg_log) over and over --- the kernel will reduce that to one physical disk write every sync interval (~ 30sec) unless we force its hand with fsync. That's where most of the performance improvement comes from. If you have a reliable kernel and reliable hardware/power supply, then you might as well turn off fsync. A crash in Postgres itself would not cause a problem --- the writes are out there in the kernel's disk buffers, and the only issue is do you trust the platform to get the data onto stable storage. regards, tom lane
Hello, I am using Postgres extensively for a number of projects. I am extremely happy with its performance and flexibility. I am trying to optimize the system, currently I run the postmaster with the following setting: postmaster -i -B 2048 -o '-S 2048' I have a couple of large(?) tables which I would like to keep them in memory (cached) so that searches are performed as fast as possible. Is it possible to 'pin' the tables and it's indexes in memory? Are there any other options/values which would yield better performance? Thanks, -Edwin S. Ramirez-
If I do a large search the first time is about three times slower than any subsequent overlapping (same data) searches. I would like to always get the higher performance. How are the buffers that I specify to the postmaster used? Will increasing this number improve things? The issue that I am encountering is that no matter how much memory I have on a computer, the performance is not improving. I am willing to fund a project to implement a postgres specific, user configurable cache. Any ideas? -Edwin S. Ramirez- Tom Lane wrote: > > Edwin Ramirez <ramirez@doc.mssm.edu> writes: > > I have a couple of large(?) tables which I would like to keep them in > > memory (cached) so that searches are performed as fast as possible. > > Is it possible to 'pin' the tables and it's indexes in memory? > > If the tables are being touched often, then they will stay in buffer > cache of their own accord. I doubt that pinning them would improve > performance --- if they do get swapped out it'd be because some other > table(s) need to be accessed now, and if you did have these tables > pinned you'd be taking a large hit in access performance for those other > tables because of inadequate buffer space. LRU buffering policy really > works pretty well, so I don't think you need to worry about it. > > > currently I run the postmaster with the following setting: > > postmaster -i -B 2048 -o '-S 2048' > > Are there any other options/values which would yield better performance? > > If you have a reliable OS and power source, consider -o -F (no fsync). > This usually makes for a very substantial performance improvement, and > it can only hurt if your machine goes down without having performed > all the writes the kernel was told to do. > > regards, tom lane > > ************
> > If I do a large search the first time is about three times slower than > any subsequent overlapping (same data) searches. I would like to always > get the higher performance. > > How are the buffers that I specify to the postmaster used? > Will increasing this number improve things? > > The issue that I am encountering is that no matter how much memory I > have on a computer, the performance is not improving. I am willing to > fund a project to implement a postgres specific, user configurable > cache. > > Any ideas? > -Edwin S. Ramirez- I think that the fact you are seeing an improvement already shows a good level of caching. What happens the first time is that it must read the data off the disc. After that the data comes from memory IF it is cached. Disc read will always be slower with current disc technology. I would imagine (Im not an expert, but through observation) that if you drasticly increase the number of shared memory buffers, then when you startup your front-end simply do a select * from the tables, it may even keep them all in memory from the start. M Simms
Michael Simms <grim@argh.demon.co.uk> writes: >> If I do a large search the first time is about three times slower than >> any subsequent overlapping (same data) searches. I would like to always >> get the higher performance. > What happens the first time is that it must read the data off the disc. After > that the data comes from memory IF it is cached. Disc read will always be > slower with current disc technology. There is that effect, but I suspect Edwin may also be seeing another effect. When a tuple is first inserted or modified, it is written into the table with a marker saying (in effect) "Inserted by transaction NNN, not committed yet". To find out whether the tuple is really any good, you have to go and consult pg_log to see if that transaction got committed. Obviously, that's slow, so the first subsequent transaction that does so and finds that NNN really did get committed will rewrite the disk page with the tuple's state changed to "Known committed". So, the first select after an update transaction will spend additional cycles checking pg_log and marking committed tuples. In effect, it's doing the last phase of the update. We could instead force the update to do all its own housekeeping, but the overall result wouldn't be any faster; probably it'd be slower. > I would imagine (Im not an expert, but through observation) that if > you drasticly increase the number of shared memory buffers, then when > you startup your front-end simply do a select * from the tables, it > may even keep them all in memory from the start. The default buffer space (64 disk pages) is not very large --- use a larger -B setting if you have the memory to spare. regards, tom lane
I believe that disk pages are 1k in linux systems, that would mean that I am allocating 3M when using "postmaster -i -B 3096 -o -S 2048" and 2M for sorting. That is very low. However, some of the postgres processes have memory segments larger than 3M (see bottom). > I would imagine (Im not an expert, but through observation) that if > you drasticly increase the number of shared memory buffers, then when > you startup your front-end simply do a select * from the tables, it > may even keep them all in memory from the start. That's basically what I tried to do, but I am unable to specify a very large number (it complained when I tried -B > ~3900). Do these buffer contain the actual table data? I understand that the OS is buffering the data read from disk, but postgres is competing with all the other processes on the system. I think that if postgres had a dedicated (user configurable) cache, like Oracle, then users could configure the system/postgres better. 4:29pm up 83 days, 23:42, 5 users, load average: 0.00, 0.01, 0.00 75 processes: 74 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 0.1% user, 1.1% system, 0.0% nice, 98.7% idle Mem: 128216K av, 98812K used, 29404K free, 67064K shrd, 18536K buff Swap: 80288K av, 22208K used, 58080K free 14924K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 16633 postgres 0 0 26536 1384 1284 S 0 0.0 1.0 0:02 postmaster 18190 postgres 0 0 27708 3432 2720 S 0 0.0 2.6 0:00 postmaster 18303 postgres 0 0 27444 2728 2196 S 0 0.0 2.1 0:00 postmaster 18991 postgres 0 0 27472 2908 2392 S 0 0.0 2.2 0:00 postmaster 19154 postgres 0 0 27408 2644 2140 S 0 0.0 2.0 0:06 postmaster 19155 postgres 0 0 27428 2712 2188 S 0 0.0 2.1 0:00 postmaster 19157 postgres 0 0 27840 10M 10144 S 0 0.0 8.6 0:08 postmaster 19282 postgres 0 0 27560 3332 2732 S 0 0.0 2.5 0:11 postmaster 19335 postgres 0 0 27524 3112 2528 S 0 0.0 2.4 0:03 postmaster 19434 postgres 0 0 27416 2700 2192 S 0 0.0 2.1 0:00 postmaster
Edwin Ramirez <ramirez@doc.mssm.edu> writes: > I believe that disk pages are 1k in linux systems, that would mean that > I am allocating 3M when using "postmaster -i -B 3096 -o -S 2048" and 2M > for sorting. That is very low. No, buffers are 8K apiece (unless you've changed the BLCKSZ constant in config.h). So -B 3096 means 24 meg of buffer space. The -S number is indeed measured in kilobytes, however. > However, some of the postgres processes have memory segments larger > than 3M (see bottom). 'top' does not show shared memory segments AFAIK, and the buffer area is a shared memory segment. Try "ipcs -m -a" to see what's going on in shared memory. > That's basically what I tried to do, but I am unable to specify a very > large number (it complained when I tried -B > ~3900). You're probably running into a configuration limit of your kernel --- at a guess, your kernel is configured not to give out shared memory segments exceeding 32Mb. > I understand that the OS is buffering the data read from disk, but > postgres is competing with all the other processes on the system. I > think that if postgres had a dedicated (user configurable) cache, like > Oracle, then users could configure the system/postgres better. The shared-buffer cache does serve that purpose... regards, tom lane