Thread: Re: University Masters Project

Re: University Masters Project

From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Bruce,
> 
> The replacement of the existing client/server communication project with
> CORBA looks very interesting, I would love to get involved with something
> like that. Is there anyone working on it at the moment? What area of it
> would you like me to look into, any ideas of how I could turn a project like
> that into a good Thesis? If you can give me some pointers I'll go and speak
> to my tutor about it all.


[CC'ing to hackers for comments.]

Well, one idea is to create a server that listens on a certain port for
CORBA requests, sends them to a backend for processing, and returns the
result.

The other idea is to replace our current communication system that uses
single-character flags and data with a corba model.  See developers
documentation for deals on that.

I think the first on is clearly good, the second may suffer from
performance problems, or it may not be worth changing all our interfaces
to handle a new protocol.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: University Masters Project

From
The Hermit Hacker
Date:
On Thu, 2 Sep 1999, Bruce Momjian wrote:

> [Charset iso-8859-1 unsupported, filtering to ASCII...]
> > Bruce,
> > 
> > The replacement of the existing client/server communication project with
> > CORBA looks very interesting, I would love to get involved with something
> > like that. Is there anyone working on it at the moment? What area of it
> > would you like me to look into, any ideas of how I could turn a project like
> > that into a good Thesis? If you can give me some pointers I'll go and speak
> > to my tutor about it all.
> 
> 
> [CC'ing to hackers for comments.]
> 
> Well, one idea is to create a server that listens on a certain port for
> CORBA requests, sends them to a backend for processing, and returns the
> result.
> 
> The other idea is to replace our current communication system that uses
> single-character flags and data with a corba model.  See developers
> documentation for deals on that.
> 
> I think the first on is clearly good, the second may suffer from
> performance problems, or it may not be worth changing all our interfaces
> to handle a new protocol.

I'm curious as to whether there is a way of testing that without too much
trouble?  Even the investigation of *that* might make for the thesis in
itself? 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Postgres Performance

From
Edwin Ramirez
Date:
Hello,

I am using Postgres extensively for a number of projects.  I am
extremely happy with its performance and flexibility.  I am trying to
optimize the system, currently I run the postmaster with the following
setting: postmaster -i -B 2048 -o '-S 2048'

I have a couple of large(?) tables which I would like to keep them in
memory (cached) so that searches are performed as fast as possible.

Is it possible to 'pin' the tables and it's indexes in memory?  
Are there any other options/values which would yield better performance?

Thanks,
-Edwin S. Ramirez-


Re: [HACKERS] Postgres Performance

From
Thomas Lockhart
Date:
> I have a couple of large(?) tables which I would like to keep them in
> memory (cached) so that searches are performed as fast as possible.
> Is it possible to 'pin' the tables and it's indexes in memory?

Not explicitly. We rely on the OS to do that.

> Are there any other options/values which would yield better performance?

By default, the backend "fsyncs" for every query. You can disable
this, which would then allow the tables to hang around in memory until
the OS decides to flush to disk. Not everyone should do this, since
there is a (small) risk that if your computer crashes after some
updates but before things are flushed then the db might become
inconsistant. afaik we have never had an unambiguous report that this
has actually happened (but others might remember differently). There
is already that risk to some extent, but instead of the window being
O(1sec) it becomes O(30sec).

Run the backend by adding '-o -F' (or just '-F' to your existing list
of "-o" options). 
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres Performance

From
Tom Lane
Date:
Edwin Ramirez <ramirez@doc.mssm.edu> writes:
> I have a couple of large(?) tables which I would like to keep them in
> memory (cached) so that searches are performed as fast as possible.
> Is it possible to 'pin' the tables and it's indexes in memory?  

If the tables are being touched often, then they will stay in buffer
cache of their own accord.  I doubt that pinning them would improve
performance --- if they do get swapped out it'd be because some other
table(s) need to be accessed now, and if you did have these tables
pinned you'd be taking a large hit in access performance for those other
tables because of inadequate buffer space.  LRU buffering policy really
works pretty well, so I don't think you need to worry about it.

> currently I run the postmaster with the following setting: 
>     postmaster -i -B 2048 -o '-S 2048'
> Are there any other options/values which would yield better performance?

If you have a reliable OS and power source, consider -o -F (no fsync).
This usually makes for a very substantial performance improvement, and
it can only hurt if your machine goes down without having performed
all the writes the kernel was told to do.
        regards, tom lane


RE: [HACKERS] Re: University Masters Project

From
"Mark Proctor"
Date:
Dear All,

Yes I agree with you that something like that might make a thesis in itself,
and definitely sounds interesting.

I really need to sit down and go through PostgreSQL so that I understand how
it all works, so that I can ask questions without wasting everyone's time,
as I'm sure a lot of the questions I currently have will be in the
documentation. I start Uni in 4 weeks time, which by then I hope to have the
basics to PostgreSQL and its architecture, that along with guidance from my
tutor should then give me a good base to start the project on.

I'll keep you all informed of my progress with this over the next few weeks,
and my University's response to my request to work on a project of this
nature.

Who should I direct my correspondance to, as I don't want to start filling
up people's email box's with unessecary email.

Regards

Mark Proctor
Brunel University
Email : M.Proctor@bigfoot.com
ICQ : 8106598

-----Original Message-----
From:    The Hermit Hacker [mailto:scrappy@hub.org]
Sent:    Friday, September 03, 1999 12:19 PM
To:    Bruce Momjian
Cc:    mark@polar-digital.com; PostgreSQL-development
Subject:    Re: [HACKERS] Re: University Masters Project

On Thu, 2 Sep 1999, Bruce Momjian wrote:

> [Charset iso-8859-1 unsupported, filtering to ASCII...]
> > Bruce,
> >
> > The replacement of the existing client/server communication project with
> > CORBA looks very interesting, I would love to get involved with
something
> > like that. Is there anyone working on it at the moment? What area of it
> > would you like me to look into, any ideas of how I could turn a project
like
> > that into a good Thesis? If you can give me some pointers I'll go and
speak
> > to my tutor about it all.
>
>
> [CC'ing to hackers for comments.]
>
> Well, one idea is to create a server that listens on a certain port for
> CORBA requests, sends them to a backend for processing, and returns the
> result.
>
> The other idea is to replace our current communication system that uses
> single-character flags and data with a corba model.  See developers
> documentation for deals on that.
>
> I think the first on is clearly good, the second may suffer from
> performance problems, or it may not be worth changing all our interfaces
> to handle a new protocol.

I'm curious as to whether there is a way of testing that without too much
trouble?  Even the investigation of *that* might make for the thesis in
itself?

Marc G. Fournier                   ICQ#7615664               IRC Nick:
Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary:
scrappy@{freebsd|postgresql}.org




Re: [HACKERS] Re: University Masters Project

From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Dear All,
> 
> Yes I agree with you that something like that might make a thesis in itself,
> and definitely sounds interesting.
> 
> I really need to sit down and go through PostgreSQL so that I understand how
> it all works, so that I can ask questions without wasting everyone's time,
> as I'm sure a lot of the questions I currently have will be in the
> documentation. I start Uni in 4 weeks time, which by then I hope to have the
> basics to PostgreSQL and its architecture, that along with guidance from my
> tutor should then give me a good base to start the project on.
> 
> I'll keep you all informed of my progress with this over the next few weeks,
> and my University's response to my request to work on a project of this
> nature.
> 
> Who should I direct my correspondance to, as I don't want to start filling
> up people's email box's with unessecary email.

Hackers list is fine.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Performance

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> there is a (small) risk that if your computer crashes after some
> updates but before things are flushed then the db might become
> inconsistant. afaik we have never had an unambiguous report that this
> has actually happened (but others might remember differently). There
> is already that risk to some extent, but instead of the window being
> O(1sec) it becomes O(30sec).

I believe we use fsync not so much to reduce the time window where you
could lose a supposedly-committed update as to ensure that writes are
performed in a known order.  With fsync enabled, the data-file pages
touched by an update query will hit the disk before the pg_log entry
saying the transaction is committed hits the disk.  If you crash
somewhere during that sequence, the transaction appears uncommitted
and there is no loss of consistency.  (We assume here that writing
a single page to disk is an atomic operation, which is only sort-of
true, but it's the best we can do atop a Unix kernel.  Other than that,
there is no "window" for possible inconsistency.)

Without fsync, the kernel writes the pages to disk in whatever order
it finds convenient, so following a crash there might be a pg_log entry
saying transaction N was committed, when in fact only some of
transaction N's tuples made it to disk.  Then you see an inconsistent
database: some of the transaction's updates are there, some are not.
This might be relatively harmless, or deadly, depending on your
application logic and just what the missing updates are.

Another risk without fsync is that a client application might have been
told that the transaction was committed, when in fact it gets lost due to
a crash moments later before pg_log gets physically updated.  Again, the
possible consequences would depend on your application.

The total number of writes performed without fsync is usually way less
than with, since we tend to write certain pages (esp. pg_log) over and
over --- the kernel will reduce that to one physical disk write every
sync interval (~ 30sec) unless we force its hand with fsync.  That's
where most of the performance improvement comes from.

If you have a reliable kernel and reliable hardware/power supply, then
you might as well turn off fsync.  A crash in Postgres itself would
not cause a problem --- the writes are out there in the kernel's disk
buffers, and the only issue is do you trust the platform to get the
data onto stable storage.
        regards, tom lane


[HACKERS] Postgres Performance

From
Edwin Ramirez
Date:
Hello,

I am using Postgres extensively for a number of projects.  I am
extremely happy with its performance and flexibility.  I am trying to
optimize the system, currently I run the postmaster with the following
setting: postmaster -i -B 2048 -o '-S 2048'

I have a couple of large(?) tables which I would like to keep them in
memory (cached) so that searches are performed as fast as possible.

Is it possible to 'pin' the tables and it's indexes in memory?  
Are there any other options/values which would yield better performance?

Thanks,
-Edwin S. Ramirez-


Re: [HACKERS] Postgres Performance

From
Edwin Ramirez
Date:
If I do a large search the first time is about three times slower than
any subsequent overlapping (same data) searches.  I would like to always
get the higher performance. 

How are the buffers that I specify to the postmaster used?
Will increasing this number improve things?

The issue that I am encountering is that no matter how much memory I
have on a computer, the performance is not improving.  I am willing to
fund a project to implement a postgres specific, user configurable
cache.

Any ideas?
-Edwin S. Ramirez-

Tom Lane wrote:
> 
> Edwin Ramirez <ramirez@doc.mssm.edu> writes:
> > I have a couple of large(?) tables which I would like to keep them in
> > memory (cached) so that searches are performed as fast as possible.
> > Is it possible to 'pin' the tables and it's indexes in memory?
> 
> If the tables are being touched often, then they will stay in buffer
> cache of their own accord.  I doubt that pinning them would improve
> performance --- if they do get swapped out it'd be because some other
> table(s) need to be accessed now, and if you did have these tables
> pinned you'd be taking a large hit in access performance for those other
> tables because of inadequate buffer space.  LRU buffering policy really
> works pretty well, so I don't think you need to worry about it.
> 
> > currently I run the postmaster with the following setting:
> >       postmaster -i -B 2048 -o '-S 2048'
> > Are there any other options/values which would yield better performance?
> 
> If you have a reliable OS and power source, consider -o -F (no fsync).
> This usually makes for a very substantial performance improvement, and
> it can only hurt if your machine goes down without having performed
> all the writes the kernel was told to do.
> 
>                         regards, tom lane
> 
> ************


Re: [HACKERS] Postgres Performance

From
Michael Simms
Date:
> 
> If I do a large search the first time is about three times slower than
> any subsequent overlapping (same data) searches.  I would like to always
> get the higher performance. 
> 
> How are the buffers that I specify to the postmaster used?
> Will increasing this number improve things?
> 
> The issue that I am encountering is that no matter how much memory I
> have on a computer, the performance is not improving.  I am willing to
> fund a project to implement a postgres specific, user configurable
> cache.
> 
> Any ideas?
> -Edwin S. Ramirez-

I think that the fact you are seeing an improvement already shows a good level
of caching.

What happens the first time is that it must read the data off the disc. After
that the data comes from memory IF it is cached. Disc read will always be
slower with current disc technology.

I would imagine (Im not an expert, but through observation) that if you
drasticly increase the number of shared memory buffers, then when you
startup your front-end simply do a select * from the tables, it may even keep
them all in memory from the start.
                    M Simms


Re: [HACKERS] Postgres Performance

From
Tom Lane
Date:
Michael Simms <grim@argh.demon.co.uk> writes:
>> If I do a large search the first time is about three times slower than
>> any subsequent overlapping (same data) searches.  I would like to always
>> get the higher performance. 

> What happens the first time is that it must read the data off the disc. After
> that the data comes from memory IF it is cached. Disc read will always be
> slower with current disc technology.

There is that effect, but I suspect Edwin may also be seeing another
effect.  When a tuple is first inserted or modified, it is written into
the table with a marker saying (in effect) "Inserted by transaction NNN,
not committed yet".  To find out whether the tuple is really any good,
you have to go and consult pg_log to see if that transaction got
committed.  Obviously, that's slow, so the first subsequent transaction
that does so and finds that NNN really did get committed will rewrite
the disk page with the tuple's state changed to "Known committed".

So, the first select after an update transaction will spend additional
cycles checking pg_log and marking committed tuples.  In effect, it's
doing the last phase of the update.  We could instead force the update
to do all its own housekeeping, but the overall result wouldn't be any
faster; probably it'd be slower.

> I would imagine (Im not an expert, but through observation) that if
> you drasticly increase the number of shared memory buffers, then when
> you startup your front-end simply do a select * from the tables, it
> may even keep them all in memory from the start.

The default buffer space (64 disk pages) is not very large --- use
a larger -B setting if you have the memory to spare.
        regards, tom lane


Re: [HACKERS] Postgres Performance

From
Edwin Ramirez
Date:
I believe that disk pages are 1k in linux systems, that would mean that
I am allocating 3M when using "postmaster -i -B 3096 -o -S 2048" and 2M
for sorting.  That is very low.  
However, some of the postgres processes have memory segments larger
than 3M (see bottom).

> I would imagine (Im not an expert, but through observation) that if
> you drasticly increase the number of shared memory buffers, then when
> you startup your front-end simply do a select * from the tables, it
> may even keep them all in memory from the start.

That's basically what I tried to do, but I am unable to specify a very
large number (it complained when I tried -B > ~3900).  Do these buffer
contain the actual table data?
I understand that the OS is buffering the data read from disk, but
postgres is competing with all the other processes on the system.  I
think that if postgres had a dedicated (user configurable) cache, like
Oracle, then users could configure the system/postgres better.


4:29pm  up 83 days, 23:42,  5 users,  load average: 0.00, 0.01, 0.00
75 processes: 74 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.1% user,  1.1% system,  0.0% nice, 98.7% idle
Mem:  128216K av,  98812K used,  29404K free,  67064K shrd,  18536K buff
Swap:  80288K av,  22208K used,  58080K free                 14924K
cached
 PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
COMMAND
16633 postgres   0   0 26536 1384  1284 S       0  0.0  1.0   0:02
postmaster
18190 postgres   0   0 27708 3432  2720 S       0  0.0  2.6   0:00
postmaster
18303 postgres   0   0 27444 2728  2196 S       0  0.0  2.1   0:00
postmaster
18991 postgres   0   0 27472 2908  2392 S       0  0.0  2.2   0:00
postmaster
19154 postgres   0   0 27408 2644  2140 S       0  0.0  2.0   0:06
postmaster
19155 postgres   0   0 27428 2712  2188 S       0  0.0  2.1   0:00
postmaster
19157 postgres   0   0 27840  10M 10144 S       0  0.0  8.6   0:08
postmaster
19282 postgres   0   0 27560 3332  2732 S       0  0.0  2.5   0:11
postmaster
19335 postgres   0   0 27524 3112  2528 S       0  0.0  2.4   0:03
postmaster
19434 postgres   0   0 27416 2700  2192 S       0  0.0  2.1   0:00
postmaster


Re: [HACKERS] Postgres Performance

From
Tom Lane
Date:
Edwin Ramirez <ramirez@doc.mssm.edu> writes:
>     I believe that disk pages are 1k in linux systems, that would mean that
> I am allocating 3M when using "postmaster -i -B 3096 -o -S 2048" and 2M
> for sorting.  That is very low.  

No, buffers are 8K apiece (unless you've changed the BLCKSZ constant in
config.h).  So -B 3096 means 24 meg of buffer space.  The -S number is
indeed measured in kilobytes, however.

>     However, some of the postgres processes have memory segments larger
> than 3M (see bottom).

'top' does not show shared memory segments AFAIK, and the buffer area is
a shared memory segment.  Try "ipcs -m -a" to see what's going on in
shared memory.

> That's basically what I tried to do, but I am unable to specify a very
> large number (it complained when I tried -B > ~3900).

You're probably running into a configuration limit of your kernel ---
at a guess, your kernel is configured not to give out shared memory
segments exceeding 32Mb.

> I understand that the OS is buffering the data read from disk, but
> postgres is competing with all the other processes on the system.  I
> think that if postgres had a dedicated (user configurable) cache, like
> Oracle, then users could configure the system/postgres better.

The shared-buffer cache does serve that purpose...
        regards, tom lane