Thread: RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From
"Mikheev, Vadim"
Date:
> > BUT, do we know for sure that sleep(0) is not optimized in 
> > the library to just return? 
> 
> We can only do our best here. I think guessing whether other backends
> are _about_ to commit is pretty shaky, and sleeping every time is a
> waste.  This seems the cleanest.

A long ago you, Bruce, made me gift - book about transaction processing
(thanks again -:)). This sleeping before fsync in commit is described
there as standard technique. And the reason is cleanest.
Men, cost of fsync is very high! { write (64 bytes) + fsync() }
takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse
performance when there is only one backend running but greatly
increase overall performance for 100 simultaneous backends. Ie this
delay is trade off to gain better scalability.

I agreed that it must be configurable, smaller or probably 0 by
default, use approximate # of simultaneously running backends for
guessing (postmaster could maintain this number in shmem and
backends could just read it without any locking - exact number is
not required), good described as tuning patameter in documentation.
Anyway I object sleep(0).

Vadim


Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> A long ago you, Bruce, made me gift - book about transaction processing
> (thanks again -:)). This sleeping before fsync in commit is described
> there as standard technique. And the reason is cleanest.
> Men, cost of fsync is very high! { write (64 bytes) + fsync() }
> takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse
> performance when there is only one backend running but greatly
> increase overall performance for 100 simultaneous backends. Ie this
> delay is trade off to gain better scalability.

> I agreed that it must be configurable, smaller or probably 0 by
> default, use approximate # of simultaneously running backends for
> guessing (postmaster could maintain this number in shmem and
> backends could just read it without any locking - exact number is
> not required), good described as tuning patameter in documentation.
> Anyway I object sleep(0).

Good points.  Another idea that Bruce and I kicked around on the phone
was to make the pre-fsync delay be self-adjusting; that is, it'd
automatically move up and down based on system load.  For example,
you could keep track of the time since the last xact commit, and guess
that the time to the next one will be similar.  If that's greater than
your intended sleep delay, forget the sleep and just fsync.  But the
shorter the time since the last commit, the longer you should be willing
to delay.  This'd need some experimentation to get right, but it seems a
lot better than asking the dbadmin to pick a value.

Another thing that should happen is that once someone fsyncs, all the
other backends waiting should be awoken immediately, instead of waiting
for their delays to time out.  Not sure how doable this is --- there's
no wait-for-semaphore-with-timeout in SysV IPC, is there?  Perhaps we
can distinguish the first waiter (the guy who will ultimately do the
fsync, he's just hoping for some passengers) from the rest, who see
that someone's already waiting for fsync and just wait for him to do it.
Those other guys don't do a time wait, they sleep on a semaphore that
the first waiter will release once he's done the fsync.
        regards, tom lane


Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From
Bruce Momjian
Date:
> > sleep(3) should conform to POSIX specification, if anyone has the
> > reference they can check it to see what the effect of sleep(0)
> > should be.
> 
>   Yes, but Posix also specifies sched_yield() which rather explicitly
> allows a process to yield its timeslice.  No idea how well that is
> supported.

OK, I have a new idea.

There are two parts to transaction commit.  The first is writing all
dirty buffers or log changes to the kernel, and second is fsync of the
log file.

I suggest having a per-backend shared memory byte that has the following
values:
START_LOG_WRITEWAIT_ON_FSYNCNOT_IN_COMMITbackend_number_doing_fsync

I suggest that when each backend starts a commit, it sets its byte to
START_LOG_WRITE.  When it gets ready to fsync, it checks all backends. 
If all are NOT_IN_COMMIT, it does fsync and continues.

If one or more are in START_LOG_WRITE, it waits until no one is in
START_LOG_WRITE.  It then checks all WAIT_ON_FSYNC, and if it is the
lowest backend in WAIT_ON_FSYNC, marks all others with its backend
number, and does fsync.  It then clears all backends with its number to
NOT_IN_COMMIT.  Other backend will see they are not the lowest
WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT
so they can then continue, knowing their data was synced.

This allows a single backend not to sleep, and allows multiple backends
to bunch up only when they are all about to commit.

The reason backend numbers are written is so other backends entering the
commit code will not interfere with the backends performing fsync.

Comments?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From
Bruce Momjian
Date:
Added to TODO:
* Delay fsync() when other backends are about to commit too


[ Charset ISO-8859-1 unsupported, converting... ]
> > > BUT, do we know for sure that sleep(0) is not optimized in 
> > > the library to just return? 
> > 
> > We can only do our best here. I think guessing whether other backends
> > are _about_ to commit is pretty shaky, and sleeping every time is a
> > waste.  This seems the cleanest.
> 
> A long ago you, Bruce, made me gift - book about transaction processing
> (thanks again -:)). This sleeping before fsync in commit is described
> there as standard technique. And the reason is cleanest.
> Men, cost of fsync is very high! { write (64 bytes) + fsync() }
> takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse
> performance when there is only one backend running but greatly
> increase overall performance for 100 simultaneous backends. Ie this
> delay is trade off to gain better scalability.
> 
> I agreed that it must be configurable, smaller or probably 0 by
> default, use approximate # of simultaneously running backends for
> guessing (postmaster could maintain this number in shmem and
> backends could just read it without any locking - exact number is
> not required), good described as tuning patameter in documentation.
> Anyway I object sleep(0).
> 
> Vadim
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: PostgreSQL on WinME?

From
Manuel Cabido
Date:
Hi there,
  I would like to inquire of any support for WinME to run
PostgreSQL. Should anyone knows how, I would be grateful to ask for
advice. I need to run PostgreSQL on my WinME box.

--                              Manny C. Cabido                             ====================================
                    e-mail:manny@tinago.msuiit.edu.ph                                    manny@sun.msuiit.edu.ph
                    =====================================