Thread: Advice: Where could I be of help?

Advice: Where could I be of help?

From
"Curtis Faith"
Date:
All,

I'd like to help work on some 7.4 features, however, since you've not seen
my name before, I'm obviously new to the list and the org.

I really like working on speed optimizations and rewrites. I have 15 years
experience with C++-based systems and databases,  and have worked on
commercial database engines (i.e. indexing and query execution systems), sql
execution and optimization, various lex and yacc based compilers and
parsers. I've generally been able to get code to perform as well or better
than competitive systems with similar functionality, and usually have been
able to beat other code by 3 to 10 X. My unix experience is reasonable but
I'm not an expert.

Any suggestions for where to start? I don't mind digging into very hairy
code or large problems. I'm willing to run the risk of a patch not being
accepted (for large changes) since I'll make sure whatever I do is well
known to those who will do the accept/deny and the approach approved of
ahead of time.

Since I'm new here, I'm thinking a problem that would not otherwise get
handled by the experienced group would be the best place to start. Where is
the system especially slow?

I've read the TODO's, and the last five months of the archives for this
list, so I have some general ideas.

I've also had a lot experience marketing to I.T. organizations so I'd be
happy to help out on the Product Marketing for PostgreSQL advocacy, i.e.
developing a marketing strategy, press releases, etc.

- Curtis

Curtis Faith
Principal
Galt Capital, LLP

------------------------------------------------------------------
Galt Capital                            http://www.galtcapital.com
12 Wimmelskafts Gade
Post Office Box 7549                           voice: 340.776.0144
Charlotte Amalie,  St. Thomas                    fax: 340.776.0244
United States Virgin Islands  00801             cell: 340.643.5368



Re: Advice: Where could I be of help?

From
Mike Benoit
Date:
I'm not a developer, but I know this item on the todo list has been a
magor pain in my side for quite a while:

# Make IN/NOT IN have similar performance to EXISTS/NOT EXISTS
[http://momjian.postgresql.org/cgi-bin/pgtodo?exists]

Any time I've attempted to use this feature, the query cost is in the
millions according to "explain", which of course makes it useless to
even execute. :(

I have managed to work around this performance problem, but it sure
would be nice if PGSQL handled such cases better.

There are probably thousands of other todo items you could spend your
time on that would be more useful to more people, but this is just one
suggestion. :)


On Wed, 2002-10-02 at 13:13, Curtis Faith wrote:
> All,
> 
> I'd like to help work on some 7.4 features, however, since you've not seen
> my name before, I'm obviously new to the list and the org.
> 
> I really like working on speed optimizations and rewrites. I have 15 years
> experience with C++-based systems and databases,  and have worked on
> commercial database engines (i.e. indexing and query execution systems), sql
> execution and optimization, various lex and yacc based compilers and
> parsers. I've generally been able to get code to perform as well or better
> than competitive systems with similar functionality, and usually have been
> able to beat other code by 3 to 10 X. My unix experience is reasonable but
> I'm not an expert.
> 
> Any suggestions for where to start? I don't mind digging into very hairy
> code or large problems. I'm willing to run the risk of a patch not being
> accepted (for large changes) since I'll make sure whatever I do is well
> known to those who will do the accept/deny and the approach approved of
> ahead of time.
> 
> Since I'm new here, I'm thinking a problem that would not otherwise get
> handled by the experienced group would be the best place to start. Where is
> the system especially slow?
> 
> I've read the TODO's, and the last five months of the archives for this
> list, so I have some general ideas.
> 
> I've also had a lot experience marketing to I.T. organizations so I'd be
> happy to help out on the Product Marketing for PostgreSQL advocacy, i.e.
> developing a marketing strategy, press releases, etc.
> 
> - Curtis
> 
> Curtis Faith
> Principal
> Galt Capital, LLP
> 
> ------------------------------------------------------------------
> Galt Capital                            http://www.galtcapital.com
> 12 Wimmelskafts Gade
> Post Office Box 7549                           voice: 340.776.0144
> Charlotte Amalie,  St. Thomas                    fax: 340.776.0244
> United States Virgin Islands  00801             cell: 340.643.5368
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 



Re: Advice: Where could I be of help?

From
Bruce Momjian
Date:
I would read the developers corner stuff, the developers FAQ, pick a
TODO item, and try a patch.  It's that simple.  Feel free to contact me
for specific advice.  I am on chat at:AIM    bmomjianICQ    151255111Yahoo    bmomjianMSN    root@candle.pha.pa.usIRC
#postgresql vis efnet
 

---------------------------------------------------------------------------

Curtis Faith wrote:
> All,
> 
> I'd like to help work on some 7.4 features, however, since you've not seen
> my name before, I'm obviously new to the list and the org.
> 
> I really like working on speed optimizations and rewrites. I have 15 years
> experience with C++-based systems and databases,  and have worked on
> commercial database engines (i.e. indexing and query execution systems), sql
> execution and optimization, various lex and yacc based compilers and
> parsers. I've generally been able to get code to perform as well or better
> than competitive systems with similar functionality, and usually have been
> able to beat other code by 3 to 10 X. My unix experience is reasonable but
> I'm not an expert.
> 
> Any suggestions for where to start? I don't mind digging into very hairy
> code or large problems. I'm willing to run the risk of a patch not being
> accepted (for large changes) since I'll make sure whatever I do is well
> known to those who will do the accept/deny and the approach approved of
> ahead of time.
> 
> Since I'm new here, I'm thinking a problem that would not otherwise get
> handled by the experienced group would be the best place to start. Where is
> the system especially slow?
> 
> I've read the TODO's, and the last five months of the archives for this
> list, so I have some general ideas.
> 
> I've also had a lot experience marketing to I.T. organizations so I'd be
> happy to help out on the Product Marketing for PostgreSQL advocacy, i.e.
> developing a marketing strategy, press releases, etc.
> 
> - Curtis
> 
> Curtis Faith
> Principal
> Galt Capital, LLP
> 
> ------------------------------------------------------------------
> Galt Capital                            http://www.galtcapital.com
> 12 Wimmelskafts Gade
> Post Office Box 7549                           voice: 340.776.0144
> Charlotte Amalie,  St. Thomas                    fax: 340.776.0244
> United States Virgin Islands  00801             cell: 340.643.5368
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Advice: Where could I be of help?

From
Neil Conway
Date:
"Curtis Faith" <curtis@galtair.com> writes:
> I'd like to help work on some 7.4 features, however, since you've
> not seen my name before, I'm obviously new to the list and the org.

[...]

> Any suggestions for where to start?

Well, I'd suggest working on what you find interesting -- there is
room for improvement in just about every area of the system, so don't
let us dictate how you spend your free time.

That said, the code for hash indexes requires some *major* changes,
and AFAIK none of the core developers are planning on working on it
any time soon (and since the hash index could is somewhat isolated, it
might be a good place to start). There's also plenty of work remaining
to be done for replication -- the pgreplication project could use some
help. You could also improve PL/pgSQL -- there are a bunch of
relatively minor improvements that could be made. You could also try
implementing bitmap indexes, or improving GEQO (the genetic-algorithm
based query optimizer).

HTH,

Neil

-- 
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC



Re: Advice: Where could I be of help?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I would read the developers corner stuff, the developers FAQ, pick a
> TODO item, and try a patch.  It's that simple.

Yup.  I'd also suggest starting with something relatively small and
localized (the nearby suggestion to fix IN/EXISTS, for example, is
probably not a good first project --- and anyway I was going to work
on that myself this month ;-)).

Neil Conway's thought of working on plpgsql seems a good one to me;
and as he says, there's lots of other possibilities.  What do you
find interesting in the TODO list?
        regards, tom lane


Re: Advice: Where could I be of help?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I would read the developers corner stuff, the developers FAQ, pick a
> > TODO item, and try a patch.  It's that simple.
> 
> Yup.  I'd also suggest starting with something relatively small and
> localized (the nearby suggestion to fix IN/EXISTS, for example, is
> probably not a good first project --- and anyway I was going to work
> on that myself this month ;-)).

That's good news.  I am getting a little embarassed because I had to
explain the work arounds to someone this week, twice.

As it stands now, when is EXISTS quicker than IN.  It isn't always,
right?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Advice: Where could I be of help?

From
"Curtis Faith"
Date:
tom lane wrote:
> But more globally, I think that our worst problems these days have to do
> with planner misestimations leading to bad plans.  The planner is
> usually *capable* of generating a good plan, but all too often it picks
> the wrong one.  We need work on improving the cost modeling equations
> to be closer to reality.  If that's at all close to your sphere of
> interest then I think it should be #1 priority --- it's both localized,
> which I think is important for a first project, and potentially a
> considerable win.

This seems like a very interesting problem. One of the ways that I thought
would be interesting and would solve the problem of trying to figure out the
right numbers is to have certain guesses for the actual values based on
statistics gathered during vacuum and general running and then have the
planner run the "best" plan.

Then during execution if the planner turned out to be VERY wrong about
certain assumptions the execution system could update the stats that led to
those wrong assumptions. That way the system would seek the correct values
automatically. We could also gather the stats that the system produces for
certain actual databases and then use those to make smarter initial guesses.

I've found that I can never predict costs. I always end up testing
empirically and find myself surprised at the results.

We should be able to make the executor smart enough to keep count of actual
costs (or a statistical approximation) without introducing any significant
overhead.

tom lane also wrote:
> There is no "cache flushing".  We have a shared buffer cache management
> algorithm that's straight LRU across all buffers.  There's been some
> interest in smarter cache-replacement code; I believe Neil Conway is
> messing around with an LRU-2 implementation right now.  If you've got
> better ideas we're all ears.

Hmmm, this is the area that I think could lead to huge performance gains.

Consider a simple system with a table tbl_master that gets read by each
process many times but with very infrequent inserts and that contains about
3,000 rows. The single but heavily used index for this table is contained in
a btree with a depth of three with 20 - 8K pages in the first two levels of
the btree.

Another table tbl_detail with 10 indices that gets very frequent inserts.
There are over 300,000 rows. Some queries result in index scans over the
approximatley 5,000 8K pages in the index.

There is a 40M shared cache for this system.

Everytime a query which requires the index scan runs it will blow out the
entire cache since the scan will load more blocks than the cache holds. Only
blocks that are accessed while the scan is going will survive. LRU is bad,
bad, bad!

LRU-2 might be better but it seems like it still won't give enough priority
to the most frequently used blocks. I don't see how it would do better for
the above case.

I once implemented a modified cache algorithm that was based on the clock
algorithm for VM page caches. VM paging is similar to databases in that
there is definite locality of reference and certain pages are MUCH more
likely to be requested.

The basic idea was to have a flag in each block that represented the access
time in clock intervals. Imagine a clock hand sweeping across a clock, every
access is like a tiny movement in the clock hand. Blocks that are not
accessed during a sweep are candidates for removal.

My modification was to use access counts to increase the durability of the
more accessed blocks. Each time a block is accessed it's flag is shifted
left (up to a maximum number of shifts - ShiftN ) and 1 is added to it.
Every so many cache accesses (and synchronously when the cache is full) a
pass is made over each block, right shifting the flags (a clock sweep). This
can also be done one block at a time each access so the clock is directly
linked to the cache access rate. Any blocks with 0 are placed into a doubly
linked list of candidates for removal. New cache blocks are allocated from
the list of candidates. Accesses of blocks in the candidate list just
removes them from the list.

An index root node page would likely be accessed frequently enough so that
all it's bits would be set so it would take ShiftN clock sweeps.

This algorithm increased the cache hit ratio from 40% to about 90% for the
cases I tested when compared to a simple LRU mechanism. The paging ratio is
greatly dependent on the ratio of the actual database size to the cache
size.

The bottom line that it is very important to keep blocks that are frequently
accessed in the cache. The top levels of large btrees are accessed many
hundreds (actually a power of the number of keys in each page) of times more
frequently than the leaf pages. LRU can be the worst possible algorithm for
something like an index or table scan of large tables since it flushes a
large number of potentially frequently accessed blocks in favor of ones that
are very unlikely to be retrieved again.

tom lane also wrote:
> This is an interesting area.  Keep in mind though that Postgres is a
> portable DB that tries to be agnostic about what kernel and filesystem
> it's sitting on top of --- and in any case it does not run as root, so
> has very limited ability to affect what the kernel/filesystem do.
> I'm not sure how much can be done without losing those portability
> advantages.

The kinds of things I was thinking about should be very portable. I found
that simply writing the cache in order of the file system offset results in
very greatly improved performance since it lets the head seek in smaller
increments and much more smoothly, especially with modern disks. Most of the
time the file system will create files are large sequential bytes on the
physical disks in order. It might be in a few chunks but those chunks will
be sequential and fairly large.

tom lane also wrote:
> Well, not really all that isolated.  The bottom-level index code doesn't
> know whether you're doing INSERT or UPDATE, and would have no easy
> access to the original tuple if it did know.  The original theory about
> this was that the planner could detect the situation where the index(es)
> don't overlap the set of columns being changed by the UPDATE, which
> would be nice since there'd be zero runtime overhead.  Unfortunately
> that breaks down if any BEFORE UPDATE triggers are fired that modify the
> tuple being stored.  So all in all it turns out to be a tad messy to fit
> this in :-(.  I am unconvinced that the impact would be huge anyway,
> especially as of 7.3 which has a shortcut path for dead index entries.

Well, this probably is not the right place to start then.

- Curtis



Re: Advice: Where could I be of help?

From
Tom Lane
Date:
"Curtis Faith" <curtis@galtair.com> writes:
> Then during execution if the planner turned out to be VERY wrong about
> certain assumptions the execution system could update the stats that led to
> those wrong assumptions. That way the system would seek the correct values
> automatically.

That has been suggested before, but I'm unsure how to make it work.
There are a lot of parameters involved in any planning decision and it's
not obvious which ones to tweak, or in which direction, if the plan
turns out to be bad.  But if you can come up with some ideas, go to
it!

> Everytime a query which requires the index scan runs it will blow out the
> entire cache since the scan will load more blocks than the cache
> holds.

Right, that's the scenario that kills simple LRU ...

> LRU-2 might be better but it seems like it still won't give enough priority
> to the most frequently used blocks.

Blocks touched more than once per query (like the upper-level index
blocks) will survive under LRU-2.  Blocks touched once per query won't.
Seems to me that it should be a win.

> My modification was to use access counts to increase the durability of the
> more accessed blocks.

You could do it that way too, but I'm unsure whether the extra
complexity will buy anything.  Ultimately, I think an LRU-anything
algorithm is equivalent to a clock sweep for those pages that only get
touched once per some-long-interval: the single-touch guys get recycled
in order of last use, which seems just like a clock sweep around the
cache.  The guys with some amount of preference get excluded from the
once-around sweep.  To determine whether LRU-2 is better or worse than
some other preference algorithm requires a finer grain of analysis than
this.  I'm not a fan of "more complex must be better", so I'd want to see
why it's better before buying into it ...

> The kinds of things I was thinking about should be very portable. I found
> that simply writing the cache in order of the file system offset results in
> very greatly improved performance since it lets the head seek in smaller
> increments and much more smoothly, especially with modern disks.

Shouldn't the OS be responsible for scheduling those writes
appropriately?  Ye good olde elevator algorithm ought to handle this;
and it's at least one layer closer to the actual disk layout than we
are, thus more likely to issue the writes in a good order.  It's worth
experimenting with, perhaps, but I'm pretty dubious about it.

BTW, one other thing that Vadim kept saying we should do is alter the
cache management strategy to retain dirty blocks in memory (ie, give
some amount of preference to as-yet-unwritten dirty pages compared to
clean pages).  There is no reliability cost here since the WAL will let
us reconstruct any dirty pages if we crash before they get written; and
the periodic checkpoints will ensure that we eventually write a dirty
block and thus it will become available for recycling.  This seems like
a promising line of thought that's orthogonal to the basic
LRU-vs-whatever issue.  Nobody's got round to looking at it yet though.
I've got no idea how much preference should be given to a dirty block
--- not infinite, probably, but some.
        regards, tom lane


Re: Advice: Where could I be of help?

From
"Curtis Faith"
Date:
I wrote:

> > My modification was to use access counts to increase the
> durability of the
> > more accessed blocks.
>

tom lane replies:
> You could do it that way too, but I'm unsure whether the extra
> complexity will buy anything.  Ultimately, I think an LRU-anything
> algorithm is equivalent to a clock sweep for those pages that only get
> touched once per some-long-interval: the single-touch guys get recycled
> in order of last use, which seems just like a clock sweep around the
> cache.  The guys with some amount of preference get excluded from the
> once-around sweep.  To determine whether LRU-2 is better or worse than
> some other preference algorithm requires a finer grain of analysis than
> this.  I'm not a fan of "more complex must be better", so I'd want to see
> why it's better before buying into it ...

I'm definitely not a fan of "more complex must be better either". In fact,
its surprising how often the real performance problems are easy to fix
and simple while many person years are spent solving the issue everyone
"knows" must be causing the performance problems only to find little gain.

The key here is empirical testing. If the cache hit ratio for LRU-2 is
much better then there may be no need here. OTOH, it took less than
less than 30 lines or so of code to do what I described, so I don't consider
it too, too "more complex" :=} We should run a test which includes
running indexes (or is indices the PostgreSQL convention?) that are three
or more times the size of the cache to see how well LRU-2 works. Is there
any cache performance reporting built into pgsql?

tom lane wrote:
> Shouldn't the OS be responsible for scheduling those writes
> appropriately?  Ye good olde elevator algorithm ought to handle this;
> and it's at least one layer closer to the actual disk layout than we
> are, thus more likely to issue the writes in a good order.  It's worth
> experimenting with, perhaps, but I'm pretty dubious about it.

I wasn't proposing anything other than changing the order of the writes,
not actually ensuring that they get written that way at the level you
describe above. This will help a lot on brain-dead file systems that
can't do this ordering and probably also in cases where the number
of blocks in the cache is very large.

On a related note, while looking at the code, it seems to me that we
are writing out the buffer cache synchronously, so there won't be
any possibility of the file system reordering anyway. This appears to be
a huge performance problem. I've read claims  in the archives that
that the buffers are written asynchronously but my read of the
code says otherwise. Can someone point out my error?

I only see calls that ultimately call FileWrite or write(2) which will
block without a O_NOBLOCK open. I thought one of the main reasons
for having a WAL is so that you can write out the buffer's asynchronously.

What am I missing?

I wrote:
> > Then during execution if the planner turned out to be VERY wrong about
> > certain assumptions the execution system could update the stats
> that led to
> > those wrong assumptions. That way the system would seek the
> correct values
> > automatically.

tom lane replied:
> That has been suggested before, but I'm unsure how to make it work.
> There are a lot of parameters involved in any planning decision and it's
> not obvious which ones to tweak, or in which direction, if the plan
> turns out to be bad.  But if you can come up with some ideas, go to
> it!

I'll have to look at the current planner before I can suggest
anything concrete.

- Curtis