Re: Enough RAM for entire Database.. cost aside, is this - Mailing list pgsql-general

From Mike Rylander
Subject Re: Enough RAM for entire Database.. cost aside, is this
Date
Msg-id cc5a5u$15ga$1@news.hub.org
Whole thread Raw
In response to Enough RAM for entire Database.. cost aside, is this going to be fastest?  ("Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk>)
List pgsql-general
<posted & mailed>

Andy B wrote:

> Hello again Mike,
>
> Thanks for the replies! Here's my next salvo!
-- snip --
>
>
> Imagine this scenario. Your application uses 5 bits of data - A B C D and
> E. Only 3 can sit in memory at once.
>
> You want to do two transactions. The first combines A, B and C, so you
> load those, in order.
>
> Once you're finished with that first transaction, you then require to
> combine A, C and D.
>
> Now, an application driven cache will see immediately that the optimal
> cache replacement is to load D into the slot taken by B. With the OS in
> charge, you'll probably end up with a cache thrash.
>

If all the data fits into memory, then this cache thrashing won't occur,
yes?

> This is simplistic, but it is just this sort of smart caching that can
> lead to dramatic speedups, and I've got quite a bit of experience of
> seeing just that.
>
> I don't want to devalue the linux cache strategy. It is very appropriate
> for almost all situations. I'm talking about a very specific scenario here
> though.
>
> Does that make sense?
>

It does.  And I will support your strategy as soon as there is some
empirical evidence surrounding PG.  I'm all for more speed!

>
> Again, in my scenario, postgres is *relieved* of the task of having to
> decide what data to choose to be in the workspace, since ALL of the data
> is accessible there in my scenario.
>

It's not the deciding what to pull into the workspace, it's the finding of
the target data in the workspace.  As I've said, I'm no expert in caching
strategies (I will leave that to the likes of Tome Lane and Linus... and
you, if you would like to submit a patch :).

> The point about postgres having to shovel through millions of tuples is a
> different point. If I do a query which can only be executed by looking at
> a million tuples, then that is what has to happen, no matter what caching
> strategy you employ. Obviously, if you're shovelling data in and out of
> memory, then the *way* you traverse those tuples becomes important to
> efficiency, as loading sequential stuff off a disk is always faster.
(we both want it all in memory... I never want to hit a disk for read!)
> If
> it's *all* in the workspace, well it doesn't matter at all then - the
> planner can forget about disks and choose the fastest algorithmic option.
>

Well, I think I have a way for you to test this...:

Test of BIG shared_buffers:
------------------------------
1) inflate shared buffers as as much as possible.
2) deflate effective cache size as much as possible.
3) set random_page_cost to 0.01, or even lower (force planner to think disk
= ram)
4) for all relevent tables, do SELECT count(*) FROM table;
5) run queries involving many tables using EXPLAIN ANALYZE and collect the
times

Test of currently recommended shared_buffers:
-------------------------------------------------
1) set shared_buffers to 20000 (goodly setting for production DB...)
2) set effective cache size using current recommendations (see General Bits,
etc)
3) proceed to steps 4 and 5 above

Unfortunately, I don't have a machine on which to test this at the moment,
but if anyone out there does, I would find the results very interesting!

> Again - this is a simplification, but I hope you can see what I'm getting
> at.
>
>> Come now, Nov. 2001?  Most of the issues brought up here have been solved
>> *in the kernel* since then.  (In the general case, with the bevy of IO
>> schedulers, and the interuptable kernel.)
>
> The point is that if your data is sitting in two parts of memory at the
> same time, then that's a waste of memory, and it's extra time copying all
> that stuff around. The date of the article isn't relevant to this, I don't
> think.
>

I understand what you are saying.  My point was simply that the specific
issues raised in the article have been largely addressed in kernel space on
linux.

> They make the point that no matter how efficient the OS kernel is, in the
> sort of scenario I am interested in, it just ends up being an unnecessary
> step.

That is true, and will be until PG implements it's own cache memory
management AND on-disk storage.  I'd be willing to bet that any patches
that dramatically increase speed for the majority of use cases would be
considered strongly.

>>
>> Well, Oracle isn't PG, and I can't speak to their buffering and caching
>> implementation.
>
> I'm just saying that these issues are generic issues for any high
> performance database server, regardless of whether it is Oracle,
> Postgresql, Mysql etc. The things they highlight in that article are the
> very things I allude to here for my special scenario. Even if you're a
> postgresql fan (and I chose to use it too), it is worth poking your head
> over the parapet and seeing how others are dealing with these issues too.
>

I understand and agree.  In truth, that was an off-handed comment coming
more from my own ignorance of the issue at hand then from my desire to stay
within my "walls"... the article presents interesting points, and is worth
investigating.

>
>  I've spent years getting data on and off disks quickly, and even
>  something
> as simple as a non machine page boundary aligned transfer can slow things
> down by an order of magnitude. It's just a question of doing the
> straightest, most 'hardware optimised' route.
>

I couldn't agree more, though I think at this point we are arguing around
the problem.  We both want all the data in RAM... it's where the data stays
for the majority of its off-disk time we disagree on.

If all the data is in memory, OS or app cache, then the point is who is
better at looking up that data.

Finding target data (complex lookup in the PG shared buffers) will take
longer than transferring disk pages from the OS cache, based on index
information, and then doing a simple, fast scan. This is the current
working assumption for postres.

Your contention, correct me if I've gotten this wrong, is that transferring
it from the OS cache will be slower than finding it using a more complex
lookup, but without moving the data from the OS cache to the PG workspace.

In both scenarios all data is in RAM.

I see what you mean about a waste of memory, but if the costing parameters
are tuned well for the use of the specific DB then current evidence with
postgres shows that the former strategy is faster.  I would be interested
to see if the latter scenario could be made to work with the current PG
codebase, or even if there is a simple patch that could speed up the
shared_buffers case.

I guess, though, that the main problem I see with the big shared_buffers
idea on linux specifically is that SHMEM is slow on linux.  Rumor has it
that the SHMEM implementation on Solaris is much faster than on linux, so
PG on Solaris may very well benefit from using your approach, but I just
don't see it on linux, which is where this discussion started IIRC.

>>
>> We (well, I) don't want you to shut up.  Infact I enjoy discussions like
>> this, because it freshens my memory on just why PG rocks so hard! ;)
>>
(sorry, just watched "School of Rock" last night...)

> Cool! I'm glad that you're really into PG, because I'm sort of counting on
> it too, so that makes me feel happier, and it's really good to see the
> newsgroups for the database being so active.
>
> Thanks anyway for the replies. I'm going to do a lot more digging and see
> if I can find anyone who has got their hands dirty trying to do this
> 'everything in RAM' scenario.
>
> I'll come back with anything more interesting that I find.

I will wait with baited breath ;)

--miker

>
> Regards
> Andy


pgsql-general by date:

Previous
From: masculinu@hotmail.com (Maus)
Date:
Subject: how could I connect a Postgres database in c language using odbc?
Next
From: Mike Rylander
Date:
Subject: Re: Enough RAM for entire Database.. cost aside, is this