Re: hstore query: Any better idea than adding more memory? - Mailing list pgsql-performance

From Stephen Frost
Subject Re: hstore query: Any better idea than adding more memory?
Date
Msg-id 20111024004140.GX12765@tamriel.snowman.net
Whole thread Raw
In response to Re: hstore query: Any better idea than adding more memory?  (Stefan Keller <sfkeller@gmail.com>)
List pgsql-performance
* Stefan Keller (sfkeller@gmail.com) wrote:
> >> Adding more memory (say to total of 32 GB) would only postpone the problem.
> > Erm, seems like you're jumping to conclusions here...
>
> Sorry. I actually only wanted to report here what's special in my
> postgresql.conf.

My comment was referring to "postpone the problem".

> No: I simply meant doing the query when one can assume that the query
> result is not yet in the postgres' cache.
> You can check that here online: http://labs.geometa.info/postgisterminal

If it's not in PG's cache, and it's not in the OS's cache, then it's
gotta come from disk. :/

> But then what should the arrow ("->") wants to stand for?

Eh..  I wouldn't read the arrows as meaning all that much. :)  They're
there as a visual aide only, aiui.  Also, explain really shows the
*plan* that PG ended up picking for this query, thinking about it that
way might help.

> I would suggest that the inverse arrow would be more intuitive:

Perhaps, but don't get your hopes up about us breaking explain-reading
applications by changing that. :)

> But there are only periodic updates (currently once a night) and these
> are actually done by 1. truncating the database and 2. bulk loading
> all the stuff, then 3. reindexing.

Well, that would certainly help avoid bloat. :)

> If one tries to completely fit the whole data into memory, then to me
> PostgreSQL features borrowed from in-memory databases become
> interesting.

... huh?  I don't know of any system that's going to be able to make
sure that all your queries perform like in-memory queries when you don't
have enough memory to actually hold it all..

> => Is there anything else than "index-only scans" (planned for 9.2?)
> which could be of interest here?

index-only scans may be able to help with this as it may be able to
reduce the amount of disk i/o that has to be done, and reduce the amount
of memory needed to get everything into memory, but if you don't have
enough memory then you're still going to see a performance difference
between querying data that's cached and data that has to come from disk.

I don't know if index-only scans will, or will not, be able to help with
these specific queries.  I suspect they won't be much help since the
data being returned has to be in the index.  If I remember your query,
you were pulling out data which wasn't actaully in the index that was
being used to filter the result set.  Also, I don't know if we'll have
index-only scans for GIST/GIN indexes in 9.2 or if it won't be available
till a later release.  AIUI, only btree indexes can perform index-only
scans in the currently committed code.

Now, we've also been discussing ways to have PG automatically
re-populate shared buffers and possibly OS cache based on what was in
memory at the time of the last shut-down, but I'm not sure that would
help your case either since you're rebuilding everything every night and
that's what's trashing your buffers (because everything ends up getting
moved around).  You might actually want to consider if that's doing more
harm than good for you.  If you weren't doing that, then the cache
wouldn't be getting destroyed every night..

    Thanks,

        Stephen

Attachment

pgsql-performance by date:

Previous
From: Stefan Keller
Date:
Subject: Re: hstore query: Any better idea than adding more memory?
Next
From: Amitabh Kant
Date:
Subject: Usage of pg_stat_database