Re: hstore query: Any better idea than adding more memory? - Mailing list pgsql-performance

From Stefan Keller
Subject Re: hstore query: Any better idea than adding more memory?
Date
Msg-id CAFcOn2_hZ9Rz0bsgcukhCPx28noChf=qDJgQMY=yxLczmbNM7A@mail.gmail.com
Whole thread Raw
In response to Re: hstore query: Any better idea than adding more memory?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: hstore query: Any better idea than adding more memory?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-performance
Hi Stephen

Thanks for your answer and hints.

2011/10/24 Stephen Frost <sfrost@snowman.net> wrote:
> * Stefan Keller (sfkeller@gmail.com) wrote:
>> Adding more memory (say to total of 32 GB) would only postpone the problem.
> Erm, seems like you're jumping to conclusions here...

Sorry. I actually only wanted to report here what's special in my
postgresql.conf.

>> First time the query lasts about 10 time longer (~ 1010 ms) - but I'd
>> like to get better results already in the first query.
>
> Do you mean first time after a database restart?

No: I simply meant doing the query when one can assume that the query
result is not yet in the postgres' cache.
You can check that here online: http://labs.geometa.info/postgisterminal

>> => 1. When I add the "actual time" from EXPLAIN above, I get 11 + 10 +
>> 10ms which is three times greater than the 11ms reported.  Why?
>
> Because they include the times from the nodes under them.
>
>> => 2. Why does the planner choose to sort first instead of sorting the
>> (smaller)  result query at the end the?
>
> You're reading the explain 'backwards' regarding time..  It *does* do
> the sort last.  Nodes which are indented feed the nodes above them, so
> the bitmap index scan and recheck feed into the sort, hence the sort is
> actually done after.  Can't really work any other way anyway, PG has to
> get the data before it can sort it..

Oh, thanks. I should have realized that.

But then what should the arrow ("->") wants to stand for?
Sort (cost=30819.51...
  ->  Bitmap Heap Scan on osm_point  (cost=313.21...
      ->  Bitmap Index Scan on osm_point_tags_idx

I would suggest that the inverse arrow would be more intuitive:
Sort (cost=30819.51...
  <-  Bitmap Heap Scan on osm_point  (cost=313.21...
      <-  Bitmap Index Scan on osm_point_tags_idx

>> => 3. What could I do to speed up such queries (first time, i.e.
>> without caching) besides simply adding more memory?
>
> There didn't look like anything there that could really be done much
> faster, at the plan level.  It's not uncommon for people to
> intentionally get a box with more memory than the size of their
> database, so everything is in memory.
>
> At the end of the day, if the blocks aren't in memory then PG has to get
> them from disk.  If disk is slow, the query is going to be slow.  Now,
> hopefully, you're hitting this table often enough with similar queries
> that important, common, parts of the table and index are already in
> memory, but there's no magic PG can perform to ensure that.
>
> If there's a lot of updates/changes to this table, you might check if
> there's a lot of bloat (check_postgres works great for this..).
> Eliminating excessive bloat, if there is any, could help with all
> accesses to that table, of course, since it would reduce the amount of
> data which would need to be.

Thanks for the hint.

But there are only periodic updates (currently once a night) and these
are actually done by 1. truncating the database and 2. bulk loading
all the stuff, then 3. reindexing.

If one tries to completely fit the whole data into memory, then to me
PostgreSQL features borrowed from in-memory databases become
interesting.

=> Is there anything else than "index-only scans" (planned for 9.2?)
which could be of interest here?

Stefan

pgsql-performance by date:

Previous
From: Stephen Frost
Date:
Subject: Re: hstore query: Any better idea than adding more memory?
Next
From: Stephen Frost
Date:
Subject: Re: hstore query: Any better idea than adding more memory?