Re: GiST, caching, and consistency - Mailing list pgsql-performance
From | Matthew Wakeling |
---|---|
Subject | Re: GiST, caching, and consistency |
Date | |
Msg-id | alpine.DEB.2.00.0908051104590.18938@aragorn.flymine.org Whole thread Raw |
In response to | Re: GiST, caching, and consistency (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: GiST, caching, and consistency
(Robert Haas <robertmhaas@gmail.com>)
Re: GiST, caching, and consistency (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
On Wed, 5 Aug 2009, Greg Stark wrote: > On Tue, Aug 4, 2009 at 11:56 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> Beats me. It looks like the first few queries are pulling stuff into >> cache, and then after that it settles down, but I'm not sure why it >> takes 5 repetitions to do that. Is the plan changing? > > Yeah, we're just guessing without the explain analyze output. > > But as long as we're guessing, perhaps it's doing a sequential scan on > one of the tables and each query is reading in new parts of the table > until the whole table is in cache. Is this a machine with lots of RAM > but a small setting for shared_buffers? modmine_overlap_test=# explain analyse select count(*) from (select * FROM locatedsequencefeatureoverlappingfeatures limit 1000000) AS a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=478847.24..478847.25 rows=1 width=0) (actual time=27546.424..27546.428 rows=1 loops=1) -> Limit (cost=0.01..466347.23 rows=1000000 width=8) (actual time=0.104..24349.407 rows=1000000 loops=1) -> Nested Loop (cost=0.01..9138533.31 rows=19595985 width=8) (actual time=0.099..17901.571 rows=1000000 loops=1) Join Filter: (l1.subjectid <> l2.subjectid) -> Seq Scan on location l1 (cost=0.00..90092.22 rows=4030122 width=16) (actual time=0.013..11.467 rows=3396 loops=1) -> Index Scan using location_object_bioseg on location l2 (cost=0.01..1.46 rows=35 width=16) (actual time=0.130..3.339 rows=295 loops=3396) Index Cond: ((l2.objectid = l1.objectid) AND (bioseg_create(l1.intermine_start, l1.intermine_end) &&bioseg_create(l2.intermine_start, l2.intermine_end))) Total runtime: 27546.534 ms (8 rows) Time: 27574.164 ms It is certainly doing a sequential scan. So are you saying that it will start a sequential scan from a different part of the table each time, even in the absence of other simultaneous sequential scans? Looks like I'm going to have to remove the limit to get sensible results - I only added that to make the query return in a sensible time for performance testing. Some trivial testing with "select * from location limit 10;" indicates that it starts the sequential scan in the same place each time - but is this different from the above query? To answer your question: shared_buffers = 450MB Machine has 16GB or RAM The location table is 389 MB The location_object_bioseg index is 182 MB Matthew -- What goes up must come down. Ask any system administrator.
pgsql-performance by date: