Thread: Re: BUG #2658: Query not using index
Hi, Adding DESC to both columns in the SORT BY did not make the query use the multikey index. So both SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; and SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid DESC, ts DESC; use the same query plans and both do sequential scans without using either the (assetid, ts) or (ts) indexes. Any otherideas on how to make this query use an index? Thanks, -- Graham Davis Refractions Research Inc. gdavis@refractions.net >On Wed, Sep 27, 2006 at 20:56:32 +0000, > Graham Davis <gdavis@refractions.net> wrote: > > >>SELECT assetid, max(ts) AS ts >>FROM asset_positions >>GROUP BY assetid; >> >>I have an index on (ts), another index on (assetid) and a multikey index on >>(assetid, ts). I know the assetid index is pointless since the multikey one >>takes its place, but I put it there while testing just to make sure. The >>ANALYZE EXPLAIN for this query is: >> >> QUERY PLAN >>---------------------------------------------------------------------------- >>------------------------------------------------------------- >> HashAggregate (cost=125423.96..125424.21 rows=20 width=12) (actual >>time=39693.995..39694.036 rows=20 loops=1) >> -> Seq Scan on asset_positions (cost=0.00..116654.64 rows=1753864 >>width=12) (actual time=20002.362..34724.896 rows=1738693 loops=1) >> Total runtime: 39694.245 ms >>(3 rows) >> >>You can see it is doing a sequential scan on the table when it should be >>using the (assetid, ts) index, or at the very least the (ts) index. This >>query takes about 40 seconds to complete with a table of 1.7 million rows. >>I tested running the query without the group by as follows: >> >> > > > >>SELECT DISTINCT ON (assetid) assetid, ts >>FROM asset_positions >>ORDER BY assetid, ts DESC; >> >> > >This is almost what you want to do to get an alternative plan. But you >need to ORDER BY assetid DESC, ts DESC to make use of the multicolumn >index. If you really need the other output order, reverse it in your >application or use the above as a subselect in another query that orders >by assetid ASC. > >
gdavis@refractions.net (Graham Davis) writes: > Adding DESC to both columns in the SORT BY did not make the query use > the multikey index. So both > > SELECT DISTINCT ON (assetid) assetid, ts > FROM asset_positions ORDER BY assetid, ts DESC; > > and > > SELECT DISTINCT ON (assetid) assetid, ts > FROM asset_positions ORDER BY assetid DESC, ts DESC; > > use the same query plans and both do sequential scans without using > either the (assetid, ts) or (ts) indexes. Any other ideas on how to > make this query use an index? Thanks, Why do you want to worsen performance by forcing the use of an index? You are reading through the entire table, after all, and doing so via a sequential scan is normally the fastest way to do that. An index scan would only be more efficient if you don't have enough space in memory to store all assetid values. -- (reverse (concatenate 'string "gro.mca" "@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/emacs.html Expect the unexpected. -- The Hitchhiker's Guide to the Galaxy, page 7023
The asset_positions table has about 1.7 million rows, and this query takes over 40 seconds to do a sequential scan. Initially I was trying to get the original query: SELECT assetid, max(ts) AS ts FROM asset_positions GROUP BY assetid; to use the multikey index since I read that PostgreSQL 8 added support for aggregates to use indexes. However, the GROUP BY was causing the query plan to not use any index (removing the GROUP by allowed the query to use the ts index and it took only 50 ms to run). Since I need the query to find the max time for EACH asset, I can't just drop the GROUP BY from my query. So I was trying some alternate ways of writing the query (as described in the below email) to force the use of one of these indexes. 40 seconds is much too slow for this query to run and I'm assuming that the use of an index will make it much faster (as seen when I removed the GROUP BY clause). Any tips? Graham. Chris Browne wrote: >gdavis@refractions.net (Graham Davis) writes: > > >>Adding DESC to both columns in the SORT BY did not make the query use >>the multikey index. So both >> >>SELECT DISTINCT ON (assetid) assetid, ts >>FROM asset_positions ORDER BY assetid, ts DESC; >> >>and >> >>SELECT DISTINCT ON (assetid) assetid, ts >>FROM asset_positions ORDER BY assetid DESC, ts DESC; >> >>use the same query plans and both do sequential scans without using >>either the (assetid, ts) or (ts) indexes. Any other ideas on how to >>make this query use an index? Thanks, >> >> > >Why do you want to worsen performance by forcing the use of an index? > >You are reading through the entire table, after all, and doing so via >a sequential scan is normally the fastest way to do that. An index >scan would only be more efficient if you don't have enough space in >memory to store all assetid values. > > -- Graham Davis Refractions Research Inc. gdavis@refractions.net
Also, the multikey index of (assetid, ts) would already be sorted and that is why using such an index in this case is faster than doing a sequential scan that does the sorting afterwards. Graham. Chris Browne wrote: >gdavis@refractions.net (Graham Davis) writes: > > >>Adding DESC to both columns in the SORT BY did not make the query use >>the multikey index. So both >> >>SELECT DISTINCT ON (assetid) assetid, ts >>FROM asset_positions ORDER BY assetid, ts DESC; >> >>and >> >>SELECT DISTINCT ON (assetid) assetid, ts >>FROM asset_positions ORDER BY assetid DESC, ts DESC; >> >>use the same query plans and both do sequential scans without using >>either the (assetid, ts) or (ts) indexes. Any other ideas on how to >>make this query use an index? Thanks, >> >> > >Why do you want to worsen performance by forcing the use of an index? > >You are reading through the entire table, after all, and doing so via >a sequential scan is normally the fastest way to do that. An index >scan would only be more efficient if you don't have enough space in >memory to store all assetid values. > > -- Graham Davis Refractions Research Inc. gdavis@refractions.net
gdavis@refractions.net (Graham Davis) writes: > 40 seconds is much too slow for this query to run and I'm assuming > that the use of an index will make it much faster (as seen when I > removed the GROUP BY clause). Any tips? Assumptions are dangerous things. An aggregate like this has *got to* scan the entire table, and given that that is the case, an index scan is NOT optimal; a seq scan is. An index scan is just going to be slower. -- let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" [name;tld];; http://cbbrowne.com/info/linux.html "The computer is the ultimate polluter: its feces are indistinguishable from the food it produces." -- Alan J. Perlis
How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. This is why SELECT max(ts) AS ts FROM asset_positions; Uses an index on the ts column and only takes 50 milliseconds. When I added the group by it would not use a multikey index or any other index. Is there just no support for aggregates to use multikey indexes? Sorry to be so pushy, but I just want to make sure I understand why the above query can use an index and the following can't: SELECT assetid, max(ts) AS ts FROM asset_positions GROUP BY assetid; -- Graham Davis Refractions Research Inc. gdavis@refractions.net Chris Browne wrote: >gdavis@refractions.net (Graham Davis) writes: > > >>40 seconds is much too slow for this query to run and I'm assuming >>that the use of an index will make it much faster (as seen when I >>removed the GROUP BY clause). Any tips? >> >> > >Assumptions are dangerous things. > >An aggregate like this has *got to* scan the entire table, and given >that that is the case, an index scan is NOT optimal; a seq scan is. > >An index scan is just going to be slower. > >
On Tue, Oct 03, 2006 at 12:13:43 -0700, Graham Davis <gdavis@refractions.net> wrote: > Also, the multikey index of (assetid, ts) would already be sorted and > that is why using such an index in this case is > faster than doing a sequential scan that does the sorting afterwards. That isn't necessarily true. The sequentional scan and sort will need a lot fewer disk seeks and could run faster than using an index scan that has the disk drives doing seeks for every tuple (in the worst case, where the on disk order of tuples doesn't match the order in the index). If your server is caching most of the blocks than the index scan might give better results. You might try disabling sequentional scans to try to coerce the other plan and see what results you get. If it is substantially faster the other way, then you might want to look at lowering the random page cost factor. However, since this can affect other queries you need to be careful that you don't speed up one query at the expense of a lot of other queries.
Graham Davis <gdavis@refractions.net> writes: > How come an aggreate like that has to use a sequential scan? I know > that PostgreSQL use to have to do a sequential scan for all aggregates, > but there was support added to version 8 so that aggregates would take > advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of "bug fix". regards, tom lane
Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: >Graham Davis <gdavis@refractions.net> writes: > > >>How come an aggreate like that has to use a sequential scan? I know >>that PostgreSQL use to have to do a sequential scan for all aggregates, >>but there was support added to version 8 so that aggregates would take >>advantage of indexes. >> >> > >Not in a GROUP BY context, only for the simple case. Per the comment in >planagg.c: > > * We don't handle GROUP BY, because our current implementations of > * grouping require looking at all the rows anyway, and so there's not > * much point in optimizing MIN/MAX. > >The problem is that using an index to obtain the maximum value of ts for >a given value of assetid is not the same thing as finding out what all >the distinct values of assetid are. > >This could possibly be improved but it would take a considerable amount >more work. It's definitely not in the category of "bug fix". > > regards, tom lane > > -- Graham Davis Refractions Research Inc. gdavis@refractions.net
Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: > Thanks Tom, that explains it and makes sense. I guess I will have to > accept this query taking 40 seconds, unless I can figure out another way > to write it so it can use indexes. If there are any more syntax > suggestions, please pass them on. Thanks for the help everyone. > > Graham. > > > Tom Lane wrote: > > >Graham Davis <gdavis@refractions.net> writes: > > > > > >>How come an aggreate like that has to use a sequential scan? I know > >>that PostgreSQL use to have to do a sequential scan for all aggregates, > >>but there was support added to version 8 so that aggregates would take > >>advantage of indexes. > >> > >> > > > >Not in a GROUP BY context, only for the simple case. Per the comment in > >planagg.c: > > > > * We don't handle GROUP BY, because our current implementations of > > * grouping require looking at all the rows anyway, and so there's not > > * much point in optimizing MIN/MAX. > > > >The problem is that using an index to obtain the maximum value of ts for > >a given value of assetid is not the same thing as finding out what all > >the distinct values of assetid are. > > > >This could possibly be improved but it would take a considerable amount > >more work. It's definitely not in the category of "bug fix". > > > > regards, tom lane > > > > > >
The "summary table" approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: >Have you looked into a materialized view sort of approach? You could >create a table which had assetid as a primary key, and max_ts as a >column. Then use triggers to keep that table up to date as rows are >added/updated/removed from the main table. > >This approach would only make sense if there were far fewer distinct >assetid values than rows in the main table, and would get slow if you >commonly delete rows from the main table or decrease the value for ts in >the row with the highest ts for a given assetid. > >-- Mark Lewis > >On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: > > >>Thanks Tom, that explains it and makes sense. I guess I will have to >>accept this query taking 40 seconds, unless I can figure out another way >>to write it so it can use indexes. If there are any more syntax >>suggestions, please pass them on. Thanks for the help everyone. >> >>Graham. >> >> >>Tom Lane wrote: >> >> >> >>>Graham Davis <gdavis@refractions.net> writes: >>> >>> >>> >>> >>>>How come an aggreate like that has to use a sequential scan? I know >>>>that PostgreSQL use to have to do a sequential scan for all aggregates, >>>>but there was support added to version 8 so that aggregates would take >>>>advantage of indexes. >>>> >>>> >>>> >>>> >>>Not in a GROUP BY context, only for the simple case. Per the comment in >>>planagg.c: >>> >>> * We don't handle GROUP BY, because our current implementations of >>> * grouping require looking at all the rows anyway, and so there's not >>> * much point in optimizing MIN/MAX. >>> >>>The problem is that using an index to obtain the maximum value of ts for >>>a given value of assetid is not the same thing as finding out what all >>>the distinct values of assetid are. >>> >>>This could possibly be improved but it would take a considerable amount >>>more work. It's definitely not in the category of "bug fix". >>> >>> regards, tom lane >>> >>> >>> >>> >> >> -- Graham Davis Refractions Research Inc. gdavis@refractions.net
Hmmm. How many distinct assetids are there? -- Mark Lewis On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: > The "summary table" approach maintained by triggers is something we are > considering, but it becomes a bit more complicated to implement. > Currently we have groups of new positions coming in every few seconds or > less. They are not guaranteed to be in order. So for instance, a group > of positions from today could come in and be inserted, then a group of > positions that got lost from yesterday could come in and be inserted > afterwards. > > This means the triggers would have to do some sort of logic to figure > out if the newly inserted position is actually the most recent by > timestamp. If positions are ever deleted or updated, the same sort of > query that is currently running slow will need to be executed in order > to get the new most recent position. So there is the possibility that > new positions can be inserted faster than the triggers can calculate > and maintain the summary table. There are some other complications > with maintaining such a summary table in our system too, but I won't get > into those. > > Right now I'm just trying to see if I can get the query itself running > faster, which would be the easiest solution for now. > > Graham. > > > Mark Lewis wrote: > > >Have you looked into a materialized view sort of approach? You could > >create a table which had assetid as a primary key, and max_ts as a > >column. Then use triggers to keep that table up to date as rows are > >added/updated/removed from the main table. > > > >This approach would only make sense if there were far fewer distinct > >assetid values than rows in the main table, and would get slow if you > >commonly delete rows from the main table or decrease the value for ts in > >the row with the highest ts for a given assetid. > > > >-- Mark Lewis > > > >On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: > > > > > >>Thanks Tom, that explains it and makes sense. I guess I will have to > >>accept this query taking 40 seconds, unless I can figure out another way > >>to write it so it can use indexes. If there are any more syntax > >>suggestions, please pass them on. Thanks for the help everyone. > >> > >>Graham. > >> > >> > >>Tom Lane wrote: > >> > >> > >> > >>>Graham Davis <gdavis@refractions.net> writes: > >>> > >>> > >>> > >>> > >>>>How come an aggreate like that has to use a sequential scan? I know > >>>>that PostgreSQL use to have to do a sequential scan for all aggregates, > >>>>but there was support added to version 8 so that aggregates would take > >>>>advantage of indexes. > >>>> > >>>> > >>>> > >>>> > >>>Not in a GROUP BY context, only for the simple case. Per the comment in > >>>planagg.c: > >>> > >>> * We don't handle GROUP BY, because our current implementations of > >>> * grouping require looking at all the rows anyway, and so there's not > >>> * much point in optimizing MIN/MAX. > >>> > >>>The problem is that using an index to obtain the maximum value of ts for > >>>a given value of assetid is not the same thing as finding out what all > >>>the distinct values of assetid are. > >>> > >>>This could possibly be improved but it would take a considerable amount > >>>more work. It's definitely not in the category of "bug fix". > >>> > >>> regards, tom lane > >>> > >>> > >>> > >>> > >> > >> > >
Not many. It fluctuates, but there are usually only ever a few hundred at most. Each assetid has multi-millions of positions though. Mark Lewis wrote: >Hmmm. How many distinct assetids are there? >-- Mark Lewis > >On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: > > >>The "summary table" approach maintained by triggers is something we are >>considering, but it becomes a bit more complicated to implement. >>Currently we have groups of new positions coming in every few seconds or >>less. They are not guaranteed to be in order. So for instance, a group >>of positions from today could come in and be inserted, then a group of >>positions that got lost from yesterday could come in and be inserted >>afterwards. >> >>This means the triggers would have to do some sort of logic to figure >>out if the newly inserted position is actually the most recent by >>timestamp. If positions are ever deleted or updated, the same sort of >>query that is currently running slow will need to be executed in order >>to get the new most recent position. So there is the possibility that >>new positions can be inserted faster than the triggers can calculate >>and maintain the summary table. There are some other complications >>with maintaining such a summary table in our system too, but I won't get >>into those. >> >>Right now I'm just trying to see if I can get the query itself running >>faster, which would be the easiest solution for now. >> >>Graham. >> >> >>Mark Lewis wrote: >> >> >> >>>Have you looked into a materialized view sort of approach? You could >>>create a table which had assetid as a primary key, and max_ts as a >>>column. Then use triggers to keep that table up to date as rows are >>>added/updated/removed from the main table. >>> >>>This approach would only make sense if there were far fewer distinct >>>assetid values than rows in the main table, and would get slow if you >>>commonly delete rows from the main table or decrease the value for ts in >>>the row with the highest ts for a given assetid. >>> >>>-- Mark Lewis >>> >>>On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: >>> >>> >>> >>> >>>>Thanks Tom, that explains it and makes sense. I guess I will have to >>>>accept this query taking 40 seconds, unless I can figure out another way >>>>to write it so it can use indexes. If there are any more syntax >>>>suggestions, please pass them on. Thanks for the help everyone. >>>> >>>>Graham. >>>> >>>> >>>>Tom Lane wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Graham Davis <gdavis@refractions.net> writes: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>How come an aggreate like that has to use a sequential scan? I know >>>>>>that PostgreSQL use to have to do a sequential scan for all aggregates, >>>>>>but there was support added to version 8 so that aggregates would take >>>>>>advantage of indexes. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Not in a GROUP BY context, only for the simple case. Per the comment in >>>>>planagg.c: >>>>> >>>>> * We don't handle GROUP BY, because our current implementations of >>>>> * grouping require looking at all the rows anyway, and so there's not >>>>> * much point in optimizing MIN/MAX. >>>>> >>>>>The problem is that using an index to obtain the maximum value of ts for >>>>>a given value of assetid is not the same thing as finding out what all >>>>>the distinct values of assetid are. >>>>> >>>>>This could possibly be improved but it would take a considerable amount >>>>>more work. It's definitely not in the category of "bug fix". >>>>> >>>>> regards, tom lane >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> -- Graham Davis Refractions Research Inc. gdavis@refractions.net
Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN
A few hundred is quite a lot for the next proposal and it's kind of an ugly one, but might as well throw the idea out since you never know. Have you considered creating one partial index per assetid? Something along the lines of "CREATE INDEX asset_index_N ON asset_positions(ts) WHERE assetid=N"? I'd guess that the planner probably wouldn't be smart enough to use the partial indexes unless you issued a separate query for each assetid, but each one of those queries should be really fast. Of course, this is all assuming that PG knows how to use partial indexes to satisfy MAX queries; I'm not sure if it does. -- Mark Lewis On Tue, 2006-10-03 at 14:35 -0700, Graham Davis wrote: > Not many. It fluctuates, but there are usually only ever a few hundred > at most. Each assetid has multi-millions of positions though. > > Mark Lewis wrote: > > >Hmmm. How many distinct assetids are there? > >-- Mark Lewis > > > >On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: > > > > > >>The "summary table" approach maintained by triggers is something we are > >>considering, but it becomes a bit more complicated to implement. > >>Currently we have groups of new positions coming in every few seconds or > >>less. They are not guaranteed to be in order. So for instance, a group > >>of positions from today could come in and be inserted, then a group of > >>positions that got lost from yesterday could come in and be inserted > >>afterwards. > >> > >>This means the triggers would have to do some sort of logic to figure > >>out if the newly inserted position is actually the most recent by > >>timestamp. If positions are ever deleted or updated, the same sort of > >>query that is currently running slow will need to be executed in order > >>to get the new most recent position. So there is the possibility that > >>new positions can be inserted faster than the triggers can calculate > >>and maintain the summary table. There are some other complications > >>with maintaining such a summary table in our system too, but I won't get > >>into those. > >> > >>Right now I'm just trying to see if I can get the query itself running > >>faster, which would be the easiest solution for now. > >> > >>Graham. > >> > >> > >>Mark Lewis wrote: > >> > >> > >> > >>>Have you looked into a materialized view sort of approach? You could > >>>create a table which had assetid as a primary key, and max_ts as a > >>>column. Then use triggers to keep that table up to date as rows are > >>>added/updated/removed from the main table. > >>> > >>>This approach would only make sense if there were far fewer distinct > >>>assetid values than rows in the main table, and would get slow if you > >>>commonly delete rows from the main table or decrease the value for ts in > >>>the row with the highest ts for a given assetid. > >>> > >>>-- Mark Lewis > >>> > >>>On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: > >>> > >>> > >>> > >>> > >>>>Thanks Tom, that explains it and makes sense. I guess I will have to > >>>>accept this query taking 40 seconds, unless I can figure out another way > >>>>to write it so it can use indexes. If there are any more syntax > >>>>suggestions, please pass them on. Thanks for the help everyone. > >>>> > >>>>Graham. > >>>> > >>>> > >>>>Tom Lane wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>Graham Davis <gdavis@refractions.net> writes: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>How come an aggreate like that has to use a sequential scan? I know > >>>>>>that PostgreSQL use to have to do a sequential scan for all aggregates, > >>>>>>but there was support added to version 8 so that aggregates would take > >>>>>>advantage of indexes. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>Not in a GROUP BY context, only for the simple case. Per the comment in > >>>>>planagg.c: > >>>>> > >>>>> * We don't handle GROUP BY, because our current implementations of > >>>>> * grouping require looking at all the rows anyway, and so there's not > >>>>> * much point in optimizing MIN/MAX. > >>>>> > >>>>>The problem is that using an index to obtain the maximum value of ts for > >>>>>a given value of assetid is not the same thing as finding out what all > >>>>>the distinct values of assetid are. > >>>>> > >>>>>This could possibly be improved but it would take a considerable amount > >>>>>more work. It's definitely not in the category of "bug fix". > >>>>> > >>>>> regards, tom lane > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > >
Like many descent RDBMS, Postgresql server allocates its own shared memory area where data is cached in. When receiving a query request, Postgres engine checks first its shared memory buffers, if not found, the engine performs disk I/Os to retrieve data from PostgreSQL data files and place it in the shared buffer area before serving it back to the client. Blocks in the shared buffers are shared by other sessions and can therefore be possibly accessed by other sessions. Postgresql shared buffers can be allocated by setting the postgresql.conf parameter namely, shared_buffers. Sincerely, -- Husam -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 2:49 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL Caching Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster ********************************************************************** This message contains confidential information intended only for the use of the addressee(s) named above and may containinformation that is legally privileged. If you are not the addressee, or the person responsible for delivering itto the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictlyprohibited. If you have received this message by mistake, please immediately notify us by replying to the messageand delete the original message immediately thereafter. Thank you. FADLD Tag **********************************************************************
Mark Lewis <mark.lewis@mir3.com> writes: > Have you considered creating one partial index per assetid? Something > along the lines of "CREATE INDEX asset_index_N ON asset_positions(ts) > WHERE assetid=N"? I'd guess that the planner probably wouldn't be smart > enough to use the partial indexes unless you issued a separate query for > each assetid, but each one of those queries should be really fast. Actually, a single index on (assetid, ts) is sufficient to handle select max(ts) from asset_positions where assetid = constant The problem is to know what values of "constant" to issue the query for, and this idea doesn't seem to help with that. If Graham is willing to assume that the set of assetids changes slowly, perhaps he could keep a summary table that contains all the valid assetids (or maybe there already is such a table? is assetid a foreign key?) and do select pk.assetid, (select max(ts) from asset_positions where assetid = pk.assetid) from other_table pk; I'm pretty sure the subselect would be planned the way he wants. regards, tom lane
Thanks, I wonder these ; * When any session updates the data that allready in shared buffer, does Postgres sychronize the data both disk and shared buffers area immediately ? * Does postgres cache SQL execution plan analyze results in memory to use for other sessions ? For example ; When session A execute "SELECT * FROM tab WHERE col1 = val1 AND col2 = val2", does postgres save the parser/optimizer result in memory in order to use by other session to prevent duplicate execution of parser and optimizer so therefore get time ?. Because an execution plan is created before.. Sincenerly Adnan DURSUN ----- Original Message ----- From: "Tomeh, Husam" <htomeh@firstam.com> To: "Adnan DURSUN" <a_dursun@hotmail.com>; <pgsql-performance@postgresql.org> Sent: Wednesday, October 04, 2006 1:11 AM Subject: Re: [PERFORM] PostgreSQL Caching Like many descent RDBMS, Postgresql server allocates its own shared memory area where data is cached in. When receiving a query request, Postgres engine checks first its shared memory buffers, if not found, the engine performs disk I/Os to retrieve data from PostgreSQL data files and place it in the shared buffer area before serving it back to the client. Blocks in the shared buffers are shared by other sessions and can therefore be possibly accessed by other sessions. Postgresql shared buffers can be allocated by setting the postgresql.conf parameter namely, shared_buffers. Sincerely, -- Husam -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 2:49 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL Caching Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster ********************************************************************** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ********************************************************************** ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
>> * When any session updates the data that already in shared buffer, >>does Postgres synchronize the data both disk and shared buffers area >> immediately ? Not necessarily true. When a block is modified in the shared buffers, the modified block is written to the Postgres WAL log. A periodic DB checkpoint is performed to flush the modified blocks in the shared buffers to the data files. >> * Does postgres cache SQL execution plan analyze results in memory >> to use for other sessions ? For example ; >> When session A execute "SELECT * FROM tab WHERE col1 = val1 AND col2 >> = val2", does postgres save the parser/optimizer result in memory in order >> to use by other session to prevent duplicate execution of parser >> and optimizer so therefore get time ?. Because an execution plan is created >> before.. Query plans are not stored in the shared buffers and therefore can not be re-used by other sessions. They're only cached by the connection on a session level. Sincerely, -- Husam -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 4:53 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL Caching Thanks, I wonder these ; * When any session updates the data that allready in shared buffer, does Postgres sychronize the data both disk and shared buffers area immediately ? * Does postgres cache SQL execution plan analyze results in memory to use for other sessions ? For example ; When session A execute "SELECT * FROM tab WHERE col1 = val1 AND col2 = val2", does postgres save the parser/optimizer result in memory in order to use by other session to prevent duplicate execution of parser and optimizer so therefore get time ?. Because an execution plan is created before.. Sincenerly Adnan DURSUN ----- Original Message ----- From: "Tomeh, Husam" <htomeh@firstam.com> To: "Adnan DURSUN" <a_dursun@hotmail.com>; <pgsql-performance@postgresql.org> Sent: Wednesday, October 04, 2006 1:11 AM Subject: Re: [PERFORM] PostgreSQL Caching Like many descent RDBMS, Postgresql server allocates its own shared memory area where data is cached in. When receiving a query request, Postgres engine checks first its shared memory buffers, if not found, the engine performs disk I/Os to retrieve data from PostgreSQL data files and place it in the shared buffer area before serving it back to the client. Blocks in the shared buffers are shared by other sessions and can therefore be possibly accessed by other sessions. Postgresql shared buffers can be allocated by setting the postgresql.conf parameter namely, shared_buffers. Sincerely, -- Husam -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Adnan DURSUN Sent: Tuesday, October 03, 2006 2:49 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] PostgreSQL Caching Hi, I wonder how PostgreSQL caches the SQL query results. For example ; * does postgres cache query result in memory that done by session A ? * does session B use these results ? Best Regards Adnan DURSUN ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster ********************************************************************** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ********************************************************************** ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
----- Original Message ----- From: "Tomeh, Husam" <htomeh@firstam.com> To: "Adnan DURSUN" <a_dursun@hotmail.com>; <pgsql-performance@postgresql.org> Sent: Wednesday, October 04, 2006 4:29 AM Subject: RE: [PERFORM] PostgreSQL Caching >Query plans are not stored in the shared buffers and therefore can not >be re-used by other sessions. They're only cached by the connection on a >session level. Ok. i see. thanks..So that means that a stored object execution plan saved before is destroyed from memory after it was altered or dropped by any session. Is that true ? And last one :-) i want to be can read an execution plan when i look at it. So, is there any doc about how it should be read ? Sincenerly ! Adnan DURSUN
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > Adnan DURSUN > i want to be can read an execution plan when > i look at it. > So, is there any doc about how it should be read ? You are asking how to read the output from EXPLAIN? This page is a good place to start: http://www.postgresql.org/docs/8.1/interactive/performance-tips.html
On Wed, 2006-10-04 at 07:38 -0500, Dave Dutcher wrote: > > -----Original Message----- > > From: pgsql-performance-owner@postgresql.org > > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > > Adnan DURSUN > > i want to be can read an execution plan when > > i look at it. > > So, is there any doc about how it should be read ? > > > You are asking how to read the output from EXPLAIN? This page is a good > place to start: > > http://www.postgresql.org/docs/8.1/interactive/performance-tips.html Robert Treat's Explaining Explain presentation from OSCON is also very good: http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf#search=%22%22explaining%20explain%22%22 Brad.
On Tue, 2006-10-03 at 18:29 -0700, Tomeh, Husam wrote: > >> * When any session updates the data that already in shared > buffer, > >>does Postgres synchronize the data both disk and shared buffers area > >> immediately ? > > Not necessarily true. When a block is modified in the shared buffers, > the modified block is written to the Postgres WAL log. A periodic DB > checkpoint is performed to flush the modified blocks in the shared > buffers to the data files. Postgres 8.0 and beyond have a process called bgwriter that continually flushes dirty buffers to disk, to minimize the work that needs to be done at checkpoint time.