Thread: Page access pattern in query plan using index scan
Suppose I have a table as follows: testdb=> \d person Table "public.person" Column | Type | Modifiers ------------+-------------------------+----------- id | integer | not null age | integer | other_info | character varying(1000) | Indexes: person_pkey primary key btree (id), idx_person_age btree (age) Here is the execution plan for a not very selective query on age: testdb=> explain select * from person where age between 30 and 40; QUERY PLAN -------------------------------------------------------------------------------- Index Scan using idx_person_age on person (cost=0.00..17.08 rows=5 width=524) Index Cond: ((age >= 30) AND (age <= 40)) (2 rows) What is the pattern of access to data pages? I can think of two likely answers: 1) The index is scanned for ages 30 through 40. As each index entry is scanned, a row is retrieved. 2) The index is scanned for ages 30 and 40. The index entries are sorted so that rows on the same page are grouped together, increasing the odds that a needed page will be present in the shared buffers. I'm using 7.3.4, and will be upgrading to 7.4.2 soon. Jack Orenstein
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote: > What is the pattern of access to data pages? I can think of two likely > answers: > > 1) The index is scanned for ages 30 through 40. As each index entry is > scanned, a row is retrieved. This one. There have been noises about doing the second, but it's non trivial and there's no hacker currently working on it. Not to be expected on the next version, I'd think. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Licensee shall have no right to use the Licensed Software for productive or commercial use. (Licencia de StarOffice 6.0 beta)
Alvaro Herrera wrote: > On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote: > > >>What is the pattern of access to data pages? I can think of two likely >>answers: >> >>1) The index is scanned for ages 30 through 40. As each index entry is >>scanned, a row is retrieved. > > > This one. There have been noises about doing the second, but it's non > trivial and there's no hacker currently working on it. Not to be > expected on the next version, I'd think. Can you recommend an application-level workaround that will access pages in the right order? Jack Orenstein
On Wed, Jun 02, 2004 at 11:22:53PM -0400, Jack Orenstein wrote: > Alvaro Herrera wrote: > >On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote: > > > > > >>What is the pattern of access to data pages? I can think of two likely > >>answers: > >> > >>1) The index is scanned for ages 30 through 40. As each index entry is > >>scanned, a row is retrieved. > > > >This one. There have been noises about doing the second, but it's non > >trivial and there's no hacker currently working on it. Not to be > >expected on the next version, I'd think. > > My naive guess about this is that you read the index entries, each one > contains a page number, and you sort by the page number. The set of > index entries could be large, requiring a potentially expensive sort. > I don't know enough about query plan internals to know if this sort of > plan can even be expressed currently, so maybe a major extension would > be needed. And then the optimizer would have to be extended to > consider the two approaches. Is this why you say the problem is > non-trivial, or is there some additional set of issues that I'm > missing? Part of the problem is that the sort is not expressable in the current code. Another part is that this kind of index scan is different and violates some assumptions of the current indexscan, so it's not best in all scenarios. Several planner/optimizer improvements are needed for this to work. This has been discussed several times on the hackers' list. If you want to read them, search for "bitmap indexes" on http://www.pgsql.ru. Note that this seems to be different from what other RDBMSs consider a bitmap index to be. Regarding your other question: I don't know any application workaround. There probably isn't any (unless you know how the tuples are sorted on the heap, which is quite unlikely). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No hay cielo posible sin hundir nuestras raíces en la profundidad de la tierra" (Malucha Pinto)
Jack Orenstein <jao@geophile.com> writes: > Alvaro Herrera wrote: > > > On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote: > > > > >What is the pattern of access to data pages? I can think of two likely > > >answers: > > > > > >1) The index is scanned for ages 30 through 40. As each index entry is > > >scanned, a row is retrieved. > > > > This one. There have been noises about doing the second, but it's non > > trivial and there's no hacker currently working on it. Not to be > > expected on the next version, I'd think. I think the main problem is dealing with locking issues. As long as the index scan is using the tuples it sees right away it doesn't care if someone else changes some tuple it hasn't gotten to yet or has already seen. If you read them all and then sort them it's possible that while you're dealing with your local copy that someone else comes along and does something that you would want to know about. Actually I'm not exactly clear on what types of changes would cause problems, and thinking about it now I don't see how this is any different than a materialize or other plan that stores intermediate results. So there's something I'm missing. > Can you recommend an application-level workaround that will access pages > in the right order? Well you could try CLUSTER-ing your table on that index. Alternatively you could try partitioning your table so the optimizer can use a sequential scan to efficiently get the records you want. But there's no native support for partitioning so you would have to roll your own entirely manually. That would mean e.g. having tables cheques_jan, cheques_feb, cheques_mar,... And a union of all the months. Any query that only spanned one month would use the monthly table so it could do a sequential scan of just that month. I don't recommend this unless you have a _lot_ of data that you often deal with in particular chunks. Even then unless you're purging and archiving data in those chunks it's probably still not worth it. -- greg
Greg Stark wrote: >>Can you recommend an application-level workaround that will access pages >>in the right order? > > > Well you could try CLUSTER-ing your table on that index. For my application, the problem with CLUSTER is that it reclusters the entire table. So as the table grows, the cost of CLUSTER goes up. What would be really nice is something like "cluster recent". The idea is to cluster only rows added since the last cluster. This amortizes the time and storage costs of clustering across the invocations of cluster in a much more palatable way. If such a command were used this wisely, (e.g. whenever the table grows by 20%) then I suspect applications would obtain most of the benefit of a sort following an index lookup. (And I'm guessing this is a much simpler development task.) Admittedly, this would not work so well in cases where the index keys change in value. But this would be a huge win in situations (like mine) where the keys never change. Jack Orenstein
Jack Orenstein <jao@geophile.com> writes: > What would be really nice is something like "cluster recent". The idea is to > cluster only rows added since the last cluster. And you would put them where? regards, tom lane