Re: [INTERFACES] need information about storing methods - Mailing list pgsql-general
From | Gene Selkov Jr. |
---|---|
Subject | Re: [INTERFACES] need information about storing methods |
Date | |
Msg-id | 199811111655.KAA26844@antares.mcs.anl.gov Whole thread Raw |
List | pgsql-general |
(copying to GENERAL because that's where it seems to belong) > Hi, > I'm nowadays tryng to use postgresql to store geographical data to use > them with a gis program (GRASS). > The coordinates of each point are stored in a field that is a > point-type. > Anyone can tell me if postgress uses, for this kind of data, a storing > method like HHCODE, poin-quad-trees (or anything better) , or it stores > point-data like anything else? > A query like this one: > " SELECT coordinates FROM table WHERE coordinates @ '((1,1),1)'::circle > " > is executed like anything else, processing evry tuple in the table ? > If so, anyone knows how to implement a more complex storing method? > > Thanks to all, > Christian Saldarini Let's clarify the terms first: The only STORAGE METHOD used by postgres is a regular file or a set of files. Objects are stored and retrieved via some sort of mapping between oids and file pointers. So the answer to the first part of your question is that all types of objects (except large objects) are stored in the same way -- that is, as chunks of data within files. ACCESS METHOD, in postgres lingo, is a way to speed up access to individual objects by recording their placement within the physical files (a.k.a. indexing) in a structured way. It generally takes significantly less steps for a query to traverse an index structure down to individual objects, than it would take to iterate through the hole set. There is a class of access methods whose index structures look like trees (so do the query execution paths). Those implemented in postgres are B-tree and two-dimensional R-tree. Both methods rely on a hierarchical grouping of the individual objects into broader categories, akin to the indexing used by librarians to arrange books in an orderly fashion. Categories in B-tree are numeric intervals and in R-tree, they are box unions (majoritarian boxes). Each step of a query path involves sequential comparison, or evaluation of fitness, of the target object (criterion) against just a few broad meta-objects. Once the decision of fitness is made at a certain level, the query sinks one level below, but this time it compares the target object to only those meta-objects which are confined within the meta-object that satisfied the criterion one level above. That way, it works all the way down to the original objects without making (too many) futile comparisons. Closer to the point, > A query like this one: > " SELECT coordinates FROM table WHERE coordinates @ '((1,1),1)'::circle will be evaluated sequentially, provided 'coordinates' is a point and there is an operator '@' for (point, circle) arguments. The reason is that the built-in version of R-tree can only index a set of objects of the same kind, and these objects are boxes. You can improve your query like this: SELECT * FROM t WHERE p @ '0,0,2,2' AND p::circle @ '((1,1),1)'::circle; where p is a box type column that contains zero-area boxes representing points, like so: (0.603,0.378),(0.603,0.378) This query uses an r-tree index: CREATE INDEX pix ON t USING rtree (p box_ops); to first reduce the set to the majoritarian box, '0,0,2,2', and then scan that box for the points inside the circle: explain SELECT * FROM t WHERE p @ '0,0,2,2' AND p::circle @ '((1,1),1)'::circle; NOTICE: QUERY PLAN: Index Scan using pix on t (cost=85.00 size=501 width=32) For further improvement, read about GiST. It is a generalization of R-tree that allows you build indices for all objectsthat can be grouped based on the concept of containment. GiST allows you to build both exact and lossy indices. Anexact index drives you right to the leaves of the search tree. For objects like polygons, which do not cope well with theconcept of containment, majoritarian entities (e.g., boxes) can be used to build R-trees. Such is the lossy index: itleaves out some of the information about the original object (e.g., the number of polygon vertices and their arrangement),but it still remebers some kind of proximity. It will require additional sequential examination of polygons,but in the end, performance of even a lossy index will typically be better that that of a completely sequentialsearch. I think the above query illustrates the concept of lossy indexing well enough. You might want to see http://wit.mcs.anl.gov/~selkovjr/pggist-patched.tgz (you will find more references to the original source inside) For more sophisticated indexing methods, check out Joe Hellerstein's curriculum: http://epoch.cs.berkeley.edu:8000/personal/jmh/ --Gene
pgsql-general by date: