Services
  • 24×7×365 Technical Support
  • Migration to PostgreSQL
  • High Availability Deployment
  • Database Audit
  • Remote DBA for PostgreSQL
Products
  • Postgres Pro Enterprise
  • Postgres Pro Standard
  • Cloud Solutions
  • Postgres Extensions
Resources
  • Blog
  • Documentation
  • Webinars
  • Videos
  • Presentations
Community
  • Events
  • Training Courses
  • Books
  • Demo Database
  • Mailing List Archives
About
  • Leadership team
  • Partners
  • Customers
  • In the News
  • Press Releases
  • Press Info

Facebook
Postgres Pro
  • Services
    24×7×365 Technical Support Migration to PostgreSQL High Availability Deployment Database Audit Remote DBA for PostgreSQL
  • Products
    Postgres Pro Enterprise Postgres Pro Standard Cloud Solutions Postgres Extensions
  • Resources
    Blog Documentation Webinars Videos Presentations
  • Community
    Events Training Courses Books Demo Database Mailing List Archives
  • About
    Leadership team Partners Customers In the News Press Releases Press Info
Facebook
Downloads
11.13. Using k-NN Algorithm for Optimizing Queries
Prev UpChapter 11. IndexesHome Next

11.13. Using k-NN Algorithm for Optimizing Queries

11.13.1. Examples
11.13.2. See Also

Postgres Pro Enterprise can optimize k-nearest neighbors (k-NN) search for B-tree, GiST, and SP-GiST index types. With these index types, you can use ordering operators that return the rows in the order defined by the ORDER BY clause of the following type:

ORDER BY index_column operator constant

where operator is the supported ordering operator, such as <->. In this case, Postgres Pro Enterprise uses a special scan strategy for k-NN search.

For B-tree indexes, you can use only one ordering operator on the first index column. GiST and SP-GiST do not limit the number of ordering operators you can use for k-NN search. For multicolumn GiST indexes, you can also use multiple ordering operators for each column included into the multicolumn index. For example:

SELECT * FROM points ORDER BY column1 <-> '(0, 0)', column2 <-> '(1, 1)'

where column1 and column2 are index columns of the multicolumn GiST index.

You can run k-NN search for all built-in and custom data types that support the concept of distance. In database queries, k-NN search can provide performance benefits for various use cases: geographic information systems (GIS), classification problems (majority voting), clusterization, looking for similar objects or nearest events, detecting duplicates and typos, trigram matching, multimedia search, and more.

To natively implement k-NN search, Postgres Pro Enterprise uses distance-based priority queue for index traversal. Depending on the index type, the search algorithm will differ:

  • For GiST and SP-GiST index types, the priority queue is based on the pairing heap that can be effectively balanced. Inner nodes of the tree represent the minimum bounding rectangle (MBR) for their children. Leaf nodes store actual tuple entries. Postgres Pro Enterprise uses a heuristic to maintain the order by distance between the queue entries and the target point defined by the search argument. For inner nodes, the minimal distance for the child node is calculated. For leaf nodes, the distance to the tuple entry is used. In the tree nodes, leaf nodes are stored before the inner nodes in the list, so tuple entries are processed first. Each time a result is returned from the queue, it is guaranteed to be the closest one to the target point among the remaining entries. Thus, there is no need to scan all nodes. Once the required k entries are extracted from the queue, search can stop.

  • B-tree indexes use a simpler algorithm. If the target point falls into the scan range, Postgres Pro Enterprise performs bidirectional scan starting from this point, advancing at each step in the direction that includes the nearest point. Otherwise, a simple unidirectional scan in the right direction is used.

k-NN search performance has logarithmic dependence on the data size and linear dependence on k. As k tends to the number of rows in the table, the k-NN search time gradually tends to a naive algorithm that consecutively reads the whole table, sorts the data, and returns the first k entries. Since k-NN approach reduces the number of rows that need to be sorted, you can get significant performance benefits if the total number of rows returned by a query is big, or the rows are long. In the worst-case scenario, when all points are equally far from the target point, you have to traverse the whole index, but performance gain can still be achieved because only k entries need to be read from the table, at random.

Another benefit of using k-NN algorithm is simpler logic: you do not need to know the search “radius” in advance, and can resume the search if required.

11.13.1. Examples

Suppose you have the spots table containing coordinates of multiple points. To find ten closest points to the given target point, you can run the following query:

SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist 
FROM spots ORDER BY dist ASC LIMIT 10;

             coordinates             |       dist       
-------------------------------------+------------------
 (3.57192993164062,6.51727240153665) | 2.08362656457647
 (3.49502563476562,6.49134782128243) | 2.11874164636854
 (3.4393,6.4473)                     | 2.12848814420001
 (3.31787109375,6.50089913799597)    | 2.25438592075067
 (2.6323,6.4779)                     | 2.79109148900569
 (2.66349792480469,6.53159856871478) | 2.79374947392946
 (1.84102535247803,6.27874198581057) |  3.4079762161672
 (1.2255,6.1228)                     | 3.93796014327215
 (1.22772216796875,6.15693947094637) | 3.94570513108469
 (9.6977,4.044503)                   | 4.79388775494473
(10 rows)

If you need to find ten closest points that contain the word "mars" in their names, you can use a multi-column index:

CREATE INDEX pt_fts_idx ON spots USING gist(point, to_tsvector('english',asciiname));

SELECT asciiname,point, (point <->'5.0,5.0'::point) AS dist
FROM spots WHERE to_tsvector('english', asciiname)
@@ to_tsquery('english','mars') ORDER BY dist ASC LIMIT 10;

Suppose you have the events table that describes arbitrary historical events. To select five events that happened close to the launch of the first satellite Sputnik-1 on October 4, 1957, run:

SELECT date, event,  ('1957-10-04'::date <-> date) AS dist 
FROM events ORDER BY dist ASC LIMIT 5;

    date    |                 event                     | dist
------------+-------------------------------------------+------
 1957-10-04 | Gregory T Linteris, astronaut, is born    |    0
            | in New Jersey                             |
 1957-10-04 | USSR launches the first artificial        |    0
            | Earth satellite Sputnik-1                 |
 1957-10-04 | "Leave It to Beaver" sitcom               |    0
            | debuts on CBS                             |
 1957-10-03 | Lorinc Szabo, Hungarian poet,             |    1
            | dies aged 57                              |
 1957-10-05 | All-Stars beat Montreal in 11th           |    1
            | NHL All-Star Game                         |
(5 rows)

Consider an example of fuzzy search on trigrams. Let's create a trigram index on the events table, as follows:

CREATE INDEX events_trgm ON events USING gist(event gist_trgm_ops);

Now let's find three events that best match the following search query:

SELECT date, event, ('jeorge ewashington' <-> event ) AS dist 
FROM events ORDER BY dist ASC LIMIT 3;

    date    |                 event                      |  dist
------------+--------------------------------------------+---------
 1732-02-11 | George Washington                          | 0.458333
 1792-12-05 | George Washington is re-elected U.S.       | 0.674419
            | President                                  |
 1945-05-12 | Jayotis Washington is born                 | 0.714286
(3 rows)

11.13.2. See Also

Index Types
Ordering Operators

Prev Up Next
11.12. Examining Index Usage Home Chapter 12. Full Text Search
pdf epub

This page in other versions:

Postgres Pro Enterprise 17
Postgres Pro Enterprise 16
Postgres Pro Enterprise 15
Postgres Pro Enterprise 14
Postgres Pro Enterprise 13
Postgres Pro Enterprise 12
Postgres Pro Enterprise 11
Postgres Pro Enterprise 10
Postgres Pro Enterprise 9.6
Есть вопросы? Напишите нам!

Соглашаюсь с условиями обработки персональных данных



✖
×
Postgres Pro Enterprise 17 Postgres Pro Enterprise 16 Postgres Pro Enterprise 15 Postgres Pro Enterprise 14 Postgres Pro Enterprise 13 Postgres Pro Enterprise 12 Postgres Pro Enterprise 11 Postgres Pro Enterprise 10 Postgres Pro Enterprise 9.6
Go to Postgres Pro Enterprise 13
Services
  • 24×7×365 Technical Support
  • Migration to PostgreSQL
  • High Availability Deployment
  • Database Audit
  • Remote DBA for PostgreSQL
Products
  • Postgres Pro Enterprise
  • Postgres Pro Standard
  • Cloud Solutions
  • Postgres Extensions
Resources
  • Blog
  • Documentation
  • Webinars
  • Videos
  • Presentations
Community
  • Events
  • Training Courses
  • Books
  • Demo Database
  • Mailing List Archives
About
  • Leadership team
  • Partners
  • Customers
  • In the News
  • Press Releases
  • Press Info
Products
  • Postgres Pro Enterprise
  • Postgres Pro Standard
  • Cloud Solutions
  • Postgres Extensions
Resources
  • Blog
  • Documentation
  • Webinars
  • Videos
  • Presentations
Services
  • 24×7×365 Technical Support
  • Migration to Postgres
  • High Availability Deployment
  • Database Audit
  • Remote DBA for PostgreSQL
Community
  • Events
  • Training Courses
  • Intro Book
  • Demo Database
  • Mailing List Archives
About
  • Leadership team
  • Partners
  • Customers
  • In the News
  • Press Releases
  • Press Info
Contacts
  • Neptune House, Marina Bay, office 207, Gibraltar, GX11 1AA
  • info@postgrespro.com
  • Facebook
Get in touch!



© Postgres Professional Europe Limited, 2015 — 2025
EULA EULA for Cloud Environments Privacy Policy GDPR Compliance Commitment
© Postgres Professional Europe Limited, 2015 — 2025
EULA
EULA for Cloud Environments
Privacy Policy
GDPR Compliance Commitment
By continuing to browse this website, you agree to the use of cookies. Go to Privacy Policy.