Home > mailing lists

GIN Index has O(N^2) complexity for array overlap operator? - Mailing list pgsql-performance

From	Felix Geisendörfer
Subject	GIN Index has O(N^2) complexity for array overlap operator?
Date	September 7, 2018 18:56:30
Msg-id	37D7D0A1-9AB4-41AD-BCE2-D127C891B4DF@felixge.de Whole thread Raw
List	pgsql-performance

Tree view

Hi everybody,

I ran into an issue with using the && array operator on a GIN index of mine. Basically I have a query that looks like this:

SELECT * FROM example WHERE keys && ARRAY[...];

This works fine for a small number of array elements (N), but gets really slow as N gets bigger in what appears to be O(N^2) complexity.

However, from studying the GIN data structure as described by the docs, it seems that the performance for this could be O(N). In fact, it's possible to coerce the query planner into an O(N) plan like this:

SELECT DISTINCT ON (example.id) * FROM unnest(ARRAY[...]) key JOIN example ON keys && ARRAY[key]

In order to illustrate this better, I've created a jupyter notebook that populates an example table, show the query plans for both queries, and most importantly benchmarks them and plots a time vs array size (N) graph.

https://github.com/felixge/pg-slow-gin/blob/master/pg-slow-gin.ipynb

Please help me understand what causes the O(N^2) performance for query 1 and if query 2 is the best way to work around this issue.

Thanks

Felix Geisendörfer

PS: I'm using Postgres 10, but also verified that this problem exists with Postgres 11.

pgsql-performance by date:

From: Jeff Janes
Date: 07 September 2018, 17:32:00
Subject: Re: Multi-second pauses blocking even trivial activity

From: James Coleman
Date: 07 September 2018, 19:17:25
Subject: Partial index plan/cardinality costing

GIN Index has O(N^2) complexity for array overlap operator? - Mailing list pgsql-performance

Previous

Next