Thread: Searching GIN-index (FTS) and sort by timestamp-column

Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
Hi all.
 
I'm doing full-text-search and want do display the results in the order the articles were received (timestamp). I have millions of articles where the newest are the most interesting, and a search may match many articles so doing the sort using some INDEX would be great.
 
Take the following example-schema:
 
create extension if not exists btree_gin;
drop table if EXISTS delivery;
create table delivery(    id BIGSERIAL primary key,    fts_all TSVECTOR not null,    folder_id BIGINT NOT NULL,    received_timestamp TIMESTAMP not null,    message varchar not null
);
create index fts_idx ON delivery using gin(fts_all, folder_id);

CREATE OR REPLACE FUNCTION update_delivery_tsvector_tf() RETURNS TRIGGER AS $$
BEGIN
    NEW.fts_all = to_tsvector('simple', NEW.message);
    return NEW;
END;
$$ LANGUAGE PLPGSQL;

CREATE TRIGGER update_delivery_tsvector_t BEFORE INSERT OR UPDATE ON delivery
FOR EACH ROW EXECUTE PROCEDURE update_delivery_tsvector_tf();

insert into delivery(folder_id, received_timestamp, message)
values (1, '2015-01-01', 'Yes hit four')    , (1, '2014-01-01', 'Hi man')    , (2, '2013-01-01', 'Hi man')    , (2, '2013-01-01', 'fish')
;

analyze delivery;

set ENABLE_SEQSCAN to off;

explain analyze SELECT del.id    , del.received_timestamp
FROM delivery del
WHERE 1 = 1
      AND del.fts_all @@ to_tsquery('simple', 'hi:*')      AND del.folder_id = 1
ORDER BY  del.received_timestamp DESC LIMIT 101 OFFSET 0;
 
I use btree_gin extention to make folder_id part of index.
 
I get the following plan (using 9.6 from master):
                                                         QUERY PLAN                                                         
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=5.23..5.23 rows=1 width=16) (actual time=0.042..0.043 rows=2 loops=1)
   ->  Sort  (cost=5.23..5.23 rows=1 width=16) (actual time=0.040..0.040 rows=2 loops=1)
         Sort Key: received_timestamp DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on delivery del  (cost=3.90..5.22 rows=1 width=16) (actual time=0.029..0.030 rows=2 loops=1)
               Recheck Cond: (fts_all @@ '''hi'':*'::tsquery)
               Filter: (folder_id = 1)
               Rows Removed by Filter: 1
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on fts_idx  (cost=0.00..3.90 rows=1 width=0) (actual time=0.018..0.018 rows=3 loops=1)
                     Index Cond: (fts_all @@ '''hi'':*'::tsquery)
 Planning time: 0.207 ms
 Execution time: 0.085 ms
(13 rows)
 
Here is the explain from a real-world query:
 
EXPLAIN ANALYZE SELECT del.entity_id    , del.received_timestamp
FROM origo_email_delivery del
WHERE 1 = 1
      AND del.fts_all @@ to_tsquery('simple', 'andre:*')      AND del.folder_id = 44964
ORDER BY del.received_timestamp DESC LIMIT 101 OFFSET 0;
                                                                        QUERY PLAN                                                                        
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=92260.66..92260.91 rows=101 width=16) (actual time=347.891..347.907 rows=101 loops=1)
   ->  Sort  (cost=92260.66..92291.08 rows=12167 width=16) (actual time=347.888..347.899 rows=101 loops=1)
         Sort Key: received_timestamp DESC
         Sort Method: top-N heapsort  Memory: 29kB
         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=2480.95..91794.77 rows=12167 width=16) (actual time=152.568..346.132 rows=18257 loops=1)
               Recheck Cond: (fts_all @@ '''andre'':*'::tsquery)
               Filter: (folder_id = 44964)
               Rows Removed by Filter: 264256
               Heap Blocks: exact=80871
               ->  Bitmap Index Scan on temp_fts_idx  (cost=0.00..2477.91 rows=309588 width=0) (actual time=134.903..134.903 rows=282513 loops=1)
                     Index Cond: (fts_all @@ '''andre'':*'::tsquery)
 Planning time: 0.530 ms
 Execution time: 347.967 ms
(13 rows)
 
 
1. Why isnt' folder_id part of the index-cond?
2. Is there a way to make it use the (same) index to sort by received_timestamp?
3. Using a GIN-index, is there a way to use the index at all for sorting?
4. It doesn't seem like ts_rank uses the index for sorting either.
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Tom Lane
Date:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

            regards, tom lane


Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På onsdag 16. mars 2016 kl. 14:53:04, skrev Andreas Joseph Krogh <andreas@visena.com>:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?
 
Thanks.
 
This paper talks about ORDER BY optimizations for FTS (starting at slide 6 and 7):
http://www.sai.msu.su/~megera/postgres/talks/Next%20generation%20of%20GIN.pdf
 
This indicates some work is being done in this area.
 
Oleg, if you're listening, do you guys have any exiting news regarding this?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Evgeniy Shishkin
Date:
> On 16 Mar 2016, at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andreas Joseph Krogh <andreas@visena.com> writes:
>> 1. Why isnt' folder_id part of the index-cond?
>
> Because a GIN index is useless for sorting.

I don't see how gin inability to return sorted data relates to index condition.
In fact i tried to reproduce the example,
and if i change folder_id to int from bigint, then index condition with folder_id is used

         Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))


>
>> 2. Is there a way to make it use the (same) index to sort by
>> received_timestamp?
>
> No.
>
>> 3. Using a GIN-index, is there a way to use the index at all for sorting?
>
> No.
>
>> 4. It doesn't seem like ts_rank uses the index for sorting either.
>
> Same reason.
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Evgeniy Shishkin
Date:
> On 16 Mar 2016, at 17:52, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>
>
>> On 16 Mar 2016, at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Andreas Joseph Krogh <andreas@visena.com> writes:
>>> 1. Why isnt' folder_id part of the index-cond?
>>
>> Because a GIN index is useless for sorting.
>
> I don't see how gin inability to return sorted data relates to index condition.
> In fact i tried to reproduce the example,
> and if i change folder_id to int from bigint, then index condition with folder_id is used
>
>         Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
>

Looks like documentation http://www.postgresql.org/docs/9.5/static/btree-gin.html
is lying about supporting int8 type

>
>>
>>> 2. Is there a way to make it use the (same) index to sort by
>>> received_timestamp?
>>
>> No.
>>
>>> 3. Using a GIN-index, is there a way to use the index at all for sorting?
>>
>> No.
>>
>>> 4. It doesn't seem like ts_rank uses the index for sorting either.
>>
>> Same reason.
>>
>>             regards, tom lane
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>



Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Evgeniy Shishkin
Date:
> On 16 Mar 2016, at 18:04, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>
>>
>> On 16 Mar 2016, at 17:52, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>>
>>
>>> On 16 Mar 2016, at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> Andreas Joseph Krogh <andreas@visena.com> writes:
>>>> 1. Why isnt' folder_id part of the index-cond?
>>>
>>> Because a GIN index is useless for sorting.
>>
>> I don't see how gin inability to return sorted data relates to index condition.
>> In fact i tried to reproduce the example,
>> and if i change folder_id to int from bigint, then index condition with folder_id is used
>>
>>        Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
>>
>
> Looks like documentation http://www.postgresql.org/docs/9.5/static/btree-gin.html
> is lying about supporting int8 type
>

Uh, it works if i cast to bigint explicitly
      WHERE  del.fts_all @@ to_tsquery('simple', 'hi')
      AND del.folder_id = 1::bigint;
results in
         Index Cond: ((folder_id = '1'::bigint) AND (fts_all @@ '''hi'''::tsquery))

>>
>>>
>>>> 2. Is there a way to make it use the (same) index to sort by
>>>> received_timestamp?
>>>
>>> No.
>>>
>>>> 3. Using a GIN-index, is there a way to use the index at all for sorting?
>>>
>>> No.
>>>
>>>> 4. It doesn't seem like ts_rank uses the index for sorting either.
>>>
>>> Same reason.
>>>
>>>             regards, tom lane
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance



Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På onsdag 16. mars 2016 kl. 16:04:08, skrev Evgeniy Shishkin <itparanoia@gmail.com>:

> On 16 Mar 2016, at 17:52, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>
>
>> On 16 Mar 2016, at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Andreas Joseph Krogh <andreas@visena.com> writes:
>>> 1. Why isnt' folder_id part of the index-cond?
>>
>> Because a GIN index is useless for sorting.
>
> I don't see how gin inability to return sorted data relates to index condition.
> In fact i tried to reproduce the example,
> and if i change folder_id to int from bigint, then index condition with folder_id is used
>
>         Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
>

Looks like documentation http://www.postgresql.org/docs/9.5/static/btree-gin.html
is lying about supporting int8 type
 
Hm, interesting!
 
@Tom: Any idea why BIGINT doesn't work, but INTEGER does?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På onsdag 16. mars 2016 kl. 16:07:56, skrev Evgeniy Shishkin <itparanoia@gmail.com>:

> On 16 Mar 2016, at 18:04, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>
>>
>> On 16 Mar 2016, at 17:52, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>>
>>
>>> On 16 Mar 2016, at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> Andreas Joseph Krogh <andreas@visena.com> writes:
>>>> 1. Why isnt' folder_id part of the index-cond?
>>>
>>> Because a GIN index is useless for sorting.
>>
>> I don't see how gin inability to return sorted data relates to index condition.
>> In fact i tried to reproduce the example,
>> and if i change folder_id to int from bigint, then index condition with folder_id is used
>>
>>        Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
>>
>
> Looks like documentation http://www.postgresql.org/docs/9.5/static/btree-gin.html
> is lying about supporting int8 type
>

Uh, it works if i cast to bigint explicitly
      WHERE  del.fts_all @@ to_tsquery('simple', 'hi')
      AND del.folder_id = 1::bigint;
results in
         Index Cond: ((folder_id = '1'::bigint) AND (fts_all @@ '''hi'''::tsquery))
 
Hm, this is quite cranky, but thanks for the heads-up!
 
Tho it looks like it works if prepared, without explicit cast:
 
prepare fish AS
    SELECT del.id        , del.received_timestamp    FROM delivery del    WHERE 1 = 1
          AND del.fts_all @@ to_tsquery('simple', $1)          AND del.folder_id = $2    ORDER BY  del.received_timestamp DESC LIMIT 101 OFFSET 0;

explain analyze execute fish('hi:*', 1);
                                                         QUERY PLAN                                                         
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=9.13..9.13 rows=1 width=16) (actual time=0.047..0.048 rows=2 loops=1)
   ->  Sort  (cost=9.13..9.13 rows=1 width=16) (actual time=0.045..0.045 rows=2 loops=1)
         Sort Key: received_timestamp DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on delivery del  (cost=7.80..9.12 rows=1 width=16) (actual time=0.034..0.034 rows=2 loops=1)
               Recheck Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id = '1'::bigint))
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on fts_idx  (cost=0.00..7.80 rows=1 width=0) (actual time=0.023..0.023 rows=2 loops=1)
                     Index Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id = '1'::bigint))
 Execution time: 0.103 ms
(10 rows)
 
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Tom Lane
Date:
Evgeniy Shishkin <itparanoia@gmail.com> writes:
> Uh, it works if i cast to bigint explicitly

FWIW, the reason for that is that the int8_ops operator class that
btree_gin creates doesn't contain any cross-type operators.  Probably
wouldn't be that hard to fix if somebody wanted to put in the work.

            regards, tom lane


Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På torsdag 17. mars 2016 kl. 18:20:23, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Evgeniy Shishkin <itparanoia@gmail.com> writes:
> Uh, it works if i cast to bigint explicitly

FWIW, the reason for that is that the int8_ops operator class that
btree_gin creates doesn't contain any cross-type operators.  Probably
wouldn't be that hard to fix if somebody wanted to put in the work.

regards, tom lane
 
Thanks for info.
 
Can you explain why it works when using prepared statement without casting? Does the machinary then know the type so the "setParameter"-call uses the correct type?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Tom Lane
Date:
Andreas Joseph Krogh <andreas@visena.com> writes:
> På torsdag 17. mars 2016 kl. 18:20:23, skrev Tom Lane <tgl@sss.pgh.pa.us
>  FWIW, the reason for that is that the int8_ops operator class that
>  btree_gin creates doesn't contain any cross-type operators.  Probably
>  wouldn't be that hard to fix if somebody wanted to put in the work.

> Can you explain why it works when using prepared statement without casting?

If you mean the example

prepare fish AS SELECT del.id , del.received_timestamp FROM delivery del
WHERE 1= 1 AND del.fts_all @@ to_tsquery('simple', $1) AND
del.folder_id = $2 ORDER BY del.received_timestamp DESC LIMIT 101 OFFSET 0;

you didn't provide any type for the parameter $2, so the parser had to
infer a type, and the applicable heuristic here is "same type that's on
the other side of the operator".  So you ended up with "bigint = bigint"
which is in the btree_gin operator class.  If you'd specified the
parameter's type as integer, it would've worked the same as Evgeniy's
example.

            regards, tom lane


Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Jeff Janes
Date:
On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?

I don't see why it would not be possible to create a new execution node type that does an index scan to obtain order (or just to satisfy an equality or range expression), and takes a bitmap (as produced by the FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on doing that.

Cheers,

Jeff

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På lørdag 19. mars 2016 kl. 03:44:55, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?
 
I don't see why it would not be possible to create a new execution node type that does an index scan to obtain order (or just to satisfy an equality or range expression), and takes a bitmap (as produced by the FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on doing that.
 
Isn't this what Postgres Pro are planning? http://postgrespro.com/roadmap/mssearch
 
"Unlike external special-purpose search engines, a full-text search engine built in a DBMS is capable of combining full-text and attributive search criteria in SQL query syntax. It is planned to improve the existing PostgreSQL full-text search engine by extending the functionality of Generalized Inverted Index (GIN) to make it capable of storing extra information required for ranging query results. This search acceleration will allow to go back from external full-text search engines, thus facilitating system administration and use, reducing technology risks, and improving information security."
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Oleg Bartunov
Date:


On Mon, Mar 21, 2016 at 5:41 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På lørdag 19. mars 2016 kl. 03:44:55, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?
 
I don't see why it would not be possible to create a new execution node type that does an index scan to obtain order (or just to satisfy an equality or range expression), and takes a bitmap (as produced by the FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on doing that.
 
Isn't this what Postgres Pro are planning? http://postgrespro.com/roadmap/mssearch
 
"Unlike external special-purpose search engines, a full-text search engine built in a DBMS is capable of combining full-text and attributive search criteria in SQL query syntax. It is planned to improve the existing PostgreSQL full-text search engine by extending the functionality of Generalized Inverted Index (GIN) to make it capable of storing extra information required for ranging query results. This search acceleration will allow to go back from external full-text search engines, thus facilitating system administration and use, reducing technology risks, and improving information security."

This is different feature ! Actually, we already have prototype of what Jeff suggested, we called it bitmap filtering, but failed to find use case where it provides benefits. Teodor will comment this idea more detail.
 
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 

Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Andreas Joseph Krogh
Date:
På mandag 21. mars 2016 kl. 16:13:07, skrev Oleg Bartunov <obartunov@gmail.com>:
 
 
On Mon, Mar 21, 2016 at 5:41 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På lørdag 19. mars 2016 kl. 03:44:55, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?
 
I don't see why it would not be possible to create a new execution node type that does an index scan to obtain order (or just to satisfy an equality or range expression), and takes a bitmap (as produced by the FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on doing that.
 
Isn't this what Postgres Pro are planning? http://postgrespro.com/roadmap/mssearch
 
"Unlike external special-purpose search engines, a full-text search engine built in a DBMS is capable of combining full-text and attributive search criteria in SQL query syntax. It is planned to improve the existing PostgreSQL full-text search engine by extending the functionality of Generalized Inverted Index (GIN) to make it capable of storing extra information required for ranging query results. This search acceleration will allow to go back from external full-text search engines, thus facilitating system administration and use, reducing technology risks, and improving information security."
 
This is different feature ! Actually, we already have prototype of what Jeff suggested, we called it bitmap filtering, but failed to find use case where it provides benefits. Teodor will comment this idea more detail.
 
The feature I'm missing is the ability to do FTS (or use GIN in general) and then sort on some other column (also indexed by the same GIN-index, using the btree-gin extention), often of type BIGINT or TIMESTAMP.
Are you planning to work on such a feature for GIN?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Oleg Bartunov
Date:


On Sat, Mar 19, 2016 at 5:44 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Joseph Krogh <andreas@visena.com> writes:
> 1. Why isnt' folder_id part of the index-cond?

Because a GIN index is useless for sorting.

> 2. Is there a way to make it use the (same) index to sort by
> received_timestamp?

No.

> 3. Using a GIN-index, is there a way to use the index at all for sorting?

No.

> 4. It doesn't seem like ts_rank uses the index for sorting either.

Same reason.

regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?

I don't see why it would not be possible to create a new execution node type that does an index scan to obtain order (or just to satisfy an equality or range expression), and takes a bitmap (as produced by the FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on doing that.

Please, find bitmap filtering patch, which we developed several months ago, but failed  to find good use case :( Teodor is here now, so he could answer the questions.
 

Cheers,

Jeff

Attachment

Re: Searching GIN-index (FTS) and sort by timestamp-column

From
Jeff Janes
Date:
On Tue, Mar 22, 2016 at 9:41 AM, Oleg Bartunov <obartunov@gmail.com> wrote:
>
>
> On Sat, Mar 19, 2016 at 5:44 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>>
>> I don't see why it would not be possible to create a new execution node
>> type that does an index scan to obtain order (or just to satisfy an equality
>> or range expression), and takes a bitmap (as produced by the FTS/GIN) to
>> apply as a filter.  But, I don't know of anyone planning on doing that.
>
>
> Please, find bitmap filtering patch, which we developed several months ago,
> but failed  to find good use case :( Teodor is here now, so he could answer
> the questions.

I can't find any benefit because I can't get the new node to ever execute.

I set up this:

create table foo as select md5(random()::text), random() as y from
generate_series(1,10000000);
create index on foo using gin (md5 gin_trgm_ops);
create index on foo (y);
vacuum ANALYZE foo ;

Then when I run this:

explain (analyze,buffers) select y from foo where md5 like '%abcde%'
order by y limit 1

The function "cost_filtered_index(newpath)" never fires.  So the
planner is never even considering this feature.

It seems to be getting short-circuited here:

            if (ipath->indexorderbys == NIL && ipath->indexorderbycols == NIL)
                continue;



I don't know enough about the planner to know where to start on this.

Cheers,

Jeff