Re: Updated RUM-index and support for bigint as part of index - Mailing list pgsql-general

From Andreas Joseph Krogh
Subject Re: Updated RUM-index and support for bigint as part of index
Date
Msg-id VisenaEmail.bf.b23b7ec74f28713.1567b293dcc@tc7-visena
Whole thread Raw
In response to Re: Updated RUM-index and support for bigint as part of index  (Artur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-general
På torsdag 11. august 2016 kl. 19:13:10, skrev Artur Zakirov <a.zakirov@postgrespro.ru>:
On 07.08.2016 11:05, Andreas Joseph Krogh wrote:
> På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov
> <obartunov@gmail.com <mailto:obartunov@gmail.com>>:
>
>     [snip]
>     have you considered <=| and |=> operators ? <=> in ORDER BY works
>     like KNN.
>
>
> I don't get how these operators should work. Neither give me the
> expected results.
>
> Using <=>
>
>
> SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
> origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
> 'andreas:*&jose:*') ORDER BY '2000-01-01' :: TIMESTAMP <=>
> del.received_timestamp LIMIT 10;
>
>  entity_id | folder_id |   received_timestamp
> -----------+-----------+-------------------------
>   1224278 |   1068087 | 2015-08-17 23:53:26
>   1224382 |   1068087 | 2015-08-18 03:07:55
>   1224404 |   1068087 | 2015-08-18 03:49:02
>   1505713 |     48496 | 2015-10-27 14:51:45
>    142132 |     66658 | 2012-12-03 14:14:05.488
>    122565 |     90115 | 2012-11-20 15:41:04.936
>    200744 |     66655 | 2013-01-28 21:47:44.561
>   1445927 |    888665 | 2015-09-29 00:26:56
>    123671 |     83509 | 2012-11-21 14:16:26.448
>   1129928 |     66658 | 2015-05-09 08:39:14.128
> (10 rows)
>
>
> Using <=|
>
> SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
> origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
> 'andreas:*&jose:*') ORDER BY '2000-01-01' :: TIMESTAMP <=|
> del.received_timestamp LIMIT 10;
>
>
>  entity_id | folder_id |   received_timestamp
> -----------+-----------+-------------------------
>   1224278 |   1068087 | 2015-08-17 23:53:26
>   1224382 |   1068087 | 2015-08-18 03:07:55
>   1224404 |   1068087 | 2015-08-18 03:49:02
>   1505713 |     48496 | 2015-10-27 14:51:45
>    142132 |     66658 | 2012-12-03 14:14:05.488
>    122565 |     90115 | 2012-11-20 15:41:04.936
>    200744 |     66655 | 2013-01-28 21:47:44.561
>   1445927 |    888665 | 2015-09-29 00:26:56
>    123671 |     83509 | 2012-11-21 14:16:26.448
>   1129928 |     66658 | 2015-05-09 08:39:14.128
> (10 rows)
>
>
> Neither are ordered by received_timestamp
>
> Can you explain how to get ORDER BY received_timestamp DESC?
>
> Thanks.
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas@visena.com <mailto:andreas@visena.com>
> www.visena.com <https://www.visena.com>
> <https://www.visena.com>
>

Do you need simple ordering by received_timestamp column? Not ordering
by distance between received_timestamp and some date?

Then you can use simple "ORDER BY received_timestamp". For example, we
have data:

=# SELECT * FROM test;
  id |     fts     |        received
----+-------------+-------------------------
   1 | 'andreas':1 | 2015-08-17 23:53:26
   2 | 'andreas':1 | 2015-08-18 03:07:55
   3 | 'andreas':1 | 2015-08-18 03:49:02
   4 | 'andreas':1 | 2012-12-03 14:14:05.488
   5 | 'andreas':1 | 2012-11-20 15:41:04.936
   6 | 'andreas':1 | 2013-01-28 21:47:44.561
   6 | 'andreas':1 | 2015-09-29 00:26:56
   7 | 'andreas':1 | 2012-11-21 14:16:26.448
   8 | 'andreas':1 | 2015-05-09 08:39:14.128
(9 rows)

I created index:

CREATE INDEX rumidx ON test USING rum (fts rum_tsvector_timestamp_ops,
received) WITH (attach = 'received', to = 'fts');

Then we can execute queries:

=# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple',
'andreas') ORDER BY received LIMIT 8;
  id |        received
----+-------------------------
   5 | 2012-11-20 15:41:04.936
   7 | 2012-11-21 14:16:26.448
   4 | 2012-12-03 14:14:05.488
   6 | 2013-01-28 21:47:44.561
   8 | 2015-05-09 08:39:14.128
   1 | 2015-08-17 23:53:26
   2 | 2015-08-18 03:07:55
   3 | 2015-08-18 03:49:02
(8 rows)

=# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple',
'andreas') ORDER BY received DESC LIMIT 8;
  id |        received
----+-------------------------
   6 | 2015-09-29 00:26:56
   3 | 2015-08-18 03:49:02
   2 | 2015-08-18 03:07:55
   1 | 2015-08-17 23:53:26
   8 | 2015-05-09 08:39:14.128
   6 | 2013-01-28 21:47:44.561
   4 | 2012-12-03 14:14:05.488
   7 | 2012-11-21 14:16:26.448
(8 rows)
 
Yes, this gives the correct result, but the whole motivation for using RUM-index is for the query to use the same index for ORDER BY, as it seems to do using the <=> operator.
 
The query you gave above does not the index for sorting AFAIU.
 
Operators <=>, |=>, <=| you can use to order by nearest date to specific
date:
[snip]
 
I hope this is what you want.
 
I still don't understand how my query which had ORDER BY '2000-01-01' :: TIMESTAMP <=> del.received_timestamp
can produce the following ordering:
 
 entity_id | folder_id |   received_timestamp     
-----------+-----------+-------------------------
  1224278 |   1068087 | 2015-08-17 23:53:26
  1224382 |   1068087 | 2015-08-18 03:07:55
  1224404 |   1068087 | 2015-08-18 03:49:02
  1505713 |     48496 | 2015-10-27 14:51:45
   142132 |     66658 | 2012-12-03 14:14:05.488
   122565 |     90115 | 2012-11-20 15:41:04.936
   200744 |     66655 | 2013-01-28 21:47:44.561
  1445927 |    888665 | 2015-09-29 00:26:56
   123671 |     83509 | 2012-11-21 14:16:26.448
  1129928 |     66658 | 2015-05-09 08:39:14.128
 
How can "nearest date to specific date" produce this ordering when the specific date si 2000-01-01?
 
Thanks for explaining.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

pgsql-general by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Re: Postgres Pain Points 2 ruby / node language drivers
Next
From: Jeff Janes
Date:
Subject: Re: Postgres Pain Points: 1 pg_hba conf