Re: [GENERAL] Large data and slow queries - Mailing list pgsql-general

From Martijn Tonies \(Upscene Productions\)
Subject Re: [GENERAL] Large data and slow queries
Date
Msg-id 5D5368E3CC1D4078AE18C506C5CE5867@MARTIJNWS
Whole thread Raw
In response to [GENERAL] Large data and slow queries  (Samuel Williams <space.ship.traveller@gmail.com>)
Responses Re: [GENERAL] Large data and slow queries
Re: [GENERAL] Large data and slow queries
List pgsql-general
Samuel, others,

Perhaps I'm missing something, but I'd be interested in the reasoning behind
this.

For column 'what', it seems you have no index on all values, only indices
with specific values for 'what'.

How does this speed up the search? Will PostgreSQL use those indices,
instead of using a generic index on 'what' and optionally other columns?


With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com


-----Original Message-----
From: Samuel Williams
Sent: Wednesday, April 19, 2017 6:01 AM
To: pgsql-general
Subject: [GENERAL] Large data and slow queries

Hi.

We have 400,000,000 records in a table (soon to be 800,000,000), here
is the schema (\d+)

https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121

We want the following kinds of query to be fast:

SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
"user_event"."what" IN ('poll', 'location_change',
'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
01:23:55') AND (latitude > -37.03079375089291 AND latitude <
-36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
175.0805140220076);

We have a btree index and it appears to be working. However, it's
still pretty slow.

EXPLAIN ANALYZE gives the following:
https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314

I'm thinking that I need to do the following to help:

CLUSTER user_event ON index_user_event_for_visits_3 followed by
analyze... Our data is mostly time series but sometimes we get some
dumps with historical records.

Perhaps add a BRIN index on created_at

I'm wondering if... we can use an index to cache, all user_ids seen on
a given day. If we change our index to be more discrete, e.g.
created_at::date, would this help? The set union of user_ids for 365
days should be pretty fast?

I'm open to any ideas or suggestions, ideally we can keep
optimisations within the database, rather than adding a layer of
caching on top.

Kind regards,
Samuel


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



pgsql-general by date:

Previous
From: vinny
Date:
Subject: Re: [GENERAL] Large data and slow queries
Next
From: Samuel Williams
Date:
Subject: Re: [GENERAL] Large data and slow queries