Thread: Window functions, partitioning, and sorting performance

Window functions, partitioning, and sorting performance

From

Eli Naeher

Date:

21 August 2014, 16:30:01

I have a table called stop_event (a stop event is one bus passing one bus stop at a given time for a given route and direction), and I'd like to get the average interval for each stop/route/direction combination.

A few hundred new events are written to the table once every minute. No rows are ever updated (or deleted, except in development).

stop_event looks like this:

Table "public.stop_event"

Column | Type | Modifiers

-----------+-----------------------------+-----------

stop_time | timestamp without time zone | not null

stop | integer | not null

bus | integer | not null

direction | integer | not null

route | integer | not null

Foreign-key constraints:

"stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES direction(id)

"stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id)

"stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id)

And my query looks like this:

SELECT (floor(date_part(E'epoch', avg(interval))) / 60)::INTEGER,
route,
direction,
name,
st_asgeojson(stop_location)::JSON
FROM
(SELECT (stop_time - (lag(stop_time) OVER w)) AS interval,
route,
direction,
name,
stop_location
FROM stop_event
INNER JOIN stop ON (stop_event.stop = stop.id)

WINDOW w AS (PARTITION BY route, direction, stop ORDER BY stop_time))

AS all_intervals
WHERE (interval IS NOT NULL)
GROUP BY route,
direction,
name,
stop_location;

With around 1.2 million rows, this takes 20 seconds to run. 1.2 million rows is only about a week's worth of data, so I'd like to figure out a way to make this faster. The EXPLAIN ANALYZE is at http://explain.depesz.com/s/ntC.

Clearly the bulk of the time is spent sorting the rows in the original table, and then again sorting the results of the subselect. But I'm afraid I don't really know what to do with this information. Is there any way I can speed this up? Is my use of an aggregate key for stop_event causing problems? Would using a synthetic key help?

Thank you for any help you can provide,
-Eli

Re: Window functions, partitioning, and sorting performance

From

Marti Raudsepp

Date:

21 August 2014, 17:02:58

On Thu, Aug 21, 2014 at 4:29 PM, Eli Naeher <enaeher@gmail.com> wrote:
> Clearly the bulk of the time is spent sorting the rows in the original
> table, and then again sorting the results of the subselect. But I'm afraid I
> don't really know what to do with this information. Is there any way I can
> speed this up?

"Sort Method: external merge  Disk: 120976kB"

The obvious first step is to bump up work_mem to avoid disk-based
sort. Try setting it to something like 256MB in your session and see
how it performs then. This may also allow the planner to choose
HashAggregate instead of sort.

It not always straightforward how to tune correctly. It depends on
your hardware, concurrency and query complexity, here's some advice:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem_maintainance_work_mem

Also you could create an index on (route, direction, stop, stop_time)
to avoid the inner sort entirely.

And it seems that you can move the "INNER JOIN stop" to the outer
query as well, not sure if that will change much.

Try these and if it's still problematic, report back with a new EXPLAIN ANALYZE

Regards,
Marti

Re: Window functions, partitioning, and sorting performance

From

Shaun Thomas

Date:

21 August 2014, 17:05:55

On 08/21/2014 08:29 AM, Eli Naeher wrote:

> With around 1.2 million rows, this takes 20 seconds to run. 1.2 million
> rows is only about a week's worth of data, so I'd like to figure out a
> way to make this faster.

Well, you'll probably be able to reduce the run time a bit, but even
with really good hardware and all in-memory processing, you're not going
to see significant run-time improvements with that many rows. This is
one of the reasons reporting-specific structures, such as fact tables,
were designed to address.

Repeatedly processing the same week/month/year aggregate worth of
several million rows will just increase linearly with each iteration as
data size increases. You need to maintain up-to-date aggregates on the
metrics you actually want to measure, so you're only reading the few
hundred rows you introduce every update period. You can retrieve those
kind of results in a few milliseconds.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Window functions, partitioning, and sorting performance

From

Eli Naeher

Date:

21 August 2014, 19:19:25

Upping work_mem did roughly halve the time, but after thinking about Shaun's suggestion, I figured it's better to calculate this stuff once and then store it. So here is how the table looks now:

Table "public.stop_event"

Column | Type | Modifiers

---------------------+-----------------------------+---------------------------------------------------------

stop_time | timestamp without time zone | not null

stop | integer | not null

bus | integer | not null

direction | integer | not null

route | integer | not null

id | bigint | not null default nextval('stop_event_id_seq'::regclass)

previous_stop_event | bigint |

Indexes:

"stop_event_pkey" PRIMARY KEY, btree (id)

"stop_event_previous_stop_event_idx" btree (previous_stop_event)

Foreign-key constraints:

"stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES direction(id)

"stop_event_previous_stop_event_fkey" FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id)

"stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id)

"stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id)

Referenced by:

TABLE "stop_event" CONSTRAINT "stop_event_previous_stop_event_fkey" FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id)

previous_stop_event simply references the previous (by stop_time) stop event for the combination of stop, route, and direction. I have successfully populated this column for my existing test data. However, when I try to do a test self-join using it, Postgres does two seq scans across the whole table, even though I have indexes on both id and previous_stop_event: http://explain.depesz.com/s/ctck. Any idea why those indexes are not being used?

Thank you again,

-Eli

On Thu, Aug 21, 2014 at 9:05 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 08/21/2014 08:29 AM, Eli Naeher wrote:

With around 1.2 million rows, this takes 20 seconds to run. 1.2 million
rows is only about a week's worth of data, so I'd like to figure out a
way to make this faster.

Well, you'll probably be able to reduce the run time a bit, but even with really good hardware and all in-memory processing, you're not going to see significant run-time improvements with that many rows. This is one of the reasons reporting-specific structures, such as fact tables, were designed to address.

Repeatedly processing the same week/month/year aggregate worth of several million rows will just increase linearly with each iteration as data size increases. You need to maintain up-to-date aggregates on the metrics you actually want to measure, so you're only reading the few hundred rows you introduce every update period. You can retrieve those kind of results in a few milliseconds.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Window functions, partitioning, and sorting performance

From

Eli Naeher

Date:

21 August 2014, 19:21:13

Oops, I forgot to include the test self-join query I'm using. It is simply:

SELECT se1.stop_time AS curr, se2.stop_time AS prev

FROM stop_event se1

JOIN stop_event se2 ON se1.previous_stop_event = se2.id;

On Thu, Aug 21, 2014 at 11:19 AM, Eli Naeher <enaeher@gmail.com> wrote:

Upping work_mem did roughly halve the time, but after thinking about Shaun's suggestion, I figured it's better to calculate this stuff once and then store it. So here is how the table looks now:

Table "public.stop_event"
Column | Type | Modifiers
---------------------+-----------------------------+---------------------------------------------------------
stop_time | timestamp without time zone | not null
stop | integer | not null
bus | integer | not null
direction | integer | not null
route | integer | not null
id | bigint | not null default nextval('stop_event_id_seq'::regclass)
previous_stop_event | bigint |
Indexes:
"stop_event_pkey" PRIMARY KEY, btree (id)
"stop_event_previous_stop_event_idx" btree (previous_stop_event)
Foreign-key constraints:
"stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES direction(id)
"stop_event_previous_stop_event_fkey" FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id)
"stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id)
"stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id)
Referenced by:
TABLE "stop_event" CONSTRAINT "stop_event_previous_stop_event_fkey" FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id)

previous_stop_event simply references the previous (by stop_time) stop event for the combination of stop, route, and direction. I have successfully populated this column for my existing test data. However, when I try to do a test self-join using it, Postgres does two seq scans across the whole table, even though I have indexes on both id and previous_stop_event: http://explain.depesz.com/s/ctck. Any idea why those indexes are not being used?

Thank you again,
-Eli

On Thu, Aug 21, 2014 at 9:05 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 08/21/2014 08:29 AM, Eli Naeher wrote:

With around 1.2 million rows, this takes 20 seconds to run. 1.2 million
rows is only about a week's worth of data, so I'd like to figure out a
way to make this faster.

Well, you'll probably be able to reduce the run time a bit, but even with really good hardware and all in-memory processing, you're not going to see significant run-time improvements with that many rows. This is one of the reasons reporting-specific structures, such as fact tables, were designed to address.

Repeatedly processing the same week/month/year aggregate worth of several million rows will just increase linearly with each iteration as data size increases. You need to maintain up-to-date aggregates on the metrics you actually want to measure, so you're only reading the few hundred rows you introduce every update period. You can retrieve those kind of results in a few milliseconds.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Window functions, partitioning, and sorting performance

From

Marti Raudsepp

Date:

21 August 2014, 20:15:25

On Thu, Aug 21, 2014 at 7:19 PM, Eli Naeher <enaeher@gmail.com> wrote:
> However, when I try to do a
> test self-join using it, Postgres does two seq scans across the whole table,
> even though I have indexes on both id and previous_stop_event:
> http://explain.depesz.com/s/ctck. Any idea why those indexes are not being
> used?

Because the planner thinks seq scan+hash join is going to be faster
than incurring the overhead of index scans for other kinds of plans.

You can try out alternative plan types by running 'set
enable_hashjoin=off' in your session. If it does turn out to be
faster, then it usually means you haven't set planner tunables right
(random_page_cost, effective_cache_size and possibly cpu_tuple_cost).

Regards,
Marti