Incremental aggregate/rollup strategy advice - Mailing list pgsql-general

From Morris de Oryx
Subject Incremental aggregate/rollup strategy advice
Date
Msg-id CAKqncciZm9qc-6K5x+RF2aP3iate9-YX-TMcDyBCwddvEU4uQA@mail.gmail.com
Whole thread Raw
Responses Re: Incremental aggregate/rollup strategy advice
RE: Incremental aggregate/rollup strategy advice
List pgsql-general
I'm researching strategies for incrementally updating aggregate/rollup tables. The problem is how to do so without double-counting changes, and not skipping changes. I know enough about concurrency issues to ask the question, but do not know enough about the features and details of Postgres' concurrency management to figure out a 100% reliable solution without some help. And, with concurrency-related stuff, you're either 100% right or you're buggy.

And thanks in advance to anyone who can help out. I'm not good at writing short :( I've tried to put in enough detail to get to the point, which is "how do I find unprocessed records without missing any."

Okay, the setup is that we've got a lot of tables where we would like to do incremental aggregates. To simplify things, mostly these are INSERT-only tables, sometimes UPDATE, not worrying about DELETE yet. A couple of strategies I'd like to avoid:

* Full queries will take too long, and will scale poorly. So, MATERIALIZED VIEW is unappealing. So, rollup tables as it's possible to update them incrementally.

* We may have multiple aggregates off the same base data, and may change them over time. So, putting some kind of flag field in the source table doesn't really fit. 

* I was thinking about a posting/diff/delta/audit-like table, but that's a pretty "heavy" solution. You need some kind of ON AFTER INSERT/UPDATE selection-based trigger to push over the data that's needed to update the aggregates. Which, again, means the source table needs to know what aggregations are going to take place. Plus, it's just a ton of churn and extra data...when all of necessary data exists in the source table already.

* I saw one strategy that looks good from the folks at CitusData:

Briefly, they use a bigserial counter which, I guess, is not transaction-bound so that record insertions have a chronological stamp. 1, 2, 3, etc. This is a design familiar to me from other environments and is sometimes called a "concurrency ID." In our case, we need to support UPDATE as well, so I don't think the sequence idea will work (?) To make this more concrete, here's a simplified table with source data:

CREATE TABLE "error_report" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We've got distributed sources, so UUIDs for IDs. 
"error_name" text NOT NULL DEFAULT false,                -- Something we'll summarize by.
"facility_id" uuid NOT NULL DEFAULT NULL,                -- Something we'll summarize by.
"error_dts" timestamptz NOT NULL DEFAULT NULL,           -- Set on the source machine in UTC
"last_updated_dts" timestamptz NOT NULL DEFAULT NULL);   -- Set on Postgres after INSERT or UPDATE.

The idea is that you have a stable number line as a number or a timestamp. We use timestamptz and store everything in UTC. Otherwise, it's the same basic idea as what the CitusData folks said: You have an ever-increasing number line so that you can mark where you've processed to. This way, you can fetch unprocessed rows without missing any, without a flag field the source table, and without an audit table/change queue of any kind. I've simplified the timestamps below for legibility to spell this out, as it's the crux of my question about Postgres specifics. And, just pretend that these rows are all on page 0...I've faked ctid values to make the rows easier to keep track of.

ctid   last_updated_dts
(0,1)  2018-09-25 05:53:00 
(0,2)  2018-09-25 05:54:00
(0,3)  2018-09-25 05:55:00
(0,3)  2018-09-25 05:55:00
(0,4)  2018-09-26 02:23:00
(0,5)  2018-09-26 03:14:00
(0,6)  2018-09-26 03:15:00
(0,7)  2018-09-28 05:10:00
(0,8)  2018-09-28 05:14:00
(0,9)  2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00

You need a small utility table to hold details about which records you've aggregated or processed.

CREATE TABLE "rollup_status" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We use UUIDs, not necessary here, but it's what we use. 
"rollup_name" text NOT NULL DEFAULT false,               
"last_processed_dts" timestamptz NOT NULL DEFAULT NULL); -- Marks the last timestamp processed.

Now imagine that I've got a rollup_status record

rollup_name         last_processed_dts
error_name_counts   2018-09-26 02:23:00

If I search for rows that were modified after the "processed until", I get these:

ctid   last_updated_dts
(0,5)  2018-09-26 03:14:00
(0,6)  2018-09-26 03:15:00
(0,7)  2018-09-28 05:10:00
(0,8)  2018-09-28 05:14:00
(0,9)  2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00

And update the max(last_updated_dts) in the rollup_detail record: 

rollup_name     last_processed_dts
error_name_counts   2018-09-28 05:15:00

So, I got a chunk of the timeline, recorded how far I went, and processed those records. The beauty part of this technique, if I can get it implemented correctly, is that this doesn't have to block new records. While I'm processing those 5 (or 5K), new records can be added onto the end of error_report and, so long as they have a timestamp later than rollup_detail.last_processed_dts, I'll find them on the next sweep. And if I want to add a different rollup? There's no need to do _anything_ to the error_report table. Imagine two new rows have come in to the error_report table

(0,11) 2018-09-28 05:17:00
(0,12) 2018-09-28 05:18:00

The new rollup runs, processes all 12 rows, and now there are two rollup_detail entries:

rollup_name         last_processed_dts
error_name_counts   2018-09-28 05:15:00
facility_avgs       2018-09-28 05:18:00

Same timeline, different uses, different last-position-processed stored in rollup_detail.last_processed_dts.

Okay, this finally brings me to the Postgres-specific question.

     How do I make sure that I don't have transactions commit with earlier timestamps?

I've been digging into this and have seen and considered a few things, but I'm not sure which, if any will work.

* Use an ON AFTER trigger on the selection with a transition table (cool feature!) to stamp the last_updated_dts with clock_timestamp(). That function in particular as I've now learned that most timestamp functions in a trigger return the transaction _start_ time, not _commit_ time. I need the transaction commit time. Which brings us to the next idea.

* pg_xact_commit_timestamp which, if I understand it correctly, is sort of an additional, invisible system column that stores the transaction commit timestamp as a timestamptz. That sounds perfectly matched to my requirements but:

-- I've failed to find _any_ detailed discussion of this feature, which is unusual with Postgres.

-- I'm not 100% sure it will work. Do I need some kind of lock notification or something while going the aggregate to button things down?

-- I'm not wild about invisible columns as they're obscure to anyone else.

-- I'm not wild about bolting an extra 8-bytes onto every row in every table. And does this field index? I think it must. 

-- I've seen mention that the timestamps aren't retained.

I don't need the timestamp as such, it's just meant to order things along the number line. I'll put a BRIN index on error_report.last_updated_dts (and such fields in other tables.) I don't think I would need an index on the pg_xact_commit_timestamp value, and it's not directly supported. But if it is necessary, I guess you could build one with a cast AT TIME ZONE to convert the value to something immutable for the index engine.

* The xmin system column. I don't think that this would work. While transaction IDs always increase, they aren't going to commit in that order. 

* The ctid system column. I've seen this mentioned, but I honestly don't follow how you could use this to reliably detect all new and modified rows.

I will be very grateful for any help or suggestions. I'm out over my skis on some of the details on Postgres specifics, so I won't take corrections the wrong way.

pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Active connections are terminated because of smallwal_sender_timeout
Next
From: Tatsuo Ishii
Date:
Subject: Re: Incremental aggregate/rollup strategy advice