Partitioning of a dependent table not based on date - Mailing list pgsql-general

From Herouth Maoz
Subject Partitioning of a dependent table not based on date
Date
Msg-id F4C9EC2B-DF6B-4AF3-9AAC-5BD78F95D4DC@unicell.co.il
Whole thread Raw
Responses Re: Partitioning of a dependent table not based on date
List pgsql-general
I am currently in the process of creating a huge archive database that contains data from all of our systems, going back for almost a decade.

Most of the tables fall into one of two categories:

1. Static tables, which are rarely updated, such as lookup tables or user lists. I don't intend to partition these, I'll just refresh them periodically from production.
2. Transaction tables, that have a timestamp field, for which I have the data archived in COPY format by month. Of course a monolithic table over a decade is not feasible, so I am partitioning these by month.

(I don't mean "transaction" in the database sense, but in the sense that the data represents historical activity, e.g. message sent, file downloaded etc.)

I have one table, though, that doesn't fall into this pattern. It's a many-to-one table relating to one of the transaction tables. So on one hand, it doesn't have a time stamp field, and on the other hand, it has accumulated lots of data over the last decade so I can't keep it unpartitioned.

CREATE TABLE smart_messaging_copyrights
(
    smd_id              BIGINT          references smart_messaging_data(smd_id)
                                        not null,
    copyright_id        NUMERIC(21)     not null,
    cover_percent       FLOAT           null
)

So, the question is - how do I keep this table partitioned, so that the query planner will not go through all of the partitions when I query on it. Queries on this table are, of course, always in conjunction with the smart_messaging_data table, based on the smd_id field. So I was thinking perhaps to create partitions based on the smd_id field.

The archived copy files are per-month, just like the usual transaction table. They are saved based on the time stamp field in the smart_messaging_data table.

I'm thinking of one of two solutions.

solution one

Restore the files each into its own partition, created at first without check constraints, check for min(smd_id) and max(smd_id), and then add check constraints accordingly.

My concern is - what happens if it so happens that there are overlaps between tables. That is, smd_id may not have originally been assigned exactly according to the transaction's time stamp, so it could be that the last id for January '13 is, say, 53001, but the first id for February '13 is 52995, so there is a slight overlap between the check constraints of two different partitions.

The advantage is that I keep a correspondence between the partitioning of the smart_messaging_data and the smart_messaging_copyrights, and that I can process the files one by one.

solution two

I upload all the data from all the files into the database, and then start partitioning it based on smd_id so that I don't have any check constraint overlap.

This way there is no overlap, but there is no correspondence between the structures of the two tables, and I have to do an initial load that gets all the huge data into one place where I'll have to index it and manipulate it before I partition it.


Does anybody know what problems check constraint overlaps in a partitioned table cause as far as the query planner is concerned? If it just causes it to include both partitions in the query plan, then it's not that bad and I would prefer the first solution.

Otherwise, can anybody offer a better solution?

Note: this table is no longer in active use since January 2014. So after the initial load, there will be no need to create new partitions.


TIA,
Herouth


pgsql-general by date:

Previous
From: Andy Colson
Date:
Subject: Re: Slow delete when many foreign tables are defined
Next
From: Bill Moran
Date:
Subject: Re: Slow delete when many foreign tables are defined