Re: design partioning scheme for selecting from latest partition - Mailing list pgsql-general

From Francisco Olarte
Subject Re: design partioning scheme for selecting from latest partition
Date
Msg-id CA+bJJbx6CQ2O9oAcSBeGi-puV14SZzc0O6VYAu50ndO5895vEA@mail.gmail.com
Whole thread Raw
In response to design partioning scheme for selecting from latest partition  (Niels Jespersen <NJN@dst.dk>)
Responses SV: design partioning scheme for selecting from latest partition  (Niels Jespersen <NJN@dst.dk>)
List pgsql-general
Niels:

On Mon, Mar 22, 2021 at 3:40 PM Niels Jespersen <NJN@dst.dk> wrote:
...
>     version int not null,
...
> ) partition by list (version);
>
> -- Add tens of partitions
> -- Load millions of rows in each partition
> -- Then I want to be able to do this wothout scanning all partitions for the highest version number.
>
> select s.* from s where s.version = (select max(version) from s);

> I could add an index on the version column. But the only use would be to the newest partition, so that seems a bit
likeoverkill, indexing 100 of milliomns of rows.
 

Without an index, or some caching, you would need to scan partitions.
Potentially several.  Because you could have a partition for (0,3,6),
other for (1,2,8) and a another for 4. Are you sure list is better,
giving your message hints you make them increasing? I mean, even if
you do no tuse all versions, having ranges means you can query the
schema, order the partitions in desceding order in the range, query
each one and it must be either empty or contain the maximum. You could
do this if your lists are increasing too ( i.e, you have 1,2,3 and
10,12,14, and 100,200, but you haven't told that so we cannot assume
it ). With free lists, like the ones I've put above, you may scan
1,2,8 expecting an 8 to find max is a 2, then 0,3,6 expecting a 6 to
find max is a 3 and then 4 and find a max there, and without indexes
or insider knowledge every scan will have to be a full scan.

Even if you can do something like that, without an index you will need
a full scan, or do some trigger magic and keep a cache ( just keep
versio, count(*) on a table and maintain it ). If your partitions are
ordered, you can always keep the last one indexed, or if you know
versions do not decrease, you may keep things cached. This seems to be
the kind of problem where the generic solution is hard but a little
insider knowledge can accelerate it a lot.

Regards.
   Francisco Olarte.



pgsql-general by date:

Previous
From: Edward Donahue III
Date:
Subject: gdal32-libs-3-2-2-13.rhel bad dependency
Next
From: "Saha, Sushanta K"
Date:
Subject: CSV From Oracle with timestamp column getting errors