Hi everyone,
I assume this is not easy with standard PG but I wanted to double check.
I have a column that has a very uneven distribution of values. ~95% of the values will be the same, with some long
tailof another few dozens of values.
I want to have an index over this value. Queries that select the most common value will not use the index, because it
isa overwhelming percentage of the table. This means that ~95% of the disk space and IOPS to maintain the index is
"wasted".
I cannot use a hardcoded partial index because:
1) The common value is not known at schema definition time, and may change (very slowly) over time.
2) JDBC uses prepared statements for everything, and the value to be selected is not known at statement prepare time,
soany partial indices are ignored (this is a really really obnoxious behavior and makes partial indices almost useless
combinedwith prepared statements, sadly…)
The table size is expected to approach the 0.5 billion row mark within the next few months, hence my eagerness to save
evenseemingly small amounts of per-row costs.
Curious if anyone has a good way to approach this problem.
Thanks,
Steven