Thread: Histogram question.

Histogram question.

From
Jian He
Date:



SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes' AND s.attname = 'seat_no';

return 0.6762. 

SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes' AND s.attname = 'seat_no' AND v > '30C';

return 0.2127

SELECT round( reltuples * ( 0.2127 -- from most common values + (1 - 0.6762 - 0) * (49 / 100.0) -- from histogram )) FROM pg_class WHERE relname = 'boarding_passes';

the above mentioned query, the part I don't understand is 49/100.

Re: Histogram question.

From
Steve Midgley
Date:


On Tue, Apr 5, 2022 at 7:35 AM Jian He <hejian.mark@gmail.com> wrote:



SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes' AND s.attname = 'seat_no';

return 0.6762. 

SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes' AND s.attname = 'seat_no' AND v > '30C';

return 0.2127

SELECT round( reltuples * ( 0.2127 -- from most common values + (1 - 0.6762 - 0) * (49 / 100.0) -- from histogram )) FROM pg_class WHERE relname = 'boarding_passes';

the above mentioned query, the part I don't understand is 49/100.


I believe the exercise is intended to create a set of histograms based on data values over a series of intervals. The 49/100 (if I'm reading the source material correctly) refers to finding all the boarding passes in the lower 49 of 100 intervals. I didn't bother to read what the interval definition is, but I think that's what the "49" is referring to..