SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes'AND s.attname = 'seat_no';
return 0.6762.
SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes'AND s.attname = 'seat_no'AND v > '30C';
return 0.2127
SELECT round( reltuples * ( 0.2127-- from most common values + (1 - 0.6762 - 0) * (49 / 100.0) -- from histogram )) FROM pg_class WHERE relname = 'boarding_passes';
the above mentioned query, the part I don't understand is 49/100.
I believe the exercise is intended to create a set of histograms based on data values over a series of intervals. The 49/100 (if I'm reading the source material correctly) refers to finding all the boarding passes in the lower 49 of 100 intervals. I didn't bother to read what the interval definition is, but I think that's what the "49" is referring to..