Re: Histogram question. - Mailing list pgsql-sql

From Steve Midgley
Subject Re: Histogram question.
Date
Msg-id CAJexoS+NPAb4BLRSRYP=qNwaq3A591D6run1Qwa85mV_10ehYA@mail.gmail.com
Whole thread Raw
In response to Histogram question.  (Jian He <hejian.mark@gmail.com>)
List pgsql-sql


On Tue, Apr 5, 2022 at 7:35 AM Jian He <hejian.mark@gmail.com> wrote:



SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes' AND s.attname = 'seat_no';

return 0.6762. 

SELECT sum(s.most_common_freqs[ array_position((s.most_common_vals::text::text[]),v) ]) FROM pg_stats s, unnest(s.most_common_vals::text::text[]) v WHERE s.tablename = 'boarding_passes' AND s.attname = 'seat_no' AND v > '30C';

return 0.2127

SELECT round( reltuples * ( 0.2127 -- from most common values + (1 - 0.6762 - 0) * (49 / 100.0) -- from histogram )) FROM pg_class WHERE relname = 'boarding_passes';

the above mentioned query, the part I don't understand is 49/100.


I believe the exercise is intended to create a set of histograms based on data values over a series of intervals. The 49/100 (if I'm reading the source material correctly) refers to finding all the boarding passes in the lower 49 of 100 intervals. I didn't bother to read what the interval definition is, but I think that's what the "49" is referring to..

pgsql-sql by date:

Previous
From: Jian He
Date:
Subject: Histogram question.
Next
From: Jian He
Date:
Subject: Does postgresql know the check condition is valid or not. or can check deduce from multiple conditions