Thread: Setting Statistics on Functional Indexes
Hey everyone, So recently we upgraded to 9.1 and have noticed a ton of our queries got much worse. It turns out that 9.1 is *way* more optimistic about our functional indexes, even when they're entirely the wrong path. So after going through the docs, I see that the normal way to increase stats is to alter columns directly on a table, or change the default_statistics_target itself. But there doesn't seem to be any way to increase stats for a functional index unless you modify default_statistics_target. I did some testing, and for a particularly bad plan, we don't get a good result until the stats are at 5000 or higher. As you can imagine, this drastically increases our analyze time, and there's no way we're setting that system-wide. I tested this by: SET default_statistics_target = 5000; ANALYZE my_table; EXPLAIN SELECT [ugly query]; I only tested 1000, 2000, 3000, 4000, and 5000 before it switched plans. This is a 30M row table, and the "good" plan is 100x faster than the bad one. You can see this behavior yourself with this test case: CREATE TABLE date_test ( id SERIAL, col1 varchar, col2 numeric, action_date TIMESTAMP WITHOUT TIME ZONE ); insert into date_test (col1, col2, action_date) select 'S:' || ((random()*a.num)::int % 10000), (random()*a.num)::int % 15000, current_date - (random()*a.num)::int % 1000 from generate_series(1,10000000) a(num); create index idx_date_test_action_date_trunc on date_test (date_trunc('day', action_date)); create index idx_date_test_col1_col2 on date_test (col1, col2); explain analyze select * from date_test where col1 IN ('S:96') and col2 = 657 and date_trunc('day', action_date) >= '2012-10-24' order by id desc, action_date This seems to cause the problem more consistently when using a value where col1 and col2 have no matches. In this particular example, I didn't get the good plan until using 1000 as the default stats target. It can't be a coincidence that there are 1000 distinct values in the table for that column, and we get a terrible plan until a statistic is recorded for each and every one in the functional index so it can exclude itself. This seems counter-intuitive to pg_stats with default stats at 500: SELECT attname,n_distinct FROM pg_stats WHERE tablename='date_test'; attname | n_distinct -------------+------------ id | -1 action_date | 1000 col2 | 14999 col1 | 10000 SELECT stadistinct FROM pg_statistic WHERE starelid='idx_date_test_col1_col2'::regclass stadistinct ------------- 1000 Just on pure selectivity, it should prefer the index on col1 and col2. Anyway, we're getting all the devs to search out that particular functional index and eradicate it, but that will take a while to get through testing and deployment. The overriding problem seems to be two-fold: 1. Is there any way to specifically set stats on a functional index? 2. Why is the planner so ridiculously optimistic with functional indexes, even in the case of much higher selectivity as reported by pg_stats on the named columns? Thanks! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas <sthomas@optionshouse.com> writes: > 1. Is there any way to specifically set stats on a functional index? Sure, the same way you would for a table. regression=# create table foo (f1 int, f2 int); CREATE TABLE regression=# create index fooi on foo ((f1 + f2)); CREATE INDEX regression=# \d fooi Index "public.fooi" Column | Type | Definition --------+---------+------------ expr | integer | (f1 + f2) btree, for table "public.foo" regression=# alter index fooi alter column expr set statistics 5000; ALTER INDEX The weak spot in this, and the reason this isn't "officially" supported, is that the column name for an index expression isn't set in stone. But as long as you check what it's called you can set its target. > 2. Why is the planner so ridiculously optimistic with functional > indexes, even in the case of much higher selectivity as reported by > pg_stats on the named columns? It's not particularly (not that you've even defined what you think "optimistic" is, much less mentioned what baseline you're comparing to). I tried your example on HEAD and I got what seemed pretty decent rowcount estimates ... regards, tom lane
On 10/24/2012 02:11 PM, Tom Lane wrote: > It's not particularly (not that you've even defined what you think > "optimistic" is, much less mentioned what baseline you're comparing > to). The main flaw with my example is that it's random. But I swear I'm not making it up! :) There seems to be a particularly nasty edge case we're triggering, then. Like I said, it's worse when col1+col2 don't match anything. In that case, it's using the trunc index on the date column, which has demonstrably worse performance. Here are the two analyzes I got before/after front-loading statistics. Before stats increase: Sort (cost=9.38..9.39 rows=1 width=23) (actual time=78.282..78.282 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_action_date_trunc on date_test (cost=0.00..9.37 rows=1 width=23) (actual time=78.274..78.274 rows=0 loops=1) Index Cond: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Total runtime: 78.317 ms And then after. I used your unofficial trick to set it to 1000: alter index idx_date_test_action_date_trunc alter column date_trunc set statistics 1000; analyze date_test; Sort (cost=9.83..9.83 rows=1 width=23) (actual time=0.038..0.038 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_col1_col2 on date_test (cost=0.00..9.82 rows=1 width=23) (actual time=0.030..0.030 rows=0 loops=1) Index Cond: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Filter: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Total runtime: 0.066 ms This is on a bone-stock PG 9.1.6 from Ubuntu 12.04 LTS, with default_statistics increased to 500. The only thing I bumped up was the functional index between those two query plans. But then I noticed something else. I reverted back to the old 500 default for everything, and added an index: create index idx_date_test_action_date_trunc_col1 on date_test (date_trunc('day', action_date), col1); I think we can agree that this index would be more selective than the one on date_trunc by itself. Yet: Sort (cost=9.38..9.39 rows=1 width=23) (actual time=77.055..77.055 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_action_date_trunc on date_test (cost=0.00..9.37 rows=1 width=23) (actual time=77.046..77.046 rows=0 loops=1) Index Cond: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Total runtime: 77.091 ms All I have to say about that is: wat. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 10/24/2012 02:31 PM, Shaun Thomas wrote: > The main flaw with my example is that it's random. But I swear I'm not > making it up! :) And then I find a way to make it non-random. Hooray: CREATE TABLE date_test ( id SERIAL, col1 varchar, col2 numeric, action_date TIMESTAMP WITHOUT TIME ZONE ); insert into date_test (col1, col2, action_date) select 'S:' || (a.num % 10000), a.num % 15000, current_date - a.num % 1000 from generate_series(1,10000000) a(num); create index idx_date_test_action_date_trunc on date_test (date_trunc('day', action_date)); create index idx_date_test_col1_col2 on date_test (col1, col2); set default_statistics_target = 500; vacuum analyze date_test; explain analyze select * from date_test where col1 IN ('S:96') and col2 = 657 and date_trunc('day', action_date) >= '2012-10-24' order by id desc, action_date; Sort (cost=9.38..9.39 rows=1 width=23) (actual time=83.418..83.418 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_action_date_trunc on date_test (cost=0.00..9.37 rows=1 width=23) (actual time=83.409..83.409 rows=0 loops=1) Index Cond: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Total runtime: 83.451 ms alter index idx_date_test_action_date_trunc alter column date_trunc set statistics 1000; analyze date_test; Sort (cost=9.83..9.83 rows=1 width=23) (actual time=0.077..0.077 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_col1_col2 on date_test (cost=0.00..9.82 rows=1 width=23) (actual time=0.069..0.069 rows=0 loops=1) Index Cond: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Filter: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Total runtime: 0.105 m Then for fun: create index idx_date_test_action_date_trunc_col1 on date_test (date_trunc('day', action_date), col1); alter index idx_date_test_action_date_trunc alter column date_trunc set statistics -1; analyze date_test; Sort (cost=9.38..9.39 rows=1 width=23) (actual time=84.375..84.375 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_action_date_trunc on date_test (cost=0.00..9.37 rows=1 width=23) (actual time=84.366..84.366 rows=0 loops=1) Index Cond: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Total runtime: 84.410 ms o_O -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas <sthomas@optionshouse.com> writes: > On 10/24/2012 02:31 PM, Shaun Thomas wrote: >> The main flaw with my example is that it's random. But I swear I'm not >> making it up! :) > And then I find a way to make it non-random. Hooray: I can't reproduce this. In 9.1 for instance, I get Sort (cost=9.83..9.83 rows=1 width=23) (actual time=0.029..0.029 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_col1_col2 on date_test (cost=0.00..9.82 rows=1 width=23) (actual time=0.021..0.021rows=0 loops=1) Index Cond: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Filter: (date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) Total runtime: 0.086 ms and those estimates don't change materially with the stats adjustments. If I drop that index altogether, it goes over to this: Sort (cost=521.83..521.83 rows=1 width=23) (actual time=2.544..2.544 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_action_date_trunc_col1 on date_test (cost=0.00..521.82 rows=1 width=23) (actual time=2.536..2.536rows=0 loops=1) Index Cond: ((date_trunc('day'::text, action_date) >= '2012-10-24 00:00:00'::timestamp without time zone) AND ((col1)::text= 'S:96'::text)) Filter: (col2 = 657::numeric) Total runtime: 2.600 ms So the planner's conclusions look fairly sane from here. I get about the same results from HEAD, 9.2 branch tip, or 9.1 branch tip. So I'm wondering exactly what "9.1" version you're using, and also whether you've got any nondefault planner cost parameters. regards, tom lane
On 10/26/2012 02:35 PM, Tom Lane wrote: > So I'm wondering exactly what "9.1" version you're using, and also > whether you've got any nondefault planner cost parameters. Just a plain old 9.1.6 from Ubuntu 12.04. Only thing I personally changed was the default_statistics_target. Later, I bumped up shared buffers and work mem, but that just reduced the run time. Still uses the bad index. But I just noticed the lag in your response. :) It turns out, even though I was substituting 2012-10-24 or 2012-10-25, what I really meant was current_date. That does make all the difference, actually. If the date in the where clause isn't the current date, it comes up with the right plan. Even a single day in the past makes it work right. It only seems to break on the very edge. This should work: DROP TABLE IF EXISTS date_test; CREATE TABLE date_test ( id SERIAL, col1 varchar, col2 numeric, action_date TIMESTAMP WITHOUT TIME ZONE ); insert into date_test (col1, col2, action_date) select 'S:' || (a.num % 10000), a.num % 15000, current_date - a.num % 1000 from generate_series(1,10000000) a(num); create index idx_date_test_action_date_trunc on date_test (date_trunc('day', action_date)); create index idx_date_test_col1_col2 on date_test (col1, col2); set default_statistics_target = 500; vacuum analyze date_test; explain analyze select * from date_test where col1 IN ('S:96') and col2 = 657 and date_trunc('day', action_date) >= current_date order by id desc, action_date; Sort (cost=9.39..9.39 rows=1 width=23) (actual time=10.679..10.679 rows=0 loops=1) Sort Key: id, action_date Sort Method: quicksort Memory: 25kB -> Index Scan using idx_date_test_action_date_trunc on date_test (cost=0.01..9.38 rows=1 width=23) (actual time=10.670..10.670 rows=0 loops=1) Index Cond: (date_trunc('day'::text, action_date) >= ('now'::text)::date) Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric)) Total runtime: 10.713 ms And if this helps: foo=# select name,setting from pg_settings where setting != boot_val; name | setting ----------------------------+--------------------- application_name | psql archive_command | (disabled) client_encoding | UTF8 default_statistics_target | 500 default_text_search_config | pg_catalog.english lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 log_file_mode | 0600 log_line_prefix | %t max_stack_depth | 2048 server_encoding | UTF8 shared_buffers | 3072 ssl | on transaction_isolation | read committed unix_socket_directory | /var/run/postgresql unix_socket_permissions | 0777 wal_buffers | 96 That's every single setting that's not a default from the compiled PG. Some of these were obviously modified by Ubuntu, but I didn't touch anything else. I was trying to produce a clean-room to showcase this. But I'm seeing it everywhere I test, even with sane settings. Our EDB server is doing the same thing on much beefier hardware and correspondingly increased settings, which is what prompted me to test it in plain PG. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas <sthomas@optionshouse.com> writes: > But I just noticed the lag in your response. :) It turns out, even > though I was substituting 2012-10-24 or 2012-10-25, what I really meant > was current_date. That does make all the difference, actually. Ah. [ pokes at that for awhile... ] OK, this has nothing to do with functional indexes, and everything to do with the edge-case behavior of scalarltsel. What you've got is a histogram whose last entry (corresponding to the highest observed value of the date) is current_date, and the question is what we should assume when estimating how many rows have a value >= that. The answer of course is "one, plus any duplicates" ... but we don't know how many duplicates there are, and what we do know is it's not a particularly large number because the value isn't present in the most-common-values stats. So the code there assumes there aren't any dups. Once you have enough histogram resolution for current_date to show up as the next-to-last as well as the last histogram entry, then of course the estimate gets a lot better, since we can now tell that there's at least one histogram bin's worth of duplicates. Interestingly, this is a case where the get_actual_variable_range patch (commit 40608e7f, which appeared in 9.0) makes the results worse. Before that, there was a (very arbitrary) lower bound on what we'd believe as the selectivity of a >= condition, but now, when we know the actual upper limit of the variable, we don't clamp the result that way. I think the clamp must have been saving you in your previous version, because it more-or-less-accidentally accounted for the end value not being unique. So the bottom line is that this is a case where you need a lot of resolution in the histogram. I'm not sure there's anything good we can do to avoid that. I spent a bit of time thinking about whether we could use n_distinct to get some idea of how many duplicates there might be for the endpoint value, but n_distinct is unreliable enough that I can't develop a lot of faith in such a thing. Or we could just arbitarily assume some fraction-of-a-histogram-bin's worth of duplicates, but that would make the results worse for some people. regards, tom lane
On Fri, Oct 26, 2012 at 6:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Interestingly, this is a case where the get_actual_variable_range patch > (commit 40608e7f, which appeared in 9.0) makes the results worse. > Before that, there was a (very arbitrary) lower bound on what we'd > believe as the selectivity of a >= condition, but now, when we know the > actual upper limit of the variable, we don't clamp the result that way. > I think the clamp must have been saving you in your previous version, > because it more-or-less-accidentally accounted for the end value not > being unique. IIRC, that patch was performing an index query (index_last) to get the real largest value, right? How many duplicates would you think the planner would require to choose another (better) plan? Because once you've accessed that last index page, it would be rather trivial finding out how many duplicate tids are in that page and, with a small CPU cost (no disk access if you don't query other index pages) you could verify the assumption of near-uniqueness.
Claudio Freire <klaussfreire@gmail.com> writes: > Because once you've accessed that last index page, it would be rather > trivial finding out how many duplicate tids are in that page and, with > a small CPU cost (no disk access if you don't query other index pages) > you could verify the assumption of near-uniqueness. I thought about that too, but I'm not sure how promising the idea is. In the first place, it's not clear when to stop counting duplicates, and in the second, I'm not sure we could get away with not visiting the heap to check for tuple liveness. There might be a lot of apparent duplicates in the index that just represent unreaped old versions of a frequently-updated endpoint tuple. (The existing code is capable of returning a "wrong" answer if the endpoint tuple is dead, but I don't think it matters much in most cases. I'm less sure such an argument could be made for dup-counting.) regards, tom lane
On Fri, Oct 26, 2012 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> Because once you've accessed that last index page, it would be rather >> trivial finding out how many duplicate tids are in that page and, with >> a small CPU cost (no disk access if you don't query other index pages) >> you could verify the assumption of near-uniqueness. > > I thought about that too, but I'm not sure how promising the idea is. > In the first place, it's not clear when to stop counting duplicates, and > in the second, I'm not sure we could get away with not visiting the heap > to check for tuple liveness. There might be a lot of apparent > duplicates in the index that just represent unreaped old versions of a > frequently-updated endpoint tuple. (The existing code is capable of > returning a "wrong" answer if the endpoint tuple is dead, but I don't > think it matters much in most cases. I'm less sure such an argument > could be made for dup-counting.) Would checking the visibility map be too bad? An index page worth of tuples should also fit within a page in the visibility map.
On Fri, Oct 26, 2012 at 7:04 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Fri, Oct 26, 2012 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Claudio Freire <klaussfreire@gmail.com> writes: >>> Because once you've accessed that last index page, it would be rather >>> trivial finding out how many duplicate tids are in that page and, with >>> a small CPU cost (no disk access if you don't query other index pages) >>> you could verify the assumption of near-uniqueness. >> >> I thought about that too, but I'm not sure how promising the idea is. >> In the first place, it's not clear when to stop counting duplicates, and >> in the second, I'm not sure we could get away with not visiting the heap >> to check for tuple liveness. There might be a lot of apparent >> duplicates in the index that just represent unreaped old versions of a >> frequently-updated endpoint tuple. (The existing code is capable of >> returning a "wrong" answer if the endpoint tuple is dead, but I don't >> think it matters much in most cases. I'm less sure such an argument >> could be made for dup-counting.) > > Would checking the visibility map be too bad? An index page worth of > tuples should also fit within a page in the visibility map. Scratch that, they're sorted by tid. So it could be lots of pages in random order.
On 10/26/2012 04:08 PM, Tom Lane wrote: > So the bottom line is that this is a case where you need a lot of > resolution in the histogram. I'm not sure there's anything good > we can do to avoid that. I kinda hoped it wouldn't be something like that. For the particularly painful instance, it was easy to replace the index with a better (if larger) dual index and drop the bad old one. But in some cases, I'm having to maintain two indexes that make me sad: CREATE TABLE activity ( activity_id SERIAL NOT NULL PRIMARY KEY, account_id BIGINT NOT NULL, action_date TIMESTAMP WITHOUT TIME ZONE ); CREATE INDEX idx_activity_action_date_account_id ON activity (action_date, activity_id); CREATE INDEX idx_activity_account_id_action_date ON activity (activity_id, action_date); Because in the first case, we needed the action_date to be first for analytics that *don't* supply account_id. But in the second case, we need the account_id first, so we can get the most recent action(s) for that account without a very expensive backwards index scan on the first index. I know that current_date seems like an edge case, but I can't see how getting the most recent activity for something is an uncommon activity. Tip tracking is actually the most frequent pattern in the systems I've seen. Admittedly, those are almost always high TPS trading systems. At this point, I'm almost willing to start putting in optimization fences to force it along the right path. Which is gross, because that's effectively no better than Oracle hints. But I also don't like setting my statistics to 5000+ on problematic column/index combos to get the right heuristics, or having semi-duplicate multi-column indexes to exploit sorting performance. I mean, I get it. I just wonder if this particular tweak isn't more of a regression than initially thought. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun Thomas wrote: > I know that current_date seems like an edge case, but I can't see > how getting the most recent activity for something is an uncommon > activity. Tip tracking is actually the most frequent pattern in the > systems I've seen. Yeah, this has been a recurring problem with database statistics with various products for at least 20 years. For a while I was using a product whose optimizer engineers referred to it as "data skew" and recommended adding a "dummy" entry to get a single value out past the maximum end of the range. If you couldn't stomach the dummy data, they had detailed instructions for dumping your statistics, tinkering with the end of it to allow for the issue, and writing it back over the actual statistics gathered. We need a better answer than that. > I just wonder if this particular tweak isn't more of a regression > than initially thought. It does seem like we have a serious regression in terms of this particular issue. -Kevin
On Fri, Oct 26, 2012 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So the bottom line is that this is a case where you need a lot of > resolution in the histogram. I'm not sure there's anything good > we can do to avoid that. I spent a bit of time thinking about whether > we could use n_distinct to get some idea of how many duplicates there > might be for the endpoint value, but n_distinct is unreliable enough > that I can't develop a lot of faith in such a thing. Or we could just > arbitarily assume some fraction-of-a-histogram-bin's worth of > duplicates, but that would make the results worse for some people. I looked at this a bit. It seems to me that the root of this issue is that we aren't distinguishing (at least, not as far as I can see) between > and >=. ISTM that if the operator is >, we're doing exactly the right thing, but if it's >=, we're giving exactly the same estimate that we would give for >. That doesn't seem right. Worse, I suspect that in this case we're actually giving a smaller estimate for >= than we would for =, because = would estimate as if we were searching for an arbitrary non-MCV, while >= acts like > and says, hey, there's nothing beyond the end. Shouldn't there be a separate estimator for scalarlesel? Or should the existing estimator be adjusted to handle the two cases differently? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 14, 2012 at 5:36 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Shouldn't there be a separate estimator for scalarlesel? Or should > the existing estimator be adjusted to handle the two cases > differently? Woulnd't adding eqsel to scalar(lt|gt)sel work? (saving duplication with mvc_selectivity)
Robert Haas <robertmhaas@gmail.com> writes: > Shouldn't there be a separate estimator for scalarlesel? Or should > the existing estimator be adjusted to handle the two cases > differently? Well, it does handle it differently to some extent, in that the operator itself is invoked when checking the MCV values, so we get the right answer for those. The fact that there's not separate estimators for < and <= is something we inherited from Berkeley, so I can't give the original rationale for certain, but I think the notion was that the difference is imperceptible when dealing with a continuous distribution. The question is whether you think that the "=" case contributes any significant amount to the probability given that the bound is not one of the MCV values. (If it is, the MCV check will have accounted for it, so adding anything would be wrong.) I guess we could add 1/ndistinct or something like that, but I'm not convinced that will really make the estimates better, mainly because ndistinct is none too reliable itself. regards, tom lane