Thread: Re: attdisbursion
Added to TODO. This will improve VACUUM ANALYZE performance, thought I don't think we have btree comparison functions for all data types, though we should: * change VACUUM ANALYZE to use btree comparison functions, not <,=,> calls > > > Also, I have idea about using '<' '>' in vacuum: > > > what if try to use btree BT_ORDER functions which allow > > > to compare vals for many data types (btXXXcmp functions in > > > nbtcompare.c). > > > > I see, use a btree index to tell use how selective the > or < is? An > > interesting idea. Isn't there a significant performance problem with > > this? > > Don't use btree index, but use btree functions to compare > two values of a datatype. You call > func_operator = oper("<",... > "=" > ">" > but this's not right way in common case: operators may be > overloaded. > > These functions are stored in catalog. > To get function for a datatype btree call > > proc = index_getprocid(rel, 1, BTORDER_PROC); > > Look @ nbtcompare.c: > > * These functions are stored in pg_amproc. For each operator class > * defined on btrees, they compute > * > * compare(a, b): > * < 0 if a < b, > * = 0 if a == b, > * > 0 if a > b. > > There are functions for INTs, FLOATs, ... > > ...But this is not so important thing... > > Vadim > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Added to TODO. This will improve VACUUM ANALYZE performance, thought I > don't think we have btree comparison functions for all data types, > though we should: > * change VACUUM ANALYZE to use btree comparison functions, not <,=,> calls There are several places that know more than they should about the meaning of "<" etc operators. For example, the parser assumes it should use "<" and ">" to implement ORDER BY [DESC]. Making VACUUM not depend on specific names for the ordering operators will not improve life unless we fix *all* of these places. Rather than depending on btree to tell us which way is up, maybe the pg_type row for a type ought to specify the standard ordering operators for the type directly. While we are at it we could think about saying that there is just one "standard ordering operator" for a type and it yields a strcmp-like result (minus, zero, plus) rather than several ops yielding booleans. But that'd take a lot of changes in btree and everywhere else... regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Added to TODO. This will improve VACUUM ANALYZE performance, thought I > > don't think we have btree comparison functions for all data types, > > though we should: > > > * change VACUUM ANALYZE to use btree comparison functions, not <,=,> calls > > There are several places that know more than they should about the > meaning of "<" etc operators. For example, the parser assumes it > should use "<" and ">" to implement ORDER BY [DESC]. Making VACUUM > not depend on specific names for the ordering operators will not > improve life unless we fix *all* of these places. Actually, I thought it would be good for performance reasons, not for portability. We would call one function per attribute instead of three. > > Rather than depending on btree to tell us which way is up, maybe the > pg_type row for a type ought to specify the standard ordering operators > for the type directly. > > While we are at it we could think about saying that there is just one > "standard ordering operator" for a type and it yields a strcmp-like > result (minus, zero, plus) rather than several ops yielding booleans. > But that'd take a lot of changes in btree and everywhere else... > The btree comparison functions do just that, returning -1,0,1 like strcmp, for each type btree supports. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >>>> * change VACUUM ANALYZE to use btree comparison functions, not <,=,> calls >> >> There are several places that know more than they should about the >> meaning of "<" etc operators. For example, the parser assumes it >> should use "<" and ">" to implement ORDER BY [DESC]. Making VACUUM >> not depend on specific names for the ordering operators will not >> improve life unless we fix *all* of these places. > Actually, I thought it would be good for performance reasons, not for > portability. We would call one function per attribute instead of three. No such luck: what VACUUM wants to do is figure out whether the current value is less than the min-so-far (one "<" call), greater than the max-so-far (one ">") call, and/or equal to the candidate most-frequent values it has (one "=" call apiece). Same number of function calls if it's using a "compare" function. I suppose you'd save a little time by only looking up one operator function per column instead of three, but it's hard to think that'd be measurable let alone significant. There's not going to be any per-tuple savings. >> While we are at it we could think about saying that there is just one >> "standard ordering operator" for a type and it yields a strcmp-like >> result (minus, zero, plus) rather than several ops yielding booleans. >> But that'd take a lot of changes in btree and everywhere else... > The btree comparison functions do just that, returning -1,0,1 like > strcmp, for each type btree supports. Right, and that's useful for btree because it saves compares, but it doesn't really help VACUUM noticeably. After writing the above quote, I realized that you can't really define a type's ordering just in terms of a strcmp-like operator with no other baggage. That might be enough for building a btree index, but in order to *do* anything with the index, the optimizer and executor have to understand the relationship of the index ordering to the things that a user would write in a query, such as "WHERE A >= 12 AND A < 100" or "ORDER BY column USING >". So there has to be information relating these user-available operators to the type's ordering, as well. (We do have that, in the form of the pg_amop table entries. The point is that you can't get away with much less information than is contained in pg_amop.) As far as I can see, the only thing that's really at stake here is not hardwiring the semantics of the operator names "<", "=", ">" into the system. While that'd be nice from a cleanliness/data-type-independence point of view, it's not clear that it has any real practical significance. Any data type designer who didn't make "=" mean equals ought to be shot anyway ;-). So upon second thought I think I'd put this *way* down the to-do list... regards, tom lane
> As far as I can see, the only thing that's really at stake here is not > hardwiring the semantics of the operator names "<", "=", ">" into the > system. While that'd be nice from a cleanliness/data-type-independence > point of view, it's not clear that it has any real practical > significance. Any data type designer who didn't make "=" mean equals > ought to be shot anyway ;-). So upon second thought I think I'd put > this *way* down the to-do list... Removed from TODO list. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026