Re: 回复: An implementation of multi-key sort - Mailing list pgsql-hackers
From | Yao Wang |
---|---|
Subject | Re: 回复: An implementation of multi-key sort |
Date | |
Msg-id | CACq1W05frccXe1qfPU29aSCB+A_qCRbywJdTX5DphM9ZWPvsxQ@mail.gmail.com Whole thread Raw |
In response to | Re: 回复: An implementation of multi-key sort (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: 回复: An implementation of multi-key sort
Re: 回复: An implementation of multi-key sort Re: 回复: An implementation of multi-key sort |
List | pgsql-hackers |
Hi John, Thanks for your kind message. I talked to Heikki before getting Tomas's response, and he said "no promise but I will take a look". That's why I added his email. I have updated the CF entry and added Tomas as reviewer. Hi Tomas, Again, I'd say a big thank to you. The report and script are really, really helpful. And your ideas are very valuable. Firstly, the expectation of mksort performance: 1. When mksort works well, it should be faster than qsort because it saves the cost of comparing duplicated values every time. 2. When all values are distinct at a particular column, the comparison will finish immediately, and mksort will actually fall back to qsort. For the case, mksort should be equal or a bit slower than qsort because it need to maintain more complex state. Generally, the benefit of mksort is mainly from duplicated values and sort keys: the more duplicated values and sort keys are, the bigger benefit it gets. Analysis on the report in your previous mail -------------------------------------------- 1. It seems the script uses $count to specify the duplicated values: number of repetitions for each value (ndistinct = nrows/count) However, it is not always correct. For type text, the script generates values like this: expr="md5(((i / $count) + random())::text)" But md5() generates totally random values regardless of $count. Some cases of timestamptz have the same problem. For all distinct values, the sort will finish at first depth and fall to qsort actually. 2. Even for the types with correct duplicated setting, the duplicated ratio is very small: e.g. say $nrows = 10000 and $count = 100, only 1% duplicated rows can go to depth 2, and only 0.01% of them can go to depth 3. So it still works on nearly all distinct values. 3. Qsort of PG17 uses kind of specialization for tuple comparator, i.e. it uses specialized functions for different types, e.g. qsort_tuple_unsigned() for unsigned int. The specialized comparators avoid all type related checks and are much faster than regular comparator. That is why we saw 200% or more regression for the cases. Code optimizations I did for mk qsort ------------------------------------- 1. Adapted specialization for tuple comparator. 2. Use kind of "hybrid" sort: when we actually adapt bubble sort due to limited sort items, use bubble sort to check datums since specified depth. 3. Other other optimizations such as pre-ordered check. Analysis on the new report -------------------------- I also did some modifications to your script about the issues of data types, plus an output about distinct value count/distinct ratio, and an indicator for improvement/regression. I attached the new script and a report on a data set with 100,000 rows and 2, 5, 8 columns. 1. Generally, the result match the expectation: "When mksort works well, it should be faster than qsort; when mksort falls to qsort, it should be equal or a bit slower than qsort." 2. For all values of "sequential" (except text type), mksort is a bit slower than qsort because no actual sort is performed due to the "pre-ordered" check. 3. For int and bigint type, mksort became faster and faster when there were more and more duplicated values and sort keys. Improvement of the best cases is about 58% (line 333) and 57% (line 711). 4. For timestamptz type, mksort is a bit slower than qsort because the distinct ratio is always 1 for almost all cases. I think more benefit is available by increasing the duplicated values. 5. For text type, mksort is faster than qsort for all cases, and improvement of the best case is about 160% (line 1510). It is the only tested type in which specialization comparators are disabled. Obviously, text has much better improvement than others. I suppose the cause is about the specialisation comparators: for the types with them, the comparing is too faster so the cost saved by mksort is not significant. Only when saved cost became big enough, mksort can defeat qsort. For other types without specialisation comparators, mksort can defeat qsort completely. It is the "real" performance of mksort. Answers for some other questions you mentioned ---------------------------------------------- Q1: Why are almost all the cases that got better for a single-column sort? A: mksort is enabled only for multi column sort. When there is only one column, qsort works. So we can simply ignore the cases. Q2: Why did the perf become worse by just reversing the sort keys? A: In the example we used, the sort keys are ordered from more duplicated to less. Please see the SQL: update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3; update t1 set c6 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbb' || (c1 % 5)::text; So c6 has most duplicated values, c5 has less, and so on. By the order "C6, c5, c4, ...", mksort can take effect on every sort key. By the reverse order "c2, c3, c4...", mksort almost finished on first sort key (c2) because it has only 1% duplicated values, and fell back to qsort actually. Based on the new code, I reran the example, and got about 141% improvement for order "c6, c5, c4...", and about -4% regression for order "c2, c3, c4...". Q3: Does mksort work effectively only for particular type, e.g. string? A: No, the implementation of mksort does not distinguish data type for special handling. It just calls existing comparators which are also used by qsort. I used long prefix for string just to enlarge the time cost of comparing to amplify the result. The new report shows mksort can work effectively on non-string types and string without long prefix. Q4: Was the algorithm good for a hardware at that time, but the changes (e.g. the growing important of on-CPU caches) made it less relevant? A: As my understanding, the answer is no because the benefit of mksort is from saving cost for duplicated comparison, which is not related to hardware. I suppose the new report can prove it. However, the hardware varying definitely affects the perf, especially considering that the perf different between mksort and qsort is not so big when mksort falls back to qsort. I am not able to test on a wide range of hardwares, so any finding is appreciated. Potential improvement spaces ---------------------------- I tried some other optimizations but didn't add the code finally because the benefit is not very sure and/or the implementation is complex. Just raise them for more discussion if necessary: 1. Use distinct stats info of table to enable mksort It's kind of heuristics: in optimizer, check Form_pg_statistic->stadistinct of a table via pg_statistics. Enable mksort only when it is less than a threshold. The hacked code works, which need to modify a couple of interfaces of optimizer. In addition, a complete solution should consider types and distinct values of all columns, which might be too complex, and the benefit seems not so big. 2. Cache of datum positions e.g. for heap tuple, we need to extract datum position from SortTuple by extract_heaptuple_from_sorttuple() for comparing, which is executed for each datum. By comparison, qsort does it once for each tuple. Theoretically we can create a cache to remember the datum positions to avoid duplicated extracting. The hacked code works, but the improvement seems limited. Not sure if more improvement space is available. 3. Template mechanism Qsort uses kind of template mechanism by macro (see sort_template.h), which avoids cost of runtime type check. Theoretically template mechanism can be applied to mksort, but I am hesitating because it will impose more complexity and the code will become difficult to maintain. Please let me know your opinion, thanks! Yao Wang -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Attachment
pgsql-hackers by date: