Re: Double sorting split patch - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Double sorting split patch
Date
Msg-id CAPpHfdvt-8vkD3p5wUCH0A3fCWtmgKxzG6bp23K4aBvOrS5Vxw@mail.gmail.com
Whole thread Raw
In response to Re: Double sorting split patch  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Double sorting split patch
List pgsql-hackers
Path without allocating extra bytes is attached.
I run some more detailed tests on geonames and two smaller datasets from rtreeportal.org.
Test tables are following:
1) test1 - geonames
2) test2 - California Roads, containing the MBRs of 2,249,727 streets
3) test3 - Tiger Streams, containing the MBRs of 194,971 streams
Scripts for test queries generation and execution are attached (scripts.tar.gz).

Build times are given in the following table:
      New linear Double sorting
test1 277.889630 273.766355
test2  73.262561  75.114079
test3   4.251563   4.425139
As we can see, build times differ insignificantly.

Index sizes are given in the following table:
      New linear Double sorting
test1 572383232  578977792
test2 179929088  178569216
test3  15409152   15532032
As we can see, index sizes differ insignificantly.

Data about index quality testing are in the table test_results of the following structure:
tablename - name of the queried table
count - actual count of rows matching query
nominal_count - count of rows matching query which test generator tried to provide (test generator not always can create test query which return exactly same amount of rows as it was expected)
strategy - split strategy: 1 - new liner split(current), 2 - double sorting(this patch)
hits - number of pages hits for query execution.

Raw data are in the attachment (test_result.sql.gz). Report is below.
test=# select tablename, nominal_count, round(avg(count),2) as avg_count, strategy, round(avg(hits),2) as avg_hits from test_results group by tablename, nominal_count, strategy order by tablename, nominal_count, strategy;
 tablename | nominal_count | avg_count | strategy | avg_hits 
-----------+---------------+-----------+----------+----------
 test1     |             1 |      4.87 |        1 |    19.94
 test1     |             1 |      4.87 |        2 |     6.46
 test1     |            10 |     11.07 |        1 |    23.95
 test1     |            10 |     11.07 |        2 |     7.36
 test1     |           100 |    101.36 |        1 |    43.30
 test1     |           100 |    101.36 |        2 |    10.19
 test1     |          1000 |    998.70 |        1 |    86.48
 test1     |          1000 |    998.70 |        2 |    24.21
 test2     |             1 |      1.32 |        1 |     8.67
 test2     |             1 |      1.32 |        2 |     5.99
 test2     |            10 |     11.32 |        1 |     9.40
 test2     |            10 |     11.32 |        2 |     6.71
 test2     |           100 |    102.93 |        1 |    13.10
 test2     |           100 |    102.93 |        2 |     9.02
 test2     |          1000 |    999.67 |        1 |    32.16
 test2     |          1000 |    999.67 |        2 |    23.51
 test3     |             1 |      0.99 |        1 |     6.03
 test3     |             1 |      0.99 |        2 |     4.32
 test3     |            10 |      9.95 |        1 |     7.52
 test3     |            10 |      9.95 |        2 |     5.09
 test3     |           100 |     99.92 |        1 |    10.73
 test3     |           100 |     99.92 |        2 |     7.73
 test3     |          1000 |    999.75 |        1 |    27.47
 test3     |          1000 |    999.75 |        2 |    22.44
(24 rows)

As we can see, double sorting split outperfoms new linear split in all the presented cases. Difference in test2 and test3 is not so big as it is in test1. It can be explained by fact that test2 and test3 are smaller than test1.

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [REVIEW] prepare plans of embedded sql on function start
Next
From: Magnus Hagander
Date:
Subject: Bug in walsender when calling out to do_pg_stop_backup (and others?)