Thread: database sorting algorithms.

database sorting algorithms.

From
Jian He
Date:

Been self study Database, from database I deep dived into sorting algorithms.

Databases can do in-memory QuickSort. It also has an on-disk MergeSort.

For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108  (around 1 minutes only)

But I am still not fully understanding about nlogn. I understand how many passes it will take, that is logn. 
Yes each pass will sort N elements.
But I still don't get the N stand for in n*logn.
Why does each pass take  n time to sort it?






Re: database sorting algorithms.

From
Francisco Olarte
Date:
Jian He:

On Sat, May 1, 2021 at 9:07 AM Jian He <hejian.mark@gmail.com> wrote:
> Been self study Database, from database I deep dived into sorting algorithms.

Peek a good book, because that is a hairy topic with lot of math and
other previous knowledge required.

> Databases can do in-memory QuickSort. It also has an on-disk MergeSort.

If you ignore top-n and other stuff, just full sorts, databases uses
two types of sorting, internal sorts, which are the ones used when you
can fit your data on ram, and external sorts, when they have to spill
to tape/disk/whatever.

For internal sorts you can use quicksort, heapsort, mergesort, even
insertion sort. Quicksort with fallback to insertion is a popular one
( quicksort and merge sort are divide and conquer algorithms, but once
they have divided enough it is easier to fallback to a simpler
algorithm ).

When the data no longers fit in ram you need to make an external sort.
The classic way to do this is to sort chunks as big as they fit in
ram, write presorted chunks to disk and then merge those chunks. These
is called merge sort too, and is similar to a memory merge sort, but
not the same ( when I've done it I use two way merge for memory merge
sorts, but N-way for disk sort ( current memories are so big I never
need to do more than 1 merge pass since the tape era )

> For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108  (around 1 minutes only)
> But I am still not fully understanding about nlogn. I understand how many passes it will take, that is logn.
> Yes each pass will sort N elements.
> But I still don't get the N stand for in n*logn.

Basically you do logn passes over N elements.

> Why does each pass take  n time to sort it?

I have not time for looking at the video, but basic binary merge sort
is N log N because you have logN passes and each one must merge N
elements. Imagine you do an 8 elements pure merge sort.
1st you split it into 4+4 elements, recurse to sort them and merge
them.  Merging is an N-complexity op because you need to read the 8
elements.
4 elements are split into 2+2, same thing, merge neads four reads, but
you have two merges, so eight reads again.
2 elements are split into 1+1, two reads for merging, but you have
four chungk, so eight again.

Total, 3 passes ( log2(8) ) of eight reads.

Real sorts are more complex, but the N is always present, as you
always need to at least read the full input set to sort.

Note, in external sort you may have N elements, of which only M fit in ram.
You do C=N/M chunks , these need C*MlogM = N logMtime to be sorted
and C*M=N time to be written to and read from disk.
But to merge C chunks you basically do logC comparison for element, so
add N*logC N(logN-logM). If you add appropiate constants and add all
you'll find the final result is O(NlogN).

Francisco Olarte.








Francisco Olarte.



Re: database sorting algorithms.

From
Gavan Schneider
Date:

On 1 May 2021, at 17:06, Jian He wrote:

Been self study Database, from database I deep dived into sorting algorithms.

Databases can do in-memory QuickSort. It also has an on-disk MergeSort.

For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108
(around 1 minutes only)

Also check https://en.wikipedia.org/wiki/Merge_sort

But I am still not fully understanding about nlogn. I understand how many
passes it will take, that is* logn. *
Yes each pass will sort N elements.
But I still don't get the N stand for in nlogn.*

So, answering the question…
The ’n’ refers to the need to do something to each element at least once, so the sort time grows in simple proportion to the size of the list that needs to be sorted. Unfortunately that is not enough work to get the list sorted so extra steps are needed. The log(n) indicates how many extra steps are needed. So the overall performance is proportional to the number of elements (N) multiplied by the log of the number of elements, viz., N * log(N)

Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

Re: database sorting algorithms.

From
Jian He
Date:

Thanks a lot.
I found out about this Youtube video (https://youtu.be/alJswNJ4P3U?t=1852), in case you guys are interested.
This video really clarify about the time complixty of MergeSort.

On Sat, May 1, 2021 at 3:19 PM Gavan Schneider <list.pg.gavan@pendari.org> wrote:

On 1 May 2021, at 17:06, Jian He wrote:

Been self study Database, from database I deep dived into sorting algorithms.

Databases can do in-memory QuickSort. It also has an on-disk MergeSort.

For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108
(around 1 minutes only)

Also check https://en.wikipedia.org/wiki/Merge_sort

But I am still not fully understanding about nlogn. I understand how many
passes it will take, that is* logn. *
Yes each pass will sort N elements.
But I still don't get the N stand for in nlogn.*

So, answering the question…
The ’n’ refers to the need to do something to each element at least once, so the sort time grows in simple proportion to the size of the list that needs to be sorted. Unfortunately that is not enough work to get the list sorted so extra steps are needed. The log(n) indicates how many extra steps are needed. So the overall performance is proportional to the number of elements (N) multiplied by the log of the number of elements, viz., N * log(N)

Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920