最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sorting - Why is the cutoff value to insertion sort for small sub-arrays in optimizing quicksort algorithm is system-dependent?

programmeradmin1浏览0评论

In pp.296 Sedgewick & et al.'s Algorithm, 4rd edition, the author wrote:

The optimum value of the cutoff M is system-dependent, but any value between 5 and 15 is likely to work well in most situations.

But I don't understand what it means by the cutoff value is system-dependent because the performance of an algorithm is measured is based on the number of operations it performs not and independent of the speed of the computer processor?

In pp.296 Sedgewick & et al.'s Algorithm, 4rd edition, the author wrote:

The optimum value of the cutoff M is system-dependent, but any value between 5 and 15 is likely to work well in most situations.

But I don't understand what it means by the cutoff value is system-dependent because the performance of an algorithm is measured is based on the number of operations it performs not and independent of the speed of the computer processor?

Share Improve this question asked Jan 20 at 7:05 Kt StudentKt Student 1591 silver badge4 bronze badges 7
  • 1 I haven't read the book, so I can't say what the author is aiming at, but optimizing for small values by definition is not related to big O notation. Big O is about asymptotic complexity, i.e., behavior for large inputs. When optimizing for small values, it's all about the constants, which are machine-dependent. – Vincent van der Weele Commented Jan 20 at 7:27
  • @VincentvanderWeele correct me if I'm wrong but I think the notation is related to the performance measurement because the optimization of the algorithm in this case because when sorting the whole array is divided into MANY small sub-arrays where the optimization begins? – Kt Student Commented Jan 20 at 7:34
  • 2 Yeah, that's why I'm not sure what point the author is trying to make. Thing is, you can choose any constant M and say that the problem for subarrays shorter than M is O(1). This doesn't impact the analysis of the whole algorithm, it stays O(n log n). The definition of O(n log n) is that there exists a constant C such that the runtime is less that C * n log n. The choice of M does have an impact on the constant C. – Vincent van der Weele Commented Jan 20 at 8:04
  • As @VincentvanderWeele said it, for small containers, it's machine-dependent (and also data-dependent...) and you can't ignore the constants around the O(...) value. To give you a degenerated example: you shouldn't invoke quicksort to sort an array of TWO numbers that never grow... Because depending on your platform, you can even sort it in only one CPU instruction (test and swap), so a O(1) complexity. – Wisblade Commented Jan 20 at 9:43
  • Due to cache being faster than ram, the range on systems in the last 10 years is more like 16 to 64. – rcgldr Commented Jan 20 at 11:03
 |  Show 2 more comments

1 Answer 1

Reset to default 1

I have a 2nd Edition, from 1984. On page 112, under the heading "Small Subfiles" in a discussion of Quicksort:

The second improvement stems from the observation that a recursive program is guaranteed to call itself for many small subfiles, so it should be changed to use a better method when small subfiles are encountered. One obvious way to do this is to change the test at the beginning of the recursive routine from if r>l then to a call on insertion sort (modified to accept parameters defining the subfile to be sorted), that is if r-l <= M then insertion(l,r). Here, M is some parameter whose exact value depends upon the implementation. The value chosen for M need not be the best possible: the algorithm works about the same for M in the range from about 5 to about 25. The reduction in the running time is on the order of 20% for most applications.

There are a couple of points here.

You're right: the performance of an algorithm is measured based on the number of operations it performs, not on the speed of the processor.

As I explained in my answer to a similar question, Quicksort is a hugely complicated algorithm when compared to Insertion sort. There is considerable bookkeeping overhead involved. That is, there are certain fixed costs with Quicksort, regardless of how large or small the subarray you're sorting. As the array gets smaller, the percentage of time spent in overhead increases.

Insertion sort is a very simple sorting algorithm. There is very little bookkeeping overhead. Insertion sort can be faster than Quicksort for small arrays because Insertion sort actually performs fewer operations than Quicksort does when the arrays are very small. The variation (as the text says, from 5 to 25) depends on the exact algorithm implementations.

Sedgewick's book, as with many other algorithms texts, often blurs the line between the theoretical and the practical. I think it's good to keep the practical in mind, but either the author should make it clear when he's talking about actual performance and theoretical performance, or the instructor should clarify that in class.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论