I was attempting LeetCode question 128. Longest Consecutive Sequence:
Given an unsorted array of integers
nums
, return the length of the longest consecutive elements sequence.You must write an algorithm that runs in
O(n)
time.Example 1:
Input:
nums = [100,4,200,1,3,2]
Output:4
Explanation: The longest consecutive elements sequence is[1, 2, 3, 4]
. Therefore its length is4
.Constraints:
0 <= nums.length <= 105
-109 <= nums[i] <= 109
My first attempt did sorting followed by counting. This has O(nlogn) time complexity, but it surprisingly gave me 93.93% percentile for time complexity (40ms).
I then re-read the question and realised the answer must be in O(n) time complexity. So I wrote the following code:
def longestConsecutive(self, nums: List[int]) -> int:
s = set(nums)
longest_streak = 0
for num in nums:
if (num - 1) not in s:
current_streak = 1
while (num + 1) in s:
num += 1
current_streak += 1
longest_streak = max(longest_streak, current_streak)
return longest_streak
(I know, it's not a great practice to reuse num variable name in the nested loop, but that's beside the point of the question, I have tested using a separate variable as well with the same result as below)
While this should theoretically be O(n) time complexity, faster than my first solution, this actually resulted in time limit exceeded for a couple of the cases and my code was rejected.
I eventually submitted a passing solution after referring to the solution
class Solution:
def longestConsecutive(self, nums: List[int]) -> int:
nums = set(nums)
longest_streak = 0
for num in nums:
if (num - 1) not in nums:
next_num = num + 1
while next_num in nums:
next_num += 1
longest_streak = max(longest_streak, next_num - num)
return longest_streak
where I identified 2 key differences:
- I reassigned nums to a set in-place instead of a new variable
- I used next_num instead of keeping a current_streak variable
However, both of these changes does not seem like it should have significant impact on the runtime, enough to cross the line between time-limit exceeded into a passing solution. To puzzle me even more, this O(n) solution still performed worse than my sorting solution, ranking only at 75.73% percentile (46 ms).
So my questions are:
- Why does a O(nlogn) algorithm perform faster than O(n) in practice?
- Why is my first O(n) algorithm so slow that it reached time-limit exceeded while my second algorithm with minimal changes could pass?
I was attempting LeetCode question 128. Longest Consecutive Sequence:
Given an unsorted array of integers
nums
, return the length of the longest consecutive elements sequence.You must write an algorithm that runs in
O(n)
time.Example 1:
Input:
nums = [100,4,200,1,3,2]
Output:4
Explanation: The longest consecutive elements sequence is[1, 2, 3, 4]
. Therefore its length is4
.Constraints:
0 <= nums.length <= 105
-109 <= nums[i] <= 109
My first attempt did sorting followed by counting. This has O(nlogn) time complexity, but it surprisingly gave me 93.93% percentile for time complexity (40ms).
I then re-read the question and realised the answer must be in O(n) time complexity. So I wrote the following code:
def longestConsecutive(self, nums: List[int]) -> int:
s = set(nums)
longest_streak = 0
for num in nums:
if (num - 1) not in s:
current_streak = 1
while (num + 1) in s:
num += 1
current_streak += 1
longest_streak = max(longest_streak, current_streak)
return longest_streak
(I know, it's not a great practice to reuse num variable name in the nested loop, but that's beside the point of the question, I have tested using a separate variable as well with the same result as below)
While this should theoretically be O(n) time complexity, faster than my first solution, this actually resulted in time limit exceeded for a couple of the cases and my code was rejected.
I eventually submitted a passing solution after referring to the solution
class Solution:
def longestConsecutive(self, nums: List[int]) -> int:
nums = set(nums)
longest_streak = 0
for num in nums:
if (num - 1) not in nums:
next_num = num + 1
while next_num in nums:
next_num += 1
longest_streak = max(longest_streak, next_num - num)
return longest_streak
where I identified 2 key differences:
- I reassigned nums to a set in-place instead of a new variable
- I used next_num instead of keeping a current_streak variable
However, both of these changes does not seem like it should have significant impact on the runtime, enough to cross the line between time-limit exceeded into a passing solution. To puzzle me even more, this O(n) solution still performed worse than my sorting solution, ranking only at 75.73% percentile (46 ms).
So my questions are:
- Why does a O(nlogn) algorithm perform faster than O(n) in practice?
- Why is my first O(n) algorithm so slow that it reached time-limit exceeded while my second algorithm with minimal changes could pass?
3 Answers
Reset to default 3Why does a O(nlogn) algorithm perform faster than O(n) in practice?
Time complexity does not say anything about actual running times for concrete input sizes. It only says something about how running times will evolve asymptotically as the input size grows large.
In general (unrelated to the actual code you presented), we can imagine a O(
if num-1 not in nums and n+longest_sequence in nums:
. The cost-benefit tradeoff depends on the actual data, but I believe the cost in the worst case is pretty low and the savings in the best could be substantial. – Adrian McCarthy Commented yesterday