python - How to optimize this algorithm to find the longest substring without any duplicate characters

Today I tackled an interesting Leetcode problem: Finding the longest substring without any duplicating characters. This problem introduced me to the dynamic sliding window technique a powerful tool that I can add to my arsenal.

I implemented the solution in which I used a set to track distinct characters and two pointers l and r. When encountering a duplicate, I made the program shrink from the left to the right and removing them with a while loop.

My solution runs in O(n) but it took 6558ms, which is inefficient. There must be a way to optimize it further. Any suggestions?

class Solution(object):
    def lengthOfLongestSubstring(self, s):
        """
        :type s: str
        :rtype: int
        """
        longest = set()
        l = 0
        max_num = 0
        # "abcabcbb"
        for r in range(len(s)):
            # add the s[r] if not in arr
            while s[r] in longest:
                longest.remove(s[l])
                l += 1

            if s[r] not in longest:
                longest.add(s[r])
                max_num = max(max_num, (r-l) + 1)
                print(longest)
                print(max_num)
                continue

        return max_num

This problem helped me understand choosing the right data structure and technique to use, and how to expand and shrink a sliding window. Have you encountered this problem before? How would you optimize it?

My solution runs in O(n) but it took 6558ms, which is inefficient. There must be a way to optimize it further. Any suggestions?

class Solution(object):
    def lengthOfLongestSubstring(self, s):
        """
        :type s: str
        :rtype: int
        """
        longest = set()
        l = 0
        max_num = 0
        # "abcabcbb"
        for r in range(len(s)):
            # add the s[r] if not in arr
            while s[r] in longest:
                longest.remove(s[l])
                l += 1

            if s[r] not in longest:
                longest.add(s[r])
                max_num = max(max_num, (r-l) + 1)
                print(longest)
                print(max_num)
                continue

        return max_num

Share Improve this question edited 2 days ago ThomasIsCoding 103k9 gold badges37 silver badges102 bronze badges asked Apr 2 at 4:42 Emmanuel Rusiana 394 bronze badges

2 Why do you need if s[r] not in longest? The while loop before this ensures that that's true. – Barmar Commented Apr 2 at 5:11
2 Printing is slow, and probably not part of the problem requirements. What if you just remove the print calls? – user2357112 Commented Apr 2 at 5:17
Can't reproduce, gets accepted in ~230 ms, nowhere near 6558ms. – Kelly Bundy Commented 2 days ago

Add a comment |

4 Answers 4

Sorted by: Reset to default 5

That’s an interesting problem. As mentioned in the comments already, I’d start by removing print statements.

Your intuition using sets is great, but this element lookup in set() inside a while loop, seems to be giving a lot of overhead.

Instead, you could use a heap or a hash map, to store last positions of each character and then look them up as you go further into the array.

You could even use default dict to initialise the char_index.

class Solution(object):
    def lengthOfLongestSubstring(self, s):
        char_index = {}  # hashmap to store last positions of each character
        max_length = 0
        l = 0

        for r, char in enumerate(s):
            if char in char_index and char_index[char] >= l:
                # Move left pointer just after the duplicate's previous index
                l = char_index[char] + 1

            # Update or add the character's latest index
            char_index[char] = r
            
            # Update maximum length found so far
            max_length = max(max_length, r - l + 1)

        return max_length

Some improvements:

Don't increase l with 1, but instead remember where the current character occurred and set l to the index right after it. This way you can eliminate the inner loop. This is what also @kris-szczepaniak has pointed out, suggesting to use a dict.
As the code challenge limits the character range to "English letters, digits, symbols and spaces", which suggests the characters are within the ASCII range, you could use a fixed-size list instead of a set/dict, where the index into that list corresponds to the ASCII code of the character.

Here is how that could look:

    def lengthOfLongestSubstring(self, s: str) -> int:
        # List indices are ASCII codes. List values are indices in s, 
        #     after last occurrence of character that has the ASCII code.
        indexafter = [0] * 128  # ASCII range
        size = 0
        start = 0
        for stop, asc in enumerate(map(ord, s), 1):  # get index *after* current character
            if indexafter[asc] > start:  # character already occurs within window
                start = indexafter[asc]  # shrink window: start after previous occurrence
            indexafter[asc] = stop       # update index to after the current occurrence
            size = max(size, stop - start)
        return size

As noted in comments, you should remove the print() statements as well as the unnecessary test after the while loop.

Which breaks it down to this:

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        left = result = 0
        observed = set()
        for right, c in enumerate(s):
            while c in observed:
                observed.remove(s[left])
                left += 1
            observed.add(c)
            if (r := right - left + 1) > result:
                result = r
        return result

You can use the sliding window algorithm in this way:

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        # Initialize set st, n as sliding wondow pointer, res to store result
        st = set()
        n = 0
        res = 0
        # Iterate through the string s with i as right pointer
        for i in range(len(s)):
            # Handle repeating character
            while s[i] in st:
                st.remove(s[n])
                n += 1
            # Add current character to set
            st.add(s[i])
            # Update result
            res = max(res, i - n + 1)
        # Return result
        return res

Hope this helps

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How to optimize this algorithm to find the longest substring without any duplicate characters - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)