Problem Statement
Given an unsorted integer array nums, return the length of the longest sequence of consecutive integers present in nums. The sequence must be consecutive in value, but the elements can appear in any order in the input array.
LeetCode: 128. Longest Consecutive Sequence
Difficulty: Medium
How the Algorithm Works
💡 One Sentence Rule (Key Insight)
A number starts a consecutive sequence only if its predecessor (num - 1) doesn't exist; use a set for O(1) lookups to identify sequence starts and count forward.
This rule captures the ESSENCE of the algorithm. Every step we take follows from this rule.
Algorithm Overview
The naive approach would sort the array first (O(n log n)) and then scan for consecutive runs. However, we can achieve O(n) time by using a clever hash set approach.
The key insight is that we only need to start counting from numbers that are sequence starts. A number is a sequence start if (num - 1) is not in the array. This prevents us from counting the same sequence multiple times.
By converting the array to a set, we get O(1) membership checks. We then iterate through unique numbers, identify sequence starts, and count consecutive numbers forward.
Step-by-Step Process
When we need to find the longest consecutive sequence:
- Handle edge case: If array is empty, return 0 ← No sequence possible
- Create a set from array ← Enables O(1) lookups and removes duplicates
- For each number in the set:
- Check if it's a sequence start: Is
(num - 1)absent? ← If yes, this starts a new sequence - If it's a start, count forward: Keep checking
(num + 1),(num + 2), etc. ← Count consecutive numbers - Track maximum length found ← Update longest streak
- Check if it's a sequence start: Is
- Return the maximum length ← This is our answer
Why It Works
The algorithm works because of a simple but powerful observation:
- Every consecutive sequence has exactly one start (the smallest number)
- By only counting from sequence starts, we ensure each sequence is counted exactly once
- The inner while loop only runs when we're at a sequence start, so despite appearing nested, the total work is O(n)
Time Complexity Analysis:
- Creating the set: O(n)
- Outer loop: O(n) iterations
- Inner while loop: Runs only for sequence starts, and each number is visited at most once across all iterations
- Total: O(n) time with O(n) space
Input
nums = [100, 4, 200, 1, 3, 2]
Output
4
Explanation: The longest consecutive sequence is [1, 2, 3, 4], which has length 4. Note that 100 and 200 are isolated values that don't form longer sequences.
Solution
def longest_consecutive(nums):
"""Find the longest consecutive sequence in O(n) time."""
if not nums:
return 0
# Convert to set for O(1) lookups
num_set = set(nums)
longest_streak = 0
# Check each number
for num in num_set:
# Only start counting if this is a sequence start
# (i.e., num - 1 is not in the set)
if (num - 1) not in num_set:
current_num = num
current_streak = 1
# Count consecutive numbers forward
while (current_num + 1) in num_set:
current_num += 1
current_streak += 1
# Update maximum length
longest_streak = max(longest_streak, current_streak)
return longest_streak
Complexity Analysis
Time Complexity: O(n) where n is the length of the input array
- Creating the set: O(n)
- Iterating through the set: O(n)
- The inner while loop might seem like it makes this O(n²), but it's not! Each number is visited at most twice (once in the outer loop, once in the inner loop when it's part of a sequence), so the total work is still O(n)
Space Complexity: O(n)
- We store all numbers in a set, which takes O(n) space
Applying the One Sentence Rule
Use a hashmap whenever you need to track relationships, frequencies, or quick lookups to avoid O(n²) nested loops.
For this problem:
- What are we tracking? The presence of numbers for O(1) membership checks
- What nested loop are we avoiding? Comparing every number with every other to find consecutive sequences (O(n²))
- How does the hashmap help? Set (a type of hashmap) gives us O(1) lookup to check if
num - 1ornum + 1exists - What's the key insight? Only count from sequence starts (where
num - 1doesn't exist) to avoid redundant work
Applying the rule:
The hash set approach is perfect here because we need to repeatedly check "does this number exist?" - exactly what hash sets excel at with O(1) time.
Example walkthrough for [100, 4, 200, 1, 3, 2]:
-
Create set:
{100, 4, 200, 1, 3, 2}(order doesn't matter in a set) -
Check 100:
- Is 99 in set? ✗ → This is a sequence start!
- Count forward: 101 in set? ✗
- Sequence length: 1
-
Check 4:
- Is 3 in set? ✓ → NOT a sequence start (3 comes before it)
- Skip counting (we'll count from 1 instead)
-
Check 200:
- Is 199 in set? ✗ → This is a sequence start!
- Count forward: 201 in set? ✗
- Sequence length: 1
-
Check 1:
- Is 0 in set? ✗ → This is a sequence start!
- Count forward: 2 in set? ✓ (streak = 2)
- Count forward: 3 in set? ✓ (streak = 3)
- Count forward: 4 in set? ✓ (streak = 4)
- Count forward: 5 in set? ✗
- Sequence length: 4 ✓ Maximum found!
-
Check 3 and 2: Both have predecessors in the set, so we skip them
Result: The longest consecutive sequence has length 4.
Notice how we only counted sequences from their starts (1, 100, 200) and skipped numbers that weren't starts. This is why the algorithm is O(n) - each number participates in counting at most once.
Key Insights
-
Sequence Start Detection: The check
(num - 1) not in num_setis the crucial optimization. It ensures we only count each sequence once from its smallest element. -
Set vs Sorting: Using a set gives us O(n) time vs O(n log n) for sorting. The set approach also handles duplicates naturally.
-
Seemingly Nested Loops, Actually Linear: While there's a for loop with a while loop inside, the total iterations across both loops equals at most 2n (each number visited at most twice), making it O(n).
-
Hash Set = Unordered Hashmap: A set is essentially a hashmap with only keys (no values), still providing O(1) lookups.
-
Works with Duplicates and Negatives: The problem mentions these as possibilities. Converting to a set automatically handles duplicates, and the algorithm works fine with negative numbers.
-
Why Not Just Sort? Sorting would work but takes O(n log n) time. The challenge specifically asks for O(n) time, making the hash set approach necessary.
Edge Cases Handled
- Empty array: Returns 0 (handled by initial check)
- Single element: Returns 1 (the element itself is a sequence of length 1)
- All duplicates: Set deduplication handles this (e.g.,
[1,1,1]→ set{1}→ length 1) - No consecutive numbers: Returns 1 (each number is its own sequence)
- Negative numbers: Works seamlessly (e.g.,
[-2, -1, 0, 1]→ length 4) - Large gaps: Efficiently skips over gaps (e.g.,
[1, 1000000]→ quickly identifies two sequences of length 1)