Longest Consecutive Sequence: The Hash Set Pattern for O(n) Time

Problem Statement

Given an unsorted integer array nums, return the length of the longest sequence of consecutive integers present in nums. The sequence must be consecutive in value, but the elements can appear in any order in the input array.

LeetCode: 128. Longest Consecutive Sequence

Difficulty: Medium

How the Algorithm Works

💡 One Sentence Rule (Key Insight)

A number starts a consecutive sequence only if its predecessor (num - 1) doesn't exist; use a set for O(1) lookups to identify sequence starts and count forward.

This rule captures the ESSENCE of the algorithm. Every step we take follows from this rule.

Algorithm Overview

The naive approach would sort the array first (O(n log n)) and then scan for consecutive runs. However, we can achieve O(n) time by using a clever hash set approach.

The key insight is that we only need to start counting from numbers that are sequence starts. A number is a sequence start if (num - 1) is not in the array. This prevents us from counting the same sequence multiple times.

By converting the array to a set, we get O(1) membership checks. We then iterate through unique numbers, identify sequence starts, and count consecutive numbers forward.

Step-by-Step Process

When we need to find the longest consecutive sequence:

Handle edge case: If array is empty, return 0 ← No sequence possible
Create a set from array ← Enables O(1) lookups and removes duplicates
For each number in the set:
- Check if it's a sequence start: Is (num - 1) absent? ← If yes, this starts a new sequence
- If it's a start, count forward: Keep checking (num + 1), (num + 2), etc. ← Count consecutive numbers
- Track maximum length found ← Update longest streak
Return the maximum length ← This is our answer

Why It Works

The algorithm works because of a simple but powerful observation:

Every consecutive sequence has exactly one start (the smallest number)
By only counting from sequence starts, we ensure each sequence is counted exactly once
The inner while loop only runs when we're at a sequence start, so despite appearing nested, the total work is O(n)

Time Complexity Analysis:

Creating the set: O(n)
Outer loop: O(n) iterations
Inner while loop: Runs only for sequence starts, and each number is visited at most once across all iterations
Total: O(n) time with O(n) space

Input

nums = [100, 4, 200, 1, 3, 2]

Output

Explanation: The longest consecutive sequence is [1, 2, 3, 4], which has length 4. Note that 100 and 200 are isolated values that don't form longer sequences.

Solution

def longest_consecutive(nums):
    """Find the longest consecutive sequence in O(n) time."""
    if not nums:
        return 0
    
    # Convert to set for O(1) lookups
    num_set = set(nums)
    longest_streak = 0
    
    # Check each number
    for num in num_set:
        # Only start counting if this is a sequence start
        # (i.e., num - 1 is not in the set)
        if (num - 1) not in num_set:
            current_num = num
            current_streak = 1
            
            # Count consecutive numbers forward
            while (current_num + 1) in num_set:
                current_num += 1
                current_streak += 1
            
            # Update maximum length
            longest_streak = max(longest_streak, current_streak)
    
    return longest_streak

Complexity Analysis

Time Complexity: O(n) where n is the length of the input array

Creating the set: O(n)
Iterating through the set: O(n)
The inner while loop might seem like it makes this O(n²), but it's not! Each number is visited at most twice (once in the outer loop, once in the inner loop when it's part of a sequence), so the total work is still O(n)

Space Complexity: O(n)

We store all numbers in a set, which takes O(n) space

Applying the One Sentence Rule

Use a hashmap whenever you need to track relationships, frequencies, or quick lookups to avoid O(n²) nested loops.

For this problem:

What are we tracking? The presence of numbers for O(1) membership checks
What nested loop are we avoiding? Comparing every number with every other to find consecutive sequences (O(n²))
How does the hashmap help? Set (a type of hashmap) gives us O(1) lookup to check if num - 1 or num + 1 exists
What's the key insight? Only count from sequence starts (where num - 1 doesn't exist) to avoid redundant work

Applying the rule:

The hash set approach is perfect here because we need to repeatedly check "does this number exist?" - exactly what hash sets excel at with O(1) time.

Example walkthrough for [100, 4, 200, 1, 3, 2]:

Create set: {100, 4, 200, 1, 3, 2} (order doesn't matter in a set)
Check 100:
- Is 99 in set? ✗ → This is a sequence start!
- Count forward: 101 in set? ✗
- Sequence length: 1
Check 4:
- Is 3 in set? ✓ → NOT a sequence start (3 comes before it)
- Skip counting (we'll count from 1 instead)
Check 200:
- Is 199 in set? ✗ → This is a sequence start!
- Count forward: 201 in set? ✗
- Sequence length: 1
Check 1:
- Is 0 in set? ✗ → This is a sequence start!
- Count forward: 2 in set? ✓ (streak = 2)
- Count forward: 3 in set? ✓ (streak = 3)
- Count forward: 4 in set? ✓ (streak = 4)
- Count forward: 5 in set? ✗
- Sequence length: 4 ✓ Maximum found!
Check 3 and 2: Both have predecessors in the set, so we skip them

Result: The longest consecutive sequence has length 4.

Notice how we only counted sequences from their starts (1, 100, 200) and skipped numbers that weren't starts. This is why the algorithm is O(n) - each number participates in counting at most once.

Key Insights

Sequence Start Detection: The check (num - 1) not in num_set is the crucial optimization. It ensures we only count each sequence once from its smallest element.
Set vs Sorting: Using a set gives us O(n) time vs O(n log n) for sorting. The set approach also handles duplicates naturally.
Seemingly Nested Loops, Actually Linear: While there's a for loop with a while loop inside, the total iterations across both loops equals at most 2n (each number visited at most twice), making it O(n).
Hash Set = Unordered Hashmap: A set is essentially a hashmap with only keys (no values), still providing O(1) lookups.
Works with Duplicates and Negatives: The problem mentions these as possibilities. Converting to a set automatically handles duplicates, and the algorithm works fine with negative numbers.
Why Not Just Sort? Sorting would work but takes O(n log n) time. The challenge specifically asks for O(n) time, making the hash set approach necessary.

Edge Cases Handled

Empty array: Returns 0 (handled by initial check)
Single element: Returns 1 (the element itself is a sequence of length 1)
All duplicates: Set deduplication handles this (e.g., [1,1,1] → set {1} → length 1)
No consecutive numbers: Returns 1 (each number is its own sequence)
Negative numbers: Works seamlessly (e.g., [-2, -1, 0, 1] → length 4)
Large gaps: Efficiently skips over gaps (e.g., [1, 1000000] → quickly identifies two sequences of length 1)