Top 10 Advanced Python DSA Interview Questions and In-Depth Answers

1. How do you find the k-th largest element in an array?

Problem: Find the k-th largest element in an unsorted array.
Solution:

import heapq

def find_kth_largest(nums, k):
    return heapq.nlargest(k, nums)[-1]

# Example
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(find_kth_largest(nums, k))  # Output: 5

Step-by-Step Explanation:

Importing heapq: The heapq module in Python provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.
Defining the Function: The function find_kth_largest(nums, k) takes two arguments:
- nums: A list of numbers.
- k: An integer representing the position of the largest element we want to find.
Using heapq.nlargest():
- The heapq.nlargest(k, nums) function returns a list of the k largest elements from the nums list, sorted in descending order.
- For example, heapq.nlargest(2, [3, 2, 1, 5, 6, 4]) would return [6, 5].
Returning the k-th Largest Element:
- The expression heapq.nlargest(k, nums)[-1] accesses the last element in the list of the k largest elements.
- Since heapq.nlargest(k, nums) returns the k largest elements sorted in descending order, the k-th largest element will be at the -1 index of this list.

Example:

Given nums = [3, 2, 1, 5, 6, 4] and k = 2:
- heapq.nlargest(2, nums) will produce [6, 5].
- The k-th largest element is 5, which is at the -1 index of [6, 5].

Thus, the function correctly identifies the second-largest element in the list, resulting in the output 5.

2. How do you solve the Longest Increasing Subsequence (LIS) problem?

Problem: Find the length of the longest increasing subsequence in an array.
Solution (Using Binary Search):

from bisect import bisect_left

def length_of_lis(nums):
    sub = []
    for num in nums:
        pos = bisect_left(sub, num)
        if pos == len(sub):
            sub.append(num)
        else:
            sub[pos] = num
    return len(sub)

# Example
nums = [10, 9, 2, 5, 3, 7, 101, 18]
print(length_of_lis(nums))  # Output: 4 ([2, 3, 7, 101])

Step-by-Step Explanation:

Importing bisect_left: This function from the bisect module is used to find the insertion point of an element in a sorted list to maintain order. This is crucial for efficiently finding positions to insert elements.
Function Definition:
- sub List: This list will store the smallest possible tail values for all increasing subsequences of various lengths.
- For Loop: Iterate through each element in the input list nums.
Using bisect_left(sub, num):
- Find Position (pos): The position where num can be inserted to maintain the sorted order of sub.
- Condition if pos == len(sub):
  - Append: If pos equals the length of sub, it means num is greater than all current elements in sub, so num is appended to sub.
- Else Condition:
  - Replace: If pos is less than the length of sub, num replaces the element at position pos. This replacement ensures that sub contains the smallest possible tail values for increasing subsequences of various lengths.
Return Length of sub: The length of sub at the end of the loop gives the length of the Longest Increasing Subsequence (LIS).

Example:

Input: nums = [10, 9, 2, 5, 3, 7, 101, 18]
- Iteration Details:
  - 10: sub becomes [10]
  - 9: sub becomes [9] (replaces 10)
  - 2: sub becomes [2] (replaces 9)
  - 5: sub becomes [2, 5]
  - 3: sub becomes [2, 3] (replaces 5)
  - 7: sub becomes [2, 3, 7]
  - 101: sub becomes [2, 3, 7, 101]
  - 18: sub becomes [2, 3, 7, 18] (replaces 101)

Thus, the length of the Longest Increasing Subsequence is 4, corresponding to the subsequence [2, 3, 7, 101].

3. How do you find the median of two sorted arrays?

Problem: Given two sorted arrays, find the median in $O(\log(\text{min}(n, m)))$ time.
Solution:

def find_median_sorted_arrays(nums1, nums2):
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1

    x, y = len(nums1), len(nums2)
    low, high = 0, x

    while low <= high:
        partition_x = (low + high) // 2
        partition_y = (x + y + 1) // 2 - partition_x

        max_left_x = float('-inf') if partition_x == 0 else nums1[partition_x - 1]
        min_right_x = float('inf') if partition_x == x else nums1[partition_x]

        max_left_y = float('-inf') if partition_y == 0 else nums2[partition_y - 1]
        min_right_y = float('inf') if partition_y == y else nums2[partition_y]

        if max_left_x <= min_right_y and max_left_y <= min_right_x:
            if (x + y) % 2 == 0:
                return (max(max_left_x, max_left_y) + min(min_right_x, min_right_y)) / 2
            else:
                return max(max_left_x, max_left_y)
        elif max_left_x > min_right_y:
            high = partition_x - 1
        else:
            low = partition_x + 1

# Example
nums1 = [1, 3]
nums2 = [2]
print(find_median_sorted_arrays(nums1, nums2))  # Output: 2

Step-by-Step Explanation:

Ensure nums1 is the smaller array:
- If nums1 is longer than nums2, swap them. This ensures the binary search is performed on the smaller array to maintain efficiency.
Initialize variables:
- x and y are the lengths of nums1 and nums2, respectively.
- low and high define the search range in nums1.
Binary Search Loop:
- The loop runs as long as low is less than or equal to high.
- partition_x is the partition index in nums1.
- partition_y is the corresponding partition index in nums2.
Calculate partition values:
- max_left_x: Maximum value on the left side of the partition in nums1.
- min_right_x: Minimum value on the right side of the partition in nums1.
- max_left_y: Maximum value on the left side of the partition in nums2.
- min_right_y: Minimum value on the right side of the partition in nums2.
Check for valid partition:
- If the maximum value on the left side of both partitions is less than or equal to the minimum value on the right side, a valid partition is found.
- If the combined length of nums1 and nums2 is even, return the average of the two middle values.
- If the combined length is odd, return the maximum value of the left partition.
Adjust search range:
- If max_left_x is greater than min_right_y, move the high pointer to the left.
- Otherwise, move the low pointer to the right.

Example:

Input: nums1 = [1, 3] and nums2 = [2]
Iterations:
- Ensure nums1 is the smaller array: already true.
- Initial partitions:
  - partition_x = 1, partition_y = 1
  - max_left_x = 1, min_right_x = 3
  - max_left_y = 2, min_right_y = float('inf')
- Valid partition:
  - Combined length is odd (3), return max(max_left_x, max_left_y) = max(1, 2) = 2.

Thus, the function correctly identifies the median as 2.

4. How do you implement a Segment Tree for range sum queries?

Solution:

class SegmentTree:
    def __init__(self, nums):
        n = len(nums)
        self.nums = nums
        self.tree = [0] * (4 * n)
        self.build(0, 0, n - 1)

    def build(self, node, start, end):
        if start == end:
            self.tree[node] = self.nums[start]
        else:
            mid = (start + end) // 2
            left = 2 * node + 1
            right = 2 * node + 2
            self.build(left, start, mid)
            self.build(right, mid + 1, end)
            self.tree[node] = self.tree[left] + self.tree[right]

    def query(self, node, start, end, l, r):
        if r < start or l > end:
            return 0
        if l <= start and r >= end:
            return self.tree[node]
        mid = (start + end) // 2
        left = 2 * node + 1
        right = 2 * node + 2
        return self.query(left, start, mid, l, r) + self.query(right, mid + 1, end, l, r)

    def update(self, node, start, end, idx, val):
        if start == end:
            self.tree[node] = val
        else:
            mid = (start + end) // 2
            left = 2 * node + 1
            right = 2 * node + 2
            if idx <= mid:
                self.update(left, start, mid, idx, val)
            else:
                self.update(right, mid + 1, end, idx, val)
            self.tree[node] = self.tree[left] + self.tree[right]

# Example
nums = [1, 3, 5, 7, 9, 11]
tree = SegmentTree(nums)
print(tree.query(0, 0, len(nums) - 1, 1, 3))  # Output: 15 (sum of [3, 5, 7])
tree.update(0, 0, len(nums) - 1, 1, 10)
print(tree.query(0, 0, len(nums) - 1, 1, 3))  # Output: 22

Explanation: The segment tree allows efficient range queries and updates.

5. How do you solve the Word Ladder problem?

Problem: Find the shortest transformation sequence length from beginWord to endWord by changing one letter at a time.
Solution:

from collections import deque

def word_ladder_length(beginWord, endWord, wordList):
    wordSet = set(wordList)
    if endWord not in wordSet:
        return 0

    queue = deque([(beginWord, 1)])

    while queue:
        word, length = queue.popleft()
        if word == endWord:
            return length

        for i in range(len(word)):
            for c in 'abcdefghijklmnopqrstuvwxyz':
                next_word = word[:i] + c + word[i + 1:]
                if next_word in wordSet:
                    wordSet.remove(next_word)
                    queue.append((next_word, length + 1))

    return 0

# Example
beginWord = "hit"
endWord = "cog"
wordList = ["hot", "dot", "dog", "lot", "log", "cog"]
print(word_ladder_length(beginWord, endWord, wordList))  # Output: 5

Step-by-Step Explanation

Initialization:
- Convert wordList to a set wordSet for efficient lookup.
- If endWord is not in wordSet, return 0 because it's not possible to transform beginWord to endWord.
BFS Setup:
- Initialize a queue (deque) with a tuple containing beginWord and the initial transformation length 1.
Breadth-First Search (BFS):
- Perform BFS to explore all possible transformations.
- Dequeue the front element (current word and its transformation length).
- If the current word is endWord, return its transformation length as the result.
Generating Next Words:
- For each character position in the current word, try replacing it with every letter from 'a' to 'z'.
- Generate the next_word and check if it exists in wordSet.
- If it does, add next_word to the queue with an incremented length and remove it from wordSet to avoid revisiting.
Return Result:
- If the queue is exhausted and endWord is not found, return 0.

Example Walkthrough

Initial Setup

Input: beginWord = "hit", endWord = "cog", wordList = ["hot", "dot", "dog", "lot", "log", "cog"]
wordSet: {"hot", "dot", "dog", "lot", "log", "cog"}

BFS Steps

Start with queue = deque([("hit", 1)])
First Iteration:
- word = "hit", length = 1
- Generate next words:
  - "hot" (valid, enqueue with length 2, remove from wordSet)
Second Iteration:
- word = "hot", length = 2
- Generate next words:
  - "dot" (valid, enqueue with length 3, remove from wordSet)
  - "lot" (valid, enqueue with length 3, remove from wordSet)
Third Iteration:
- word = "dot", length = 3
- Generate next words:
  - "dog" (valid, enqueue with length 4, remove from wordSet)
Fourth Iteration:
- word = "lot", length = 3
- Generate next words:
  - "log" (valid, enqueue with length 4, remove from wordSet)
Fifth Iteration:
- word = "dog", length = 4
- Generate next words:
  - "cog" (valid, enqueue with length 5, remove from wordSet)
- Since cog is the endWord, return length = 5.

Thus, the output is 5 indicating the shortest transformation sequence from "hit" to "cog" has a length of 5

6. How do you detect and remove a cycle in a directed graph?

Solution:

from collections import defaultdict

def has_cycle(graph):
    def dfs(node):
        if visited[node] == 1:
            return True
        if visited[node] == 2:
            return False

        visited[node] = 1
        for neighbor in graph[node]:
            if dfs(neighbor):
                return True
        visited[node] = 2
        return False

    visited = [0] * len(graph)
    for node in range(len(graph)):
        if not visited[node]:
            if dfs(node):
                return True
    return False

# Example
graph = defaultdict(list, {0: [1], 1: [2], 2: [0]})
print(has_cycle(graph))  # Output: True

Step-by-Step Explanation

Graph Representation:
- The graph is represented using a defaultdict of lists. In the example, the graph is {0: [1], 1: [2], 2: [0]}.
- This means there are directed edges from node 0 to 1, from 1 to 2, and from 2 back to 0, forming a cycle.
DFS Function:
- The dfs function performs a Depth-First Search to detect cycles.
- Visited Array: Tracks the state of each node:
  - 0: Unvisited
  - 1: Visiting (currently in the call stack)
  - 2: Fully visited
- Cycle Detection:
  - If a node is 1 (currently in the call stack), it means a cycle is detected because we encountered a node we've already started visiting.
  - If a node is 2 (fully visited), no cycle is found along that path.
DFS Implementation:
- Visiting: Mark the current node as 1.
- Recursive Call: For each neighbor, recursively call dfs. If a cycle is found during any call, return True.
- Fully Visited: Mark the current node as 2 after all its neighbors are processed.
Cycle Detection Loop:
- Iterate over all nodes in the graph.
- If a node hasn't been visited (visited[node] == 0), call dfs on that node.
- If dfs returns True for any node, a cycle is detected.

Example Execution

Graph: `{0: [1], 1: [2], 2: [0]}`

Initial Visited Array: [0, 0, 0] (All nodes unvisited)

Iteration and DFS Calls

Node 0:
- Call dfs(0):
  - Mark 0 as 1: [1, 0, 0]
  - Call dfs(1) for neighbor 1:
    - Mark 1 as 1: [1, 1, 0]
    - Call dfs(2) for neighbor 2:
      - Mark 2 as 1: [1, 1, 1]
      - Call dfs(0) for neighbor 0:
        dfs(0) returns True because node 0 is already 1, indicating a cycle.

Thus, the output is True because the graph contains a cycle (0 -> 1 -> 2 -> 0).

7. How do you solve the "Maximum Rectangle" problem in a binary matrix?

Solution:

def maximal_rectangle(matrix):
    if not matrix:
        return 0

    def largest_histogram(heights):
        stack, max_area = [], 0
        for i, h in enumerate(heights + [0]):
            while stack and heights[stack[-1]] > h:
                height = heights[stack.pop()]
                width = i if not stack else i - stack[-1] - 1
                max_area = max(max_area, height * width)
            stack.append(i)
        return max_area

    heights = [0] * len(matrix[0])
    max_area = 0
    for row in matrix:
        for i in range(len(row)):
            heights[i] = heights[i] + 1 if row[i] == '1' else 0
        max_area = max(max_area, largest_histogram(heights))
    return max_area

# Example
matrix = [
    ["1", "0", "1", "0", "0"],
    ["1", "0", "1", "1", "1"],
    ["1", "1", "1", "1", "1"],
    ["1", "0", "0", "1", "0"]
]
print(maximal_rectangle(matrix))  # Output: 6

Key Steps:

1. Initial Setup:

The algorithm starts by checking if the input matrix is empty.
It initializes a heights array to store the cumulative heights of 1s for each column. This array represents the "height" of the histogram for each row.

2. Row-by-Row Iteration:

The function processes the matrix row by row.
For each row:
- It updates the heights array:
  - If a cell contains '1', the height for that column increases by 1 (continuing the vertical streak of 1s).
  - If a cell contains '0', the height resets to 0 (ending the streak).

3. Largest Histogram Area:

After updating the heights array for a given row, the function calculates the largest rectangular area in the histogram represented by the updated heights array.
This part uses a stack-based approach to efficiently compute the largest rectangle in a histogram:
- The stack keeps track of indices of increasing heights.
- When a smaller height is encountered, rectangles ending at earlier heights are calculated.

4. Updating the Maximum Area:

For each row, the largest area of the histogram is compared to the previously recorded maximum area.
The global maximum area is updated as needed.

5. Result:

After processing all rows, the function returns the maximum area found across all row histograms.

8. How do you solve the "Sliding Window Maximum" problem?

Problem: Given an array nums and an integer k, find the maximum value in every contiguous subarray of size k.
Solution:

from collections import deque

def max_sliding_window(nums, k):
    dq = deque()  # Stores indices
    result = []

    for i in range(len(nums)):
        # Remove elements out of the window
        if dq and dq[0] < i - k + 1:
            dq.popleft()

        # Remove elements smaller than the current element
        while dq and nums[dq[-1]] < nums[i]:
            dq.pop()

        dq.append(i)

        # Append the maximum element of the window
        if i >= k - 1:
            result.append(nums[dq[0]])

    return result

# Example
nums = [1, 3, -1, -3, 5, 3, 6, 7]
k = 3
print(max_sliding_window(nums, k))  # Output: [3, 3, 5, 5, 6, 7]

Function Explanation

Input:

nums: A list of integers.
k: The size of the sliding window.

Output:

A list of the maximum values within each sliding window of size k.

Key Steps:

Initialize Deque and Result List:
- dq: A deque to store indices of array elements.
- result: A list to store the maximum values for each sliding window.
Iterate Over the List:
- Remove Out-of-Window Indices:
  - If the leftmost index in dq is out of the current window (i.e., dq[0] < i - k + 1), remove it.
- Maintain Deque Order:
  - Remove elements from the deque that are smaller than the current element nums[i] (because they are not useful anymore as they cannot be maximum if current element is bigger).
- Add Current Element Index:
  - Append the current index i to the deque.
- Append Maximum for the Current Window:
  - If i is greater than or equal to k - 1, append nums[dq[0]] to result (this is the maximum for the current window).

Example Execution:

Input: nums = [1, 3, -1, -3, 5, 3, 6, 7], k = 3
Output: [3, 3, 5, 5, 6, 7]

Detailed Steps:

Initial State:
- dq = deque([]), result = []
First Window (i = 0 to i = 2):
- i = 0: dq = deque([0])
- i = 1: dq = deque([1]) (removes 0 because nums[0] < nums[1])
- i = 2: dq = deque([1, 2])
- Append nums[1] (3) to result: result = [3]
Second Window (i = 3):
- dq = deque([1, 2, 3])
- Remove 1 because it is out of the current window: dq = deque([2, 3])
- Append nums[1] (3) to result: result = [3, 3]
Third Window (i = 4):
- dq = deque([4]) (removes 2, 3 because nums[2, 3] < nums[4])
- Append nums[4] (5) to result: result = [3, 3, 5]
Fourth Window (i = 5):
- dq = deque([4, 5])
- Append nums[4] (5) to result: result = [3, 3, 5, 5]
Fifth Window (i = 6):
- dq = deque([6]) (removes 4, 5 because nums[4, 5] < nums[6])
- Append nums[6] (6) to result: result = [3, 3, 5, 5, 6]
Sixth Window (i = 7):
- dq = deque([6, 7])
- Append nums[6] (6) to result: result = [3, 3, 5, 5, 6, 7]

Hence, the final output is [3, 3, 5, 5, 6, 7], representing the maximum values in each sliding window of size k.

9. How do you solve the "Minimum Window Substring" problem?

Problem: Given two strings s and t, find the minimum window in s which contains all characters of t.
Solution:

from collections import Counter

def min_window(s, t):
    if not s or not t:
        return ""

    t_count = Counter(t)
    window_count = Counter()

    l, r = 0, 0
    required = len(t_count)
    formed = 0

    min_len = float('inf')
    min_window = (0, 0)

    while r < len(s):
        char = s[r]
        window_count[char] += 1
        if char in t_count and window_count[char] == t_count[char]:
            formed += 1

        while l <= r and formed == required:
            char = s[l]
            if r - l + 1 < min_len:
                min_len = r - l + 1
                min_window = (l, r)

            window_count[char] -= 1
            if char in t_count and window_count[char] < t_count[char]:
                formed -= 1
            l += 1

        r += 1

    l, r = min_window
    return s[l:r+1] if min_len != float('inf') else ""

# Example
s = "ADOBECODEBANC"
t = "ABC"
print(min_window(s, t))  # Output: "BANC"

Explanation: Use a sliding window and two pointers to efficiently find the smallest substring that satisfies the condition.

10. How do you solve the "Longest Palindromic Substring" problem?

Problem: Find the longest palindromic substring in a given string s.
Solution (Expand Around Center):

def longest_palindromic_substring(s):
    def expand_around_center(left, right):
        while left >= 0 and right < len(s) and s[left] == s[right]:
            left -= 1
            right += 1
        return left + 1, right - 1

    start, end = 0, 0
    for i in range(len(s)):
        # Odd-length palindrome
        l1, r1 = expand_around_center(i, i)
        # Even-length palindrome
        l2, r2 = expand_around_center(i, i + 1)

        if r1 - l1 > end - start:
            start, end = l1, r1
        if r2 - l2 > end - start:
            start, end = l2, r2

    return s[start:end+1]

# Example
s = "babad"
print(longest_palindromic_substring(s))  # Output: "bab" or "aba"

Explanation: Expand around each character (and its adjacent character for even-length palindromes) to find the longest palindrome.