Top 10 Advanced Python DSA Interview Questions and In-Depth Answers
- Get link
- X
- Other Apps
Here are 10 advanced Python interview questions related to Data Structures and Algorithms (DSA), along with in-depth answers:
1. How do you find the k-th largest element in an array?
Problem: Find the k-th largest element in an unsorted array.
Solution:
import heapq
def find_kth_largest(nums, k):
return heapq.nlargest(k, nums)[-1]
# Example
nums = [3, 2, 1, 5, 6, 4]
k = 2
print(find_kth_largest(nums, k)) # Output: 5
Step-by-Step Explanation:
Importing heapq: The
heapq
module in Python provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.Defining the Function: The function
find_kth_largest(nums, k)
takes two arguments:nums
: A list of numbers.k
: An integer representing the position of the largest element we want to find.
Using heapq.nlargest():
The
heapq.nlargest(k, nums)
function returns a list of thek
largest elements from thenums
list, sorted in descending order.For example,
heapq.nlargest(2, [3, 2, 1, 5, 6, 4])
would return[6, 5]
.
Returning the k-th Largest Element:
The expression
heapq.nlargest(k, nums)[-1]
accesses the last element in the list of thek
largest elements.Since
heapq.nlargest(k, nums)
returns thek
largest elements sorted in descending order, thek-th
largest element will be at the-1
index of this list.
Example:
Given
nums = [3, 2, 1, 5, 6, 4]
andk = 2
:heapq.nlargest(2, nums)
will produce[6, 5]
.The k-th largest element is
5
, which is at the-1
index of[6, 5]
.
Thus, the function correctly identifies the second-largest element in the list, resulting in the output 5
.
2. How do you solve the Longest Increasing Subsequence (LIS) problem?
Problem: Find the length of the longest increasing subsequence in an array.
Solution (Using Binary Search):
from bisect import bisect_left
def length_of_lis(nums):
sub = []
for num in nums:
pos = bisect_left(sub, num)
if pos == len(sub):
sub.append(num)
else:
sub[pos] = num
return len(sub)
# Example
nums = [10, 9, 2, 5, 3, 7, 101, 18]
print(length_of_lis(nums)) # Output: 4 ([2, 3, 7, 101])
Step-by-Step Explanation:
Importing
bisect_left
: This function from thebisect
module is used to find the insertion point of an element in a sorted list to maintain order. This is crucial for efficiently finding positions to insert elements.Function Definition:
sub
List: This list will store the smallest possible tail values for all increasing subsequences of various lengths.For Loop: Iterate through each element in the input list
nums
.
Using
bisect_left(sub, num)
:Find Position (
pos
): The position wherenum
can be inserted to maintain the sorted order ofsub
.Condition
if pos == len(sub)
:Append: If
pos
equals the length ofsub
, it meansnum
is greater than all current elements insub
, sonum
is appended tosub
.
Else Condition:
Replace: If
pos
is less than the length ofsub
,num
replaces the element at positionpos
. This replacement ensures thatsub
contains the smallest possible tail values for increasing subsequences of various lengths.
Return Length of
sub
: The length ofsub
at the end of the loop gives the length of the Longest Increasing Subsequence (LIS).
Example:
Input:
nums = [10, 9, 2, 5, 3, 7, 101, 18]
Iteration Details:
10
:sub
becomes[10]
9
:sub
becomes[9]
(replaces10
)2
:sub
becomes[2]
(replaces9
)5
:sub
becomes[2, 5]
3
:sub
becomes[2, 3]
(replaces5
)7
:sub
becomes[2, 3, 7]
101
:sub
becomes[2, 3, 7, 101]
18
:sub
becomes[2, 3, 7, 18]
(replaces101
)
Thus, the length of the Longest Increasing Subsequence is 4
, corresponding to the subsequence [2, 3, 7, 101]
.
3. How do you find the median of two sorted arrays?
Problem: Given two sorted arrays, find the median in O(log(min(n,m))) time.
Solution:
def find_median_sorted_arrays(nums1, nums2):
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
x, y = len(nums1), len(nums2)
low, high = 0, x
while low <= high:
partition_x = (low + high) // 2
partition_y = (x + y + 1) // 2 - partition_x
max_left_x = float('-inf') if partition_x == 0 else nums1[partition_x - 1]
min_right_x = float('inf') if partition_x == x else nums1[partition_x]
max_left_y = float('-inf') if partition_y == 0 else nums2[partition_y - 1]
min_right_y = float('inf') if partition_y == y else nums2[partition_y]
if max_left_x <= min_right_y and max_left_y <= min_right_x:
if (x + y) % 2 == 0:
return (max(max_left_x, max_left_y) + min(min_right_x, min_right_y)) / 2
else:
return max(max_left_x, max_left_y)
elif max_left_x > min_right_y:
high = partition_x - 1
else:
low = partition_x + 1
# Example
nums1 = [1, 3]
nums2 = [2]
print(find_median_sorted_arrays(nums1, nums2)) # Output: 2
Step-by-Step Explanation:
Ensure
nums1
is the smaller array:If
nums1
is longer thannums2
, swap them. This ensures the binary search is performed on the smaller array to maintain efficiency.
Initialize variables:
x
andy
are the lengths ofnums1
andnums2
, respectively.low
andhigh
define the search range innums1
.
Binary Search Loop:
The loop runs as long as
low
is less than or equal tohigh
.partition_x
is the partition index innums1
.partition_y
is the corresponding partition index innums2
.
Calculate partition values:
max_left_x
: Maximum value on the left side of the partition innums1
.min_right_x
: Minimum value on the right side of the partition innums1
.max_left_y
: Maximum value on the left side of the partition innums2
.min_right_y
: Minimum value on the right side of the partition innums2
.
Check for valid partition:
If the maximum value on the left side of both partitions is less than or equal to the minimum value on the right side, a valid partition is found.
If the combined length of
nums1
andnums2
is even, return the average of the two middle values.If the combined length is odd, return the maximum value of the left partition.
Adjust search range:
If
max_left_x
is greater thanmin_right_y
, move thehigh
pointer to the left.Otherwise, move the
low
pointer to the right.
Example:
Input:
nums1 = [1, 3]
andnums2 = [2]
Iterations:
Ensure
nums1
is the smaller array: already true.Initial partitions:
partition_x = 1
,partition_y = 1
max_left_x = 1
,min_right_x = 3
max_left_y = 2
,min_right_y = float('inf')
Valid partition:
Combined length is odd (
3
), returnmax(max_left_x, max_left_y) = max(1, 2) = 2
.
Thus, the function correctly identifies the median as 2
.
4. How do you implement a Segment Tree for range sum queries?
Solution:
class SegmentTree:
def __init__(self, nums):
n = len(nums)
self.nums = nums
self.tree = [0] * (4 * n)
self.build(0, 0, n - 1)
def build(self, node, start, end):
if start == end:
self.tree[node] = self.nums[start]
else:
mid = (start + end) // 2
left = 2 * node + 1
right = 2 * node + 2
self.build(left, start, mid)
self.build(right, mid + 1, end)
self.tree[node] = self.tree[left] + self.tree[right]
def query(self, node, start, end, l, r):
if r < start or l > end:
return 0
if l <= start and r >= end:
return self.tree[node]
mid = (start + end) // 2
left = 2 * node + 1
right = 2 * node + 2
return self.query(left, start, mid, l, r) + self.query(right, mid + 1, end, l, r)
def update(self, node, start, end, idx, val):
if start == end:
self.tree[node] = val
else:
mid = (start + end) // 2
left = 2 * node + 1
right = 2 * node + 2
if idx <= mid:
self.update(left, start, mid, idx, val)
else:
self.update(right, mid + 1, end, idx, val)
self.tree[node] = self.tree[left] + self.tree[right]
# Example
nums = [1, 3, 5, 7, 9, 11]
tree = SegmentTree(nums)
print(tree.query(0, 0, len(nums) - 1, 1, 3)) # Output: 15 (sum of [3, 5, 7])
tree.update(0, 0, len(nums) - 1, 1, 10)
print(tree.query(0, 0, len(nums) - 1, 1, 3)) # Output: 22
Explanation: The segment tree allows efficient range queries and updates.
5. How do you solve the Word Ladder problem?
Problem: Find the shortest transformation sequence length from beginWord
to endWord
by changing one letter at a time.
Solution:
from collections import deque
def word_ladder_length(beginWord, endWord, wordList):
wordSet = set(wordList)
if endWord not in wordSet:
return 0
queue = deque([(beginWord, 1)])
while queue:
word, length = queue.popleft()
if word == endWord:
return length
for i in range(len(word)):
for c in 'abcdefghijklmnopqrstuvwxyz':
next_word = word[:i] + c + word[i + 1:]
if next_word in wordSet:
wordSet.remove(next_word)
queue.append((next_word, length + 1))
return 0
# Example
beginWord = "hit"
endWord = "cog"
wordList = ["hot", "dot", "dog", "lot", "log", "cog"]
print(word_ladder_length(beginWord, endWord, wordList)) # Output: 5
Step-by-Step Explanation
Initialization:
Convert
wordList
to a setwordSet
for efficient lookup.If
endWord
is not inwordSet
, return0
because it's not possible to transformbeginWord
toendWord
.
BFS Setup:
Initialize a queue (
deque
) with a tuple containingbeginWord
and the initial transformation length1
.
Breadth-First Search (BFS):
Perform BFS to explore all possible transformations.
Dequeue the front element (current word and its transformation length).
If the current word is
endWord
, return its transformation length as the result.
Generating Next Words:
For each character position in the current word, try replacing it with every letter from 'a' to 'z'.
Generate the
next_word
and check if it exists inwordSet
.If it does, add
next_word
to the queue with an incremented length and remove it fromwordSet
to avoid revisiting.
Return Result:
If the queue is exhausted and
endWord
is not found, return0
.
Example Walkthrough
Initial Setup
Input:
beginWord = "hit"
,endWord = "cog"
,wordList = ["hot", "dot", "dog", "lot", "log", "cog"]
wordSet
:{"hot", "dot", "dog", "lot", "log", "cog"}
BFS Steps
Start with
queue = deque([("hit", 1)])
First Iteration:
word = "hit"
,length = 1
Generate next words:
"hot"
(valid, enqueue with length2
, remove fromwordSet
)
Second Iteration:
word = "hot"
,length = 2
Generate next words:
"dot"
(valid, enqueue with length3
, remove fromwordSet
)"lot"
(valid, enqueue with length3
, remove fromwordSet
)
Third Iteration:
word = "dot"
,length = 3
Generate next words:
"dog"
(valid, enqueue with length4
, remove fromwordSet
)
Fourth Iteration:
word = "lot"
,length = 3
Generate next words:
"log"
(valid, enqueue with length4
, remove fromwordSet
)
Fifth Iteration:
word = "dog"
,length = 4
Generate next words:
"cog"
(valid, enqueue with length5
, remove fromwordSet
)
Since
cog
is theendWord
, returnlength = 5
.
Thus, the output is 5
indicating the shortest transformation sequence from "hit"
to "cog"
has a length of 5
6. How do you detect and remove a cycle in a directed graph?
Solution:
from collections import defaultdict
def has_cycle(graph):
def dfs(node):
if visited[node] == 1:
return True
if visited[node] == 2:
return False
visited[node] = 1
for neighbor in graph[node]:
if dfs(neighbor):
return True
visited[node] = 2
return False
visited = [0] * len(graph)
for node in range(len(graph)):
if not visited[node]:
if dfs(node):
return True
return False
# Example
graph = defaultdict(list, {0: [1], 1: [2], 2: [0]})
print(has_cycle(graph)) # Output: True
Step-by-Step Explanation
Graph Representation:
The graph is represented using a
defaultdict
of lists. In the example, the graph is{0: [1], 1: [2], 2: [0]}
.This means there are directed edges from node
0
to1
, from1
to2
, and from2
back to0
, forming a cycle.
DFS Function:
The
dfs
function performs a Depth-First Search to detect cycles.Visited Array: Tracks the state of each node:
0
: Unvisited1
: Visiting (currently in the call stack)2
: Fully visited
Cycle Detection:
If a node is
1
(currently in the call stack), it means a cycle is detected because we encountered a node we've already started visiting.If a node is
2
(fully visited), no cycle is found along that path.
DFS Implementation:
Visiting: Mark the current node as
1
.Recursive Call: For each neighbor, recursively call
dfs
. If a cycle is found during any call, returnTrue
.Fully Visited: Mark the current node as
2
after all its neighbors are processed.
Cycle Detection Loop:
Iterate over all nodes in the graph.
If a node hasn't been visited (
visited[node] == 0
), calldfs
on that node.If
dfs
returnsTrue
for any node, a cycle is detected.
Example Execution
Graph: {0: [1], 1: [2], 2: [0]}
Initial Visited Array:
[0, 0, 0]
(All nodes unvisited)
Iteration and DFS Calls
Node 0:
Call
dfs(0)
:Mark
0
as1
:[1, 0, 0]
Call
dfs(1)
for neighbor1
:Mark
1
as1
:[1, 1, 0]
Call
dfs(2)
for neighbor2
:Mark
2
as1
:[1, 1, 1]
Call
dfs(0)
for neighbor0
:dfs(0)
returnsTrue
because node0
is already1
, indicating a cycle.
Thus, the output is True
because the graph contains a cycle (0 -> 1 -> 2 -> 0
).
7. How do you solve the "Maximum Rectangle" problem in a binary matrix?
Solution:
def maximal_rectangle(matrix):
if not matrix:
return 0
def largest_histogram(heights):
stack, max_area = [], 0
for i, h in enumerate(heights + [0]):
while stack and heights[stack[-1]] > h:
height = heights[stack.pop()]
width = i if not stack else i - stack[-1] - 1
max_area = max(max_area, height * width)
stack.append(i)
return max_area
heights = [0] * len(matrix[0])
max_area = 0
for row in matrix:
for i in range(len(row)):
heights[i] = heights[i] + 1 if row[i] == '1' else 0
max_area = max(max_area, largest_histogram(heights))
return max_area
# Example
matrix = [
["1", "0", "1", "0", "0"],
["1", "0", "1", "1", "1"],
["1", "1", "1", "1", "1"],
["1", "0", "0", "1", "0"]
]
print(maximal_rectangle(matrix)) # Output: 6
Key Steps:
1. Initial Setup:
- The algorithm starts by checking if the input matrix is empty.
- It initializes a
heights
array to store the cumulative heights of1
s for each column. This array represents the "height" of the histogram for each row.
2. Row-by-Row Iteration:
- The function processes the matrix row by row.
- For each row:
- It updates the
heights
array:- If a cell contains
'1'
, the height for that column increases by 1 (continuing the vertical streak of1
s). - If a cell contains
'0'
, the height resets to 0 (ending the streak).
- If a cell contains
- It updates the
3. Largest Histogram Area:
- After updating the
heights
array for a given row, the function calculates the largest rectangular area in the histogram represented by the updatedheights
array. - This part uses a stack-based approach to efficiently compute the largest rectangle in a histogram:
- The stack keeps track of indices of increasing heights.
- When a smaller height is encountered, rectangles ending at earlier heights are calculated.
4. Updating the Maximum Area:
- For each row, the largest area of the histogram is compared to the previously recorded maximum area.
- The global maximum area is updated as needed.
5. Result:
- After processing all rows, the function returns the maximum area found across all row histograms.
8. How do you solve the "Sliding Window Maximum" problem?
Problem: Given an array nums
and an integer k
, find the maximum value in every contiguous subarray of size k
.
Solution:
from collections import deque
def max_sliding_window(nums, k):
dq = deque() # Stores indices
result = []
for i in range(len(nums)):
# Remove elements out of the window
if dq and dq[0] < i - k + 1:
dq.popleft()
# Remove elements smaller than the current element
while dq and nums[dq[-1]] < nums[i]:
dq.pop()
dq.append(i)
# Append the maximum element of the window
if i >= k - 1:
result.append(nums[dq[0]])
return result
# Example
nums = [1, 3, -1, -3, 5, 3, 6, 7]
k = 3
print(max_sliding_window(nums, k)) # Output: [3, 3, 5, 5, 6, 7]
Function Explanation
Input:
nums
: A list of integers.k
: The size of the sliding window.
Output:
A list of the maximum values within each sliding window of size
k
.
Key Steps:
Initialize Deque and Result List:
dq
: A deque to store indices of array elements.result
: A list to store the maximum values for each sliding window.
Iterate Over the List:
Remove Out-of-Window Indices:
If the leftmost index in
dq
is out of the current window (i.e.,dq[0] < i - k + 1
), remove it.
Maintain Deque Order:
Remove elements from the deque that are smaller than the current element
nums[i]
(because they are not useful anymore as they cannot be maximum if current element is bigger).
Add Current Element Index:
Append the current index
i
to the deque.
Append Maximum for the Current Window:
If
i
is greater than or equal tok - 1
, appendnums[dq[0]]
toresult
(this is the maximum for the current window).
Example Execution:
Input:
nums = [1, 3, -1, -3, 5, 3, 6, 7]
,k = 3
Output:
[3, 3, 5, 5, 6, 7]
Detailed Steps:
Initial State:
dq = deque([])
,result = []
First Window (
i = 0
toi = 2
):i = 0
:dq = deque([0])
i = 1
:dq = deque([1])
(removes0
becausenums[0] < nums[1]
)i = 2
:dq = deque([1, 2])
Append
nums[1]
(3) toresult
:result = [3]
Second Window (
i = 3
):dq = deque([1, 2, 3])
Remove
1
because it is out of the current window:dq = deque([2, 3])
Append
nums[1]
(3) toresult
:result = [3, 3]
Third Window (
i = 4
):dq = deque([4])
(removes2, 3
becausenums[2, 3] < nums[4]
)Append
nums[4]
(5) toresult
:result = [3, 3, 5]
Fourth Window (
i = 5
):dq = deque([4, 5])
Append
nums[4]
(5) toresult
:result = [3, 3, 5, 5]
Fifth Window (
i = 6
):dq = deque([6])
(removes4, 5
becausenums[4, 5] < nums[6]
)Append
nums[6]
(6) toresult
:result = [3, 3, 5, 5, 6]
Sixth Window (
i = 7
):dq = deque([6, 7])
Append
nums[6]
(6) toresult
:result = [3, 3, 5, 5, 6, 7]
Hence, the final output is [3, 3, 5, 5, 6, 7]
, representing the maximum values in each sliding window of size k
.
9. How do you solve the "Minimum Window Substring" problem?
Problem: Given two strings s
and t
, find the minimum window in s
which contains all characters of t
.
Solution:
from collections import Counter
def min_window(s, t):
if not s or not t:
return ""
t_count = Counter(t)
window_count = Counter()
l, r = 0, 0
required = len(t_count)
formed = 0
min_len = float('inf')
min_window = (0, 0)
while r < len(s):
char = s[r]
window_count[char] += 1
if char in t_count and window_count[char] == t_count[char]:
formed += 1
while l <= r and formed == required:
char = s[l]
if r - l + 1 < min_len:
min_len = r - l + 1
min_window = (l, r)
window_count[char] -= 1
if char in t_count and window_count[char] < t_count[char]:
formed -= 1
l += 1
r += 1
l, r = min_window
return s[l:r+1] if min_len != float('inf') else ""
# Example
s = "ADOBECODEBANC"
t = "ABC"
print(min_window(s, t)) # Output: "BANC"
Explanation: Use a sliding window and two pointers to efficiently find the smallest substring that satisfies the condition.
10. How do you solve the "Longest Palindromic Substring" problem?
Problem: Find the longest palindromic substring in a given string s
.
Solution (Expand Around Center):
def longest_palindromic_substring(s):
def expand_around_center(left, right):
while left >= 0 and right < len(s) and s[left] == s[right]:
left -= 1
right += 1
return left + 1, right - 1
start, end = 0, 0
for i in range(len(s)):
# Odd-length palindrome
l1, r1 = expand_around_center(i, i)
# Even-length palindrome
l2, r2 = expand_around_center(i, i + 1)
if r1 - l1 > end - start:
start, end = l1, r1
if r2 - l2 > end - start:
start, end = l2, r2
return s[start:end+1]
# Example
s = "babad"
print(longest_palindromic_substring(s)) # Output: "bab" or "aba"
Explanation: Expand around each character (and its adjacent character for even-length palindromes) to find the longest palindrome.