Solution: Extra Characters in a String

Grokking the Coding Interview: Patterns for Coding Questions

Ask Author

Back to course home

0% completed

Vote For New Content

Solution: Extra Characters in a String

Table of Contents

Contents are not accessible

Problem Statement

Given a string s and an array of words words. Break string s into multiple non-overlapping substrings such that each substring should be part of the words. There are some characters left which are not part of any substring.

Return the minimum number of remaining characters in s, which are not part of any substring after string break-up.

Examples

Example 1:
- Input: s = "amazingracecar", dictionary = ["race", "car"]
- Expected Output: 7
- Justification: The string s can be rearranged to form "racecar", leaving 'a', 'm', 'a', 'z', 'i', 'n', 'g' as extra.
Example 2:
- Input: s = "bookkeeperreading", dictionary = ["keep", "read"]
- Expected Output: 9
- Justification: The words "keep" and "read" can be formed from s, but 'b', 'o', 'o', 'k', 'e', 'r', 'i', 'n', 'g' are extra.
Example 3:
- Input: s = "thedogbarksatnight", dictionary = ["dog", "bark", "night"]
- Expected Output: 6
- Justification: The words "dog", "bark", and "night" can be formed, leaving 't', 'h', 'e', 's', 'a', 't' as extra characters.

Constraints:

1 <= str.length <= 50
1 <= dictionary.length <= 50
1 <= dictionary[i].length <= 50
dictionary[i] and s consists of only lowercase English letters
dictionary contains distinct words

Solution

To solve the "Extra Characters in a String" problem, we utilize Dynamic Programming (DP) to efficiently determine the minimum number of extra characters that cannot be part of any word in the given dictionary. DP helps in breaking down the problem into smaller subproblems, ensuring that each subproblem is solved only once and its result is reused, thereby optimizing the overall computation.

We explore four distinct approaches to implement this DP solution:

Solution 1: Top Down Dynamic Programming with Substring Method
Solution 2: Bottom Up Dynamic Programming with Substring Method
Solution 3: Top Down Dynamic Programming with Trie
Solution 4: Bottom Up Dynamic Programming with Trie

Solution 1: Top Down Dynamic Programming with Substring Method

In the Top Down Dynamic Programming with Substring Method, we recursively explore all possible substrings of the input string s starting from the first character. At each step, we check if the current substring exists in the dictionary. If it does, we recursively solve the remaining part of the string. We use memoization to store the results of subproblems to avoid redundant computations. The goal is to minimize the number of extra characters by maximizing the number of dictionary words used.

This approach systematically breaks down the problem by considering every possible split of the string s. By leveraging memoization, we ensure that each substring is processed only once, significantly reducing the time complexity. This method is intuitive and straightforward, making it easy to implement and understand.

Step-by-Step Algorithm

Initialize Data Structures:
- Create a memoization array memo of size length to store the results for each starting index in the string.
- Convert the dictionary into a HashSet (wordSet) for quick lookup of words.
Recursive Function solve(index, length, s):
- Base Case: If index equals length (end of the string), return 0 because there are no extra characters left.
- Check Memoization: If the result for index is already computed (memo[index] != null), return the cached result.
- Count Extra Characters:
  - Call solve(index + 1, length, s) to treat the current character as extra and move to the next character. Add 1 to account for this extra character.
- Check Substrings:
  - Iterate over possible substrings starting at index:
    - Extract the substring s.substring(index, end + 1) and check if it exists in wordSet.
    - If the substring exists, recursively call solve(end + 1, length, s) to process the rest of the string.
    - Update minExtra with the smaller value between the current minExtra and the result of the recursive call.
- Store Result:
  - Save the computed minExtra value in memo[index] and return it.
Compute the Result:
- Start by calling solve(0, length, s) to process the entire string.
Return Final Output:
- The result from solve(0, length, s) represents the minimum number of extra characters.

Code

Python3

. . . .

Complexity Analysis

Time Complexity

Number of States:
- The recursive function solve is called for each unique starting index, resulting in $O(N)$ states where N is the length of the string s.
Substring Iteration:
- For each state, the algorithm iterates through substrings from index to length, which takes $O(N)$ time in the worst case.
Substring Creation:
- Extracting a substring using s.substring(start, end + 1) takes $O(N)$ time.
Overall Time Complexity:
- Combining the above factors, the total time complexity is $O(N³)$ .

Space Complexity

Memoization Array:
- The memo array stores results for up to N indices, consuming $O(N)$ space.
HashSet for Dictionary:
- The wordSet stores up to K words, each with an average length of M, resulting in $O(M \cdot K)$ space.
Call Stack:
- The recursion depth can reach up to N in the worst case, requiring $O(N)$ space for the call stack.
Overall Space Complexity:
- The total space complexity is $O(N + M \cdot K)$ .

Solution 2: Bottom-Up Dynamic Programming with Substring Method

In this approach, the main idea is to minimize the number of extra characters by leveraging the dictionary words to form substrings from the given string s. We iterate over the string from the end to the beginning, calculating the minimum extra characters needed for every starting index.

We use a DP array (dp) where dp[start] represents the minimum extra characters required for the substring s[start:]. Initially, for every index, we assume the current character is extra. Then, we check all possible substrings starting at the current index. If a substring matches a word in the dictionary, we update dp[start] to reflect the better result. Finally, the value at dp[0] gives the minimum extra characters needed for the entire string.

Compared to the top-down approach, this method avoids recursion and uses iteration, making it straightforward and stack-memory efficient. It iteratively calculates the result starting from the end of the string and moves towards the beginning.

Step-by-Step Algorithm

Convert Dictionary to HashSet:
- Create a HashSet called wordSet from the dictionary for fast lookup of substrings.
Initialize DP Array:
- Create a dp array of size length + 1 (where length is the length of s).
- dp[start] will store the minimum extra characters required for the substring s[start:].
Iterate Backward Over the String:
- Loop from start = length - 1 to 0:
  - Initialize dp[start] = dp[start + 1] + 1.
    - This assumes the current character at start is extra.
Check Substrings:
- For each start, loop over all possible substrings starting at start:
  - This step Tries forming substrings starting from the current index.
  - For end ranging from start to length - 1:
    - Extract substring = s.substring(start, end + 1).
    - If substring exists in wordSet:
      - Update dp[start] = Math.min(dp[start], dp[end + 1]).
Return Final Result:
- The result is stored in dp[0], representing the minimum extra characters required for the entire string.

Algorithm Walkthrough

Input:

String: s = "thedogbarksatnight".
Dictionary: dictionary = ["dog", "bark", "night"].

Initialization:

String Length: length = 18.
Dictionary as HashSet: wordSet = {"dog", "bark", "night"}.
DP Array: dp = [0, 0, ..., 0] (size 19, initialized with 0).

Steps:

Step 1: Start iterating from the end of the string.

At start = 18 (beyond the end of the string):
- dp[18] = 0 (Base case).
At start = 17 (character = 't'):
- Initialize dp[17] = dp[18] + 1 = 1.
- Substrings starting at 17: "t".
  - None of the substrings match wordSet.
- Result: dp[17] = 1.
At start = 16 (character = 'h'):
- Initialize dp[16] = dp[17] + 1 = 2.
- Substrings starting at 16: "h", "ht".
  - None of the substrings match wordSet.
- Result: dp[16] = 2.
At start = 15 (character = 'g'):
- Initialize dp[15] = dp[16] + 1 = 3.
- Substrings starting at 15: "g", "gh", "ght".
  - None of the substrings match wordSet.
- Result: dp[15] = 3.
At start = 14 (character = 'i'):
- Initialize dp[14] = dp[15] + 1 = 4.
- Substrings starting at 14: "i", "ig", "igh", "ight".
  - None of the substrings match wordSet.
- Result: dp[14] = 4.

Step 2: Handle substring "night" at index 13.

At start = 13 (character = 'n'):
- Initialize dp[13] = dp[14] + 1 = 5.
- Substrings starting at 13:
  - "n", "ni", "nig", "nigh", "night".
    - "night" matches wordSet.
    - Update dp[13] = dp[18] = 0.
- Result: dp[13] = 0.
At start = 12 (character = 't'):
- Initialize dp[12] = dp[13] + 1 = 1.
- Substrings starting at 12: "t", "tn", "tni", "tnig", "tnigh".
  - None of the substrings match wordSet.
- Result: dp[12] = 1.

Step 3: Handle substring "bark" at index 6.

At start = 6 (character = 'b'):
- Initialize dp[6] = dp[7] + 1 = 6 + 1 = 7.
- Substrings starting at 6:
  - "b", "ba", "bar", "bark".
    - "bark" matches wordSet.
    - Update dp[6] = dp[10] = 3.
- Result: dp[6] = 3.
At start = 5 (character = 'g'):
- Initialize dp[5] = dp[6] + 1 = 4.
- Substrings starting at 5: "g", "gb", "gba", "gbar", "gbark".
  - None of the substrings match wordSet.
- Result: dp[5] = 4.

Step 4: Handle substring "dog" at index 3.

At start = 3 (character = 'd'):
- Initialize dp[3] = dp[4] + 1 = 5 + 1 = 6.
- Substrings starting at 3:
  - "d", "do", "dog".
    - "dog" matches wordSet.
    - Update dp[3] = dp[6] = 3.
- Result: dp[3] = 3.
At start = 2 (character = 'e'):

Initialize dp[2] = dp[3] + 1 = 4.
Substrings starting at 2: "e", "ed", "edo", "edog", etc.
- None of the substrings match wordSet.
Result: dp[2] = 4.

Step 5: Continue Backward to the Beginning.

At start = 0 (character = 't'):
- Initialize dp[0] = dp[1] + 1 = 5 + 1 = 6.
- Substrings starting at 0:
  - "t", "th", "the", "thed", "thedo", "thedog".
    - No valid substring.
- Result: dp[0] = 6.

Final DP Array:

dp = [6, 5, 4, 3, 5, 4, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0].

Final Result:

Minimum Extra Characters: dp[0] = 6.
Explanation:
- "dog", "bark", and "night" are matched from the dictionary.
- Extra characters are 't', 'h', 'e', 's', 'a', 't'.

Code

Python3

. . . .

Complexity Analysis

Time Complexity

Outer Loop:
- Iterates over all starting indices of the string, contributing $O(N)$ operations.
Inner Loop:
- For each start, the inner loop iterates over all possible substrings starting at start, contributing $O(N)$ operations.
Substring Extraction:
- Extracting substrings with s.substring(start, end + 1) takes $O(N)$ time in the worst case.
Overall Time Complexity:
- Combining all the factors, the time complexity is $O(N³)$ .

Space Complexity

DP Array:
- The dp array of size N + 1 contributes $O(N)$ space.
HashSet for Dictionary:
- Storing the dictionary in a HashSet requires $O(M \cdot K)$ space, where M is the average length of words in the dictionary and K is the number of words.
Overall Space Complexity: $O(N + M \cdot K)$ .

Solution 3: Top Down Dynamic Programming with Trie

This solution uses a top-down dynamic programming approach with a Trie to efficiently determine the number of extra characters in a given substring that cannot form words in the dictionary. The Trie allows us to quickly validate substrings against dictionary words, while dynamic programming ensures that overlapping subproblems are solved only once.

The approach works as follows:

First, a Trie is built from the given dictionary words, where each node in the Trie represents a character in a dictionary word, and the end of a word is marked with a flag.
Then, the string s is processed recursively from left to right using a dp function. At each index, the function either counts the current character as extra and moves to the next index, or it tries to match substrings starting at the current index with words in the Trie. If a match is found, the remaining portion of the string is processed recursively.
Memoization is used to store the results of subproblems for each starting index, ensuring that each state is computed only once.

This approach is optimal because it combines the power of Tries (for fast substring matching) and dynamic programming (to reuse previous computations). By avoiding the repeated recomputation of substring matches, the solution achieves a significant speedup compared to naive methods.

Step-by-Step Algorithm

Build a Trie from the Dictionary:
- Create a root node for the Trie.
- For each word in the dictionary:
  - Traverse through its characters.
  - Add each character as a child node of the current node if it doesn't already exist.
  - Mark the last node of the word as the end of a word.
Initialize the DP Array:
- Create a memo array of size n + 1, where n is the length of the string s.
- Initialize all values in memo to null to indicate uncomputed states.
Recursive DP Function:
- Define a function dp(start) that computes the number of extra characters in the substring starting at index start.
- Base Case:
  - If start == n, return 0, as no characters are left to process.
- Memoization:
  - If memo[start] is not null, return its value to avoid redundant computation.
- Count Extra Characters:
  - Assume the character at start is extra and move to the next index, initializing minExtra as dp(start + 1) + 1.
- Trie Matching:
  - Starting from the root of the Trie, traverse the Trie while iterating through characters of the substring s[start:].
  - If a Trie node doesn't exist for a character, break out of the loop, as there is not valid substring starting from the current index.
  - If a Trie node marks the end of a word, update minExtra with a minimum of minExtra and result returned after recursively processing the remaining substring.
Return the Final Result:
- Call dp(0) to compute the result for the entire string.

Algorithm Walkthrough

Input:

String: s = "thedogbarksatnight".
Dictionary: dictionary = ["dog", "bark", "night"].

Step 1: Trie Construction

Insert "dog" into the Trie:
- Add nodes for 'd', 'o', 'g'.
- Mark 'g' as the end of a word.
Insert "bark" into the Trie:
- Add nodes for 'b', 'a', 'r', 'k'.
- Mark 'k' as the end of a word.
Insert "night" into the Trie:
- Add nodes for 'n', 'i', 'g', 'h', 't'.
- Mark 't' as the end of a word.

Step 2: Start DP Recursion from Index 0

The recursion starts with start = 0.

Call dp(0):
- Base Case:
  - The function calls itself recursively with start = 1:
    - This pattern continues until start = 18.
At start = 18:
- This is the base case: start == length. No characters remain, so return 0.

Step 3: Process Substrings from the End

After reaching the base case, the recursion starts processing substrings backward (from start = 17 to start = 0).

Processing start = 17:
- The character is 't'.
- Initialize minExtra = dp(18) + 1 = 1. Assume 't' is extra.
- No valid substrings start from 't', as it does not match any word in the Trie.
- Store and Return: memo[17] = 1.
Processing start = 16:
- The character is 'h'.
- Initialize minExtra = dp(17) + 1 = 2. Assume 'h' is extra.
- No valid substrings start from 'h', as it does not match any word in the Trie.
- Store and Return: memo[16] = 2.
Processing start = 15:
- The character is 'g'.
- Initialize minExtra = dp(16) + 1 = 3. Assume 'g' is extra.
- No valid substrings start from 'g', as it does not match any word in the Trie.
- Store and Return: memo[15] = 3.

Step 4: Process Substring "night" at start = 13

memo = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 3, 2, 1, 0]

Processing start = 13:
- The character is 'n'.
- Initialize minExtra = dp(14) + 1 = 4. Assume 'n' is extra.
- Trie Traversal:
  - Substrings:
    - "n", "ni", "nig", "nigh", "night".
    - "night" matches in the Trie.
    - Update minExtra = dp(18) = 0.
- Store and Return: memo[13] = 0.

Step 5: Process Substring "bark" at start = 6

memo = [ 0, 0, 0, 0, 0, 0, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]

Processing start = 6:
- The character is 'b'.
- Initialize minExtra = dp(7) + 1 = 6 + 1 = 7. Assume 'b' is extra.
- Trie Traversal:
  - Substrings:
    - "b", "ba", "bar", "bark".
    - "bark" matches in the Trie.
    - Update minExtra = dp(10) = 3.
- Store and Return: memo[6] = 3.

Step 6: Process Substring "dog" at start = 3

memo = [ 0, 0, 0, 0, 5, 4, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]

Processing start = 3:
- The character is 'd'.
- Initialize minExtra = dp(4) + 1 = 5 + 1 = 6. Assume 'd' is extra.
- Trie Traversal:
  - Substrings:
    - "d", "do", "dog".
    - "dog" matches in the Trie.
    - Update minExtra = dp(6) = 3.
- Store and Return: memo[3] = 3.

Step 7: Process Substring from start = 0

Processing start = 0:
- The character is 't'.
- Initialize minExtra = dp(1) + 1 = 5 + 1 = 6. Assume 't' is extra.
- Trie Traversal:
  - Substrings:
    - "t", "th", "the", "thed", "thedo", "thedog".
    - None of substrings are valid.
- Return: memo[0] = 6.

Final Result

Memoization Array:

memo = [ 6, 5, 4, 3, 5, 4, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]

Output: 6, indicating six extra characters ('t', 'h', 'e', 's', 'a', 't').

Code

Python3

. . . .

Complexity Analysis

Time Complexity

DP Method:
- The dp method is called for N + 1 unique states, where N is the length of the string.
- Each state involves iterating over the substring starting at the current index, which takes $O(N)$ time in the worst case.
- Thus, the cost of the dp method is $O(N^2)$ .
Building the Trie:
- Constructing the Trie requires inserting K words, where each word has an average length of M.
- The cost of building the Trie is $O(M \cdot K)$ .

Overall Time Complexity: $O(N^2 + M \cdot K)$ .

Space Complexity

Trie:
- The Trie requires $O(M \cdot K)$ space to store the dictionary words.
Memoization:
- The memo array has a size of N + 1, consuming $O(N)$ space.
Recursive Stack:
- The recursion depth is at most N, adding $O(N)$ stack space.

Overall Space Complexity: $O(N + M \cdot K)$ .

Solution 4: Bottom Up Dynamic Programming with Trie

This solution uses bottom-up dynamic programming with a Trie to efficiently calculate the number of extra characters in a given string that cannot form words from a provided dictionary. The Trie enables fast lookups for dictionary words while avoiding the need to repeatedly recompute substring matches. The dynamic programming (DP) array ensures that the solution is computed incrementally from the end of the string toward the beginning.

Here’s the approach:

Trie Construction: A Trie is built from the dictionary, where each character of a word is represented as a node. Words that exist in the dictionary are marked with an isEnd flag at the last node.
DP Initialization: A DP array dp is created to store the minimum extra characters for substrings starting from different indices.
Iterative Processing: Starting from the last character of the string, each character is assumed to be extra initially. Then, substrings starting from the current index are checked against the Trie. If a substring matches a word in the Trie, the DP value is updated to reflect fewer extra characters.
Result: The final result is stored in dp[0], representing the minimum extra characters required for the entire string.

This approach is optimal because it combines efficient substring matching using the Trie and incremental computation with the DP array.

Step-by-Step Algorithm

Trie Construction:
- Initialize the root of the Trie.
- For each word in the dictionary:
  - Start at the root of the Trie.
  - For each character in the word:
    - If the character doesn’t exist as a child of the current node, create a new node.
    - Move to the child node representing the character.
  - Mark the last node as the end of the word.
Initialize DP Array:
- Create a DP array dp of size n + 1, where n is the length of the string s.
- Set dp[n] = 0 since no extra characters are required after the end of the string.
To calculate the minimum extra characters for each substring starting at every index, we process the string iteratively from the last character to the first:

a. Initialize Extra Characters:
- Assume that the character at the current index start is extra.
- Set dp[start] = dp[start + 1] + 1. This means that the cost for the substring starting at start is at least one more than the cost for the substring starting at start + 1.
b. Traverse the Trie:
- Begin from the root of the Trie to validate substrings starting from start.
- Iterate through each possible end index starting from start to n - 1:
  - Check Trie Validity:
    - If the character s[end] is not a child of the current Trie node, break the loop, as no further substring starting at start can match a dictionary word.
  - Move to the Next Trie Node:
    - If the character s[end] exists in the Trie, move to the child node for s[end].
  - Check for a Complete Word:
    - If the current Trie node marks the end of a word, this substring matches a dictionary word.
    - Update dp[start] = min(dp[start], dp[end + 1]), as including this valid word reduces the number of extra characters needed for the remaining substring.
c. Continue to the Next Start Index:
- Repeat the process for the next start index until the first character is processed. The final result will be stored in dp[0].

Return the Result:
- Return dp[0] as the result, which represents the minimum extra characters for the entire string.

Code

Python3

. . . .

Complexity Analysis

Time Complexity

Dynamic Programming (DP) Operations:
- The dp array is computed for each index start in the string s. For each start, we iterate over potential end values (start to n - 1) to form substrings.
- This results in a total of $O(N²)$ iterations since the outer loop runs N times, and the inner loop runs up to N times in the worst case.
Building the Trie:
- For each word in the dictionary (total of K words), we insert its characters into the Trie. Each word has an average length of M.
- Thus, building the Trie takes $O(M \cdot K)$ .

Overall Time Complexity: $O(N² + M \cdot K)$ .

Space Complexity

Trie Storage:
- The Trie stores K words, each with an average length of M. Each character is represented by a node in the Trie.
- Hence, the Trie requires $O(M \cdot K)$ space.
DP Array:
- The dp array has a size of N + 1 to store results for each index of the string.
- This contributes $O(N)$ space.

Overall Space Complexity: $O(N + M \cdot K)$ .

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible