0% completed
Problem Statement
Given a string s
and an array of words words
. Break string s
into multiple non-overlapping substrings such that each substring should be part of the words.
There are some characters left which are not part of any substring.
Return the minimum number of remaining characters in s
, which are not part of any substring after string break-up.
Examples
-
Example 1:
- Input:
s = "amazingracecar"
,dictionary = ["race", "car"]
- Expected Output:
7
- Justification: The string
s
can be rearranged to form "racecar", leaving 'a', 'm', 'a', 'z', 'i', 'n', 'g' as extra.
- Input:
-
Example 2:
- Input:
s = "bookkeeperreading"
,dictionary = ["keep", "read"]
- Expected Output:
9
- Justification: The words "keep" and "read" can be formed from
s
, but 'b', 'o', 'o', 'k', 'e', 'r', 'i', 'n', 'g' are extra.
- Input:
-
Example 3:
- Input:
s = "thedogbarksatnight"
,dictionary = ["dog", "bark", "night"]
- Expected Output:
6
- Justification: The words "dog", "bark", and "night" can be formed, leaving 't', 'h', 'e', 's', 'a', 't' as extra characters.
- Input:
Constraints:
1 <= str.length <= 50
1 <= dictionary.length <= 50
1 <= dictionary[i].length <= 50
dictionary[i]
ands
consists of only lowercase English lettersdictionary
contains distinct words
Solution
To solve the "Extra Characters in a String" problem, we utilize Dynamic Programming (DP) to efficiently determine the minimum number of extra characters that cannot be part of any word in the given dictionary. DP helps in breaking down the problem into smaller subproblems, ensuring that each subproblem is solved only once and its result is reused, thereby optimizing the overall computation.
We explore four distinct approaches to implement this DP solution:
- Solution 1: Top Down Dynamic Programming with Substring Method
- Solution 2: Bottom Up Dynamic Programming with Substring Method
- Solution 3: Top Down Dynamic Programming with Trie
- Solution 4: Bottom Up Dynamic Programming with Trie
Solution 1: Top Down Dynamic Programming with Substring Method
In the Top Down Dynamic Programming with Substring Method, we recursively explore all possible substrings of the input string s
starting from the first character. At each step, we check if the current substring exists in the dictionary. If it does, we recursively solve the remaining part of the string. We use memoization to store the results of subproblems to avoid redundant computations. The goal is to minimize the number of extra characters by maximizing the number of dictionary words used.
This approach systematically breaks down the problem by considering every possible split of the string s
. By leveraging memoization, we ensure that each substring is processed only once, significantly reducing the time complexity. This method is intuitive and straightforward, making it easy to implement and understand.
Step-by-Step Algorithm
-
Initialize Data Structures:
- Create a memoization array
memo
of sizelength
to store the results for each starting index in the string. - Convert the
dictionary
into aHashSet
(wordSet
) for quick lookup of words.
- Create a memoization array
-
Recursive Function
solve(index, length, s)
:- Base Case: If
index
equalslength
(end of the string), return0
because there are no extra characters left. - Check Memoization: If the result for
index
is already computed (memo[index] != null
), return the cached result. - Count Extra Characters:
- Call
solve(index + 1, length, s)
to treat the current character as extra and move to the next character. Add1
to account for this extra character.
- Call
- Check Substrings:
- Iterate over possible substrings starting at
index
:- Extract the substring
s.substring(index, end + 1)
and check if it exists inwordSet
. - If the substring exists, recursively call
solve(end + 1, length, s)
to process the rest of the string. - Update
minExtra
with the smaller value between the currentminExtra
and the result of the recursive call.
- Extract the substring
- Iterate over possible substrings starting at
- Store Result:
- Save the computed
minExtra
value inmemo[index]
and return it.
- Save the computed
- Base Case: If
-
Compute the Result:
- Start by calling
solve(0, length, s)
to process the entire string.
- Start by calling
-
Return Final Output:
- The result from
solve(0, length, s)
represents the minimum number of extra characters.
- The result from
Code
Complexity Analysis
Time Complexity
-
Number of States:
- The recursive function
solve
is called for each unique starting index, resulting in O(N) states whereN
is the length of the strings
.
- The recursive function
-
Substring Iteration:
- For each state, the algorithm iterates through substrings from
index
tolength
, which takes O(N) time in the worst case.
- For each state, the algorithm iterates through substrings from
-
Substring Creation:
- Extracting a substring using
s.substring(start, end + 1)
takes O(N) time.
- Extracting a substring using
-
Overall Time Complexity:
- Combining the above factors, the total time complexity is O(N³).
Space Complexity
-
Memoization Array:
- The
memo
array stores results for up toN
indices, consuming O(N) space.
- The
-
HashSet for Dictionary:
- The
wordSet
stores up toK
words, each with an average length ofM
, resulting in O(M ⋅ K) space.
- The
-
Call Stack:
- The recursion depth can reach up to
N
in the worst case, requiring O(N) space for the call stack.
- The recursion depth can reach up to
-
Overall Space Complexity:
- The total space complexity is O(N + M ⋅ K).
Solution 2: Bottom-Up Dynamic Programming with Substring Method
In this approach, the main idea is to minimize the number of extra characters by leveraging the dictionary words to form substrings from the given string s
. We iterate over the string from the end to the beginning, calculating the minimum extra characters needed for every starting index.
We use a DP array (dp
) where dp[start]
represents the minimum extra characters required for the substring s[start:]
. Initially, for every index, we assume the current character is extra. Then, we check all possible substrings starting at the current index. If a substring matches a word in the dictionary, we update dp[start]
to reflect the better result. Finally, the value at dp[0]
gives the minimum extra characters needed for the entire string.
Compared to the top-down approach, this method avoids recursion and uses iteration, making it straightforward and stack-memory efficient. It iteratively calculates the result starting from the end of the string and moves towards the beginning.
Step-by-Step Algorithm
-
Convert Dictionary to HashSet:
- Create a
HashSet
calledwordSet
from the dictionary for fast lookup of substrings.
- Create a
-
Initialize DP Array:
- Create a
dp
array of sizelength + 1
(wherelength
is the length ofs
). dp[start]
will store the minimum extra characters required for the substrings[start:]
.
- Create a
-
Iterate Backward Over the String:
- Loop from
start = length - 1
to0
:- Initialize
dp[start] = dp[start + 1] + 1
.- This assumes the current character at
start
is extra.
- This assumes the current character at
- Initialize
- Loop from
-
Check Substrings:
- For each
start
, loop over all possible substrings starting atstart
:- This step Tries forming substrings starting from the current index.
- For
end
ranging fromstart
tolength - 1
:- Extract
substring = s.substring(start, end + 1)
. - If
substring
exists inwordSet
:- Update
dp[start] = Math.min(dp[start], dp[end + 1])
.
- Update
- Extract
- For each
-
Return Final Result:
- The result is stored in
dp[0]
, representing the minimum extra characters required for the entire string.
- The result is stored in
Algorithm Walkthrough
Input:
- String:
s = "thedogbarksatnight"
. - Dictionary:
dictionary = ["dog", "bark", "night"]
.
Initialization:
- String Length:
length = 18
. - Dictionary as HashSet:
wordSet = {"dog", "bark", "night"}
. - DP Array:
dp = [0, 0, ..., 0]
(size 19, initialized with 0).
Steps:
Step 1: Start iterating from the end of the string.
-
At
start = 18
(beyond the end of the string):dp[18] = 0
(Base case).
-
At
start = 17
(character = 't'):- Initialize
dp[17] = dp[18] + 1 = 1
. - Substrings starting at
17
:"t"
.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[17] = 1
.
- Initialize
-
At
start = 16
(character = 'h'):- Initialize
dp[16] = dp[17] + 1 = 2
. - Substrings starting at
16
:"h"
,"ht"
.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[16] = 2
.
- Initialize
-
At
start = 15
(character = 'g'):- Initialize
dp[15] = dp[16] + 1 = 3
. - Substrings starting at
15
:"g"
,"gh"
,"ght"
.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[15] = 3
.
- Initialize
-
At
start = 14
(character = 'i'):- Initialize
dp[14] = dp[15] + 1 = 4
. - Substrings starting at
14
:"i"
,"ig"
,"igh"
,"ight"
.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[14] = 4
.
- Initialize
Step 2: Handle substring "night"
at index 13.
-
At
start = 13
(character = 'n'):- Initialize
dp[13] = dp[14] + 1 = 5
. - Substrings starting at
13
:"n"
,"ni"
,"nig"
,"nigh"
,"night"
."night"
matcheswordSet
.- Update
dp[13] = dp[18] = 0
.
- Result:
dp[13] = 0
.
- Initialize
-
At
start = 12
(character = 't'):- Initialize
dp[12] = dp[13] + 1 = 1
. - Substrings starting at
12
:"t"
,"tn"
,"tni"
,"tnig"
,"tnigh"
.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[12] = 1
.
- Initialize
Step 3: Handle substring "bark"
at index 6.
-
At
start = 6
(character = 'b'):- Initialize
dp[6] = dp[7] + 1 = 6 + 1 = 7
. - Substrings starting at
6
:"b"
,"ba"
,"bar"
,"bark"
."bark"
matcheswordSet
.- Update
dp[6] = dp[10] = 3
.
- Result:
dp[6] = 3
.
- Initialize
-
At
start = 5
(character = 'g'):- Initialize
dp[5] = dp[6] + 1 = 4
. - Substrings starting at
5
:"g"
,"gb"
,"gba"
,"gbar"
,"gbark"
.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[5] = 4
.
- Initialize
Step 4: Handle substring "dog"
at index 3.
-
At
start = 3
(character = 'd'):- Initialize
dp[3] = dp[4] + 1 = 5 + 1 = 6
. - Substrings starting at
3
:"d"
,"do"
,"dog"
."dog"
matcheswordSet
.- Update
dp[3] = dp[6] = 3
.
- Result:
dp[3] = 3
.
- Initialize
-
At
start = 2
(character = 'e'):
- Initialize
dp[2] = dp[3] + 1 = 4
. - Substrings starting at
2
:"e"
,"ed"
,"edo"
,"edog"
, etc.- None of the substrings match
wordSet
.
- None of the substrings match
- Result:
dp[2] = 4
.
Step 5: Continue Backward to the Beginning.
- At
start = 0
(character = 't'):- Initialize
dp[0] = dp[1] + 1 = 5 + 1 = 6
. - Substrings starting at
0
:"t"
,"th"
,"the"
,"thed"
,"thedo"
,"thedog"
.- No valid substring.
- Result:
dp[0] = 6
.
- Initialize
Final DP Array:
dp = [6, 5, 4, 3, 5, 4, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]
.
Final Result:
- Minimum Extra Characters:
dp[0] = 6
. - Explanation:
"dog"
,"bark"
, and"night"
are matched from the dictionary.- Extra characters are
't'
,'h'
,'e'
,'s'
,'a'
,'t'
.
Code
Complexity Analysis
Time Complexity
-
Outer Loop:
- Iterates over all starting indices of the string, contributing O(N) operations.
-
Inner Loop:
- For each
start
, the inner loop iterates over all possible substrings starting atstart
, contributing O(N) operations.
- For each
-
Substring Extraction:
- Extracting substrings with
s.substring(start, end + 1)
takes O(N) time in the worst case.
- Extracting substrings with
-
Overall Time Complexity:
- Combining all the factors, the time complexity is O(N³).
Space Complexity
-
DP Array:
- The
dp
array of sizeN + 1
contributes O(N) space.
- The
-
HashSet for Dictionary:
- Storing the
dictionary
in aHashSet
requires O(M · K) space, whereM
is the average length of words in the dictionary andK
is the number of words.
- Storing the
-
Overall Space Complexity: O(N + M · K).
Solution 3: Top Down Dynamic Programming with Trie
This solution uses a top-down dynamic programming approach with a Trie to efficiently determine the number of extra characters in a given substring that cannot form words in the dictionary. The Trie allows us to quickly validate substrings against dictionary words, while dynamic programming ensures that overlapping subproblems are solved only once.
The approach works as follows:
- First, a Trie is built from the given dictionary words, where each node in the Trie represents a character in a dictionary word, and the end of a word is marked with a flag.
- Then, the string
s
is processed recursively from left to right using a dp function. At each index, the function either counts the current character as extra and moves to the next index, or it tries to match substrings starting at the current index with words in the Trie. If a match is found, the remaining portion of the string is processed recursively. - Memoization is used to store the results of subproblems for each starting index, ensuring that each state is computed only once.
This approach is optimal because it combines the power of Tries (for fast substring matching) and dynamic programming (to reuse previous computations). By avoiding the repeated recomputation of substring matches, the solution achieves a significant speedup compared to naive methods.
Step-by-Step Algorithm
-
Build a Trie from the Dictionary:
- Create a root node for the Trie.
- For each word in the dictionary:
- Traverse through its characters.
- Add each character as a child node of the current node if it doesn't already exist.
- Mark the last node of the word as the end of a word.
-
Initialize the DP Array:
- Create a
memo
array of sizen + 1
, wheren
is the length of the strings
. - Initialize all values in
memo
tonull
to indicate uncomputed states.
- Create a
-
Recursive DP Function:
- Define a function
dp(start)
that computes the number of extra characters in the substring starting at indexstart
. - Base Case:
- If
start == n
, return0
, as no characters are left to process.
- If
- Memoization:
- If
memo[start]
is notnull
, return its value to avoid redundant computation.
- If
- Count Extra Characters:
- Assume the character at
start
is extra and move to the next index, initializingminExtra
asdp(start + 1) + 1
.
- Assume the character at
- Trie Matching:
- Starting from the root of the Trie, traverse the Trie while iterating through characters of the substring
s[start:]
. - If a Trie node doesn't exist for a character, break out of the loop, as there is not valid substring starting from the current index.
- If a Trie node marks the end of a word, update
minExtra
with a minimum ofminExtra
and result returned after recursively processing the remaining substring.
- Starting from the root of the Trie, traverse the Trie while iterating through characters of the substring
- Define a function
-
Return the Final Result:
- Call
dp(0)
to compute the result for the entire string.
- Call
Algorithm Walkthrough
Input:
- String:
s = "thedogbarksatnight"
. - Dictionary:
dictionary = ["dog", "bark", "night"]
.
Step 1: Trie Construction
-
Insert
"dog"
into the Trie:- Add nodes for
'd'
,'o'
,'g'
. - Mark
'g'
as the end of a word.
- Add nodes for
-
Insert
"bark"
into the Trie:- Add nodes for
'b'
,'a'
,'r'
,'k'
. - Mark
'k'
as the end of a word.
- Add nodes for
-
Insert
"night"
into the Trie:- Add nodes for
'n'
,'i'
,'g'
,'h'
,'t'
. - Mark
't'
as the end of a word.
- Add nodes for
Step 2: Start DP Recursion from Index 0
- The recursion starts with
start = 0
.
-
Call
dp(0)
:- Base Case:
- The function calls itself recursively with
start = 1
:- This pattern continues until
start = 18
.
- This pattern continues until
- The function calls itself recursively with
- Base Case:
-
At
start = 18
:- This is the base case:
start == length
. No characters remain, so return0
.
- This is the base case:
Step 3: Process Substrings from the End
After reaching the base case, the recursion starts processing substrings backward (from start = 17
to start = 0
).
-
Processing
start = 17
:- The character is
't'
. - Initialize
minExtra = dp(18) + 1 = 1
. Assume't'
is extra. - No valid substrings start from
't'
, as it does not match any word in the Trie. - Store and Return:
memo[17] = 1
.
- The character is
-
Processing
start = 16
:- The character is
'h'
. - Initialize
minExtra = dp(17) + 1 = 2
. Assume'h'
is extra. - No valid substrings start from
'h'
, as it does not match any word in the Trie. - Store and Return:
memo[16] = 2
.
- The character is
-
Processing
start = 15
:- The character is
'g'
. - Initialize
minExtra = dp(16) + 1 = 3
. Assume'g'
is extra. - No valid substrings start from
'g'
, as it does not match any word in the Trie. - Store and Return:
memo[15] = 3
.
- The character is
Step 4: Process Substring "night"
at start = 13
memo = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 3, 2, 1, 0]
- Processing
start = 13
:- The character is
'n'
. - Initialize
minExtra = dp(14) + 1 = 4
. Assume'n'
is extra. - Trie Traversal:
- Substrings:
"n"
,"ni"
,"nig"
,"nigh"
,"night"
."night"
matches in the Trie.- Update
minExtra = dp(18) = 0
.
- Substrings:
- Store and Return:
memo[13] = 0
.
- The character is
Step 5: Process Substring "bark"
at start = 6
memo = [ 0, 0, 0, 0, 0, 0, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]
- Processing
start = 6
:- The character is
'b'
. - Initialize
minExtra = dp(7) + 1 = 6 + 1 = 7
. Assume'b'
is extra. - Trie Traversal:
- Substrings:
"b"
,"ba"
,"bar"
,"bark"
."bark"
matches in the Trie.- Update
minExtra = dp(10) = 3
.
- Substrings:
- Store and Return:
memo[6] = 3
.
- The character is
Step 6: Process Substring "dog"
at start = 3
memo = [ 0, 0, 0, 0, 5, 4, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]
- Processing
start = 3
:- The character is
'd'
. - Initialize
minExtra = dp(4) + 1 = 5 + 1 = 6
. Assume'd'
is extra. - Trie Traversal:
- Substrings:
"d"
,"do"
,"dog"
."dog"
matches in the Trie.- Update
minExtra = dp(6) = 3
.
- Substrings:
- Store and Return:
memo[3] = 3
.
- The character is
Step 7: Process Substring from start = 0
- Processing
start = 0
:- The character is
't'
. - Initialize
minExtra = dp(1) + 1 = 5 + 1 = 6
. Assume't'
is extra. - Trie Traversal:
- Substrings:
"t"
,"th"
,"the"
,"thed"
,"thedo"
,"thedog"
.- None of substrings are valid.
- Substrings:
- Return:
memo[0] = 6
.
- The character is
Final Result
- Memoization Array:
memo = [ 6, 5, 4, 3, 5, 4, 3, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]
- Output:
6
, indicating six extra characters ('t'
,'h'
,'e'
,'s'
,'a'
,'t'
).
Code
Complexity Analysis
Time Complexity
- DP Method:
- The
dp
method is called forN + 1
unique states, whereN
is the length of the string. - Each state involves iterating over the substring starting at the current index, which takes O(N) time in the worst case.
- Thus, the cost of the
dp
method is O(N^2).
- The
- Building the Trie:
- Constructing the Trie requires inserting
K
words, where each word has an average length ofM
. - The cost of building the Trie is O(M · K).
- Constructing the Trie requires inserting
Overall Time Complexity: O(N^2 + M · K).
Space Complexity
- Trie:
- The Trie requires O(M · K) space to store the dictionary words.
- Memoization:
- The
memo
array has a size ofN + 1
, consuming O(N) space.
- The
- Recursive Stack:
- The recursion depth is at most
N
, adding O(N) stack space.
- The recursion depth is at most
Overall Space Complexity: O(N + M · K).
Solution 4: Bottom Up Dynamic Programming with Trie
This solution uses bottom-up dynamic programming with a Trie to efficiently calculate the number of extra characters in a given string that cannot form words from a provided dictionary. The Trie enables fast lookups for dictionary words while avoiding the need to repeatedly recompute substring matches. The dynamic programming (DP) array ensures that the solution is computed incrementally from the end of the string toward the beginning.
Here’s the approach:
- Trie Construction: A Trie is built from the dictionary, where each character of a word is represented as a node. Words that exist in the dictionary are marked with an
isEnd
flag at the last node. - DP Initialization: A DP array
dp
is created to store the minimum extra characters for substrings starting from different indices. - Iterative Processing: Starting from the last character of the string, each character is assumed to be extra initially. Then, substrings starting from the current index are checked against the Trie. If a substring matches a word in the Trie, the DP value is updated to reflect fewer extra characters.
- Result: The final result is stored in
dp[0]
, representing the minimum extra characters required for the entire string.
This approach is optimal because it combines efficient substring matching using the Trie and incremental computation with the DP array.
Step-by-Step Algorithm
-
Trie Construction:
- Initialize the root of the Trie.
- For each word in the dictionary:
- Start at the root of the Trie.
- For each character in the word:
- If the character doesn’t exist as a child of the current node, create a new node.
- Move to the child node representing the character.
- Mark the last node as the end of the word.
-
Initialize DP Array:
- Create a DP array
dp
of sizen + 1
, wheren
is the length of the strings
. - Set
dp[n] = 0
since no extra characters are required after the end of the string.
- Create a DP array
-
To calculate the minimum extra characters for each substring starting at every index, we process the string iteratively from the last character to the first:
-
a. Initialize Extra Characters:
- Assume that the character at the current index
start
is extra. - Set
dp[start] = dp[start + 1] + 1
. This means that the cost for the substring starting atstart
is at least one more than the cost for the substring starting atstart + 1
.
- Assume that the character at the current index
-
b. Traverse the Trie:
- Begin from the root of the Trie to validate substrings starting from
start
. - Iterate through each possible
end
index starting fromstart
ton - 1
:- Check Trie Validity:
- If the character
s[end]
is not a child of the current Trie node, break the loop, as no further substring starting atstart
can match a dictionary word.
- If the character
- Move to the Next Trie Node:
- If the character
s[end]
exists in the Trie, move to the child node fors[end]
.
- If the character
- Check for a Complete Word:
- If the current Trie node marks the end of a word, this substring matches a dictionary word.
- Update
dp[start] = min(dp[start], dp[end + 1])
, as including this valid word reduces the number of extra characters needed for the remaining substring.
- Check Trie Validity:
- Begin from the root of the Trie to validate substrings starting from
-
c. Continue to the Next Start Index:
- Repeat the process for the next
start
index until the first character is processed. The final result will be stored indp[0]
.
- Repeat the process for the next
- Return the Result:
- Return
dp[0]
as the result, which represents the minimum extra characters for the entire string.
- Return
Code
Complexity Analysis
Time Complexity
-
Dynamic Programming (DP) Operations:
- The
dp
array is computed for each indexstart
in the strings
. For eachstart
, we iterate over potentialend
values (start
ton - 1
) to form substrings. - This results in a total of O(N²) iterations since the outer loop runs
N
times, and the inner loop runs up toN
times in the worst case.
- The
-
Building the Trie:
- For each word in the dictionary (total of
K
words), we insert its characters into the Trie. Each word has an average length ofM
. - Thus, building the Trie takes O(M ⋅ K).
- For each word in the dictionary (total of
Overall Time Complexity: O(N² + M ⋅ K).
Space Complexity
-
Trie Storage:
- The Trie stores
K
words, each with an average length ofM
. Each character is represented by a node in the Trie. - Hence, the Trie requires O(M ⋅ K) space.
- The Trie stores
-
DP Array:
- The
dp
array has a size ofN + 1
to store results for each index of the string. - This contributes O(N) space.
- The
Overall Space Complexity: O(N + M ⋅ K).
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible