Solution: Index Pairs of a String

Grokking Data Structures & Algorithms for Coding Interviews

Ask Author

Back to course home

0% completed

Vote For New Content

Solution: Index Pairs of a String

Table of Contents

Contents are not accessible

Problem Statement

Given a string text and a list of strings words, identify all [i, j] index pairs such that the substring text[i...j] is in words.

These index pairs should be returned in ascending order, first by the start index, then by the end index. Find every occurrence of each word within the text, ensuring that overlapping occurrences are also identified.

Examples

- Input: text = "bluebirdskyscraper", words = ["blue", "bird", "sky"]
- Expected Output: [[0, 3], [4, 7], [8, 10]]
- Justification: The word "blue" is found from index 0 to 3, "bird" from 4 to 7, and "sky" from 8 to 10 in the string.
- Input: text = "programmingisfun", words = ["pro", "is", "fun", "gram"]
- Expected Output: [[0, 2], [3, 6], [11, 12], [13, 15]]
- Justification: "pro" is found from 0 to 2, "gram" from 3 to 6, "is" from 11 to 12, and "fun" from 13 to 15.
- Input: text = "interstellar", words = ["stellar", "star", "inter"]
- Expected Output: [[0, 4], [5, 11]]
- Justification: "inter" is found from 0 to 4, and "stellar" from 5 to 11. "star" is not found.

Constraints:

1 <= text.length <= 100
1 <= words.length <= 20
1 <= words[i].length <= 50
text and words[i] consist of lowercase English letters.
All the strings of words are unique.

Solution

To solve this problem, we will use a trie data structure, which is particularly efficient for managing a set of strings and performing quick searches for patterns within a text. A trie, also known as a prefix tree, allows us to efficiently store and retrieve strings with a common prefix, making it ideal for our purpose of identifying substrings within a given text.

The approach involves two main phases: building the trie with the given list of words and then using it to find the index pairs in the text. The trie's structure enables us to search for each word in the text in an optimized manner, significantly reducing the number of comparisons needed compared to brute force methods.

Step-by-Step Algorithm

Initialize the Trie:
- Create a trie and insert all the words from the given list into it. This is done in the first part of the code.
Search and Record Index Pairs:
- Iterate over each character in the text. This character will act as the starting point for potential matches.
- For each starting character, initiate a pointer p at the root of the trie.
- Then, iterate over the text starting from the current starting point until the end of the text. In each iteration:
  - Check if the current character exists as a child of the current trie node pointed by p.
  - If it does not exist, break out of the inner loop as no further matching is possible from this starting point.
  - If it exists, move the pointer p to this child node.
  - Check if the current node p is a leaf node (indicated by p.isEnd being true). If it is, it means a complete word from the list has been found.
  - Record the start and end indices of this word. The start index is the position of the starting character, and the end index is the current position in the text.
- Continue this process for each character in the text to ensure all occurrences, including overlapping ones, are found.

Algorithm Walkthrough

Using the input text = "programmingisfun", words = ["pro", "is", "fun", "gram"]:

Initialize the Trie with Words:
- "pro", "is", "fun", and "gram" are inserted into the trie.
Search and Record Index Pairs:
- Start at index 0 ('p' in "programmingisfun"):
  - Pointer p is at the root. Finds 'p', moves to 'r', then 'o'. p.isEnd is true at 'o', so record [0, 2].
- At index 1 ('r'), no match is found.
- similarly, At index 2 ('o'), no match is found.
- At index 3 ('g'):
  - Finds 'g', 'r', 'a', 'm'. p.isEnd is true at 'm', so record [3, 6].
- At index 11 ('i'):
  - Finds 'i', 's'. p.isEnd is true at 's', so record [11, 12].
- At index 13 ('s'):
  - Finds 'f', 'u', 'n'. p.isEnd is true at 'n', so record [13, 15].
Compile Results:
- The results [[0, 2], [3, 6], [11, 12], [13, 15]] are compiled into the 2D array ans and returned.

By following these steps, the algorithm efficiently locates all occurrences of the words in the text, including overlapping ones, using the trie structure for optimized searching.

Code

Python3

. . . .

Time and Space Complexity Analysis

Time Complexity:

Building the Trie: $O(N * L)$ , where N is the number of words and L is the average length of these words.
Finding Index Pairs: $O(T^2)$ , where T is the length of the text string.
Overall: $O(N * L + T^2)$

Space Complexity:

Trie Storage: $O(N * L)$ , for storing N words each of average length L.

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible