Solution: Remove Duplicate Letters

Grokking the Coding Interview: Patterns for Coding Questions

Ask Author

Back to course home

0% completed

Vote For New Content

Solution: Remove Duplicate Letters

Table of Contents

Contents are not accessible

Problem Statement

Given a string s, remove all duplicate letters from the input string while maintaining the original order of the letters.

Additionally, the returned string should be the smallest in lexicographical order among all possible results.

A string is in the smallest lexicographical order if it appears first in a dictionary. For example, "abc" is smaller than "acb" because "abc" comes first alphabetically.

Examples:

- Input: "babac"
- Expected Output: "abc"
- Justification:
  - After removing 1 b and 1 a from the input string, we can get bac, and abc strings. The final answer is 'abc', which is the smallest lexicographical string without duplicate letters.
- Input: "zabccde"
- Expected Output: "zabcde"
- Justification: Removing one of the 'c's forms 'zabcde', the smallest string in lexicographical order without duplicates.
- Input: "mnopmn"
- Expected Output: "mnop"
- Justification: Removing the second 'm' and 'n' gives 'mnop', which is the smallest possible string without duplicate characters.

Constraints:

1 <= s.length <= 10<sup>4</sup>
s consists of lowercase English letters.

Solution

To solve the given problem, begin by iterating through the string, and calculate the frequency count for each character. We will use the stack to maintain the order of characters and the set to track the uniqueness of characters.

Next, start traversing the string, and for every character encountered, check if it's already in the 'present' set. If it's not, compare it with the top element of the 'result' stack. If the stack is not empty, and the current character is lexicographically smaller than the stack's top, and the top character of the stack appears again in the string (indicated by a non-zero frequency count), repeatedly pop the stack and remove those elements from the 'present' set. Then, add the current character to the stack and the 'present' set.

This process, facilitated by the stack and set, ensures that the stack is built with unique characters in the smallest lexicographical order. After processing the entire string, pop elements from the stack to construct the final string, thereby obtaining the smallest lexicographical sequence without duplicate characters.

Frequency Count (count):
- Initialize a count dictionary (or hash map in some languages) to store the frequency of each character in the string s.
Character Presence Tracking (present):
- Use a set present to keep track of the characters that have been added to the resultant string. This set prevents duplicate characters in the result.
Building the Result (result):
- Create a stack result to construct the final string.
- For each character c in the string s:
  - If c is not in present, proceed to compare it with the top character of result.
  - While result is not empty, and c is lexicographically smaller than the top character of result, and the frequency of the top character of result is more than 0 (indicating it appears again in the string), pop the top character from result and remove it from present.
  - Push c onto result and add it to present.
  - Decrement the frequency of c in count.
Result Construction:
- The stack result now contains the characters of the final string in reverse order. Pop elements from result to construct the output string in the correct order, from left to right.

This approach works because it ensures that the smallest character is placed first, respecting the original order and removing duplicates. The frequency count ensures that characters are not wrongly discarded.

Algorithm Walkthrough:

Input: "zabccde"

Initialization:
- Calculate the frequency of each character: {'z': 1, 'a': 1, 'b': 1, 'c': 2, 'd': 1, 'e': 1}.
- Initialize an empty stack result and a set present to keep track of characters already in result.
Iteration Over Characters:
- 'z': Not in present. Add 'z' to result, add to present. Decrease frequency of 'z'.
- 'a': Not in present. As 'a' is smaller than 'z', but 'z' won't appear again (frequency is now 0), we keep 'z'. Add 'a' to result, add to present. Decrease frequency of 'a'.
- 'b': Not in present. Add 'b' to result, add to present. Decrease frequency of 'b'.
- First 'c': Not in present. Add 'c' to result, add to present. Decrease frequency of 'c'.
- Second 'c': Already in present. Skip it.
- 'd': Not in present. Add 'd' to result, add to present. Decrease frequency of 'd'.
- 'e': Not in present. Add 'e' to result, add to present. Decrease frequency of 'e'.
Result Construction:
- The result stack now contains ['z', 'a', 'b', 'c', 'd', 'e'].
- Convert the stack to a string to get the final result: "zabcdef".

Expected Output: "zabcdef"

Code

Here is the code for this algorithm:

Python3

. . . .

Complexity Analysis:

Time Complexity: O(N) where N is the length of the string. Each character is visited once.
Space Complexity: O(1) as the extra space used does not depend on the input size. The character count and the stack size are bounded by the character set size (constant).

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible