algorithm - Efficient Approach to Decompressed String Substring Problem

The problem involves decompressing a compressed string and outputting the substring in the range of L and R. For example:

A(BB(CC)) -> ABBCCCCBBCCCC

L = 52, R = 60, ((((((((IMTOOSTRONG)))))))) -> IMTOOSTRONGIMTOOSTRONGIMTOOSTRONG...

MTOOSTRON

Here, the characters inside the parentheses are repeated, and the goal is to decompress the string and then output the substring in the given range [L, R].

The challenge is that the range [L, R] can go up to 1 billion, so decompressing the entire string would result in TLE (Time Limit Exceeded). I can’t figure out how to approach this problem efficiently.

How can I efficiently extract a substring from a nested compressed string without fully decompressing it, given that the range [L, R] can be as large as 1 billion?

I attempted to think about strategies like: Using recursion or a stack-based approach to simulate decompression. Counting the length of the decompressed string without fully expanding it, to decide which parts of the string are relevant for the [L, R] range.

The problem involves decompressing a compressed string and outputting the substring in the range of L and R. For example:

A(BB(CC)) -> ABBCCCCBBCCCC

L = 52, R = 60, ((((((((IMTOOSTRONG)))))))) -> IMTOOSTRONGIMTOOSTRONGIMTOOSTRONG...

MTOOSTRON

Here, the characters inside the parentheses are repeated, and the goal is to decompress the string and then output the substring in the given range [L, R].

How can I efficiently extract a substring from a nested compressed string without fully decompressing it, given that the range [L, R] can be as large as 1 billion?

Share Improve this question edited Nov 23, 2024 at 0:08 asked Nov 20, 2024 at 14:43 장승범 213 bronze badges

1 (I think one of your approaches promising.) – greybeard Commented Nov 20, 2024 at 14:53
1 Put efficient to the back of your mind. What is the simplest thing that could possibly work acceptably? How can you check it works as specified? Check automatically, at that? What are its resource requirements? R being large is no problem when L is small: the output would be large. What about large L? – greybeard Commented Nov 20, 2024 at 15:05
2 Please provide enough code so others can better understand or reproduce the problem. – Community Bot Commented Nov 20, 2024 at 18:26
1 I don't get "the strong example" - 1) does the string start at 1 or at 0? 2) R seems to be interpreted to be inclusive - are the numbers represented in regular decimal? – greybeard Commented Nov 23, 2024 at 4:30
1 For the "strong example", I would expect the result to be "ONGIMTOOS", not "MTOOSTRON". The repeated string "IMTOOSTRONG" has 11 characters, so it repeats at indices 11, 22, 33, 44, 55, ... At index 52, we are three characters before the next repetition, so the resulting substring should start with "ONG". I don't understand why you say it should be "MTOOSTRON". Please clarify. – trincot Commented Nov 23, 2024 at 8:07

| Show 8 more comments

1 Answer 1

Sorted by: Reset to default 1

I would suggest building a tree where the root represents the whole string, and every child represents either a substring represented by parentheses, or a literal string (without parentheses). The leaves of this tree are the literal strings.

Each node would also store the size of the (sub)string it represents (when decompressed).

To represent the repetition factor, you could add another node property that is either 1 or 2. Or, alternatively, you could define the same child node (reference) as a duplicated child of its parent: this creates two edges between the same node pair, so giving that relation a cardinality of 2. Note that this latter approach would turn the tree into a polytree.

Finally define a substring method on nodes which can then use recursion to retrieve the string representations that are relevant for the given range.

Here is an implementation in a JavaScript snippet:

class Node {
    constructor() {
        this.length = 0; // Length of represented string (in uncompressed form)
        this.children = []; // List of strings and child nodes
    }
    addChild(child) { // child could be a Node or a string
        this.length += child.length;
        this.children.push(child);
    }
    substring(start, stop) { // `stop` is excluded from the range
        if (start >= this.length || stop <= 0 || start >= stop) return "";
        let size = stop - start;
        return this.children.map(child => {
            // If child is a string, the native substring method is called here,
            //    otherwise it is a recursive call:
            const s = child.substring(start, start + size);
            start -= child.length;
            return s;
        }).join("");
    }
}

function getTokenIterator(compressed) {
    // Return an iterator over the substrings that either 
    //    are "(", ")" or a series of other characters
    return compressed.match(/[^()]+|./gs).values();
}

function createGraph(tokens) {
    const node = new Node;
    while (true) {
        const token = tokens.next().value;
        if (!token || token === ")") return node;
        if (token === "(") {
            const child = createGraph(tokens);
            // Add child twice to reflect repetition
            node.addChild(child);
            node.addChild(child);
        } else {
            node.addChild(token);
        }
    }
}

function getSlice(compressed, left, right) {
    const tokens = getTokenIterator(compressed);
    const root = createGraph(tokens);
    // As `right` is inclusive in the range, add 1:
    return root.substring(left, right+1);
}

// Example runs
console.log(getSlice("AA(BB(CC))", 3, 8)); // BCCCCB
console.log(getSlice("((((((((IMTOOSTRONG))))))))", 52, 60));

Complexity analysis

Let

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

algorithm - Efficient Approach to Decompressed String Substring Problem - Stack Overflow

1 Answer 1

Complexity analysis

与本文相关的文章

评论列表(0)