最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to cancel a wasm process from within a webworker - Stack Overflow

programmeradmin1浏览0评论

I have a wasm process (piled from c++) that processes data inside a web application. Let's say the necessary code looks like this:

std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
    process_data(data[i]);

    if (i % 1000 == 0) {
        bool is_cancelled = check_if_cancelled();
        if (is_cancelled) {
            break;
        }
    }

}

This code basically "runs/processes a query" similar to a SQL query interface:

However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.

My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and municate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).

What would be an example way to municate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?

Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.


Update: I think there are the following possible solutions here based on some research (and the initial answers/ments below) with some feedback on them:

  1. Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...

  2. Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.

  3. Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/puter).

  4. Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.

  5. Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.

  6. Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.

  7. [Anything else?]

In the bounty then I was wondering three things:

  1. If the above six analyses seem generally valid?
  2. Are there other (perhaps better) approaches I'm missing?
  3. Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.

I have a wasm process (piled from c++) that processes data inside a web application. Let's say the necessary code looks like this:

std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
    process_data(data[i]);

    if (i % 1000 == 0) {
        bool is_cancelled = check_if_cancelled();
        if (is_cancelled) {
            break;
        }
    }

}

This code basically "runs/processes a query" similar to a SQL query interface:

However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.

My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and municate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).

What would be an example way to municate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?

Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.


Update: I think there are the following possible solutions here based on some research (and the initial answers/ments below) with some feedback on them:

  1. Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...

  2. Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.

  3. Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/puter).

  4. Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.

  5. Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.

  6. Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.

  7. [Anything else?]

In the bounty then I was wondering three things:

  1. If the above six analyses seem generally valid?
  2. Are there other (perhaps better) approaches I'm missing?
  3. Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.
Share Improve this question edited Aug 13, 2019 at 21:51 David542 asked Aug 5, 2019 at 20:02 David542David542 111k206 gold badges571 silver badges1k bronze badges 14
  • "Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker" - could you create two workers? One which holds this data that needs to 'keep alive' and the other which performance the query processing which you can optionally terminate? – ColinE Commented Aug 7, 2019 at 8:05
  • @ColinE -- yes of course! Or a shared worker. However the data is quite large (500MB), is there any overhead in transferring the data from one worker to another? – David542 Commented Aug 7, 2019 at 19:16
  • @ColinE by the way, I'm going to award a 300-500 point bounty for this question, so if you have a good answer that will solve the above (with zero performance overhead), please start posting! – David542 Commented Aug 7, 2019 at 19:17
  • are you using a javascript worker threads when calling your c++ code? – Tomer Commented Aug 7, 2019 at 22:00
  • @Tomer -- yes we are. – David542 Commented Aug 7, 2019 at 23:13
 |  Show 9 more ments

2 Answers 2

Reset to default 5 +225

For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is plex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.

Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).

What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.

Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to plete the cancellation.

I'm attaching a code snippet that may explain this idea.

worker.js:

... init webassembly

onmessage = function(q) {
	// query received from main thread.
	const result = ... call webassembly(q);
	postMessage(result);
}

main.js:

const worker = new Worker("worker.js");
const cancel = false;
const processing = false;

worker.onmessage(function(r) {
	// when worker has finished processing the query.
	// r is the results of the processing.
	processing = false;

	if (cancel === true) {
		// processing is done, but result is not required.
		// instead of showing the results, update that the query was canceled.
		cancel = false;
		... update UI "cancled".
		return;
	}
	
	... update UI "results r".
}

function onCancel() {
	// Occurs when user clicks on the cancel button.
	if (cancel) {
		// sanity test - prevent this in UI.
		throw "already cancelling";
	}
	
	cancel = true;
	
	... update UI "canceling". 
}

function onQuery(q) {
	if (processing === true) {
		// sanity test - prevent this in UI.
		throw "already processing";
	}
	
	processing = true;
	// Send the query to the worker.
	// When the worker receives the message it will process the query via webassembly.
	worker.postMessage(q);
}

An idea from user experience perspective: You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again bee immediate).

Shared Thread

Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any ining messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.

It would look something like this.

main.js  ->  worker.js  ->  C++ function  ->  worker.js  ->  main.js

Breaking up the Loop

Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory. C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.

int x;
x = 0; // initialized counter at 0

std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
    process_data(data[i]);

    x++ // increment counter
    break; // stop function until told to iterate again starting at x
}

Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.

Canceling the Operation

From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)

let continueOperation = true
// here you can set to false at any time since the thread is not blocked here

worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below

worker.onmessage = function(e) {
    if (continueOperation) {
        worker.expensiveThreadBlockingFunction()
        // execute worker function again, ultimately continuing the increment in C++
    } {
        return false
        // or send message to worker to reset C++ counter to prepare for next execution
    }
}

Continuing the Operation

Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has pleted, or needs to continue, so you don't keep blocking the worker thread.

发布评论

评论列表(0)

  1. 暂无评论