How can I extract a variable from a script tag of the page from a returned HTML Page in Javasc./Typescript?
My API request to the Server:
const response = await fetch( ... )
The response contains a big HTML Page, here just an example:
<h1>Willkommen auf der Seite für Steam App Daten</h1>
<script type="text/javascript">
var g_rgAppContextData = {
"730": {
"appid": 730,
"name": "Counter-Strike 2",
"icon": ".jpg",
"link": ""
}
};
var g_rgCurrency = [];
</script>
How can I extract a variable from a script tag of the page from a returned HTML Page in Javasc./Typescript?
My API request to the Server:
const response = await fetch( ... )
The response contains a big HTML Page, here just an example:
<h1>Willkommen auf der Seite für Steam App Daten</h1>
<script type="text/javascript">
var g_rgAppContextData = {
"730": {
"appid": 730,
"name": "Counter-Strike 2",
"icon": "https://cdn.fastly.steamstatic/steamcommunity/public/images/apps/730/8dbc71957312bbd3baea65848b545be9eae2a355.jpg",
"link": "https://steamcommunity/app/730"
}
};
var g_rgCurrency = [];
</script>
I only want to extract the Variable g_rgAppContextData without anything else. I know, that i can select the script tag with getElementsByTagName("script") but what if there are 2 script tags? And how to select only the Variable?
Share Improve this question edited Jan 19 at 9:37 Zach Jensz 4,0786 gold badges17 silver badges31 bronze badges asked Jan 18 at 12:26 UsAA12UsAA12 695 bronze badges 4 |1 Answer
Reset to default 4Since the pages you want to scrape follow a certain pattern, it seems possible to make a number of simplifying assumptions about the structure of the returned HTML:
- The desired variable is assigned a constant value in JSON format (in particular, member names like
"730"
are quoted). - The HTML page contains only one assignment for this variable.
- A semicolon follows immediately after the closing
}
. - The member names and string values do not contain the sequence
};
.
Let me know if these assumptions are not justified in your case.
Under these assumptions, you can extract the variable value with a regular expression and parse it as JSON:
const response = await fetch("...");
const html = await response.text();
const g_rgAppContextData = JSON.parse(
html.match(/g_rgAppContextData\s*=\s*(\{.*?\});/s)[1]
);
g_rgAppContextData
that you define is globally available as you defined it here, so there is no reason to get the<script>
element and "interpret" the text content, if that is what you mean? – chrwahl Commented Jan 18 at 12:31[...document.querySelectorAll("script")].find(elem => elem.innerText.includes('g_rgAppContextData'))
instead of tagname. Then parse/extract the object with a regex or evaluate the script, if you trust the source. – Christopher Commented Jan 18 at 12:57