最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Extract HTML Page variable from script-tag in Javascript - Stack Overflow

programmeradmin1浏览0评论

How can I extract a variable from a script tag of the page from a returned HTML Page in Javasc./Typescript?

My API request to the Server: const response = await fetch( ... )

The response contains a big HTML Page, here just an example:

<h1>Willkommen auf der Seite für Steam App Daten</h1>

<script type="text/javascript">
  var g_rgAppContextData = {
    "730": {
      "appid": 730,
      "name": "Counter-Strike 2",
      "icon": ".jpg",
      "link": ""
    }
  };
  var g_rgCurrency = [];
</script>

How can I extract a variable from a script tag of the page from a returned HTML Page in Javasc./Typescript?

My API request to the Server: const response = await fetch( ... )

The response contains a big HTML Page, here just an example:

<h1>Willkommen auf der Seite für Steam App Daten</h1>

<script type="text/javascript">
  var g_rgAppContextData = {
    "730": {
      "appid": 730,
      "name": "Counter-Strike 2",
      "icon": "https://cdn.fastly.steamstatic/steamcommunity/public/images/apps/730/8dbc71957312bbd3baea65848b545be9eae2a355.jpg",
      "link": "https://steamcommunity/app/730"
    }
  };
  var g_rgCurrency = [];
</script>

I only want to extract the Variable g_rgAppContextData without anything else. I know, that i can select the script tag with getElementsByTagName("script") but what if there are 2 script tags? And how to select only the Variable?

Share Improve this question edited Jan 19 at 9:37 Zach Jensz 4,0786 gold badges17 silver badges31 bronze badges asked Jan 18 at 12:26 UsAA12UsAA12 695 bronze badges 4
  • The variable g_rgAppContextData that you define is globally available as you defined it here, so there is no reason to get the <script> element and "interpret" the text content, if that is what you mean? – chrwahl Commented Jan 18 at 12:31
  • @chrwahl I have not defined it, i only get it as a response from the server. that means i have to extract it from the response html page. – UsAA12 Commented Jan 18 at 12:43
  • 1 You can make use of [...document.querySelectorAll("script")].find(elem => elem.innerText.includes('g_rgAppContextData')) instead of tagname. Then parse/extract the object with a regex or evaluate the script, if you trust the source. – Christopher Commented Jan 18 at 12:57
  • @Christopher Yes that's right, thanks for the help! The answer from a user did exactly that and it works as desired. – UsAA12 Commented Jan 18 at 13:22
Add a comment  | 

1 Answer 1

Reset to default 4

Since the pages you want to scrape follow a certain pattern, it seems possible to make a number of simplifying assumptions about the structure of the returned HTML:

  • The desired variable is assigned a constant value in JSON format (in particular, member names like "730" are quoted).
  • The HTML page contains only one assignment for this variable.
  • A semicolon follows immediately after the closing }.
  • The member names and string values do not contain the sequence };.

Let me know if these assumptions are not justified in your case.

Under these assumptions, you can extract the variable value with a regular expression and parse it as JSON:

const response = await fetch("...");
const html = await response.text();
const g_rgAppContextData = JSON.parse(
  html.match(/g_rgAppContextData\s*=\s*(\{.*?\});/s)[1]
);
发布评论

评论列表(0)

  1. 暂无评论