最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to extractparse HTML using Microdata - Stack Overflow

programmeradmin0浏览0评论

I am pretty new to Microdata.

I have a HTML string with Microdata. I am trying to figure out if it's possible to extract the required information dynamically using Microdata with JS or jQuery. Has anyone done this before?

Example HTML string: I am trying to get the 'content' corresponding to itemprop 'ratingValue' for item prop-name 'Blendmagic'

<html>
    <div itemscope itemtype="">
        <span itemprop="name">Blendmagic</span>
        <span itemprop="price">$19.95</span>
        <div itemprop="reviews" itemscope itemtype="">
            <img src="four-stars.jpg" />
            <meta itemprop="ratingValue" content="4" />
            <meta itemprop="bestRating" content="5" />
            Based on <span itemprop="ratingCount">25</span> user ratings
        </div>
    </div>
    <div itemscope itemtype="">
        <span itemprop="name">testMagic</span>
        <span itemprop="price">$10.95</span>
        <div itemprop="reviews" itemscope itemtype="">
            <img src="four-stars.jpg" />
            <meta itemprop="ratingValue" content="4" />
            <meta itemprop="bestRating" content="5" />
            Based on <span itemprop="ratingCount">25</span> user ratings
        </div>
    </div>
</html>

I am pretty new to Microdata.

I have a HTML string with Microdata. I am trying to figure out if it's possible to extract the required information dynamically using Microdata with JS or jQuery. Has anyone done this before?

Example HTML string: I am trying to get the 'content' corresponding to itemprop 'ratingValue' for item prop-name 'Blendmagic'

<html>
    <div itemscope itemtype="http://schema/Offer">
        <span itemprop="name">Blendmagic</span>
        <span itemprop="price">$19.95</span>
        <div itemprop="reviews" itemscope itemtype="http://schema/AggregateRating">
            <img src="four-stars.jpg" />
            <meta itemprop="ratingValue" content="4" />
            <meta itemprop="bestRating" content="5" />
            Based on <span itemprop="ratingCount">25</span> user ratings
        </div>
    </div>
    <div itemscope itemtype="http://schema/Offer">
        <span itemprop="name">testMagic</span>
        <span itemprop="price">$10.95</span>
        <div itemprop="reviews" itemscope itemtype="http://schema/AggregateRating">
            <img src="four-stars.jpg" />
            <meta itemprop="ratingValue" content="4" />
            <meta itemprop="bestRating" content="5" />
            Based on <span itemprop="ratingCount">25</span> user ratings
        </div>
    </div>
</html>
Share Improve this question edited May 12, 2015 at 22:51 unor 96.9k28 gold badges225 silver badges381 bronze badges asked May 12, 2015 at 21:02 LearnerLearner 2,35710 gold badges46 silver badges83 bronze badges 5
  • you should use document.getItems() in firefox, maybe a polyfill elsewhere. otherwise, you'll be looking at a bunch of yucky attrib-based code... – dandavis Commented May 12, 2015 at 21:48
  • @dandavis Not so simple to pose; though doable. The native browser methods can return results; though often posers of semantic html include several different versions . g--gle has sch-ma , which has its own format ; vcard ; hcard ; microformats ; data- attributes - and microdata could all be included in a semantic web page . Additionally , authors can define their own "vocabulary" ; the "sch-ma" vocabulary is an example . Some authors may pose pages differently simply based on how g--gle parses page - not necessarily on how page will be parsed by others. – guest271314 Commented May 12, 2015 at 22:00
  • i'm surprised there's not a library for this exact task already. maybe i'll write one. yeah, there's 4-5 formats, but that's only 4-5 formats to "parse"... – dandavis Commented May 12, 2015 at 22:06
  • 1 i started, but i got distracted and now i have to go, but maybe it can help get you going: jsfiddle/j6zsyjzr – dandavis Commented May 12, 2015 at 22:45
  • Couldn’t you just use any of the various JS/jQuery Microdata parsers? If you just need a remendation of a suitable one, you could ask on Software Remendations. – unor Commented May 12, 2015 at 22:52
Add a ment  | 

2 Answers 2

Reset to default 4

Try beginning at the root itemscope node , filter descendant elements having itemprop attributes; return object result containing array items holding Microdata items.

This solution is based on the algorithm found at Microdata

7 Converting HTML to other formats

7.1 JSON

Given a list of nodes nodes in a Document, a user agent must run the following algorithm to extract the microdata from those nodes into a JSON form:

Let result be an empty object.

Let items be an empty array.

For each node in nodes, check if the element is a top-level microdata item, and if it is then get the object for that element and add it to items.

Add an entry to result called "items" whose value is the array items.

Return the result of serializing result to JSON in the shortest possible way (meaning no whitespace between tokens, no unnecessary zero digits in numbers, and only using Unicode escapes in strings for characters that do not have a dedicated escape sequence), and with a lowercase "e" used, when appropriate, in the representation of any numbers. [JSON]

This algorithm returns an object with a single property that is an array, instead of just returning an array, so that it is possible to extend the algorithm in the future if necessary.

When the user agent is to get the object for an item item, optionally with a list of elements memory, it must run the following substeps:

Let result be an empty object.

If no memory was passed to the algorithm, let memory be an empty list.

Add item to memory.

If the item has any item types, add an entry to result called "type" whose value is an array listing the item types of item, in the order they were specified on the itemtype attribute.

If the item has a global identifier, add an entry to result called "id" whose value is the global identifier of item.

Let properties be an empty object.

For each element element that has one or more property names and is one of the properties of the item item, in the order those elements are given by the algorithm that returns the properties of an item, run the following substeps:

Let value be the property value of element.

If value is an item, then: If value is in memory, then let value be the string "ERROR". Otherwise, get the object for value, passing a copy of memory, and then replace value with the object returned from those steps.

For each name name in element's property names, run the following substeps:

If there is no entry named name in properties, then add an entry named name to properties whose value is an empty array.

Append value to the entry named name in properties.

Add an entry to result called "properties" whose value is the object properties.

Return result.

var result = {};
var items = [];
document.querySelectorAll("[itemscope]")
  .forEach(function(el, i) {
    var item = {
      "type": [el.getAttribute("itemtype")],
      "properties": {}
    };
    var props = el.querySelectorAll("[itemprop]");
    props.forEach(function(prop) {
      item.properties[prop.getAttribute("itemprop")] = [
        prop.content || prop.textContent || prop.src
      ];
      if (prop.matches("[itemscope]") && prop.matches("[itemprop]")) {
        var _item = {
          "type": [prop.getAttribute("itemtype")],
          "properties": {}
        };
        prop.querySelectorAll("[itemprop]")
          .forEach(function(_prop) {
            _item.properties[_prop.getAttribute("itemprop")] = [
              _prop.content || _prop.textContent || _prop.src
            ];
          });
        item.properties[prop.getAttribute("itemprop")] = [_item];
      }
    });
    items.push(item)
  })

result.items = items;

console.log(result);

document.body
  .insertAdjacentHTML("beforeend", "<pre>" + JSON.stringify(result, null, 2) + "<pre>");

var props = ["Blendmagic", "ratingValue"];

// get the 'content' corresponding to itemprop 'ratingValue' 
// for item prop-name 'Blendmagic'
var data = result.items.map(function(value, key) {
  if (value.properties.name && value.properties.name[0] === props[0]) {
    var prop = value.properties.reviews[0].properties;
    var res = {},
      _props = {};
    _props[props[1]] = prop[props[1]];
    res[props[0]] = _props
    return res
  };
})[0];

console.log(data);
document.querySelector("pre").insertAdjacentHTML("beforebegin", "<pre>" + JSON.stringify(result, null, 2) + "<pre>");
<!DOCTYPE html>
<html>

<head>
</head>

<body>
  <div itemscope itemtype="http://schema/Offer">
    <span itemprop="name">Blendmagic</span>
    <span itemprop="price">$19.95</span>
    <div itemprop="reviews" itemscope itemtype="http://schema/AggregateRating">
      <img data-src="four-stars.jpg" />
      <meta itemprop="ratingValue" content="4" />
      <meta itemprop="bestRating" content="5" />Based on <span itemprop="ratingCount">25</span> user ratings
    </div>
  </div>
  <div itemscope itemtype="http://schema/Offer">
    <span itemprop="name">testMagic</span>
    <span itemprop="price">$10.95</span>
    <div itemprop="reviews" itemscope itemtype="http://schema/AggregateRating">
      <img data-src="four-stars.jpg" />
      <meta itemprop="ratingValue" content="4" />
      <meta itemprop="bestRating" content="5" />Based on <span itemprop="ratingCount">25</span> user ratings
    </div>
  </div>
</body>

</html>

See also Recursion and loops of Microdata items

Check this Fiddle

$("span[itemprop='name']").each(function(e) {
    if ($(arguments[1]).text() == 'Blendmagic') {
        alert($($("meta[itemprop='ratingValue']")[e]).attr('content'));       
    }    
});
发布评论

评论列表(0)

  1. 暂无评论