最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

ajax - Fetch random excerpt from Wikipedia (Javascript, client-only) - Stack Overflow

programmeradmin0浏览0评论

I have a web page that asks the user for a paragraph of text, then performs some operation on it. To demo it to lazy users, I'd like to add an "I feel lucky" button that will grab some random text from Wikipedia and populate the inputs.

How can I use Javascript to fetch a sequence of text from a random Wikipedia article?

I found some examples of fetching and parsing articles using the Wikipedia API, but they tend to be server side. I'm looking for a solution that runs entirely from the client and doesn't get scuppered by same origin policy.

Note random gibberish is not sufficient; I need human-readable sentences that make sense.

I have a web page that asks the user for a paragraph of text, then performs some operation on it. To demo it to lazy users, I'd like to add an "I feel lucky" button that will grab some random text from Wikipedia and populate the inputs.

How can I use Javascript to fetch a sequence of text from a random Wikipedia article?

I found some examples of fetching and parsing articles using the Wikipedia API, but they tend to be server side. I'm looking for a solution that runs entirely from the client and doesn't get scuppered by same origin policy.

Note random gibberish is not sufficient; I need human-readable sentences that make sense.

Share Improve this question edited May 23, 2017 at 12:34 CommunityBot 11 silver badge asked Mar 8, 2013 at 12:08 rkagererrkagerer 4,2842 gold badges28 silver badges30 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 12

My answer builds on the technique suggested here.

The tricky part is formulating the correct query string:

http://en.wikipedia/w/api.php?action=query&generator=random&prop=extracts&exchars=500&format=json&callback=onWikipedia

  • generator=random selects a random page
  • prop=extracts and exchars=500 retrieves a 500 character extract
  • format=json returns JSON-formatted data
  • callback= causes that data to be wrapped in a function call so it can be treated like any other <script> and injected into your page (see JSONP), thus bypassing cross-domain barriers.
  • requestid can optionally be added, with a new value each time, to avoid stale results from the browser cache (required in IE9)

The page served by the query is something that looks like this (I've added whitespace for readability):

onWikipedia(
  {"query":
    {"pages":
      {"12362520":
        {"pageid":12362520,
         "ns":0,
         "title":"Power Building",
         "extract":"<p>The <b>Power Building<\/b> is a historic mercial building in
                    the downtown of Cincinnati, Ohio, United States. Built in 1903, it
                    was designed by Harry Hake. It was listed on the National Register
                    of Historic Places on March 5, 1999. One week later, a group of
                    buildings in the northeastern section of downtown was named a
                    historic district, the Cincinnati East Manufacturing and Warehouse
                    District; the Power Building is one of the district's contributing
                    properties.<\/p>\n<h2> Notes<\/h2>"
  } } } }
)

Of course you'll get a different article each time.

Here's a full, working example which you can try out on JSBin.

<HTML><BODY>

  <p><textarea id="textbox" style="width:350px; height:150px"></textarea></p>
  <p><button type="button" id="button" onclick="startFetch(100, 500)">
    Fetch random Wikipedia extract</button></p>

  <script type="text/javascript">

    var textbox = document.getElementById("textbox");
    var button = document.getElementById("button");
    var tempscript = null, minchars, maxchars, attempts;

    function startFetch(minimumCharacters, maximumCharacters, isRetry) {
      if (tempscript) return; // a fetch is already in progress
      if (!isRetry) {
        attempts = 0;
        minchars = minimumCharacters; // save params in case retry needed
        maxchars = maximumCharacters;
        button.disabled = true;
        button.style.cursor = "wait";
      }
      tempscript = document.createElement("script");
      tempscript.type = "text/javascript";
      tempscript.id = "tempscript";
      tempscript.src = "http://en.wikipedia/w/api.php"
        + "?action=query&generator=random&prop=extracts"
        + "&exchars="+maxchars+"&format=json&callback=onFetchComplete&requestid="
        + Math.floor(Math.random()*999999).toString();
      document.body.appendChild(tempscript);
      // onFetchComplete invoked when finished
    }

    function onFetchComplete(data) {
      document.body.removeChild(tempscript);
      tempscript = null
      var s = getFirstProp(data.query.pages).extract;
      s = htmlDecode(stripTags(s));
      if (s.length > minchars || attempts++ > 5) {
        textbox.value = s;
        button.disabled = false;
        button.style.cursor = "auto";
      } else {
        startFetch(0, 0, true); // retry
      }
    }

    function getFirstProp(obj) {
      for (var i in obj) return obj[i];
    }

    // This next bit borrowed from Prototype / hacked together
    // You may want to replace with something more robust
    function stripTags(s) {
      return s.replace(/<\w+(\s+("[^"]*"|'[^']*'|[^>])+)?>|<\/\w+>/gi, "");
    }
    function htmlDecode(input){
      var e = document.createElement("div");
      e.innerHTML = input;
      return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
    }

  </script>

</BODY></HTML>

One downside of generator=random is you often get talk pages or generated content that are not actual articles. If anyone can improve the query string to limit it to quality articles, that would be great!

发布评论

评论列表(0)

  1. 暂无评论