最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Chrome extension history API not showing all results? - Stack Overflow

programmeradmin7浏览0评论

I am trying to use Chrome extension history API to get the history of the user according to the search term entered. But the search does not work correctly in some cases. For example when i enter the term "bi", no results are given but when i search "bit" some results are given but not all, i checked this by verifying it in chrome history search and it showed more results. Is this how the history API works or am i doing something wrong? Here is my code -

window.onload = function() {

function getHistory() {
  var list = document.getElementById('list');
  var box = document.getElementById("box").value;
  if (box === '') {
    list.innerHTML = '';
    list.innerHTML = list.innerHTML + 'Nothing To Search.';
  }
  else {
    var microseconds = 1000 * 60 * 60 * 24 * 365 * 45;
    var start = (new Date).getTime() - microseconds;
  chrome.history.search({text: box, startTime: 0, maxResults: 50000}, function(data) {
    if(Object.keys(data).length === 0) {
    list.innerHTML = '';
      list.innerHTML = list.innerHTML + 'Nothing Found.';
    }
    else {
      list.innerHTML = '';
        data.forEach(function(page) {
        list.innerHTML = list.innerHTML + '<li><p>'+page.title+'</p> <a href='+page.url+' target="_blank"><p>'+page.url+'</p></a></li> <hr>';
    });
   }
  });
 }
}

document.getElementById('search').onclick = getHistory;
}

Thank you.

I am trying to use Chrome extension history API to get the history of the user according to the search term entered. But the search does not work correctly in some cases. For example when i enter the term "bi", no results are given but when i search "bit" some results are given but not all, i checked this by verifying it in chrome history search and it showed more results. Is this how the history API works or am i doing something wrong? Here is my code -

window.onload = function() {

function getHistory() {
  var list = document.getElementById('list');
  var box = document.getElementById("box").value;
  if (box === '') {
    list.innerHTML = '';
    list.innerHTML = list.innerHTML + 'Nothing To Search.';
  }
  else {
    var microseconds = 1000 * 60 * 60 * 24 * 365 * 45;
    var start = (new Date).getTime() - microseconds;
  chrome.history.search({text: box, startTime: 0, maxResults: 50000}, function(data) {
    if(Object.keys(data).length === 0) {
    list.innerHTML = '';
      list.innerHTML = list.innerHTML + 'Nothing Found.';
    }
    else {
      list.innerHTML = '';
        data.forEach(function(page) {
        list.innerHTML = list.innerHTML + '<li><p>'+page.title+'</p> <a href='+page.url+' target="_blank"><p>'+page.url+'</p></a></li> <hr>';
    });
   }
  });
 }
}

document.getElementById('search').onclick = getHistory;
}

Thank you.

Share Improve this question asked Feb 1, 2016 at 17:11 doctorsherlockdoctorsherlock 1,3744 gold badges20 silver badges42 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 9

I'm seeing the same behaviour with an extension that I am writing. It is really quite annoying, so I went digging through the Chromium source code to find out what its really doing to match the history results.

Short answer: It seems from the source code that this behaviour is intended, so if we want to retrieve all matches to a text query we are stuck with retrieving all of the history results and searching for matches ourselves in JavaScript. On a side note, don't forget to double-check the start/end times, and make sure your 'maxResults' property is large enough, as mistaken values for any of these properties will likely give you unexpected results.

Long answer

DISCLAIMER: I don't have much C++ experience, so please correct my assessment if it is wrong.

The following function (in history_backend) is eventually called after you call chrome.history.search with a non-empty text query.

    bool URLDatabase::GetTextMatchesWithAlgorithm(
    const base::string16& query,
    query_parser::MatchingAlgorithm algorithm,
    URLRows* results) {
  query_parser::QueryNodeVector query_nodes;
  query_parser_.ParseQueryNodes(query, algorithm, &query_nodes);

  results->clear();
  sql::Statement statement(GetDB().GetCachedStatement(SQL_FROM_HERE,
      "SELECT" HISTORY_URL_ROW_FIELDS "FROM urls WHERE hidden = 0"));

  while (statement.Step()) {
    query_parser::QueryWordVector query_words;
    base::string16 url = base::i18n::ToLower(statement.ColumnString16(1));
    query_parser_.ExtractQueryWords(url, &query_words);
    GURL gurl(url);
    if (gurl.is_valid()) {
      // Decode punycode to match IDN.
      base::string16 ascii = base::ASCIIToUTF16(gurl.host());
      base::string16 utf = url_formatter::IDNToUnicode(gurl.host());
      if (ascii != utf)
        query_parser_.ExtractQueryWords(utf, &query_words);
    }
    base::string16 title = base::i18n::ToLower(statement.ColumnString16(2));
    query_parser_.ExtractQueryWords(title, &query_words);

    if (query_parser_.DoesQueryMatch(query_words, query_nodes)) {
      URLResult info;
      FillURLRow(statement, &info);
      if (info.url().is_valid())
        results->push_back(info);
    }
  }
  return !results->empty();
}

The algorithm query_parser::MatchingAlgorithm passed into this function refers to the enum shown below (from query_parser.h), and is never explicitly set from what I can tell, so it will be the DEFAULT value.

enum class MatchingAlgorithm {
  // Only words long enough are considered for prefix search. Shorter words are
  // considered for exact matches.
  DEFAULT,
  // All words are considered for a prefix search.
  ALWAYS_PREFIX_SEARCH,
};

Read the ment above the DEFAULT option -

"Only words long enough are considered for prefix search. Shorter words are considered for exact matches"

The algorithm itself (query_parser) breaks down your text query and the raw URL results into lists of "words" separated by spaces or punctuation, and checks for 'prefix matches' between each pair. This explains why if you have several pages in your history with the text "chromium" in the URL, you will get no results if you search for "hromium", but you'll get all of them if you search for "chro".

In your case, I think the search "bi" returns no results because the algorithm only looks for exact word matches for short terms, meaning that "bi" would need to be surrounded by white space or punctuation in the URL/title. This is confirmed if you do a google search for "bi", then query the history again for "bi". The google search history item will be matched since in the URL of the google search the "bi" is surrounded by punctuation and white space:

https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=bi

Sources

  • Chromium source code that is searchable
  • history_types.h - enum for algorithm
  • query_parser - algorithm itself
  • history_service - called from Javascript
  • history_backend - called from history service

chrome.history.search doesn't necessarily mean all pages will be retrieved. The documentation states it will search for the last visit time of each page that matches the query. This may be the reason as to why it looks inplete.

As to why there's no result when there's 2 characters and some results returned when there's 3 characters, I can't be certain. It might be due to the other parameters set such as startTime. It should have an epoch time value and setting it to 0 will try to search since 1970 (this may be what you intend to do).

发布评论

评论列表(0)

  1. 暂无评论