最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Detect social bots in Node Express - Stack Overflow

programmeradmin2浏览0评论

I'm trying to detect for either of the following 2 options:

  • A specific list of bots (FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider)
  • Any bots that don't support the Crawable Ajax Specification

I've seen similar questions (How to recognize Facebook User-Agent) but nothing that explains how to do this in Node and Express.

I need to do this in a format like this:

app.get("*", function(req, res){ 
  if (is one of the bots) //serve snapshot
  if (is not one of the bots) res.sendFile(__dirname + "/public/index.html");
});

I'm trying to detect for either of the following 2 options:

  • A specific list of bots (FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider)
  • Any bots that don't support the Crawable Ajax Specification

I've seen similar questions (How to recognize Facebook User-Agent) but nothing that explains how to do this in Node and Express.

I need to do this in a format like this:

app.get("*", function(req, res){ 
  if (is one of the bots) //serve snapshot
  if (is not one of the bots) res.sendFile(__dirname + "/public/index.html");
});
Share Improve this question edited May 23, 2017 at 12:25 CommunityBot 11 silver badge asked Mar 25, 2015 at 16:51 CaribouCodeCaribouCode 14.4k33 gold badges111 silver badges183 bronze badges
Add a comment  | 

3 Answers 3

Reset to default 11

You can check the header User-Agent in the request object and test its value for different bots,

As of now, Facebook says they have three types of User-Agent header values ( check The Facebook Crawler ), Also twitter has a User-Agent with versions ( check Twitter URL Crawling & Caching ), the below example should cover both bots.

Node

var http = require('http');
var server = http.createServer(function(req, res){

    var userAgent = req.headers['user-agent'];
    if (userAgent.startsWith('facebookexternalhit/1.1') ||
       userAgent === 'Facebot' ||
       userAgent.startsWith('Twitterbot') {

        /* Do something for the bot */
    }
});

server.listen(8080);

Express

var http = require('http');
var express = require('express');
var app = express();

app.get('/', function(req, res){

    var userAgent = req.headers['user-agent'];
    if (userAgent.startsWith('facebookexternalhit/1.1') ||
       userAgent === 'Facebot' ||
       userAgent.startsWith('Twitterbot') {

        /* Do something for the bot */
    }
});

app.listen(8080);

What you can do is use the request.headers object to check if the incoming request contains any UA information specific to that bot. A simple example.

Node

var http = require('http');

var server = http.createServer(function(req, res){

    if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */


});

server.listen(8080);

Express

var http = require('http');
var express = require('express');
var app = express();

app.get('/', function(req, res){

    if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */


});

app.listen(8080);

This node express middleware will analyze a bunch of different user agent strings and give you just a "bot==true" or "desktop==true" way to determine. I haven't used it and the readme sounds like it was just a trial project so I don't know how maintained it will be going forward, but it will detect all sorts of bots.

https://github.com/rguerreiro/express-device

发布评论

评论列表(0)

  1. 暂无评论