I just start to using java script and I want to fetch metadata from the URL ... when enter any URL into the input field ,it has to pull meta data from it, this is the basic usage using in html java-script when executing code throwing error
I am searching any alternatives to this, but nothing helps. Please provide any idea how to achieve the functionality.
<!DOCTYPE html>
<html>
<body>
<head>
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML5,CSS,JavaScript">
<meta name="author" content="John Doe">
<meta content=".ico">
</head>
<p>Click the button to return the value of the content attribute of all meta elements.</p>
<button onclick="myFunction()">Try it</button>
<p id="demo"></p>
<script>
function myFunction() {
var x = "/"
// var x = document.getElementsByTagName("META");
var txt = "";
var i;
for (i = 0; i < x.length; i++) {
txt = txt + "Content of "+(i+1)+". meta tag: "+x[i].content+"<br>";
}
document.getElementById("demo").innerHTML = txt;
}
</script>
</body>
</html>
I just start to using java script and I want to fetch metadata from the URL ... when enter any URL into the input field ,it has to pull meta data from it, this is the basic usage using in html java-script when executing code throwing error
I am searching any alternatives to this, but nothing helps. Please provide any idea how to achieve the functionality.
<!DOCTYPE html>
<html>
<body>
<head>
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML5,CSS,JavaScript">
<meta name="author" content="John Doe">
<meta content="http://stackoverflow./favicon.ico">
</head>
<p>Click the button to return the value of the content attribute of all meta elements.</p>
<button onclick="myFunction()">Try it</button>
<p id="demo"></p>
<script>
function myFunction() {
var x = "https://www.amazon.in/"
// var x = document.getElementsByTagName("META");
var txt = "";
var i;
for (i = 0; i < x.length; i++) {
txt = txt + "Content of "+(i+1)+". meta tag: "+x[i].content+"<br>";
}
document.getElementById("demo").innerHTML = txt;
}
</script>
</body>
</html>
Share
Improve this question
edited Jul 28, 2020 at 14:25
User 28
5,1681 gold badge21 silver badges35 bronze badges
asked Feb 28, 2020 at 5:11
ats demoats demo
711 gold badge1 silver badge7 bronze badges
5
- 1 you will need to extract html and then split on <meta> and </meta> substring – Himanshu Commented Feb 28, 2020 at 5:15
- What output you want? – Krupal Panchal Commented Feb 28, 2020 at 5:39
- @KrupalPanchal output :title,description,logo or favicon – ats demo Commented Feb 29, 2020 at 4:01
- You can retrieve metadata from a page's HTTP header fields, including the page's size and the date when it was last modified. – Anderson Green Commented Apr 19, 2024 at 20:02
- see stackoverflow./a/78907991/9303782 – Hein Soe Commented Aug 28, 2024 at 15:22
3 Answers
Reset to default 6I guess you are trying to build metadata scraper using javascript, if not wrong.
You need to take into consideration CORS policy before proceeding further, while requesting data from any url.
Reference URL:
- https://developer.mozilla/en-US/docs/Web/HTTP/CORS
- https://developer.mozilla/en-US/docs/Web/HTTP/CORS/Errors
JSFiddle: http://jsfiddle/pgrmL73h/
Have demonstrated, how you can fetch the meta tags from URL given. For demo purpose, I have used https://jsfiddle/ url for fetching the meta tags, you can change it as per your need.
Followed below steps to retrieve the META tags from website.
For retrieving page source from any website url, first you need to access that website. Using jquery AJAX method you can do it.
Reference URL: https://api.jquery./jquery.ajax/Used $.parseHTML method from jQuery which helps to retrieve DOM elements from html string.
Reference URL: https://api.jquery./jquery.parsehtml/Once the AJAX request retrieves page source successfully, we are checking each DOM element from the page source & filtered the META nodes as per our need & stored the data inside a "txt" variable.
E.G.: Tags like keyword, description will be retrieved.
- Once the AJAX request pleted, we are displaying the details of the variable "txt" inside a paragraph tag.
JS Code:
function myFunction() {
var txt = "";
document.getElementById("demo").innerHTML = txt;
// sample url used here, you can make it more dynamic as per your need.
// used AJAX here to just hit the url & get the page source from those website. It's used here like the way CURL or file_get_contents (https://www.php/manual/en/function.file-get-contents.php) from PHP used to get the page source.
$.ajax({
url: "https://jsfiddle/",
error: function() {
txt = "Unable to retrieve webpage source HTML";
},
success: function(response){
// will get the output here in string format
// used $.parseHTML to get DOM elements from the retrieved HTML string. Reference: https://api.jquery./jquery.parsehtml/
response = $.parseHTML(response);
$.each(response, function(i, el){
if(el.nodeName.toString().toLowerCase() == 'meta' && $(el).attr("name") != null && typeof $(el).attr("name") != "undefined"){
txt += $(el).attr("name") +"="+ ($(el).attr("content")?$(el).attr("content"):($(el).attr("value")?$(el).attr("value"):"")) +"<br>";
console.log($(el).attr("name") ,"=", ($(el).attr("content")?$(el).attr("content"):($(el).attr("value")?$(el).attr("value"):"")), el);
}
});
},
plete: function(){
document.getElementById("demo").innerHTML = txt;
}
});
}
You can use open-graph-scraper
for this, see this answer for details.
After fetching the text/HTML using fetch
method, DOM can be parsed by createElement
method. Later the live dom can be parsed by pure javascript to access the DOM elements. Like for meta
or link
tag, my approach is as below:
let html = "";
fetch("https://stackoverflow./posts/77197873").then(r=>r.text()).then(r=>{html=r});
let tmp = document.createElement("div")
tmp.innerHTML = html;
tmp.querySelectorAll('meta').forEach(met => console.log(met.name));
Using document.querySelector
or document.querySelectorAll
DOM element can be accessed and element's attribute can be accessed by element.getAttribute
method.