Windguru offers a service of providing acurate wind speed and wind direction at certain sites of interest to windsurfres and kitesurfers, an example being "". I would like to be able to extract these two variables automatically, perhaps using "wget" or a similar application, generating a local file containing their values.
Inspecting the source file at the above URL it appears to me that it in turn accesses the URL ".spot.min.js" which provides for the values I am looking for. However the source of the later javascript file is so terribly, and probably also intentionally plicated that I cannot make anything of it.
Windguru offers a service of providing acurate wind speed and wind direction at certain sites of interest to windsurfres and kitesurfers, an example being "http://www.windguruspot.cz/254". I would like to be able to extract these two variables automatically, perhaps using "wget" or a similar application, generating a local file containing their values.
Inspecting the source file at the above URL it appears to me that it in turn accesses the URL "http://www.windguru.cz/js/pak/wgs.spot.min.js" which provides for the values I am looking for. However the source of the later javascript file is so terribly, and probably also intentionally plicated that I cannot make anything of it.
Share Improve this question asked Dec 16, 2015 at 20:37 RuyRuy 5691 gold badge7 silver badges14 bronze badges 5- They don't want you to: windguru.cz/int/help_index.php?sec=terms – Bitwise Creative Commented Dec 16, 2015 at 20:41
- I am not a specialist in the legal aspects of the web, but in case I am proposing something illegal, please ignore my question. On the other hand I am curious to understand to what extent a website has the right to limit the ways in which people use the information they provide, especially with regards to the sentence "It is forbidden to download website content by automated scripts". – Ruy Commented Dec 16, 2015 at 23:34
- Regardless, they don't want you scraping data from their site. It's not an API service. You could end up doing all the work to get the data, then they change one little thing and your data scrape is broken. It might be much better (and easier) for you to look into weather API services that have the data you need. Example: developer.yahoo./weather – Bitwise Creative Commented Dec 17, 2015 at 4:51
- 1 Windsurfers need better weather reports than those provided by the usual weather services. An anemometer placed on a post on the water as close as possible to the sailing site, reporting wind direction as well as average, min and max wind speed is what we usually need. But, alas, maybe I am asking for too much. In any case, thanks very much "Bitwise Creative", your ments were very helpful to illustrate the hurdles faced by my project. – Ruy Commented Dec 18, 2015 at 0:33
- After benefiting from @Javier 's answer till now the API site now refuses my connections so I am back at square one. I would therefore reformulate my question as to how could I automatically dump ALL of the information you get when you access a site such as "beta.windguru.cz/station/166"; in a regular browser. Mind you that wget does not follow all internal links in that page, some of which lead to the most important information I'm looking for which is wind speed. – Ruy Commented Sep 9, 2016 at 16:48
4 Answers
Reset to default 5API Documentation
Example
Returns:
{"wind_avg":10.87,"wind_max":11.84,"wind_min":9.9,"wind_direction":135,"temperature":26.1,"mslp":1028.6,"rh":65,"datetime":"2015-12-30 15:02:33 ART","unixtime":1451498553,"error_details":""}
After some more research I finally found a partial solution to my question which may be worth sharing. Using the utility wkhtmltopdf (http://wkhtmltopdf) I can generate a pdf file from the page I am interested in (https://beta.windguru.cz/station/166), and in turn I can use pdftotext to extract the info I need. Unfortunately wkhtmltopdf still needs some polishing before I can use it: apparently, besides the main version there is also a static version. The main version works well enough to enable me to get part of the info (wind speed, but not wind direction) but it only works under X. The static version, on the other hand, runs in a normal terminal but the generated pdf file lacks all relavant information.
This run
#!/usr/bin/python
#
# sudo apt install BeautifulSoup
# sudo apt-get install python-paramiko
# sudo apt-get install python-pip
# sudo pip install requests --upgrade
import requests
import urllib
from bs4 import BeautifulSoup
import re
import sys
import os
import subprocess
import datetime
import paramiko
import getpass
import signal
import sys
import os
station = sys.argv[1]
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko'}
from lxml import etree
url1='https://www.windguru.cz/station/' + station
url2='https://www.windguru.cz/int/iapi.php?q=station_data_current&id_station=' + station
headers = {'Referer' : url1 }
r = requests.get( url2, headers = headers).json()
speed=r['wind_avg']
direction=r['wind_direction']
print "{:.1f}".format(speed), "{:.1f}".format(direction)
sys.exit()
This is more simple
#!/usr/bin/python
import requests
import sys
station = sys.argv[1]
url1='https://www.windguru.cz/station/' + station
url2='https://www.windguru.cz/int/iapi.php?q=station_data_current&id_station=' + station
headers = {'Referer' : url1 }
r = requests.get( url2, headers = headers).json()
speed=r['wind_avg']
direction=r['wind_direction']
print "{:.1f}".format(speed), "{:.1f}".format(direction)
sys.exit()
idea get here can't scape a value from Beautifulsoup in python