最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Scraping a webpage that is using a firebase database - Stack Overflow

programmeradmin1浏览0评论

DISCLAIMER: I'm just learning by doing, I have no bad intentions

So, I would like to fetch the list of the applications listed on this website:

I've done similar things in the past, but with simpler websites; this time I'm having problems getting my hands on the data behind this webpage.

The scrolling from page to page is blazing fast so, to understand how the webpage works, I've fired up a packet sniffer and analyzed the traffic. I've noticed that, after the initial loading, no traffic is exchanged between the server and my client, even if I scroll over 2500 records in the browser. How is that possible?

Anyhow. My understanding is that the website is loading the data from a stream of some sort, and render it via Javascript. Am I correct?

So, I've fired up chromium devtools a looked at the "network" tab, and saw that a WebSocket request is made to the following address: wss://s-usc1c-nss-123.firebaseio

At this point, after googling a bit, I've tried to query the very same server, using the "v=5&ns=roaringapps" query I saw on the devtools window:

from websocket import create_connection
ws = create_connection('wss://s-usc1c-nss-123.firebaseio')
ws.send('v=5&ns=roaringapps')
print json.loads(ws.recv())

And got this reply:

{u't': u'c', u'd': {u't': u'h', u'd': {u'h': u's-usc1c-nss-123.firebaseio', u's': u'JUL5t1nC2SXfGaIjwecB6G13j1OsmMVv', u'ts': 1476799051047L, u'v': u'5'}}}

I was expecting to see a json response with the raw data about applications & so on. What I'm doing wrong?

Thanks a lot!

UPDATE

Actually, I just found out that the website is using json to load its data. I was not seeing it in iterated requests probably because of caching - but disabling it in chromium did the trick.

DISCLAIMER: I'm just learning by doing, I have no bad intentions

So, I would like to fetch the list of the applications listed on this website: http://roaringapps./apps

I've done similar things in the past, but with simpler websites; this time I'm having problems getting my hands on the data behind this webpage.

The scrolling from page to page is blazing fast so, to understand how the webpage works, I've fired up a packet sniffer and analyzed the traffic. I've noticed that, after the initial loading, no traffic is exchanged between the server and my client, even if I scroll over 2500 records in the browser. How is that possible?

Anyhow. My understanding is that the website is loading the data from a stream of some sort, and render it via Javascript. Am I correct?

So, I've fired up chromium devtools a looked at the "network" tab, and saw that a WebSocket request is made to the following address: wss://s-usc1c-nss-123.firebaseio.

At this point, after googling a bit, I've tried to query the very same server, using the "v=5&ns=roaringapps" query I saw on the devtools window:

from websocket import create_connection
ws = create_connection('wss://s-usc1c-nss-123.firebaseio.')
ws.send('v=5&ns=roaringapps')
print json.loads(ws.recv())

And got this reply:

{u't': u'c', u'd': {u't': u'h', u'd': {u'h': u's-usc1c-nss-123.firebaseio.', u's': u'JUL5t1nC2SXfGaIjwecB6G13j1OsmMVv', u'ts': 1476799051047L, u'v': u'5'}}}

I was expecting to see a json response with the raw data about applications & so on. What I'm doing wrong?

Thanks a lot!

UPDATE

Actually, I just found out that the website is using json to load its data. I was not seeing it in iterated requests probably because of caching - but disabling it in chromium did the trick.

Share edited Oct 19, 2016 at 6:43 Delta asked Oct 18, 2016 at 14:11 DeltaDelta 3054 silver badges12 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 7

While the Firebase Database allows you to read/write JSON data. But its SDKs don't simply transfer the raw JSON data, they do many tricks on top of that to ensure an efficient and smooth experience. W

hat you're getting there is Firebase's wire protocol. The protocol is not publicly documented and (if you're new to it) trying to unravel it is going to give you an unpleasant time.

To retrieve the actual JSON at a location, it's easiest to use Firebase's REST API. You can get that by simply appending .json to the URL and firing a HTTP GET request against that.

So if the initial data is being loaded from:

https://mynamespace.firebaseio./path/to/data

You'd get the raw JSON by firing a HTTP GET against:

https://mynamespace.firebaseio./path/to/data.json
发布评论

评论列表(0)

  1. 暂无评论