最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

web scraping - Connect to socket.io xhr request with python - Stack Overflow

programmeradmin6浏览0评论

I'm trying to retrieve some data from here, namely games and odds. I know the data is in the response of this GET request as shown in the network tab below:

However we can see that there is some websocket protocol and I'm not sure how to handle this.

I should mention I'm new to python (usually coding in R) and websockets but I've managed to find the socketio path in the code elements so here is what I've tried :

import socketio

sio = socketio.Client(logger=True, engineio_logger=True)

@sio.event
def connect():
  print('connected!')  
  sio.emit('add user', 'Testing')
  
@sio.event
def print_message(sid):
    print("Socket ID: " , sid)

@sio.event
def disconnect():
  print('disconnected!')
  
sio.connect('',transports=['websocket'], socketio_path = '/uof-sports-server/socket.io')
  sio.wait()

I'm able to connect but I'm not sure where to go next and get the actual response from the GET request above.

Any hints appreciated

I'm trying to retrieve some data from here, namely games and odds. I know the data is in the response of this GET request as shown in the network tab below:

However we can see that there is some websocket protocol and I'm not sure how to handle this.

I should mention I'm new to python (usually coding in R) and websockets but I've managed to find the socketio path in the code elements so here is what I've tried :

import socketio

sio = socketio.Client(logger=True, engineio_logger=True)

@sio.event
def connect():
  print('connected!')  
  sio.emit('add user', 'Testing')
  
@sio.event
def print_message(sid):
    print("Socket ID: " , sid)

@sio.event
def disconnect():
  print('disconnected!')
  
sio.connect('https://sports-eu-west-3.winamax.fr',transports=['websocket'], socketio_path = '/uof-sports-server/socket.io')
  sio.wait()

I'm able to connect but I'm not sure where to go next and get the actual response from the GET request above.

Any hints appreciated

Share Improve this question asked Mar 20 at 13:52 M.OM.O 5092 silver badges11 bronze badges 3
  • These requests usually come with some form of authentication. Check the headers! – Klaus D. Commented Mar 20 at 13:58
  • I’ve already tried using get requests with all the required headers but it returns status 400. – M.O Commented Mar 20 at 15:56
  • Well, many site try to prevent exactly what you doing and have measures against it in place. – Klaus D. Commented Mar 20 at 17:54
Add a comment  | 

1 Answer 1

Reset to default 1

I believe you were quite close, just need to emit events that are also known by the other side. Most of the data exchange there goes through "m" events.

I didn't test with current socketio, but according to Version compatibility table we should use v4.x here. Target Socket.IO version is probably v2.5.0, guessed from the header of bundled uof-sports-server/socket.io/socket.io.js

# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "python-socketio[client]<5.0",
# ]
# ///
import socketio
import pprint
import uuid

sio = socketio.Client(
    logger=True,
    # engineio_logger=True
)
requestId = str(uuid.uuid4())

# connect & emit "m" event
@sio.event
def connect():
    print("connected!")
    data = dict(route="tournament:4", requestId=requestId)
    print("sending", data)
    sio.emit("m", data)

# wait for "m" event with matching requestId
@sio.on("m")
def m_response(data):
    if data.get("requestId") == requestId:
        pprint.pp(data.keys())
        pprint.pp([match["title"] for match in data["matches"].values()])
    sio.disconnect()

@sio.event
def disconnect():
    print("disconnected!")

sio.connect(
    url="https://sports-eu-west-3.winamax.fr",
    transports=["websocket"],
    socketio_path="/uof-sports-server/socket.io/",
)
sio.wait()

( you can use uv to resolve dependencies from script's inline metadata )

$ uv run winamax_socketio.py
Engine.IO connection established
Namespace / is connected
connected!
sending {'route': 'tournament:4', 'requestId': '6e9ee3d4-0bcb-45f4-ab6c-652379f234cb'}
Emitting event "m" [/]
Received event "m" [/]
dict_keys(['tournaments', 'matches', 'bets', 'outcomes', 'odds', 'requestId'])
['Angers - Rennes',
 'Auxerre - Montpellier',
 'Le Havre - Nantes',
 'Strasbourg - Lyon',
 'Toulouse - Brest',
 'Saint-Étienne - Paris SG',
 'Reims - Marseille',
 'Lille - Lens',
 'Monaco - Nice',
 'Marseille - Toulouse',
 'Nice - Nantes',
 'Brest - Monaco',
 'Montpellier - Le Havre',
 'Lyon - Lille',
 'Lens - Saint-Étienne',
 'Paris SG - Angers',
 'Reims - Strasbourg',
 'Rennes - Auxerre',
 "Ligue 1 McDonald's® 2024/25"]
Engine.IO connection dropped

To help with such tasks and to check communication flows against know working examples you might want to look into debugging proxies (mitmproxy, Telerik Fiddler, HTTP Toolkit, ...).

发布评论

评论列表(0)

  1. 暂无评论