最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pandas - I am working with AIS data in python and i have a problem - Stack Overflow

programmeradmin6浏览0评论

This code below does various calculations and it works pretty well to filter the data within the polygon, however most of the data is not accurate because how AIS works is each ship is assigned a unique track ID and the ship emits tracking data every half a second or something. I am working on southward moving ships and the problem is some ships pass the polygon twice and so both the north flowing and south flowing data is used to calculate the total time and total distance but then one cannot filter out all the north entries in the dataset because even if a ship moves south waves can push it backwards thus registering a latitude that makes the entry look like a north movement and so removing every north movement will alter the data too much, any suggestions on what to do?

Here is the code:

    from shapely.geometry import Point, Polygon
    import pandas as pd
    from geopy.distance import geodesic

    # Define the polygon (longitude, latitude)
    polygon_coords = [
     (3.816744, 51.343359),  # Top-left near pier
     (3.8215, 51.3434),      # Top-right near pier
     (3.821582, 51.331171),  # Bottom-right (southernmost point)
     (3.819723, 51.331171),  # Bottom-left (southernmost point)
    ]

    # Create a Polygon object
    polygon = Polygon(polygon_coords)

    # Filter dataset to keep only points inside the polygon
    ais_end_south = ais_one_day[
        ais_one_day.apply(
            lambda row: polygon.contains(Point(row["longitude.value"], 
    row["latitude.value"])), 
    axis=1
        )
    ].copy()

    # Ensure timestamp is in datetime format
   ais_end_south["timestamp"] = pd.to_datetime(ais_end_south["timestamp"], 
    errors="coerce")

    # Sort by track_id and timestamp
    ais_end_south = ais_end_south.sort_values(by=["track_id", "timestamp"])

    # Compute next latitude, longitude, and timestamp
    ais_end_south["next_lat"] = ais_end_south.groupby("track_id") 
    ["latitude.value"].shift(-1)
    ais_end_south["next_lon"] = ais_end_south.groupby("track_id") 
    ["longitude.value"].shift(-1)
    ais_end_south["next_timestamp"] = ais_end_south.groupby("track_id") 
    ["timestamp"].shift(-1)

    # Remove all northward movements
    ais_end_south = ais_end_south[
    ais_end_south["latitude.value"] > ais_end_south["next_lat"]
    ]

    # Find the southernmost latitude within the polygon
    southernmost_lat = min(p[1] for p in polygon_coords)

    # Function to compute haversine distance
    def haversine_distance(lat1, lon1, lat2, lon2):
    if pd.notnull(lat1) and pd.notnull(lon1) and pd.notnull(lat2) and pd.notnull(lon2):
    return geodesic((lat1, lon1), (lat2, lon2)).km
     return 0

   # Compute segment distance
   ais_end_south["segment_distance_km"] = ais_end_south.apply(
   lambda row: haversine_distance(row["latitude.value"], row["longitude.value"], 
                               row["next_lat"], row["next_lon"]),
   axis=1
   )

   # Compute time difference but stop tracking after the southernmost point
  ais_end_south["time_diff_sec"] = ais_end_south.apply(
  lambda row: (row["next_timestamp"] - row["timestamp"]).total_seconds()
  if row["latitude.value"] > southernmost_lat else 0, 
   axis=1
   )

  # Compute total southward travel time (before reaching the southernmost point)
  ais_end_south["total_time_min"] = ais_end_south.groupby("track_id") 
  ["time_diff_sec"].transform("sum") / 60

  # Compute total southward travel distance
  ais_end_south["total_distance_km"] = ais_end_south.groupby("track_id") 
  ["segment_distance_km"].transform("sum")

  # Filter out any movements **after** the ship reaches the southernmost latitude
  ais_end_south = ais_end_south[ais_end_south["latitude.value"] > southernmost_lat]

  # Reset index
  ais_end_south = ais_end_south.reset_index(drop=True)
发布评论

评论列表(0)

  1. 暂无评论