最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

airflow - Why would TaskInstance rendered_fields not get returned from dags{dag_id}dagRuns{dag_run_id}taskInstances{task_id}? -

programmeradmin0浏览0评论

It is my understanding (which may be mistaken) that the a call to API endpoint

  • api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}

on airflow's API contains a rendered_fields attribute which provides the same information as visiting

  • rendered-templates?dag_id=DAG_ID&task_id=TASK_ID&execution_date=URL_ENCODED_DAG_RUN_ID

in the UI. This seems to hold true in all but one of the task instances I have called this API endpoint for. For example, from a DAG run yesterday:

I can get the same information via the API:

DAG_RUN_ID=manual__2025-02-05T22:18:50.997520+00:00
TASK_ID=wave-splitting
URL=${BASE_URL}/api/v1/dags/${DAG_ID}/dagRuns/${DAG_RUN_ID}/taskInstances/${TASK_ID}
curl -X GET \                   
  -u ${AIRFLOW_API_USERNAME}:${AIRFLOW_API_PASSWORD} \
  $URL | jq "{dag_run_id, arguments: .rendered_fields.arguments, env_vars: .rendered_fields.env_vars}"

which returns

{
  "dag_run_id": "manual__2025-02-05T22:18:50.997520+00:00",
  "arguments": [
    "python",
    "markets/REDACTEDwave_splitting.py",
    "--dag_id",
    "REDACTED",
    "--run_id",
    "manual__2025-02-05T22:18:50.997520+00:00",
    "--log_url",
    ".997520%2B00%3A00&task_id=wave-splitting&dag_id=REDACTED&map_index=-1",
    "--excluded_customer_ids",
    "",
    "--excluded_subscription_ids",
    "",
    "--run_at",
    "",
    "--dl_type",
    "pdl",
    "--ordinal",
    "6"
  ],
  "env_vars": "[{'name': 'AWS_AU_ACCESS_KEY_ID',\n 'value': 'REDACTED',\n 'value_from': None}, {'name': 'AWS_AU_SECRET_ACCESS_KEY',\n 'value': 'REDACTED',\n 'value_from': None}, {'name': 'AWS_DEFAULT_REGION', 'value': 'eu-west-1', 'value_from': None}, {'name': 'DB_DSN',\n 'value': 'postgresql://REDFACTED-staging-master.cr0REDACTEDrds.amazonaws:5432/katana',\n 'value_from': None}, {'name': 'ENVIRONMENT_NAME', 'value': 'staging', 'value_from': None}, {'name': 'GCP_MARVIN_USER_ACCOUNT',\n 'value': '{\"refresh_token\"}}]
}

As I think you can tell... they return the same thing.

However, there is a DAG run from 21st January 2025 for which this is not true. Here is what the UI shows for the same task in that DAG run:

running the same request for that DAG run:

DAG_RUN_ID=manual__2025-01-21T09:56:44.625312+00:00
URL=${BASE_URL}/api/v1/dags/${DAG_ID}/dagRuns/${DAG_RUN_ID}/taskInstances/${TASK_ID}
curl -X GET \
  -u ${AIRFLOW_API_USERNAME}:${AIRFLOW_API_PASSWORD} \
  $URL | jq "{dag_run_id, arguments: .rendered_fields.arguments, env_vars: .rendered_fields.env_vars}"

returns:

{
  "dag_run_id": "manual__2025-01-21T09:56:44.625312+00:00",
  "arguments": null,
  "env_vars": null
}

I find this to be very odd, there's basically nothing in rendered_fields as I can prove by showing the entire response:

❯ curl -X GET \
  -u ${AIRFLOW_API_USERNAME}:${AIRFLOW_API_PASSWORD} \
  $URL
{
  "dag_id": "REDACTED",
  "dag_run_id": "manual__2025-01-21T09:56:44.625312+00:00",
  "duration": 26.226917,
  "end_date": "2025-01-21T10:02:55.236799+00:00",
  "execution_date": "2025-01-21T09:56:44.625312+00:00",
  "executor_config": "{}",
  "hostname": "REDACTED-wave-splitting-9fu4md94",
  "map_index": -1,
  "max_tries": 3,
  "note": null,
  "operator": "KubernetesPodOperator",
  "pid": 27,
  "pool": "default_pool",
  "pool_slots": 1,
  "priority_weight": 10,
  "queue": "default",
  "queued_when": "2025-01-21T10:02:07.879187+00:00",
  "rendered_fields": {},
  "sla_miss": null,
  "start_date": "2025-01-21T10:02:29.009882+00:00",
  "state": "success",
  "task_id": "wave-splitting",
  "trigger": null,
  "triggerer_job": null,
  "try_number": 1,
  "unixname": "airflow"
}

Notice "rendered_fields": {}

Why would the API return different information for two instances of the same task in different DAG runs?

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论