最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

azure devops - CalledProcessError when running dbt in Databricks - Stack Overflow

programmeradmin3浏览0评论

I'm trying to build/schedule a dbt project (sourced in Azure DevOps) in Databricks Workflows. However, whenever I run dbt there, I get the following error message:

CalledProcessError: Command 'b'\nmkdir -p "/tmp/tmp-dbt-run-1124228490001263"\nunexpected_errors="$(cp -a -u "/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/." "/tmp/tmp-dbt-run-1124228490001263" 2> >(grep -v \'Operation not supported\'))"\nif [[ -n "$unexpected_errors" ]]; then\n  >&2 echo -e "Unexpected error(s) encountered while copying:\n$unexpected_errors"\n  exit 1\nfi\n        returned non-zero exit status 1.

Unexpected error(s) encountered while copying:
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/3d_drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/algorithms/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/basic/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/graph/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/subclass/__pycache__': No such file or directory

I gather the issue arises the moment the repo files are being copied, but I don't know how to solve it. Any ideas?

These are the task settings:

resources:
  jobs:
    otd:
      name: otd
      email_notifications:
        on_failure:
          - [email protected]
        no_alert_for_skipped_runs: true
      notification_settings:
        no_alert_for_skipped_runs: true
        no_alert_for_canceled_runs: true
      tasks:
        - task_key: otd_dbt
          dbt_task:
            project_directory: ""
            commands:
              - dbt deps
              - dbt build -s +otd_total
            schema: gold
            warehouse_id: xxxxxxxxxxx
            catalog: logistics_prd
            source: GIT
          job_cluster_key: dbt_CLI
          libraries:
            - pypi:
                package: dbt-databricks>=1.0.0,<2.0.0
      job_clusters:
        - job_cluster_key: dbt_CLI
          new_cluster:
            cluster_name: ""
            spark_version: 15.4.x-scala2.12
            spark_conf:
              spark.master: local[*, 4]
              spark.databricks.cluster.profile: singleNode
            azure_attributes:
              first_on_demand: 1
              availability: ON_DEMAND_AZURE
              spot_bid_max_price: -1
            node_type_id: Standard_D4ds_v5
            custom_tags:
              ResourceClass: SingleNode
            spark_env_vars:
              PYSPARK_PYTHON: /databricks/python3/bin/python3
            enable_elastic_disk: true
            data_security_mode: SINGLE_USER
            runtime_engine: PHOTON
            num_workers: 0
      git_source:
        git_url: 
        git_provider: azureDevOpsServices
        git_branch: main
      queue:
        enabled: true

Please feel free to ask me for more details.

I'm trying to build/schedule a dbt project (sourced in Azure DevOps) in Databricks Workflows. However, whenever I run dbt there, I get the following error message:

CalledProcessError: Command 'b'\nmkdir -p "/tmp/tmp-dbt-run-1124228490001263"\nunexpected_errors="$(cp -a -u "/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/." "/tmp/tmp-dbt-run-1124228490001263" 2> >(grep -v \'Operation not supported\'))"\nif [[ -n "$unexpected_errors" ]]; then\n  >&2 echo -e "Unexpected error(s) encountered while copying:\n$unexpected_errors"\n  exit 1\nfi\n        returned non-zero exit status 1.

Unexpected error(s) encountered while copying:
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/3d_drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/algorithms/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/basic/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/graph/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/subclass/__pycache__': No such file or directory

I gather the issue arises the moment the repo files are being copied, but I don't know how to solve it. Any ideas?

These are the task settings:

resources:
  jobs:
    otd:
      name: otd
      email_notifications:
        on_failure:
          - [email protected]
        no_alert_for_skipped_runs: true
      notification_settings:
        no_alert_for_skipped_runs: true
        no_alert_for_canceled_runs: true
      tasks:
        - task_key: otd_dbt
          dbt_task:
            project_directory: ""
            commands:
              - dbt deps
              - dbt build -s +otd_total
            schema: gold
            warehouse_id: xxxxxxxxxxx
            catalog: logistics_prd
            source: GIT
          job_cluster_key: dbt_CLI
          libraries:
            - pypi:
                package: dbt-databricks>=1.0.0,<2.0.0
      job_clusters:
        - job_cluster_key: dbt_CLI
          new_cluster:
            cluster_name: ""
            spark_version: 15.4.x-scala2.12
            spark_conf:
              spark.master: local[*, 4]
              spark.databricks.cluster.profile: singleNode
            azure_attributes:
              first_on_demand: 1
              availability: ON_DEMAND_AZURE
              spot_bid_max_price: -1
            node_type_id: Standard_D4ds_v5
            custom_tags:
              ResourceClass: SingleNode
            spark_env_vars:
              PYSPARK_PYTHON: /databricks/python3/bin/python3
            enable_elastic_disk: true
            data_security_mode: SINGLE_USER
            runtime_engine: PHOTON
            num_workers: 0
      git_source:
        git_url: https://dev.azure/copa-energia/Logistics/_git/dbt_logistica
        git_provider: azureDevOpsServices
        git_branch: main
      queue:
        enabled: true

Please feel free to ask me for more details.

Share Improve this question edited Mar 11 at 11:45 Maurício Schwartsman asked Mar 10 at 15:37 Maurício SchwartsmanMaurício Schwartsman 133 bronze badges 2
  • Please share the detailed steps to reproduce the issue, including how to integrate Databricks with Azure DevOps and how you build your dbt project in Databrick workflow. – Ziyang Liu-MSFT Commented Mar 11 at 1:40
  • I've added the job settings. I'll provide more info if necessary. – Maurício Schwartsman Commented Mar 11 at 11:49
Add a comment  | 

1 Answer 1

Reset to default 0

As it turns out, the solution was simpler than I expected.

Since those files are not necessary, I could simply remrove them from the repo and add them to .gitignore:

venv/
__pycache__/
发布评论

评论列表(0)

  1. 暂无评论