I need to run scripts on Cloudera CDP. This needs to be done via MWAA(Managed Workspace for Apache Airflow) on AWS. Here are the details regarding MWAA environment:
- Airflow version 2.10.1
- Python Version 3.11
- Private Webserver
- Connected to VPC
- Library used: apache-airflow-providers-ssh
Attempt 1:
Create below SSHOperator:
run_script = SSHOperator(
task_id="run_test_script",
ssh_conn_id=None, # No predefined connection, using retrieved values
command="sudo -u etl_app bash /path/to/test.sh",
remote_host=hostname,
username=username,
password=password, # Required for password authentication
dag=dag,
)
Here host, user and password are retrieved from AWS Secrets manager.
This DAG did not get imported in MWAA and gave error: No Module Found error: airflow.providers.ssh
Fix attempted:
Created requirements.txt with following content and placed in the MWAA bucket under requirements folder but got same error. Cloudwatch Logs revealed below error for requirement:
Retrying multiple times and failing with NoConnectionError and says conflicting library versions.
Attempt 2:
Further changed requirements.txt as below:
-c .10.1/constraints-3.11.txt
apache-airflow-providers-ssh
Still facing same error.
Attempt 3:
Placed .whl files in plugin.zip and placed it in S3 bucket for MWAA and referenced same in MWAA environment. Still same error.
How to overcome this and fix?