Efficiently Debug Google Cloud Composer PyPi Package Installation Issues

Mark MacArdle
Analytics Vidhya
Published in
4 min readMar 19, 2020

--

Cloud Composer is Google’s managed service for Apache Airflow. Its environments have a number of packages preinstalled, but the versions can be out of date. It’s possible to update these and install other packages you need.

However you may get conflicts in the package dependencies which will block updating or installs of new packages. Also Google frequently update the Composer/Airflow images available for new environments so if you do get conflicts, you may have to resolve a new set of conflicts each time you update the image version.

What makes this especially painful is that Composer can take 20–40 minutes to apply your updates and throw an error, which makes trying any resolutions a very slow process.

The below describes how to recreate what’s installed on the Composer environment locally and use the pipenv package manager to resolve conflicting dependencies. This allows quick testing and modification. The needed changes can be uploaded to Composer at the end.

1. Connect to a cluster worker and do a pip freeze

Steps taken from here. Get the cluster and zone from the Environment Configuration tab in Composer.

$ gcloud container clusters get-credentials projects/bought-by-many/zones/europe-west2-c/clusters/europe-west2-test-data-ware-8052197d-gke --zone europe-west2-c

Print out all name spaces:

$ kubectl get pods --all-namespaces

From the NAMESPACE column take the name of a composer-* row and from the NAME column take the name of an airflow-worker-* row.

$ kubectl exec -itn composer-1-10-0-airflow-1-10-6-5983e0fe airflow-worker-8d8c49c87-9v7c4 -- /bin/bash

Connect into the a worker:

$ kubectl exec -itn composer-1-10-0-airflow-1-10-6-5983e0fe airflow-worker-8d8c49c87-9v7c4 -- /bin/bash

Then when connected print out requirements:

airflow@airflow-worker-8d8c49c87-9v7c4:~$ pip freeze

Copy/paste the output into a text file called original_req.txt. It’ll be used later to compare changes against.

In the pip freeze output I got the following lines

# Editable install with no version control (apache-airflow===1.10.6-composer) 
-e /usr/local/lib/airflow

and manually changed them to just:

apache-airflow==1.10.6 

2. Create a new pipenv environment

Put the original_req.txt file into a new folder and make a pipfile from it with

$ pipenv install -r path/to/requirements.txt 

3. Add in your packages and resolve conflicts

In the pipfile add in the extra packages you need and update the version numbers of the pre-installed packages that need it.

Using my-package= “*” or my-package= “>=1.2.0” here can help make resolving the conflicts easier.

Try pipenv lock. If conflicts are flagged, as in the below example, note the package thats causing the issue. grpc-google-iam-v1 in the case below.

...
[pipenv.exceptions.ResolutionFailure]: Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
First try clearing your dependency cache with $ pipenv lock --clear, then try the original command again.Alternatively, you can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.Hint: try $ pipenv lock --pre if it is a pre-release dependency.ERROR: ERROR: Could not find a version that matches grpc-google-iam-v1<0.12dev,<0.13dev,>=0.11.4,>=0.12.3Tried: 0.9.0, 0.10.0, 0.10.1, 0.11.1, 0.11.3, 0.11.4, 0.12.0, 0.12.1, 0.12.2, 0.12.3 ...

Use pipenv install —-skip-lock to allow the packages be installed anyway.

Then use pipenv graph > graph.txt to output the dependency list.

In the graph.txt file search for the issue package (eg grpc-google-iam-v1). Look for the packages where it’s being used that have the dependencies that are causing the conflicts.

For those packages update their version in the pipfile or change it to xyz = “*”.

Try pipenv lock again and repeat the process. Eventually a pipenv install should work with no errors.

4. Create a requirements.txt of all changed and additional packages

Output the new requirements with pipenv run pip freeze > new_req.txt.

Compare this with the original_req.txt file and put all the changed lines into a new diffed.txt file. The below bash command will automatically do this. The tr part of the command makes everything lower case as this is needed when uploading a list to Composer.

$ diff original_req.txt new_req.txt | grep ">" | cut -c 3- | tr A-Z a-z > diffed.txt

These modifications can then be uploaded to Composer:

gcloud composer environments update test-data-warehouse-bought-by-many \
--update-pypi-packages-from-file=diffed.txt \
--location=europe-west2 \
--async

--

--