The Google Artifact Registry documentation to Manage Python packages says that when running pip install PACKAGE
against a virtual repository:
If you request a version that is available in more than one upstream repository, Artifact Registry chooses an upstream repository to use based on the priority settings configured for the virtual repository.
In the "Virtual repositories overview", the section explaining How virtual repositories select an upstream repository even explicitly discusses the case of pip
:
For example, if you configure the Python pip tool to search PyPI and a virtual repository, your package might be downloaded directly from PyPI because pip will always choose the latest version of a package, regardless of which repository it comes from. If pip is configured to only search the virtual repository, you can then control the priority of all upstream repositories, including an upstream remote repository that acts as a proxy for PyPI.
This works as documented when specifying a package version with ==
, but when using other requirement specifiers (or no requirement specifiers), pip installs the highest version that matches the specifier across all upstreams, completely ignoring priority settings.
How do I configure the virtual repository in order to get packages that exist in an upstream with a higher priority, only from that upstream, regardless of available versions in upstreams with lower priorities?
For instance, I have created a standard repo to store my own packages (python-repo
), a remote repo to access PyPi (pypi-proxy
), and a virtual repo that aggregates python-repo and pypi-proxy with respective priorities 100 and 10 (virtual-python-repo
):
PROJECT_ID=my-project-123456
LOCATION=us-west1
gcloud artifacts repositories create python-repo --repository-format=python \
--location="$LOCATION" --description="local repo"
gcloud artifacts repositories create pypi-proxy --repository-format=python \
--location="$LOCATION" --description="PyPi proxy" \
--mode=remote-repository --remote-repo-config-desc="PyPi" \
--remote-python-repo=PYPI --project="$PROJECT_ID"
gcloud artifacts repositories create virtual-python-repo --repository-format=python \
--location="$LOCATION" --description="Virtual repo" \
--mode=virtual-repository \
--upstream-policy-file=policies.json --project="$PROJECT_ID" \
With policies.json
:
[{
"id": "python-repo",
"repository": "projects/my-project-123456/locations/us-west1/repositories/python-repo",
"priority": 100
},{
"id": "pypi-proxy",
"repository": "projects/my-project-123456/locations/us-west1/repositories/pypi-proxy",
"priority": 10
}]
I am setting pip.conf
and .pypirc
to point to virtual-python-repo
.
With this setting, if I create a new project named "sampleproject" (which already exists on PyPi with version from 1.2.0 to 4.0.0) this is the behavior that I get:
If I set the version to 1.0.0, build, and push sampleproject-1.0.0 to python-repo
:
pip install sampleproject==1.0.0
installs the version 1.0.0 frompython-repo
pip install sampleproject
installs version 4.0.0 fromPyPi
The desired behavior, would be to always install sampleproject from python-repo and ignore the versions from pypi. I know that this isn't possible with pip
alone, but I was hoping that the virtual repo would enable enforcing such policies.
The Google Artifact Registry documentation to Manage Python packages says that when running pip install PACKAGE
against a virtual repository:
If you request a version that is available in more than one upstream repository, Artifact Registry chooses an upstream repository to use based on the priority settings configured for the virtual repository.
In the "Virtual repositories overview", the section explaining How virtual repositories select an upstream repository even explicitly discusses the case of pip
:
For example, if you configure the Python pip tool to search PyPI and a virtual repository, your package might be downloaded directly from PyPI because pip will always choose the latest version of a package, regardless of which repository it comes from. If pip is configured to only search the virtual repository, you can then control the priority of all upstream repositories, including an upstream remote repository that acts as a proxy for PyPI.
This works as documented when specifying a package version with ==
, but when using other requirement specifiers (or no requirement specifiers), pip installs the highest version that matches the specifier across all upstreams, completely ignoring priority settings.
How do I configure the virtual repository in order to get packages that exist in an upstream with a higher priority, only from that upstream, regardless of available versions in upstreams with lower priorities?
For instance, I have created a standard repo to store my own packages (python-repo
), a remote repo to access PyPi (pypi-proxy
), and a virtual repo that aggregates python-repo and pypi-proxy with respective priorities 100 and 10 (virtual-python-repo
):
PROJECT_ID=my-project-123456
LOCATION=us-west1
gcloud artifacts repositories create python-repo --repository-format=python \
--location="$LOCATION" --description="local repo"
gcloud artifacts repositories create pypi-proxy --repository-format=python \
--location="$LOCATION" --description="PyPi proxy" \
--mode=remote-repository --remote-repo-config-desc="PyPi" \
--remote-python-repo=PYPI --project="$PROJECT_ID"
gcloud artifacts repositories create virtual-python-repo --repository-format=python \
--location="$LOCATION" --description="Virtual repo" \
--mode=virtual-repository \
--upstream-policy-file=policies.json --project="$PROJECT_ID" \
With policies.json
:
[{
"id": "python-repo",
"repository": "projects/my-project-123456/locations/us-west1/repositories/python-repo",
"priority": 100
},{
"id": "pypi-proxy",
"repository": "projects/my-project-123456/locations/us-west1/repositories/pypi-proxy",
"priority": 10
}]
I am setting pip.conf
and .pypirc
to point to virtual-python-repo
.
With this setting, if I create a new project named "sampleproject" (which already exists on PyPi with version from 1.2.0 to 4.0.0) this is the behavior that I get:
If I set the version to 1.0.0, build, and push sampleproject-1.0.0 to python-repo
:
pip install sampleproject==1.0.0
installs the version 1.0.0 frompython-repo
pip install sampleproject
installs version 4.0.0 fromPyPi
The desired behavior, would be to always install sampleproject from python-repo and ignore the versions from pypi. I know that this isn't possible with pip
alone, but I was hoping that the virtual repo would enable enforcing such policies.
1 Answer
Reset to default 0After discussion with Google support, it appears that the behavior described above is expected. The virtual repository still collects the union of all the relevant versions across all upstreams, regardless of their priorities. The priority of the upstream repositories is only used to select a specific upstream when a specific version is available across multiple upstreams.
The implication is that with python virtual repositories, in order to ensure that a package will be installed from the private repo, it is necessary to use "==" as version specifier (and make sure that the specified version is indeed in the private repo, and that the private repo has the highest priority).
policies.json
that I included above were written according to the section Create a virtual repository using gcloud CLI – Come Raczy Commented Nov 21, 2024 at 19:23