blog
A Private PyPI server with AWS CodeArtifact
Martijn Jacobs
21 juni 2024
Introduction
When you develop a variety of reusable Python packages or apps you soon will be facing the limits of git dependencies. And you don't want to release company / project specific packages to the public PyPI. Setting up a private PyPI server can be very time consuming and maintenance / backups are required. Can we use AWS CodeaArtifact to create our own package index?
What is AWS CodeArtifact
AWS CodeArtifact is a hosted service from AWS which can hold several package formats, like npm, PyPI, Maven, NuGet and generic package formats. The PyPI feature is interesting as it allows us to package and distribute reusable Python applications properly. These packages (wheels) are installable with pip or poetry (or any other python package manager which supports wheels and/or a PyPI server).
When to use a private PyPI server
There are several reasons why packages with a (private) PyPI server could be beneficial:
- You can "compile" front-end artifacts (images, javascript, css etc) into (binary and/or compressed) assets and have them properly distributed inside a wheel (which is a Python Package). Otherwise you'll need to rebuild your assets when you install dependencies from source distributions (via git dependency links).
- The same goes for other artifacts, like translation files (no more binary
.mo
files in your repository!) - You wrote Python extensions in another language like C(++) / Rust or any other language and you want to compile these only when you update the extensions.
- It speeds up your CI / CD pipeline: Reusable packages need to be build and released only once, instead of each time you need to build your project.
- It forces you to separate concerns: Reusable apps can also be tested in isolated sandboxes, making the code less dependent on your project requirements. A "package oriented mindset" is a sustainable way of managing software in the long term.
And there are many more advantages, like forcing you to implement (proper) versioning, implement (auto-)update strategies (dependabot) or even decide to open source one of your private packages.
Create a new repository
Creating a new AWS Codeartifact repository is fairly simple from the AWS console: CodeArtifact -> Create Repository
:
Or if you are using Terraform, the most basic configuration would be:
resource "aws_codeartifact_domain" "mydomain" { domain = "mydomain" } resource "aws_codeartifact_repository" "myrepo" { repository = "myrepo" domain = aws_codeartifact_domain.mydomain.domain } data "aws_codeartifact_repository_endpoint" "pypi_endpoint" { domain = aws_codeartifact_domain.mydomain.domain repository = aws_codeartifact_repository.myrepo.repository format = "pypi" }
Access
When an AWS CodeArtifact repository is created, the URL for accessing the (pypi) repository will be in the following format:
https://<domain>-<account>.d.codeartifact.<aws-region>.amazonaws.com/pypi/<repository>
An example using the terraform configuration above in the eu-west-1
region:
https://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo
For using the index server to query / download packages, you'll need to add a suffix /simple
to this URL:
https://mydomain...amazonaws.com/pypi/myrepo/ # <-- for publishing https://mydomain...amazonaws.com/pypi/myrepo/simple # <-- for querying / downloading
Authentication
AWS CodeArtifact uses JWT tokens for authentication. These tokens are valid for a maximum of 12 hours, but expiration times can be shorter too (suitable for CI environments). You need to setup the following permissions for the AWS user / IAM role to be able to query the endpoint and download the packages:
- codeartifact:GetAuthorizationToken - codeartifact:ReadFromRepository - sts:GetServiceBearerToken`
When an AWS IAM user / role has these permissions, you can query for a token with the AWS CLI. Let's export the whole command as an environment variable. You could add this to your .bashrc
or .zshrc
:
$ export AWS_CODEARTIFACT_TOKEN_COMMAND=`aws codeartifact get-authorization-token --domain mydomain --domain-owner 111122223333 --query authorizationToken --output text`
Now you can use this token as a password with aws
as username.
With poetry
$ poetry source add --priority=supplemental aws-codeartifact-myrepo https://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple $ poetry config http-basic.aws-codeartifact-myrepo aws $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)
With pip
$ pip install -i https://aws:$(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)@mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple <my-private-package>`
Or, you could set the credentials for a specific site like so:
pip config set site.index-url https://aws:$(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)@mydomain-606718280940.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple/
NetRC
Pip and poetry also work with netrc. I wrote a simple update-netrc CLI to set the credentials for a specific host:
$ update-netrc update http://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple --login aws --password $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)
This command can be easily integrated in CI systems like Github actions or Gitlab CI with a token which is valid for a limited time.
Publishing
Publishing packages is fairly easy as CodeArtifact is 100% compatible with the PyPI API. Your AWS account / IAM role needs the following permissions to allow uploading packages:
- codeartifact:GetAuthorizationToken - codeartifact:GetRepositoryEndpoint - codeartifact:PublishPackageVersion - codeartifact:PutPackageMetadata - sts:GetServiceBearerToken
First we export the "publishable" repository as an environment variable:
$ export AWS_CODEARTIFACT_PYPI_REPOSITORY_URL=https://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo
With poetry
With poetry you should configure a repository and then you can use the poetry CLI to publish the package:
# Add the repository and configure the token $ poetry source add --priority=supplemental aws-codeartifact-myrepo-publish $AWS_CODEARTIFACT_PYPI_REPOSITORY_URL $ poetry config http-basic.aws-codeartifact-myrepo-publish aws $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)
Now we can publish the package after building it:
$ poetry build $ poetry publish --repository aws-codeartifact-myrepo-publish
With Twine
With Twine you can upload your package with the CLI in a single line:
$ twine upload --repository-url $AWS_CODEARTIFACT_PYPI_REPOSITORY_URL --username aws --password $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND) mypackage.whl
Useful links
- https://jasonstitt.com/private-packages-codeartifact-poetry-workflow, an excellent tutorial with poetry
- https://docs.aws.amazon.com/codeartifact/latest/ug/python-configure-pip.html, configure pip with AWS CodeArtifact