blog

A Private PyPI server with AWS CodeArtifact

Martijn Jacobs

21 juni 2024

Introduction

When you develop a variety of reusable Python packages or apps you soon will be facing the limits of git dependencies. And you don't want to release company / project specific packages to the public PyPI. Setting up a private PyPI server can be very time consuming and maintenance / backups are required. Can we use AWS CodeaArtifact to create our own package index?

What is AWS CodeArtifact

AWS CodeArtifact is a hosted service from AWS which can hold several package formats, like npm, PyPI, Maven, NuGet and generic package formats. The PyPI feature is interesting as it allows us to package and distribute reusable Python applications properly. These packages (wheels) are installable with pip or poetry (or any other python package manager which supports wheels and/or a PyPI server).

When to use a private PyPI server

There are several reasons why packages with a (private) PyPI server could be beneficial:

  • You can "compile" front-end artifacts (images, javascript, css etc) into (binary and/or compressed) assets and have them properly distributed inside a wheel (which is a Python Package). Otherwise you'll need to rebuild your assets when you install dependencies from source distributions (via git dependency links).
  • The same goes for other artifacts, like translation files (no more binary .mo files in your repository!)
  • You wrote Python extensions in another language like C(++) / Rust or any other language and you want to compile these only when you update the extensions.
  • It speeds up your CI / CD pipeline: Reusable packages need to be build and released only once, instead of each time you need to build your project.
  • It forces you to separate concerns: Reusable apps can also be tested in isolated sandboxes, making the code less dependent on your project requirements. A "package oriented mindset" is a sustainable way of managing software in the long term.

And there are many more advantages, like forcing you to implement (proper) versioning, implement (auto-)update strategies (dependabot) or even decide to open source one of your private packages.

Create a new repository

Creating a new AWS Codeartifact repository is fairly simple from the AWS console: CodeArtifact -> Create Repository: AWS Codeartifact

Or if you are using Terraform, the most basic configuration would be:

resource "aws_codeartifact_domain" "mydomain" { domain = "mydomain" } resource "aws_codeartifact_repository" "myrepo" { repository = "myrepo" domain = aws_codeartifact_domain.mydomain.domain } data "aws_codeartifact_repository_endpoint" "pypi_endpoint" { domain = aws_codeartifact_domain.mydomain.domain repository = aws_codeartifact_repository.myrepo.repository format = "pypi" }

Access

When an AWS CodeArtifact repository is created, the URL for accessing the (pypi) repository will be in the following format:

https://<domain>-<account>.d.codeartifact.<aws-region>.amazonaws.com/pypi/<repository>

An example using the terraform configuration above in the eu-west-1 region:

https://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo

For using the index server to query / download packages, you'll need to add a suffix /simple to this URL:

https://mydomain...amazonaws.com/pypi/myrepo/ # <-- for publishing https://mydomain...amazonaws.com/pypi/myrepo/simple # <-- for querying / downloading

Authentication

AWS CodeArtifact uses JWT tokens for authentication. These tokens are valid for a maximum of 12 hours, but expiration times can be shorter too (suitable for CI environments). You need to setup the following permissions for the AWS user / IAM role to be able to query the endpoint and download the packages:

- codeartifact:GetAuthorizationToken - codeartifact:ReadFromRepository - sts:GetServiceBearerToken`

When an AWS IAM user / role has these permissions, you can query for a token with the AWS CLI. Let's export the whole command as an environment variable. You could add this to your .bashrc or .zshrc:

$ export AWS_CODEARTIFACT_TOKEN_COMMAND=`aws codeartifact get-authorization-token --domain mydomain --domain-owner 111122223333 --query authorizationToken --output text`

Now you can use this token as a password with aws as username.

With poetry

$ poetry source add --priority=supplemental aws-codeartifact-myrepo https://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple $ poetry config http-basic.aws-codeartifact-myrepo aws $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)

With pip

$ pip install -i https://aws:$(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)@mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple <my-private-package>`

Or, you could set the credentials for a specific site like so:

pip config set site.index-url https://aws:$(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)@mydomain-606718280940.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple/

NetRC

Pip and poetry also work with netrc. I wrote a simple update-netrc CLI to set the credentials for a specific host:

$ update-netrc update http://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo/simple --login aws --password $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)

This command can be easily integrated in CI systems like Github actions or Gitlab CI with a token which is valid for a limited time.

Publishing

Publishing packages is fairly easy as CodeArtifact is 100% compatible with the PyPI API. Your AWS account / IAM role needs the following permissions to allow uploading packages:

- codeartifact:GetAuthorizationToken - codeartifact:GetRepositoryEndpoint - codeartifact:PublishPackageVersion - codeartifact:PutPackageMetadata - sts:GetServiceBearerToken

First we export the "publishable" repository as an environment variable:

$ export AWS_CODEARTIFACT_PYPI_REPOSITORY_URL=https://mydomain-111122223333.d.codeartifact.eu-west-1.amazonaws.com/pypi/myrepo

With poetry

With poetry you should configure a repository and then you can use the poetry CLI to publish the package:

# Add the repository and configure the token $ poetry source add --priority=supplemental aws-codeartifact-myrepo-publish $AWS_CODEARTIFACT_PYPI_REPOSITORY_URL $ poetry config http-basic.aws-codeartifact-myrepo-publish aws $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND)

Now we can publish the package after building it:

$ poetry build $ poetry publish --repository aws-codeartifact-myrepo-publish

With Twine

With Twine you can upload your package with the CLI in a single line:

$ twine upload --repository-url $AWS_CODEARTIFACT_PYPI_REPOSITORY_URL --username aws --password $(eval $AWS_CODEARTIFACT_TOKEN_COMMAND) mypackage.whl

Useful links

Meer updates

Dit is wat we recent hebben gedaan.

Beautiful asserts with your Django Test Client

blog

Beautiful asserts with your Django Test Client

Martijn Jacobs

27 maart 2023

Add Plausible to your Terraform CloudFront distribution

blog

Add Plausible to your Terraform CloudFront distribution

Rob Moorman

2 december 2022

Querying json data with Django

hint

Querying json data with Django

Rob Moorman

25 november 2022