Path to using Nix/Nixpkgs - A Python perspective

Path to using Nix/Nixpkgs - A Python perspective

I often talk with people about using nix and occasionally convince them to try out nix usually by first installing nix on their existing Linux distribution. They start out using nix-shell for creating ad-hoc virtual environments and eventually start defining virtual environments via a shell.nix. I go this route because many of the developers I work with are already used to virtual environments in some context via pip requirements.txt or conda environment.yaml files. To me this is an easier introduction to nix and slowly shows the nix programming language to the user. I have to say I'm not the biggest fan of the programming language .. it seems it would have been easier to restrict an existing language to do what we need.

Basic environments via nix-shell

Lets use nix-shell interactively to install several packages.

  nix-shell -p htop vagrant nodejs pythonPackages.numpy pythonPackages.pandas

Lets take for example a simple environment in nix named shell.nix.

  { pkgs ? import <nixpkgs> { }, pythonPackages ? pkgs.python3Packages }:

  pkgs.mkShell {
    buildInputs = [
      pkgs.htop
      pkgs.vagrant
      pkgs.nodejs
      pythonPackages.numpy
      pythonPackages.pandas
    ];
  }

After you have installed nix you should have a command named nix-shell. In the directory with the shell.nix file defined run nix-shell. This will expose a python environment that has numpy and pandas installed along with several binaries htop, nodejs for programming in javascript, and vagrant. If this feels like something that conda can do great! The goal is to show how these tools can feel similar.

Reproducible environments (pinning on steroids)

A common thing to do to make this more reproducible is to pin the packages in the environment. In a conda environment.yaml definition we would pin the exact version of each package e.g. numpy==1.2.3. In nix we pin to a commit in nixpkgs which I will explain in more detail in the next paragraph. Modify your shell.nix to pin the exact version to a specific commit.

  let  pkgs = import (builtins.fetchTarball {
         # find the exact commit you want to pin to
         # by default you should use the latest in nixpkgs-unstable branch
         # https://github.com/NixOS/nixpkgs/commits/nixpkgs-unstable
         url = "https://github.com/nixos/nixpkgs/archive/<commit>.tar.gz";
         sha256 = "0a0zapfflj7747zhp1cmk8j9zfcvbppnkqjicqypc872166k9j00";
       }) {};

      pythonPackages = pkgs.python3Packages;
  in
  pkgs.mkShell {
    buildInputs = [
      pkgs.htop
      pkgs.vagrant
      pkgs.nodejs
      pythonPackages.numpy
      pythonPackages.pandas
    ];
  }

For the sake of this post I used the commit 84aa23742f6c72501f9cc209f29c438766f5352d.

  pkgs = import (builtins.fetchTarball {
    url = "https://github.com/nixos/nixpkgs/archive/84aa23742f6c72501f9cc209f29c438766f5352d.tar.gz";
    sha256 = "0a0zapfflj7747zhp1cmk8j9zfcvbppnkqjicqypc872166k9j00";
  }) {};

But your commit will be different. When you run nix-shell again on this file you will get an error similar to

unpacking 'https://github.com/nixos/nixpkgs/archive/84aa23742f6c72501f9cc209f29c438766f5352d.tar.gz'...
error: hash mismatch in file downloaded from 'https://github.com/nixos/nixpkgs/archive/84aa23742f6c72501f9cc209f29c438766f5352d.tar.gz':
  wanted: sha256:0a0zapfflj7747zhp1cmk8j9zfcvbppnkqjicqypc872166k9j00
  got:    sha256:0h7xl6q0yjrbl9vm3h6lkxw692nm8bg3wy65gm95a2mivhrdjpxp

Remember in nix we are trying to pin our build reproducibly. Thus we need to get the exact sha256 of the repository. Nix helps us by just complaining and telling us what the shell.nix should be. When you supply the correct sha256 for the example above it will look like.

  let  pkgs = import (builtins.fetchTarball {
         # find the exact commit you want to pin to
         # by default you should use the latest in nixpkgs-unstable branch
         # https://github.com/NixOS/nixpkgs/commits/nixpkgs-unstable
         url = "https://github.com/nixos/nixpkgs/archive/84aa23742f6c72501f9cc209f29c438766f5352d.tar.gz";
         sha256 = "0h7xl6q0yjrbl9vm3h6lkxw692nm8bg3wy65gm95a2mivhrdjpxp";
       }) {};

      pythonPackages = pkgs.python3Packages;
  in
  pkgs.mkShell {
    buildInputs = [
      pkgs.htop
      pkgs.vagrant
      pkgs.nodejs
      pythonPackages.numpy
      pythonPackages.pandas
    ];
  }

This build is as reproducible as it gets and notice how we didn't have to specify an exact version for packages. This also pins all libraries that we didn't specify in our build inputs but our packages rely on.

Lets look at this definition a little further. Notice how there are no package versions specified? This is because of how nix works. There are over 55k packages in nixpkgs. Nixpkgs is in github.com/nixos/nixpkgs and is a large collection of build configurations for each package. One may think of a build configuration of being a function of the input source (a.k.a package version) plus the build configuration. In nixpkgs each commit to github.com/nixos/nixpkgs is a realization of all packages with the source (version) and build configuration pinned to the commit. Thus unlike conda there is no solve phase in nixpkgs. There is an added benefit to this approach is that packages are checked for compatibility (this can also cause problems since in python there can only be one version of a given python module).

Managing python version

Now we would like to run a given environment with several different version of python. Want to see how easy that is? All you have to do is change the version of python<version>Packages being used.

  let
    ...

    pythonPackages = pkgs.python3Packages;
    # pythonPackages = pkgs.python37Packages;
    # pythonPackages = pkgs.python38Packages;
    # pythonPackages = pkgs.python39Packages;
  in
   ...

Customizing a package's configuration

Let's say you want to change the version of a given python package. Say numpy which pandas depends on.

  let
    pkgs = import (builtins.fetchTarball {
      url = "https://github.com/nixos/nixpkgs/archive/84aa23742f6c72501f9cc209f29c438766f5352d.tar.gz";
      sha256 = "0h7xl6q0yjrbl9vm3h6lkxw692nm8bg3wy65gm95a2mivhrdjpxp";
    }) { };

    pythonPackages = pkgs.python3Packages.override {
      overrides = self: super: {
        numpy = super.numpy.overrideAttrs (oldAttrs: {
          src = super.fetchPypi {
            pname = "numpy";
            version = "1.17.0";
            extension = "zip";
            sha256 = "0jnk1sz0q7kzxm6j9kjdp2hxl3d8h0g00kpc1di4ry3kzgify7wm";
          };
        });
      };
    };
  in
  pkgs.mkShell {
    buildInputs = [
      pkgs.htop
      pkgs.vagrant
      pkgs.nodejs
      pythonPackages.numpy
      pythonPackages.pandas
    ];
  }

While this only shows how to override a given python package set. A similar approach can be used to override arbitrary binaries in nixpkgs. You'll notice when you run nix-shell it will start downloading a bunch of packages and attempting a build of numpy. But you'll also notice it builds a bunch of other packages. Why is that? We'll nix tracks the dependencies of a given package and since numpy changes all packages that are needed by pandas that depend on numpy will need to be rebuilt as well.

Also another thing to notice here is that we have the power to modify the configuration of packages and have all package that we want that depend on that package to be rebuilt. An example of this would be to compile all libraries with intel mkl instead or the gcc fastmath. In other package managed this simply is NOT possible. If you want a specific package to be compiled with a specific version of a packages with exact compile flags … good luck in other package managers.

Conclusion

I hope that this shows the flexibility to customizing and reproducibly defining python environments in nix.