Trying pnpm on the JustAnswer multi-package repository

About the structure of the JustAnswer multi-package repository

undefined Zoltan Kochan

At JustAnswer, we have lots of components, libraries, apps and servers that are written in JavaScript. We structure our JavaScript code by using npm packages. These packages are then placed into one of our multi-package repositories.

If you ever worked with multi-package repositories, you probably know by now, what we learned. At first, a multi-package repo gives you wings:

  • it is easier to implement features that require changes in multiple packages
  • it is easier to do code reviews
  • easier to track changes
  • easier to test

However, when the multi-package repo becomes big (>100 packages) or even huge (>1000 packages), new issues arise:

  • everything slows down
  • hard to manage
  • hard to publish
  • hard to revert

As of now, we don't have huge multi-package repositories yet, at JustAnswer. Our biggest repository has around 300 packages. However, even with this amount of packages, we started to experience lots of issues, the most annoying one of which is slowness. Everything slowed down in our repo: installing dependencies, updating dependencies, running tests, bundling resources.

Slow installs

Probably the most annoying part of the slowness is the installation and updating of npm dependencies. In our 300 packages repo, the installation of all the dependencies in every component takes from 20 to 40 minutes... Insane! There are several reasons for this slowness:

  1. We still use npm v3 and npm v4 which are significantly slower than npm v5.
  2. As of now (February 2018), npm has no way of doing concurrent installations in multiple packages. As a result, we have to run npm install in each of the 300 packages in our repository.
  3. The installed dependencies are not reused between packages in the repository. If [email protected] is used in all the 300 packages, at least 300 copies of lodash will be created, which is bad from both disk-usage and speed point of view.

Solutions

Obviously, JustAnswer is not the first company that faces these issues. So what are the existing solutions for the listed issues?

Lerna. As of now, the most popular tool for managing multi-package repositories is Lerna. Lerna has a command for installing dependencies in a multi-package repo. By default, it uses npm to do installation in every package and then it hoists shared dependencies to the root of the repo. So Lerna solves the issue with disk-space usage but it does not solve the issue of slowness.

Yarn. Yarn is a package manager for JavaScript that has a feature called: workspaces. Like Lerna, Yarn hoists packages to the root of the repo but Yarn does it more efficiently as there is no need to run Yarn separately for each package. Yarn installs all the dependencies for all packages at once. Yarn is faster than Lerna.

pnpm. pnpm is also a package manager for JavaScript but it is faster than both npm and Yarn (benchmarks). Speed is not the only advantage of pnpm though.

  • pnpm saves one version of a package only ever once on a drive.
  • pnpm creates a strict (not flat) node_modules, so code has no access to dependencies that are not declared in package.json
  • like Yarn, pnpm can do installation in many projects, simultaneously, using the pnpm recursive commands.

Rush. Rush is an open-source project by Microsoft for managing multi-package repos. It has an unusual approach to installing dependencies. Instead of doing installation in every package, Rush creates a central project that has all packages of the repo as dependencies. Rush installs dependencies only in the central project. When installation in the central project is done, Rush symlinks the installed dependencies from the central project to other projects in the repo. This solution has a huge advantage over Lerna's and Yarn's approaches: projects have access only to those packages that they declare in package.json.

For installation, Rush recommends to use pnpm but npm can be used as well.


I think it is clear that the best options are pnpm and Rush (with pnpm under the hood).

  • speed
  • disk space efficiency
  • strictness that helps to avoid bugs

On top of that, Rush is also a general multi-package solution, which supports publishing of packages (like Lerna does). However, at the moment, I am more familiar with pnpm and have no experience with Rush, so I will test the performance of our current stack vs plain pnpm.

How to properly compare performance of different package managers

Although in this article I am going to compare only the performance of npm and pnpm, Yarn will be mentioned as well.

To properly compare the speed of different package managers, you have to be sure that:

  • they all run with either hot or cold cache/store The easiest way to do so is to specify a custom cache location. Use the --cache flag of npm, the --store flag of pnpm and --cache-folder for Yarn.
  • they all use or not use a lockfile Just remove any available lockfiles before running install. npm's lockfile is called package-lock.json, pnpm's - shrinkwrap.yaml, Yarn's - yarn.lock. Alternatively, the --no-package-lock, --no-shrinkwrap and --no-lockfile flags can be used.
  • they use the same registry. Both npm and pnpm, use the registry.npmjs.org registry by default. However, Yarn uses registry.yarnpkg.com. So, to be fair, --registry https://registry.npmjs.org should be specified for Yarn.
  • when possible, they should run with the same configurations.

Hence, for measuring the performance of npm/pnpm/Yarn on a single project, you can use a script like this:

# npm
$ rm -rf cache
$ rm -rf package-lock.json
$ rm -rf node_modules
$ time npm install --cache cache

# pnpm
$ rm -rf store
$ rm -rf shrinkwrap.yaml
$ rm -rf node_modules
$ time pnpm install --store store

# Yarn
$ rm -rf cache
$ rm -rf yarn.lock
$ rm -rf node_modules
$ time yarn --cache-folder cache --registry https://registry.npmjs.org/

Comparing performances of npm v5.6.0 and pnpm v1.35.1

In JustAnswer's case, the benchmark tests are more complex. As I mentioned, in our multi-package repo we have 300 packages. We have a simple loop in a gulpfile.js that runs npm install in each of these packages. For measuring the performance of this operation, the npm cache should be cleared first via npm cache clear -f. With the clear cache, the gulp task can be executed via time npx gulp install.

With npm v5.6.0, gulp install runs for 27 minutes.

To test the performance of pnpm, we don't need the loop in the gulpfile. pnpm has a set of commands specifically for doing operations in multi-package repos. As of v1.35.1, they include:

pnpm recursive install
pnpm recursive update
pnpm recursive link
pnpm recursive dislink

pnpm recursive link does what our gulpfile currently does:

  1. links all the packages that are available locally
  2. installs dependencies for every package in the repo

However, by default the recursive commands look for packages in all sub-directories. In our repo we want it to look only in the components/ directory. pnpm recursive can be configured to search in specific locations only. A pnpm-workspace.yaml file can be placed in the root of the repo, with this content:

packages:
 - 'components/**'

Also, to make it fair, let's run pnpm with cold cache. Let's use a new store in the root of the repo (by default, a shared store is used in the home directory). The store location can be changed via the store config. In our case, we change it via: pnpm set store "./store".

After these configurations, running pnpm recursive link finishes in 1m 36.8s! Almost 14 times faster than with npm v5!

Why is pnpm so much faster?

There are several reasons why pnpm recursive is so fast:

  1. pnpm does not copy packages like npm, it creates links. Creating a link is a faster file system operations than copying
  2. the pnpm CLI is not initialized separately for every project
  3. when installation is done in one context, some operations that happen separately for every project can be done once:
    1. verifying the integrity of store
    2. resolution of dependencies
    3. reading of configs

Summary

For our multi-package repo pnpm is a better fit than npm v5 and Yarn. Although Yarn can do installation concurrently as well, as of now (v1.3.2) it does not allow several versions of the same package in one repo. Which we currently have. Also, Yarn uses flat node_modules, which allow buggy code to work. pnpm's strict node_modules is a lot more reliable in the long run.