The necessity of “cdk.context.json” in AWS CDK

Context and cdk.context.json

Context

Context in AWS CDK refers to key-value pairs that can be associated with apps, stacks, or constructs.

In simple terms, context is used in cases where you need to provide information to CDK stacks from outside the stack definition.

For example, if you want to pass deployment environment information (dev, stg, prd, etc.) to a CDK stack from outside as a string, you can pass context with a key like ENV, and then receive that information within the CDK definition.

This context information can be described in the context key of the cdk.json file, or passed as --context (-c) options to cdk deploy or cdk synth commands.

npx cdk deploy -c ENV=dev

Within the CDK stack (or app or construct), you can retrieve it as follows:

const env = app.node.tryGetContext('ENV') as string; // dev, stg, prd, etc.

Additionally, AWS CDK itself utilizes feature flags (a mechanism to explicitly opt-in to functional changes that involve breaking changes by setting flags to true), and the cdk.json file is also used as the storage location for these feature flags.

cdk.context.json

The cdk.context.json file is a storage file for caching values retrieved from AWS accounts during synthesis.

For example, it dynamically retrieves and stores availability zone information or Amazon Machine Image (AMI) IDs currently available for EC2 instances from AWS accounts.

Specifically, when you execute methods called context methods (also called Lookup methods) provided by CDK's L2 Construct or Stack classes, AWS SDK is used internally to retrieve information from AWS accounts, and the results are stored in the cdk.context.json file.

AWS CDK Context Methods Documentation

As shown below, these methods are used quite frequently for VPCs, SSM Parameter Store, and other common use cases. Information retrieved with these methods is automatically written to the cdk.context.json file.

const vpc = Vpc.fromLookup(this, 'Vpc', {
  vpcId,
});

const parameter = StringParameter.valueFromLookup(this, parameterName);

During deployment or synthesis, if that information exists in the cache (cdk.context.json file), the process of retrieving information from the AWS account via SDK does not run, and the information in the file is used instead.

The Necessity of cdk.context.json

Now, let's discuss the main topic: "the necessity of cdk.context.json".

To be more precise, we're discussing "whether it's necessary to commit the cdk.context.json file to source code repositories like Git (not ignore it)".

The conclusion is that it's "necessary", or more accurately, "it's better to commit it (in most cases)".

The official documentation also states it's "necessary":

Because they're part of your application's state, cdk.json and cdk.context.json must be committed to source control along with the rest of your app's source code. Otherwise, deployments in other environments (for example, a CI pipeline) might produce inconsistent results.

Why cdk.context.json is Necessary

Why is it necessary to commit the cdk.context.json file to source code repositories like Git?

There are two main reasons:

To avoid non-deterministic behavior (deployments)
To improve deployment speed

Avoiding Non-deterministic Behavior (Deployments)

What does avoiding non-deterministic behavior (deployments) mean?

Let's discuss the case without a caching mechanism (when there's no cdk.context.json file).

For example, suppose you're deploying with context methods configured to retrieve the latest EC2 AMI.

If a new AMI version is released after a certain date, and your CDK is implemented to retrieve the latest image, the AMI value retrieved would eventually differ from the currently deployed EC2 instance, causing EC2 replacement (reconstruction).

To avoid such "non-deterministic" behavior where configuration changes based on deployment execution timing, the cdk.context.json file caches the AMI information from when it was deployed. In subsequent deployments, this cached information is referenced to use the same value in every deployment, ensuring "deterministic" behavior.

The official documentation's best practices page also includes a section on "Commit cdk.context.json to avoid non-deterministic behavior", so please check it out.

AWS CDK Best Practices Documentation

By the way, if you want to prevent cases where you need to look up from AWS accounts when there's no cache information in cdk.context.json, that is, if you want deploy and synth to error when there's no cache, there's a --lookups option for cdk deploy and cdk synth commands. Setting this to false will cause deployment to error when there's no cache. (The default is true, so when there's no cache, it retrieves via SDK)

This ensures completely "deterministic" behavior.

--lookups    Perform context lookups (synthesis fails if this is
             disabled and context lookups need to be performed)
                   [boolean] [default: true]

Improving Deployment Speed

The previous point about "avoiding non-deterministic behavior (deployments)" is commonly discussed when explaining cdk.context.json, but many people might not know the detailed story about this aspect.

Why does having (committing) the cdk.context.json file improve deployment speed?

While it's true that caching reduces time by eliminating SDK calls and communication processes, there's an even bigger reason.

That is, when cdk.context.json doesn't exist or doesn't contain the relevant information, "synthesis runs 2 times".

"Synthesis running twice" means not only that the synth process itself is heavy, but also that build processes for Lambda code run again.

Let's look at the actual source code from the CDK repository.

Below is the doSynthesize method of the CloudExecutable class, which is called during synthesis.

CDK Source Code - CloudExecutable

    while (true) {
      const assembly = await this.props.synthesizer(this.props.sdkProvider, this.props.configuration);

      if (assembly.manifest.missing && assembly.manifest.missing.length > 0) {
        const missingKeys = missingContextKeys(assembly.manifest.missing);

        // ...
        // ...

        if (tryLookup) {
          await this.props.ioHelper.defaults.debug('Some context information is missing. Fetching...');

          const updates = await contextproviders.provideContextValues(
            assembly.manifest.missing,
            this.props.sdkProvider,
            GLOBAL_PLUGIN_HOST,
            this.props.ioHelper,
          );

          for (const [key, value] of Object.entries(updates)) {
            this.props.configuration.context.set(key, value);
          }

          // Cache the new context to disk
          await this.props.configuration.saveContext();

          // Execute again
          continue;
        }
      }

First, there's a while loop, and within it, the synthesis process runs first.

    while (true) {
      const assembly = await this.props.synthesizer(this.props.sdkProvider, this.props.configuration);

Then, if context information is missing, meaning there's no necessary cache in cdk.context.json, it enters the following if statement.

      if (assembly.manifest.missing && assembly.manifest.missing.length > 0) {

Here's the important part: the process to retrieve context information from AWS accounts via SDK runs, saves it as context to the file, and then returns to the beginning of the while loop with continue.

const updates = await contextproviders.provideContextValues(
  assembly.manifest.missing,
  this.props.sdkProvider,
  GLOBAL_PLUGIN_HOST,
  this.props.ioHelper,
);

for (const [key, value] of Object.entries(updates)) {
  this.props.configuration.context.set(key, value);
}

// Cache the new context to disk
await this.props.configuration.saveContext();

// Execute again
continue;

Since the synthesis process is written at the beginning of the while loop, the synthesis process runs again, resulting in this behavior.

This way, when cdk.context.json doesn't exist or doesn't contain the relevant information, "synthesis runs twice", which causes deployments to take longer.

Considerations for cdk.context.json

While we've discussed that cdk.context.json is necessary, let's talk about some considerations.

For example, suppose you're using the StringParameter.valueFromLookup method to dynamically reference values from SSM Parameter Store.

At some point, you update that parameter store value to make it new, and in the next CDK deployment, you want the CDK stack to reference that new value.

However, when there's a cache in cdk.context.json, the process to access Parameter Store doesn't run, so it continues to reference the same old value as before.

In such cases, command options to clear context information (cache) are provided in the cdk context command.

Reset specific context

npx cdk context --reset [KEY_OR_NUMBER]

## ex) npx cdk context --reset 2

Clear all context

npx cdk context --clear

For the [KEY_OR_NUMBER] part of the --reset option, you specify the key name or number of the context you want to delete. You can check the key name or number with cdk context (without options).

$ npx cdk context

Context found in cdk.json:

┌───┬─────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────┐
│ # │ Key                                                         │ Value                                                   │
├───┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
│ 1 │ availability-zones:account=123456789012:region=eu-central-1 │ [ "eu-central-1a", "eu-central-1b", "eu-central-1c" ]   │
├───┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
│ 2 │ availability-zones:account=123456789012:region=eu-west-1    │ [ "eu-west-1a", "eu-west-1b", "eu-west-1c" ]            │
└───┴─────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────┘

Therefore, when you want to get the latest values like SSM parameters, it's good to clear the context with reset/clear before deployment.

However, in that case, even if you've committed the cdk.context.json file, the synth process will run twice, so be aware that deployment speed will decrease.

While not committing cdk.context.json is an option, when the cdk.context.json file exists and contains context information that wasn't cleared, that information is used as cache, so communication processes to retrieve that information via SDK don't occur.

Therefore, I still think it's better to commit cdk.context.json.

Cases Where Committing is Not Necessary

When you want to completely clear context (cache) with every deployment.

This applies to cases like "you're not loading VPCs or other resources within the stack, but you're only using context methods for SSM Parameter Store where you want to retrieve new values every time".

However, be careful not to forget that you've ignored the cdk.context.json file when context methods become necessary in future development, which could lead to unknowingly slower deployment speeds.

Conclusion

Through writing this article, I realized that cdk.context.json is an unexpectedly key point in CDK.

Please make sure to commit it rather than ignoring it.

Kenta Goto @k_goto