Anti-Pattern 8: Relying on Azure `What-If` as a Testing and Validation Tool

Trent Steenholdt
December 10, 2024

6 minutes to read

Azure Bicep Anti-Patterns

Welcome to the eighth instalment of the series on Azure Bicep anti-patterns. Today’s post is about the anti-pattern of over-relying on Azure’s What-If feature for testing and validating Infrastructure as Code (IaC). While What-If provides a very quick, high-level overview of potential deployment impacts, it is prone to false positives and false negatives. Additionally, Microsoft has not kept up with the community’s requests to suppress unnecessary noise in the output. As a result, it generates white noise for inconsequential changes while missing critical issues it should catch.

Simply put, relying on What-If alone can lead to unforeseen problems in production, missed dependencies, and misconfigurations.

What Azure “What-If” does well

The What-If feature is a useful tool for gaining a quick, loose understanding of how your IaC changes will affect your Azure environment. Some of its strengths include:

Quick and Dirty Impact Preview: It shows which resources will be created, updated, or deleted; with caveats.
Change Identification: Flags changes in resource properties before actual deployment, offering a chance to review them. Again with caveats.
Non-Destructive Testing: Operates without applying changes, making it safe to use in production-like environments.

What Azure “What-If” does poorly

Despite its utility, What-If has significant limitations that make it unsuitable as a standalone validation tool:

False Positives:
- Flags changes that won’t actually occur during deployment.
- Example: Marking resources as updated when the actual deployment won’t modify them.
False Negatives:
- Fails to detect critical changes or configuration issues that manifest during runtime.
- Example: Missing updates to dependencies like Key Vault secrets, diagnostics settings, or role assignments.
Excessive Noise:
- The What-If output often includes irrelevant information, obscuring meaningful changes.
- Example: Listing unchanged resource properties as pending updates, while skipping important dependency changes.
Runtime Validation Issues:
- Ignores external dependencies, such as API integrations or external secrets.
- Fails to check compliance with Azure Policies, leading to surprises in restricted environments.

Why does it matter that What If isn’t perfect?

To the untrained eye, this leads to confusion of what the Bicep is trying to do when you’re idempotently making changes to improve and enhance your code. For example, here is a screenshot of that confusion from a recent customer engagement of mine.

Example of What If noise

The noise the Cloud Engineer in the screenshot is concerned about occurred on a small Bicep file change to rules in an Azure Firewall. The reason for the noise and the confusion that there was going to be a monumental amount of changes to a simple pull request was because the adding of the new firewall rule was mid-array E.g.

{
  # Rule 1
}
{
  # Rule 2
}
{
  # NEW Rule in PR
}
{
  # Rule 3
}
{
  # Rule 4
}

This meant the What If output saw the new rule replacing Rule 3 which meant Rule 3 was replacing Rule 4 and so on. That cascading effect made what was a relatively small firewall change look like it was touching over 50% of the firewall rules which spooked the Cloud Engineer from not running their change promptly, with the Development team needing this change in quickly.

What If effectively was telling a lie, and the end result of the deployment would have just been the one new rule added to the Azure Firwall. While the Cloud Engineer could have just added their new rule to the end of the array, it didn’t make sense to because they were trying to keep their code human readable, as we covered in part three of the series.

Why This Is an Anti-Pattern

Relying solely on Azure What-If can create a false sense of security (or even insecurity as demontstrated above), leading to incomplete or poorly validated deployments. Key risks include:

Skipped Real-World Testing: Developer/Engineers may forego testing in isolated environments, assuming What-If is sufficient.
Missed Configuration Errors: Many issues only manifest during actual deployment or runtime.
Risk of Downtime: Deployments to production based on incomplete validation can lead to outages or misconfigurations.

Best Practices to Avoid Over-Reliance on `What-If`

1. Treat `What-If` as a supplement tool, not a solution

Use What-If for a high-level understanding of deployment impacts, but never as the sole validation mechanism. Combine it with other testing methods to ensure comprehensive validation.

2. Perform Real-World Testing in Isolated Environments

Deploy your IaC into isolated environments (e.g., a dedicated test subscription) to validate end-to-end functionality and catch issues missed by What-If.

3. Automate Validation Pipelines

Incorporate runtime validation into your CI/CD pipelines using tools like:

Terratest: For end-to-end infrastructure testing.
Azure Deployment Scripts: To run post-deployment checks. E.g. Get-AzResource
PSDocs: To auto-generate documentation and verify resource compliance.
PSRule: To test code compliance with best practices and governace compliance.

4. Combine with Linting and Unit Tests

Lint your IaC code to catch syntactical and logical errors before deployment. Use tools like:

Bicep Linter: For static code analysis.
Pester: For unit testing in PowerShell scripts and Bicep modules. Check out this blog post from Dan Rios.

5. Monitor the GitHub Issue Tracker

Stay updated on improvements to What-If by following the ARM Template What-If GitHub issue tracker. Be aware of known issues and work around them when necessary.

Conclusion

Azure What-If is a helpful tool for previewing changes of very simplistic and basic Azure Bicep, but it is far from perfect. Its excessive noise, false positives, and inability to validate runtime behaviours or external dependencies make it unsuitable as a primary testing and validation mechanism. By treating What-If as a supplemental tool and adopting robust real-world testing practices, you can avoid the pitfalls of this anti-pattern and ensure your IaC deployments are reliable, secure, and compliant.

What’s Next?

In the last post of this series, we’ll explore why IaC isn’t an exception to good software practices.