On my previous post, I wrote about Cost Optimization based on the Azure Well-Architected Framework. If you want to take a look at that post, go to the following link: Azure Well-Architected Framework – Cost Optimization
Continuing with my series of post about the Azure Well-Architected Framework (I know, it’s been a while since my last post), now it is the time for Operational Excellence.
Operational Excellence
The way to be Excellence at your operations is by being aware of what is happening to your systems. It is not feasible to operate you applications and services in the cloud when you are blind on how your resources are performing and behaving at every moment. In addition, you must embrace Agile practices. The principles to be really excellent at operations are:
- Design, build and orchestrate your applications with modern practices (through DevOps and CI/CD practices).
- Use monitoring and analytics to gain insights about your operations.
- Use automation to reduce effort and errors.
- Test your application thoughtfully.
Let’s dive into every of these principles.
Design, build and orchestrate your applications with modern practices (through DevOps and CI/CD practices)
There are three key areas to pay attention to in this regard: DevOps, Continuous Integration/Continuous Delivery and Environment Consistency. Let’s deep dive into every one of them:
DevOps
This is the inclusion of people, processes and products to deliver value. It aims for mitigating the silos by creating self contained teams that are responsible for development and operations end to end of their products. It also includes CI/CD practices. Azure DevOps helps on this last topic. Also, Github Actions help on creating workflows to build and deploy your solutions.
Continuous Integration and Continuous Delivery
Practices to integrate and test your code every time a developer commit a change (CI), while CD build, test and deploy new versions of products based on results from CI process. When you have many environments held together in a chain, you say you have a pipeline.
Environment Consistency
Strive for keeping your configurations of environments the same between different work environments, such as integration, test, production. This avoids undesirable situations where the code that perfectly works and passes all tests in one environment, simply breaks in another one.
User monitoring and analytics to gain operational insights
Monitoring is highly important for any system, specially those hosted on the cloud. Monitoring is collect and analyze data to determine the performance, availability and resource management of your solutions. This leads to alerts and warnings that let you solve problems even before users notice them.
There are 3 levels of monitoring in Azure:
Core monitoring
It lets you monitor the Azure platform and those services, incidences, upgrades and issues that may affect your resources. There are 4 kind of monitoring services:
- Activity Log: Logs all activities and actions that have occurred in your resources. (https://docs.microsoft.com/en-ca/azure/azure-monitor/platform/activity-log)
- Azure Service Health: Monitors the health of Azure services. (https://azure.microsoft.com/en-ca/features/service-health/)
- Azure Monitor: Core monitor services that let you aggregate data about different Azure services, infrastructure and apps. It also has an alert system based on custom thresholds. (https://azure.microsoft.com/en-ca/services/monitor/)
Deep infrastructure monitoring
Log Analytics lets you collect data even from Virtual Machines and services like SQL Server. Then, this data can be streamed into Azure monitor.
Deep application monitoring
For this, Azure offers Application Insights. (https://docs.microsoft.com/en-ca/azure/azure-monitor/app/app-insights-overview)
Use automation to reduce manual effort and errors
Here, there are 3 areas to pay attention to:
Infrastructure as Code
You must implement this approach to reduce the number of manual work and errors that would be produced when you need to deploy your services. There are two options when provisioning services to your Azure environment: Imperative – where you say explicitly how those resources must be created – and Declarative – when you say what you need and let the Azure system to take care of how to do it.
For Virtual Machines, as they might need additional configurations after the creation, you have two options: Custom Images – when you create a Virtual Machine with all you need and create an image based on it to create more Virtual Machines – and Post deployment scripting – where, after the Virtual Machine is created, you run some scripts to install and update stuff.
Keep in mind that in both cases, you must manage how to apply patches and updates, as well as how to implement policies along all the Virtual Machines.
Automation Operational Tasks
You can use Azure Automation to delegate some manual work, such as turning on or off a Virtual Machine at specific schedules.
Automate Development Environments
Use Azure DevTest Labs to provision development purposes Virtual Machines.
Testing strategies for your application
Automated tests
There is a wider open list of options for automated tests. Nevertheless, the main focus could in the following:
- Unit tests: This is to test the core functionality of each class or component. Ideally, it should cover 100% of your code and must run fast.
- Smoke tests: These are a little more extense than Unit tests but no as much as Integration tests. These focus on if the components can be built properly and that each component meets the criteria of performance and expected functionality.
- Integration tests: These are used for ensuring that components can work properly to each other. They run less frequently than the two above, for example every night.
Manual testing
These are more expensive than automated tests but still necessary. In this category lays the acceptance testing to confirm that the application is doing what it should. Here, there are some releasing options to consider: Blue/Green, Canary and A/B testing.
Stress testing
For testing the resilience of your application when the workload rises to high levels.
Fault injection
This means introducing some fault elements (software or infrastructure) to check how the system behaves under those circumstances.
Security testing
You must ensure your application has security in place by protecting it against things like SQL Injection, cross-site scripting and the like. Also, you can run what it is known as red team exercises where the security team tries to compromise your app.
I will keep posting on those topics highlighted on the Azure Well-Architected Framework. Next will be “Performance efficiency”.
Stay tuned for the next post.