Cloud Architecture and Automation
Here is my first Article of three-part series on Cloud Architecture and Automation. We will cover various topics from automation, infrastructure as code, layered security, and zero trust.
- Automation – Part 1
- Network Design & Security Controls – Part 2
Automation
The automation process will convert legacy infrastructure management into infrastructure as code. This will streamline efficiency while maintaining a high level of security and actually lowering recovery time objects. As system administrator I used automation stacks for 4 types of work.
- Deployments
- Updates
- Changes
- Monitoring
The process was simple we had defined the resources in the Automation stack which essentially cloned base images for Linux and Windows then applied the necessary configuration. This process worked great for VMware and AWS infrastructure and can be replicated to Azure and GCP. Each image was hardened to meet CIS benchmark requirements and a few baseline items were applied. By default, we would only allow RDP and/or SSH from our management network and we would ensure EDR and Compliance audit tools were included.
- Hardened Image
- Automation Stacks
- Terraform
- Saltstack
- Ansible
Infrastructure as code does come with advantages and disadvantages which I have outlined below. From a CISO standpoint, the benefit far outweighs the negatives since you can streamline security by ensuring baseline security controls are applied. Security patching times can be decreased with automation by removing the human factor and more time can be spent during quality control.
Advantages
- Lower Recovery Times and efficient Disaster Recovery
- Code can easily be Audited
- Change Management
- Lower maintenance costs
Disadvantages
- Initial upfront Effort
- Complexity
I do recall catastrophic hardware SSD drive failure where multiple disks died in RAID 10 array. The server contained about 8 dev, qa, uat and training environments for various projects. Due to the automation, we restore about 50 virtualized servers in a matter of hours with all databases restored from backups. Interestingly we often don’t consider the limiting factors when restoring infrastructure due to failure or ransomware breaches.
If we have automated the infrastructure end-to-end disk and network speeds are the limiting factors and the automation stacks will follow a single-threaded sequence. In legacy infrastructure management the human factor will always be the weak link in my experience. Automation in my opinion is the next direction each organization should consider. Ask yourself as CISO ask yourself the following questions
- When was the last (REAL WORLD) Disaster Recovery test performed?
- How long would it take to rebuild your whole environment? 3 days, 5 days, 4 hours
- How similar is your Dev to Production code release process?
- Can you audit your infrastructure changes?
In the next topic, we will be focused on architecture and design from security first standpoint. I will simplify layered security and cover zone-based security controls that can simplify management overhead.