How the University of Manchester adopted cloud using HashiCorp
Hear how the University of Manchester is already seeing cost and security benefits across the board while getting faster with the introduction of a platform based on HCP Terraform and Vault.
» Transcript
Good afternoon, everyone. How's your day been so far? Good. Brilliant. I've got some insights into something new, which is the Lloyds Banking Group. That was a nice chat to listen to regarding their IDP, so it's been really good for me as well.
I'll tell you how I ended up here. On Thursday afternoon, my manager says, "Oh, I've got a family commitment, and I can't make it for the conference." He says, "Look, I've prepared the slides. Don't worry, you'll be OK."
When your boss says that you've got no choice but to say, "I'll give it a go." He says, "Here are the slides. Get on with it." But the real reason why I've come here is because you get a free T-shirt, so that's how I ended up here.
I'm going to talk about the University of Manchester's cloud journey. We're at the beginning of that journey, and I'm sure there are lots of organizations that are a lot more mature. But it's been interesting.
I've been at the university for about 15, 16 months. I came from the private sector into the public sector, and it's been a real eye-opener for me. Lots of challenges, but like everyone, we're all up for these challenges.
Have I told you my name? My name is Nisar Khan, Head of Platform at the University of Manchester. I'm going to play you a nice video of the university while I grab a glass of water.
I hope you enjoyed that video. That decreases the time I have to speak now.
» Manchester University’s academic history and heritage.
As you can see from this timeline, there are a lot of Nobel Prize winners—25. Some other facts as well. We cater for 45,000 students, with nine out of ten going into employment or further studies. It's a great place to work and a great place to study.
There are lots and lots of buildings on the campus. It was quite amazing for me when I first arrived how many buildings there were. And believe it or not, they've got a place in Dubai as well. I think my manager's going to go and visit Dubai, and I'd love the chance.
A bit more history: Invention of the virtual memory by a collaboration by Tom Kilburn and Ferranti. I think the first stored computer was done by Tom Kilburn in 1948. There's a lot of IT history at the university as well.
I like this slide for two reasons: First of all, we work in the Kilburn building. The IT department is in the Kilburn building. But I also started my career at Ferranti as an apprentice many years ago. They got bought out by Thales. Ferranti being a defense company, I was based at Cheadle in Stockport.
» Back to basics
This is an interesting slide because—I don't know if you're aware or not—but last year the university suffered a cyber attack in July. Anyone aware of that? It was all over the news, papers, etc., and internet. That forced us to shut down our Azure tenant. We shut it down in a controlled manner, took out the workloads, and backed up the data just in case, even though they could contain vulnerabilities. We put them in a segregated area.
Then we had to think about our cloud strategy, where we are now, and where we wanted to get to. All the think tanks got together, and we realized that we wanted to go multi-cloud, not be dependent on one particular vendor. We had an on-prem presence as well.
We rebuilt Azure with the three pillars. In my eyes, these are the three most important pillars: well-architected, secure, and cost-optimized. We built it like a house: solid foundations, brick walls, roof, made it all watertight, then put in the windows and the doors for security. And you get the key to let only authorized people into the building.
Then you've got the rooms ready-made for workloads to just come in and slot in. That was our vision. That's what we wanted, and we wanted it to be easy to use—for development teams to spin up workloads and put them in those empty rooms.
» Business drivers for Hybrid cloud
» Cost management
The drivers for hybrid cloud were cost management. In the first three months that I arrived at the university with the existing Azure, I found lots of cost savings. They were simple things like de-allocating unused VMs, rightsizing, and reservation plans.
I'm going to give you the percentage that I saved at the end. But cost management is quite key, especially at the University of Manchester, where people are spinning up environments and development teams are doing POCs, etc. There was no afterthought in terms of supporting or maintaining those.
» Scalability
That's another important driver because we have key business events throughout the year, such as student enrollment. That's quite key. If our systems fail or go down or are slow, then that's going to give the wrong impression to the students. But also it's going to have an impact on the ranking of the university compared to its peers. So, scalability of the application was quite important.
» Reliability with built-in redundancy and DR plan
If the application goes down is there an auto-failover? Make sure we have a DR plan in place, whether it's a manual or an automatic DR process. Also, what happens if the whole of Azure goes down? That was another conversation we had. What do we do?
» Security and compliance
Making sure that our workloads and cloud environment are secure and compliant with the security regulations in the industry. There are a few of them there. Also in my mind, I've always got a concept of shift left. I know it got mentioned on our table and some of the conversations, etc. I live and breathe shift left, so that's another culture I wanted to instill into the university, and we're making slow inroads into that.
» Proactive alerting and monitoring.
It's like someone mentioned—logs are expensive, but having proactive alerting and monitoring 24/7. Why wait for something to go down when you can get an early indication?
This is a quote from my manager. He put it there. The spin I can put on is the right workloads for the right cloud, but also understanding how those workloads interact with each other—the integrations between them—and the best use case for each type of workload. If you want to have a finish-off reading that. Also, UX design and market needs.
» Proposed hybrid cloud infrastructure
» The right workloads for the best use cases
At the moment, within the university, we've got AWS mainly being used by the research IT department. Azure, we've confined it to business apps within the university. Also, we're going to acquire GCP in September. There are conversations going on that we should use it for our AI applications.
» Build automated infrastructure with cloud agnostic tools
This is where Terraform Cloud (now HCP Terraform) has come in and that third-party relationship with HashiCorp—and having cloud-agnostic tools. GitLab, Terraform were top of our list. We procured them in August-September last year. We're at the beginning of that journey. I'm going to talk about it a bit more later.
» Creation of the platform engineering team
I noticed in the first three months of arriving at the university that there were a lot of siloed teams. To get something done, you'd have to raise a ticket, and then that other team would pick it up when they wanted to. Then they'd say we're looking at it, but we've got other tickets to look at. Things would take ages to get done.
So, I wanted to create that platform engineering team—that center of excellence—which could work closely with the DevOps teams, but also security, IAM. Because cloud is not just one team's responsibility, it's the responsibility to the entire organization.
» Agile and Scrum culture
And then, one step further with creating that platform engineering team—I wanted to instill the culture of Agile and Scrum and have the teams in the same Scrum. You'd have a network engineer, a security architect, a cloud architect, the DevOps engineers, and also a platform engineer all in the same Scrum on the same daily stand-up with the same Scrum ceremonies going on.
They'd be talking to each other. There'd be no raising a ticket and get this done five days later. They'd be sorting it out there and then, so that's how we increase that pace. That's leading that speed to market because we need to make sure that we're creating great products—quality products—and remaining competitive with other universities.
» Remove vendor lock-in
The only cloud that we had when I first arrived in January 2023 was Azure. And, guess what happened in April 2023? Microsoft increased their prices by 9%, so we were stuck.
» Our three cloud environments
This is another diagram that my manager's done. This is all about the three different clouds that we're going to be using. There's Azure. That is already being developed and there are workloads going in there. Mainly, there's a student experience application that's being developed to go live in July.
There's the most critical website for the university going into AWS live in August. That's going to be the manchester.ac.uk website, which everyone sees: students, visitors, everyone. When that goes down, we get phone calls from the CTO.
At the moment, it's on-prem, so they're getting all the noise. But when it goes live, we need to make sure that it's robust, got that DR, got the scalability—and we're not going to get a telephone call at 2:00 in the morning.
Also, we want to be in a position where colleagues, students, and visitors can go direct to those cloud applications rather than having to go through the on-prem system, which is what was being done before. They were getting bounced around and that caused latency issues.
» Guiding principles
» Terraform, infrastructure as code and Vault
Terraform, top of the list. Infrastructure as code. I don't need to talk about it too much, but it's cloud-agnostic. You've got Vault in there, which we've also started using for our secrets, credentials, etc. We're at the beginning of that journey. There was a nice Vault presentation this morning by London Exchange, and that was quite interesting because that's where we want to get to.
» Boomerang connections
I just mentioned, where the traffic was being bounced from campus to cloud, back to campus, causing latency issues. So, we want to remove that.
» FinOps
We want to understand what our chargeback is, the invoicing, and be in a position where we can do that forecasting and say, this is what the environment is costing now. What will it cost in six months, in 12 months?
Because the finance director comes knocking on our door and says you've gone over budget, or what's your forecast for next year? If we've got something like FinOps. The way we've approached it is by having Power BI hook into Azure, and that then has all the metrics. We feed that into Power BI. It gives some nice lovely reports. And also, you can do that forecasting as well. The good thing about Power BI is we are also going to connect it to AWS and then we get that single pane of glass for FinOps.
» Monitoring
That's very key, whether it's security monitoring or proactive alerting or monitoring within the application.
» Team set up
Like I mentioned, it was all silos when I first arrived at the university and ticket-based to get things done from another team. We're changing that mindset and culture. But it takes time. Tools and technology you can buy off the shelf and go on a course and learn it. But that sort of mindset and culture, especially when people are at the university for 10, 20, 30 years, it’s bringing them along the journey.
» No full-fat VMs
That means we don't want to start spinning up VMs. We want to move towards PaaS and SaaS, mainly because it helps with less support, maintenance, patching, and upgrades. That's the last thing we want our team to spend time doing. We want to be adding value to innovative solutions. I'm going to use this word self-service—that's the direction we all want to go in.
» The three pillars
Well architected, secure, and cost-optimized. Those are the words that we all talk about at the university in our IT teams.
Delivery is key: If you've got all the tools, talent, and technology, but we're not delivering anything, it doesn't mean anything at all. That’s because we want to make sure that we deliver great products for students and visitors, and we remain competitive with all the other universities.
» What have we delivered so far?
We got Azure back on track. Like I said, we rebuilt it from first principles. We had a third party, Microsoft, working with us, and we also had NCC for our assurance piece. We made sure that this time round we did everything by best practices and we met the regulatory requirements.
Also, we had a new CISO arrive at the university, Heather Lowrie. I don't know if you've come across her on LinkedIn. She does a lot of public speaking. But she was instrumental in bringing that security element to the university and also helping with that cyber security incident.
Sound foundations to deliver. Ability to move workloads based on used service and also depending on cost, availability, and specialization—like I mentioned with the research IT and business AppSec.
» Faster pace
That middle bit resonates with me because you need to create that ecosystem within your team. It's all right having talented people, but they need to be collaboratively working together and also bringing each other along that journey.
We recruited a cloud architect a few months ago. We've had platform engineering, which I'm in charge of, and I've recruited some excellent people who are helping with those reusable components in the DevOps tools, Docker, containerization, etc. We're putting those building blocks in place for other DevOps teams to use. Then, there are pockets of AWS outposts and websites throughout the university.
We've also set up a new ITOC department, which is the IT operational center. That's the 24/7 monitoring for all cloud services and on-prem, which we didn't have before. There was no out-of-hours support.
This is a nice picture of where we go for walks and talks at the university. We'll go for coffee breaks. When I am a little bit stressed with ideas and things, I go for a walk, have a chit-chat, and come back, and that's where some of the best ideas originate from. Lots of plants and trees, etc. That's just a vision of our campus.
» Cloud center of something
We want to be excellent at what we do. Some of the initiatives that we've come up with are setting up cloud groups. We've got a dedicated channel for GitLab discussions, one for Terraform discussions, and also where we bring in third parties.
So, we've got a GitLab coming in to do a presentation. HashiCorp has come in and done presentations in the past as well. It's all bringing the teams together. It's alright for the platform engineering team to be that cloud center of excellence. But we don't want to be that single point of contact for everything because there are only six or seven of us at the moment. We want to share that knowledge with all the DevOps teams so then they're self-sufficient.
One thing I like doing is I'm an advocate of training, training plans, and skills matrices and see where those knowledge gaps are and help the team improve—and also the wider teams.
Then infrastructure as code is everything. But occasionally, as a last resort, we might do the odd click if it's approved and audited.
» Internal developer platform—Vision
Bringing good people together. This slide is my favorite slide. Actually, this is the one I did. It's all about creating that internal developer platform, which we've all had good conversations. We were listening to Lloyds Banking Group before. Again, we're at the beginning of that journey. So, we're creating those reusable components in terms of modules, VNETs, subnets, and databases.
Then a development team can come along from the repo or a library. They can just spin up a VNET and subnets without even having to think about CIDR ranges or public or private. All that hard work's done for them because we—as the platform engineering team—are automating as much as we can to scale it out to all the development teams within the university.
Then, we're also getting a key set of DevOps tools that hook in and communicate with each other. We're investing in the service now. We've already got Terraform, got GitLab. We've got Jira. I want to integrate all those tools so they're all hooking into each other with APIs. So, when the Jira ticket moves from left to right, it also does all the mergers, commits, and everything to GitLab.
Once it's done, it goes into ServiceNow and creates that audit trail in terms of CRs. That's where I want to get to. So, the development teams can then use that internal developer platform to provision those cloud services without raising tickets or engaging with the platform engineering team.
Then we've got the ITOC center, the 24/7 monitoring system, which we're also helping. We're creating dashboards for them and doing monitoring as code for them. Again, that's all automated through infrastructure as code.
» Embracing infrastructure as code in other areas
Networking as code, policy as code, security as code everywhere throughout the whole of the university. That's key because that's how we're going to fully automate everything. So, when a security architect or a security engineer comes along, we want to be in a position where they can make that change to the JSON file. They put it into GitLab and off it goes and gets provisioned through an appropriate approval process.
We're already having conversations with the network team. They love it because, at the moment, to do a firewall change takes about two weeks. If we can automate that, that could get done within minutes. There's a Network 2030 transformation project going on at the moment. We want to be able to use infrastructure as code on that.
» FinOps
We want to make sure we don't break the bank because every quarter I get a knock on the door saying can I have the latest figures? When I first turned up at the university, I was doing all that manually through spreadsheets, etc., I said I can't spend three days doing this. That's where the Power BI report came in.
Now, I’ve given the finance director access to that, and they can see the forecast—last year's, next year's and which projects are allocated to which cost codes. Again, it's the beginning of our journey. We're also going to do the same with AWS, make a connection into those metrics, and then we have that single pane of glass.
» Business results and stakeholder benefits
» Improved student experience
How has the university benefited from this? WellIf the applications are stable, the UX and the workflows are appealing and easy to use, then that improves the student experience. You get more enrollments. And then that impacts the national student survey results, which then leads to a higher ranking for Manchester University. Also— most importantly— it’s more research grants. Every organization wants more money at the end of the day so that all helps to bring in new revenue.
» Reduced IT support
Then, we also reduced IT support because we've been adopting PaaS and SaaS solutions. A good example is the data warehouse within the university had 80 VMs and with the cyber incident got shut down. But they're re-evaluating that now. They're going to be looking at Fabric.
» Single pane of glass view across all the clouds for monitoring
There are conversations going on about Grafana and Splunk, etc. We want to be able to use that cloud-agnostic tool to hook into all our clouds.
» Cost savings
There's that lovely 20% cost saving that I made within the first three months of arriving at the university. It wasn't anything complicated, just using the basics. PJ, the CTO at the university, came over and gave us a hug. He says, "What do you want?" I say, "Well, I could do with another headcount." I ended up convincing him to repurpose that 20% cost saving and give me an additional headcount in the platform engineering team. It was all good, and I've enjoyed my experience so far—and I'm looking forward to the next 18 months.
Thank you for listening, and I hope you have a brilliant rest of the day.