When you hear the call "We're migrating to the Cloud!" does your heart skip a beat? Lifting, shifting, migrating, and building new Apps for the Cloud can introduce changes and challenges to your current development workflow and processes. In this episode, we hear from Drew Horn from Sumo Logic who has experience living in this world and will share best practices to consider and resources that you can use today to accelerate your journey to the Cloud.
Listen to other Navigating the Cloud Journey episodes here.
Jim Mandelbaum: All right, everybody. Welcome to a live broadcast of, at Cisco Live, of Navigating the Cloud Journey. I'm Jim Mandelbaum Field CTO at Gigamon and Drew, why don't you introduce yourself?
Drew Horn: Yeah. Hi everyone. Drew Horn Senior Director at Sumo Logic.
Jim Mandelbaum: So, we're gonna kind of get started with a very common problem. We know that as people are making this cloud journey, that's what this is all about, is moving to the cloud. We see a lot of people doing lift and shift. And we know that lift and shift typically does not achieve any of the goals that they're trying to achieve. Maybe you can kind of address why that is and some of the problems.
Drew Horn: Lot of problems, lot of challenges I would say.
My particular area of expertise comes more from DevOps and the CICD pipelines. And one thing that I think a lot of people don't think about, for me as a developer, okay, I gotta take this monolithic app that's running on bare metal, my data center. I've gotta break it up into 10, 20, 50 different microservices. I've gotta go build out message queues in AWS. I've gotta go use Lambda. I gotta use one of their 5,000 or all 5,000 of their services to get this application up and running in the cloud.
But one thing that I think a lot of people don't think about is how do you get that app there? How do you transport that app there and what are the implications of your current CICD pipeline that you're using today and what is that gonna look like?
Jim Mandelbaum: Can I stop you real quick just to make sure, what is CICD? I call it acronym hell. So anytime my guest says an acronym, I'm gonna make 'em explain.
Drew Horn: We're gonna count 'em, how many strikes do I get today?
Jim Mandelbaum: We'll see. That's one. CICD?
Drew Horn: Yeah, continuous integration and continuous delivery. So, this is the process of taking your source code, compiling it down to byte code or a binary, and then packaging that up, and then delivering it continuously in an automated way out to the servers, or out to the virtual instances, or out to the cloud, or out to the Kubernetes platform where it then runs. Okay. So that's your assembly line.
Jim Mandelbaum: There you go. I like that one. So going back to your comment around the lift and shift.
Drew Horn: Yeah. So, again, apps are getting more complex. Well guess what? So are CICD pipelines. The process in order to build and deliver those are getting more complex. And some of the challenges, maybe I'll just kind of bring up real quick, that I've seen, are number one, around pipeline sprawl. So, what I mean by pipeline sprawl - I'm gonna -say it before you cut me off - pipeline sprawl. Yeah, I mean, think about it. Monolithic app 15, 500, 1000, Netflix, 10,000 plus microservices. So instead of one common pipeline in order to build and deploy an application, you now have hundreds potentially of pipelines. And at Sumo logic, we do, we have dozens, if not in the low hundreds,
And then you're touching the audience, you're touching your customers in different places now. So, you don't just have, you know, an app that runs on a windows machine. You have one that's built specifically for Mac OS, maybe one that's built for Linux. Maybe you have a mobile property, a mobile web property, a web property. Maybe you're running an app on a TV, right? The number of different platforms that you're having to build and deploy applications, it's like an order of magnitude. It's a multiplier effect in the complexity and building and deploying applications. And then, another one I would say is also tool sprawls. Lots of sprawl. Yeah. That's, that's the, that's the word of the day for me, I think, is tool sprawl. So, for us, like at Sumo Logic, we use probably a dozen different tools to build and deploy our applications. What we call maybe like the DevOps tool chains. So, we're using maybe Jenkins, or Circle CI, or GitLab to build code. We're using GitHub for version control, we're using maybe Artifactory, or one of the three different major hyperscalers, container registries, including Docker, make that four to store our images.
Jim Mandelbaum: And so, you're touching upon something that we hear quite often from C levels is I have too many tools, it's creating too much workload, you actually have to learn all these different tools. But it also goes down to another problem, which is if I've got traditional developers that are used to building monolithic apps, and now I'm saying to them, I need you to start thinking about moving these to the cloud. And when we say cloud, folks, we gotta realize that on-prem cloud. When we talk about VMware, we talk about Nutanix that's private cloud. And when you deal with AWS, Azure, Google, public cloud. Cloud is cloud, right? If I'm developing to move from a traditional bare metal app. And I wanna move it to the cloud. There are implications for these developers in the thought process. And where is it going?
Drew Horn: Yeah, I mean, for me, it's coming out of the DevOps space as a DevOps manager, I wanna make sure and empower the different developer teams to use the tools that they want to use in order to build and deploy modern applications.
What historically for me doesn't work is coming down from the top saying, hey, these are the three tools that everyone's gonna use, right? A mobile developer, you know, they're gonna go pick up a tool like fastlane maybe to automatically build and deploy their service. But a backend developer may use a completely different set of tools.
So, it's a balancing act. You have to be able to empower your teams to use the tools that they want. But then as they go off and they do that, you need to make sure that everyone's following kind of the same set of best practices. And then you need to be smart about how you collect all of the data about those processes that are going on during that transformation in order to identify, you know, what's working, what's not, how do we prioritize? How do we move faster?
Jim Mandelbaum: Okay. But one of the things that you were talking about is all the different tooling, but is it different when I'm developing for different cloud environments? Or should I keep it fairly consistent for each of those cloud environments. So, in other words, if I, if I'm looking at moving from bare metal, the first step would be maybe to private cloud, right? I just gotta get the microservices and start building it. But eventually, I'm gonna move into the public cloud which means now I'm probably gonna end up with a mix of public and private. Now that lovely hybrid environment we all keep talking about. How does that impact that whole situation?
Drew Horn: Yeah. I mean, it's your deployment strategy's gonna be different, right? So, if you're typically running, maybe you're running on hypervisors using a, you know, VMware and vSphere, right? And then you're transitioning to maybe using Kubernetes and Amazon. Maybe you're using, you know, EKS or maybe GKE, or all of the different acronyms I normally would use.
Jim Mandelbaum: I normally would stop you on those, but folks let's just say that those are the Kubernetes environments in the public cloud.
Drew Horn: Yeah. So, I mean, it's, it's really about just being smart about just understanding that those are different types of platforms, different architectures, and decoupling your CICD pipeline from those environments so that as you're developing different pipelines, you just need to be smart about, okay, if I'm deploying to this service, I'm gonna check the documentation. What's the best way to deploy to Amazon's Kubernetes service versus deploying to my legacy infrastructure maybe on VMware?
Jim Mandelbaum: But how do I maintain that continuity? All right. How do I say that? Yeah. Oh, go ahead.
Drew Horn: Yeah. Yeah. So yeah, for me, it's. There's a ton of things that, you need to do. One of the things for me is about again, collecting the data making sure that you can observe this process. Because again, as those challenges I talked about earlier pipeline sprawl, tools, sprawl. When you start having issues, if you're not collecting data about this process, you're gonna have a hard time, again, like prioritizing where do you wanna spend time and effort to optimize problems. And there's some frameworks, there's some best practices in place that help people do that, for example, the DORA metrics is a good place to start.
Jim Mandelbaum: Okay. It's acronym. Hell. Okay. You brought up DORA, I knew it were gonna come up. So, what is DORA? Let's start there.
Jim Mandelbaum: What is DORA?
Drew Horn: So DORA's DevOps research and assessment. So, this is an organization that started several years ago. What they did is they spent about seven years or so conducting surveys of various DevOps, engineers, developers, globally. And through applied data science they came up with what are some of the key metrics, the key performance indicators that you should be keeping track of as part of that continuity process, as part of moving to the cloud, or as part of increasing your software delivery performance...
They actually wrote a book on it. Dr. Nicole Forsgren wrote it, and her team of experts wrote a book called Accelerate. So, I definitely would encourage anyone. This is required reading for tons of different organizations. Some of my friends that run quality engineering teams, or they run software development teams. You get hired onto this company. You have to read the first half of that book. The back half is all of the math that proves out, you know, how they landed at these conclusions.
Jim Mandelbaum: And the thing I love about this is that if we think about traditional developers and developer process, you've got a lot of folks that have been legacy, you know, old farts, like me that have been doing this a long time. And the transition from that process, you need some help. And so, one of the things I would say for those of you who are in the space, or even those of you who manage developers that are starting to look at moving to the cloud. First, you need to get that book and look, we've got no stake in this, you know, this is a generic podcast that goes out to everybody that's by practitioners for practitioners. And the idea here is, is it'll give you an understanding that you can now manage your folks. You want to get them going on it because it really does talk about some observability, that's kind of our theme there, but you have to deal with the data, and I don't wanna be the one talking about it. So, I'm gonna let you kind of talk about some of the benefits you get out of it.
Drew Horn: Sure. So may, maybe just to close out, if you cringed when you were told to just go read a book there, the cliff notes, the cliff notes are available in the form of the State of DevOps Report. So, this is like a consolidated annual report that is put out and funded by a bunch of different organizations including DORA. So, they'll jump into some of those key four metrics around how do you measure throughput, your deployment frequency, your lead time. How long does it take to push a release from commit to production, your change failure rate, this is a measure of reliability, like how many of your deployments are resulting in customer impacting incidents? As well as MTTR. Once that impact occurs. How long does it take you to get back into that state?
Jim Mandelbaum: Meantime to Resolution, you threw another acronym out there!
Drew Horn: Oh yeah. Meantime to resolution. Would definitely encourage you to check it out. They've also got tiers, like how long it takes for you to... where do you stand in terms of these metrics? There's like low performers, medium performers, high performers, elite performers. And there's some really astounding data in those reports about the massive advantage that teams that have reached elite status over other teams. And so, this is actually something we're actively doing, working on achieving elite status for all of our key production workloads at Sumo logic.
Jim Mandelbaum: All right. So now you're talking about the elite. Now let's talk about the other side of it, right? When things go wrong. I mean, because not everything's, you know, gravy, there are things that go wrong in this process. Maybe you could highlight some of the things that you've seen. And, and where do you go with it?
Drew Horn: Yeah, sure. So, these reports, the books, they talk a lot about the best practices and the frameworks, but yeah, what do you do when things go wrong? For me it's all about collecting the data. So, you have to be able to collect the data in order to measure these performance metrics. And there's a couple ways to go about doing that, right? I mean, you could build a full stack application with the database, take a very structured approach. Or, how I try to think about it, I think of the CICD pipeline is its own application. It's not your production application, but think about it, you've got you've got a CICD, maybe server, maybe you have multiple different pipelines. It's an application and you need to be able to observe it. So, you need to be able to monitor, troubleshoot, diagnose it. So, I take the same concepts of observability for modern apps. They apply directly to the CICD pipeline. And so now what you can try to do is you can kind of bridge the gap. And so, you know, when things go wrong in a production workload, what are the things I need to have visibility into in order to solve those problems? And it all comes back to collecting, the logs metrics, traces from your CICD pipeline.
Jim Mandelbaum: Okay. So, one of the things I know that you like to talk about is OpenTelemetry. Yeah, traces. Maybe you can kind of highlight your thoughts on that and educate what that means.
Drew Horn: Yeah. So OpenTelemetry it's a fantastic concept, right? I think this kind of came out of teams using traditional observability platforms where those vendors were providing their own agents in order to collect logs and metrics and traces. And that's a bit of a problem, at least for me, because what happens is you kind of get locked into that particular vendor and their way of doing things.
Jim Mandelbaum: We often talk about vendor lock. That is something that, that executives anybody at the C level always says, vendor lock is their biggest fear.
Drew Horn: Yeah. And so OpenTelemetry has kind of come up over the last few years as a way to help give developers, give DevOps engineers, give your organization control of your data back. Where you wanna send your data. So as opposed to taking a vendor's agent to collect data you have now have an open-source community and tooling, and you can decide what vendor or vendors you wanna send that data to.
And what's really interesting to me about OpenTelemetry is kind of coming back to the original conversation. OTEL, OpenTelemetry for short. OTEL, there's a lot of focus on production, workloads and observability in production. Where I wanna see this go next is applying the same concept to again, treating the CICD pipeline is an application, I wanna see OTEL come into all the big vendors. So, you've got Circle CI, you've got GitLab. I mean, there's, you could go on forever talking about all the different vendors. But right now, you can collect data from all of the vendors, but it's through web hooks, it's through their own kind of proprietary format, which, which is great, don't get me wrong. But I'm really excited about some of the work going into OpenTelemetry where people are actually instrumenting CICD pipelines and they're emitting OTEL compliant data about traces and spans that can be collected by observability vendors. And so this now gives teams that are trying to monitor, diagnose, and troubleshoot issues in the CICD pipeline, a common framework for understanding the performance, the bottlenecks and really landing at the root cause of issues in your pipeline.
And doing that all in a data driven way. So, if you can take those traces, you can put 'em in an analytics platform and run analytics on all of those traces, it makes it a lot easier to, when something goes wrong, as opposed to pointing fingers, getting frustrated and you know, reviews of what went wrong or in sprint retrospectives. You have all of the data in place to make an informed decision about what went wrong, how can we improve, and how do we prioritize? No one only has one issue; everyone has 50 issues. I've got 99 problems, right? So, it's like, which one do I need to take off easy? Yeah. And, and OTEL is not one, but yeah, but like, which of these do we wanna work with right? And having all the data in place just makes that process a lot easier.
Jim Mandelbaum: Okay. So earlier we talked about DORA and the book, we will make sure for those of you that are watching that we put the links there, so you know where to find it, but where to the folks find out about OpenTelemetry?
Drew Horn: Yeah. google.com, enter, go to the search for our OpenTelemetry.
Jim Mandelbaum: Come on, man. We gotta do better than that. There's a standard body.
Drew Horn: Yeah. I don't have the URL memorized. I'm gonna go with OpenTelemetry.org, but don't quote me, don't quote me on it. Yeah, it's live. It's already out there folks.
Jim Mandelbaum: So anyway, we'll make sure for those of you watching that, we'll put the link so you can find the OpenTelemetry.
Jim Mandelbaum: And then one last thing that I'd like to leave people with since you've been doing this a while and you've been running teams. For somebody, that's just saying, look, we're trying to make that transition to the cloud. We're trying to move our apps. What are some best practices they can follow? They're just getting started.
Drew Horn: Yeah. I mean, the big thing for me was again, I just keep coming back to it, treating a CICD pipeline as an application. Treating it as immutable infrastructure in the same way you're treating infrastructure as code, define your CICD pipelines as code. All the major vendors now support the ability to declare via configuration what you want your CICD pipelines to look like. And again, when that sprawl starts to occur, if you're following a best practice like that upfront, it makes that process easier. Everything is defined as code. You can go through the same code review processes with your development or your DevOps team that you normally would for your production workloads. And then you can look at that as an application, you can start identifying, you know, as I add more and more pipelines, what are some of the common things in these config files that teams are doing? And DevOps teams can come together or a DevOps community, and you can start to abstract chunks of common functionality into shared libraries, right? And just starts making life easier so that everyone is not going off and creating their own bespoke CICD pipelines.
Jim Mandelbaum: Wonderful. All right. We are about out of time. I want to thank Drew for joining us today and thank you all for joining us. If you have any questions, Drew will be here. He'll be glad to answer any questions you have.
Thank you everybody. And please remember to sign up for Navigating the Cloud Journey Podcast so you can see this podcast and all the others up there. Thank you very much, everybody.