Navigating the Cloud Journey

Episode 14: Migrating to the Cloud - Architecture & Best Practices

February 03, 2023 Jim Mandelbaum Episode 14
Navigating the Cloud Journey
Episode 14: Migrating to the Cloud - Architecture & Best Practices
Show Notes Transcript Chapter Markers

In the episode, Jim talks to Head of Software Innovation at Veloce Energy, Arila Barnes. Arila has extensive experience in enterprise software and product management. She is an expert in edge computing and IoT solutions. 

Jim and Arila discuss edge/IoT computing, multi-cloud deployments, Machine2Machine Zero Trust, Observability in the cloud and much more.

Listen to other Navigating the Cloud Journey episodes here.

EP 14 – Migrating to the Cloud: Architecture & Best Practices

Jim: Hi everybody. Jim Mandelbaum here again with another Navigating The Cloud Journey podcast. Today I am joined by Arila Barnes, and I'm probably messing up your first name. She is head of Software Innovation. Love the title with Veloce Energy. She's had a lot of different great titles. Director of Cloud Architecture and Software Development. She's co-founded some software companies she's had some great titles, so she's very knowledgeable on the subject of cloud and multi-cloud. Welcome! 

Arila: Jim, thank you for having me on this podcast. And I like to think of not titles, but more like a career path in my journey on the, on, in my cloud journey. No pun intended. 

Jim: All right. We're gonna talk about cloud obviously. We're gonna talk about multi-cloud. But I think one of the things that's really interesting is if we can level set and have you give us some education. 

[00:01:08] Edge Computing in the Cloud

Jim: So, one of the things that comes up quite often that people don't understand is the concept of edge computing and where does that fit into things like Infrastructure as a Service? Maybe you could level set that to start with. 

Arila: Sure. Actually, the cloud computing is the new concept that has usually been challenging for me to explain. It started with the computer in the office, computers in the back office, the data centers, and now those data centers. for those computers are made available virtually to all of us through the concept of cloud computing. 

Now, today the new challenges as we bring in devices in factories that have never been connected before. Cars, EV chargers like, storage cabinets, solar panels and so forth. It becomes interesting of, how you control those assets, how you collect information from them closer to the edge. And that's where edge computing comes into play is how close can we move the compute to the assets that we want to manage? And at the same time, keep that connection to the cloud, to the bigger compute power that it provides by leveraging a multitude of data centers behind the scenes. 

Jim: So, when we talk about the edge, moving to the edge, we're really only talking not about changing what we're doing in the cloud, but what we're doing is trying to logically position the data closer to the device. 

Arila: No, the data comes from the device. It's like positioning the compute and the management of the data closer to the device and figuring out, and that's the art part, of what's the best way of orchestrating and balancing what compute you need to do at the edge and what compute makes more sense to do in the cloud.

And also, it's like leveraging what we have learned. Of what companies like Amazon and Google have learned in building cloud services, especially with the advance of Kubernetes. How can you have that kind of management services for compute resources that closer to the assets at the edge, right? So, it's like how do you manage those services? I can give you one example from AWS. They have this service now EKS Anywhere. So, what you can do in the cloud, you can also do for your compute at the edge. 

Jim: Okay. So, you've brought up a lot of interesting things. You've brought up the concept of IoT devices. And I love the fact that you're talking about cars because I don't think people understand how many data points and how much data is actually being generated in modern vehicles today, especially electric vehicles. You've brought up the concept of solar and solar energy and tracking all of that data and telemetry and moving that up to the cloud.

[00:04:08] Microservices and Infrastructure as a Service

Jim: But you also brought up microservices and so you're talking about moving microservices further. so that we're trying to get those closer to the edge, but how does that fit in? What we're hearing a lot about people talking about Infrastructure as Code and Infrastructure as a Service. How does that all tie in?

Arila: First, let's explain what Infrastructure as a Service is again. In the case of cloud computing, you can just request compute on demand, like EC2 instances in AWS, is one example. Another example is to have managed Kubernetes on Azure is a different example, all the way to just having direct interface to a managed database that's also running on the cloud.

 Infrastructure as a Service means that we don't think about where to purchase our networking, where to purchase our memory cards, where to purchase our CPUs and all of that. We just basically pick up anything suitable for our problem and, viola, within clicks, within minutes, that compute is available to us.

So that's Infrastructure as a Service, and unfortunately on the edge, we are not there yet. And that's by design and by the nature of the edge. Like I mentioned at the edge, we have a high variability of what that device is and how capable or not capable it is of computing or interacting with other devices on the network.

And some of them can be super sophisticated. Teslas, they have a supercomputer on board. And that probably has the power of a small cloud. And then you have like a much smaller devices that can only do a couple of things and can only work at the SDK level with very limited memory and compute.

But we're still interested in how we can normalize, and how can we have that template-based instantiation of those devices. So that's an open field and especially in the energy space, it's very interesting to us at Veloce of how can we look at those devices at the grid edge and how to figure and put the puzzle together of being able to do Infrastructure as Code. And in our case, power systems as code as well, and leverage artificial intelligence and ML machine learning to help us with that and also see what we can borrow from the cloud on that journey. 

[00:06:51] Considering Multi-Cloud Environments

Jim: All right. So, you've brought up an entire spectrum of things we could talk about, but one of the things I wanna highlight on is you brought up the concept of using multiple cloud providers. You said, I want to spin up Kubernetes in Azure. I wanna spin up EC2s in AWS. But once we start looking at multi-cloud, which to me it sounds like you're saying that is the norm today. And my experience is that really is the case because we're looking at cloud vendors and saying, what can I get that gets me my most value for my spend and for my data services from each cloud provider? But doesn't that now, as soon as I look at that multi-cloud or hybrid cloud architecture, doesn't that start creating multiple blind spots for me? 

Arila: Not necessarily. So that's another thing that helps with planning for multi-cloud or two plus clouds infrastructure, is the Infrastructure as Code. What that means is basically codifying your architecture in a way that can be more portable from one cloud to other, and a frontrunner in that space, from open source as a tool, is Terraform. And also, each cloud provider has provided their own tools to help codify the infrastructure so that it's repeatable, whether it's from one team to another or from one cloud to another, it gives that repeatability, that source of truth that is more manageable as people look to negotiate their contracts, long-term, contracts with cloud vendors, and also gives them options whether it's to improve their resiliency scenarios.

Say one cloud provider is not yet present in a geography of interest. Or just to be able to recover in case one cloud provider is hit with a cybersecurity event and the company needs to recover. I'll just give you an example that if you have your Infrastructure as Code, you can recreate your entire infrastructure and deploy your applications within minutes, anywhere between minutes to half, an hour to an hour, depending how long it takes for everything to instantiate, which if you take the time to click and do all of that just from a UI perspective, it might take you days or weeks and it's no guarantee. It's be the same as what you have captured in infrastructure as a code. 

[00:09:27] Observability in the Cloud

Jim: Okay. So, one of the things that I hear quite often, remember, one of the things that we're trying to get here is there's a lot of people that are watching this or listening to this, they're on the journey somewhere along the way. And so one of the things when we talk about in the cloud quite a bit, and you're talking about Infrastructure as Code, is talking about observability.

One of the things that's important, you said, you mentioned, you might have a s cybersecurity event. That's a fancy word of saying you had a breach. But if you have this event, observability becomes really critical. And I love how you just went across it, but that is something that we all need to plan for. So, talk about your thoughts on observability in the cloud. 

Arila: I think it has become very important to start there as soon as possible versus as an afterthought. And I have some favorite tools that have helped me along in those kinds of scenarios. And I can list quite a few, like New Relic, Datadog, Lacework and AutoCloud. They're pretty easy to set up because they're focused on exactly that, observing what you're doing and focusing on collecting information that can help you have good visibility of what's going on as you building your cloud. Second, they can alert you to issues that happen behind the scenes. And in some cases, even prevent issues from happening in the future as they leverage machine learning ai behind the scenes as you interact with those tools. 

And like I said, one thing is to create the requirements for cloud system. Another to codify it as Infrastructure as Code, but you also need to be able to visualize it and see it. And one tool that's in my toolbox, it has found a permanent spot, is AutoCloud because it also gives you time snapshots as you make your journey on the cloud, and as things are happening and as your teams are reacting in that space.

So that's from a startup perspective. But I think at some point you'll also better tools and Gigamon and others. 

[00:11:39] Building out your cloud observability architecture. Do it sooner rather than later

Jim: Okay. So let me back up a second. You talked about these observability tools. And you gave a laundry list of the different vendors that, that you recommend, or you worked with in the cloud.

I think what's really interesting is that we see a lot of folks that do this as an afterthought. So, let's say I'm already in the cloud, I'm already moved there, but I haven't started looking at observability yet. How do I even begin? 

Arila: I'll tell you what I learned from AWS's startup accelerator.

So, they partnered with us and they recommended to try this tool nOps. So nOps is a free service that just does the well architected framework analysis of what you have. So that's like step one. Just like I mentioned, any tool, whether it's nOps or others I mentioned that help you see what you have as you're building it, it's critical. And then depending on your resources, you can leverage your internal resources, like your architects, your engineering teams, to reflect on that information and come up with a plan, what's next? Or you can engage cloud partners whether it's from AWS or Google or Azure.

So, I think that's very important. Like I said, What I have learned is that starting there as soon as possible, as soon as you are prototyping as soon as you are like trying out services to also have kind of that independent audit of observability in place. It's an "aha" moment and really drives the message.

Jim: Observability clearly is the takeaway right now. We can't protect what we can't see, right? That's the reality. And then we look at it from a security perspective, it's around protecting what you can't see, you don't know what's happening. But also think about all these people that are looking at Zero Trust initiatives. There's so many that are focused on Zero Trust, and I know you have a background in that as well. And I think one of the interesting things with observability is how do you validate your Zero Trust project without having observability to validate that what you've done is actually working? So, I, I think that's an important takeaway that a lot of us have. 

[00:13:57] Best Practices for moving to the Cloud

Jim: So, I wanna pivot a little bit. You've done a lot of development. That is your core background. You've done a lot of development for the cloud. So, one of the things that, that I find interesting is this risk-based or risk averse base of moving to the cloud.

So maybe you can talk a little bit about, how does somebody look at this and say I'm gonna do lift and shift, or I'm going to modernize. And I, and there's these terms of rewrite or reskill my code. Maybe you can address that a little bit. 

Arila: Sure. So, the cloud can be a very scary place because you're no longer in control. Like it is you're delegating the control to the cloud providers. So basically, you're trusting the data center, IT departments for the clouds, versus your own. It also gives you an opportunity to take advantage of highly skilled, highly motivated engineers that are always on the cutting edge and learning from not just your business, but other businesses as well. And it also depends on the business objectives and what they're trying to do with going to the cloud. Is it to reduce operations cost? Is it to reduce attrition? Is it to reduce challenge hiring talent? Is it to enter into new markets? Is it to accelerate innovations? And there's d different strategies in each of those scenarios. 

So, lift and shift, it's what people will try to do when they want to minimize risk. That's like baby steps. Okay, I'll identify an application or workload and see how or what it will take to move it from my data center to a cloud. And iterate on that and learn from it and apply to other applications. So usually that just reducing the cost and preparing for the next type of strategies whether it's that I have saved money on compute and on IT staff, how can I modernize or transform my digital assets? 

[00:16:13] Re-skilling your staff for the Cloud 

Arila: And that's a starting point that also gives opportunity to re-skill your staff. So, you don't need any more data administrators in your team because you're not managing databases like you did before when you own the data center. However, those are highly skilled people that with reskilling, for example, in technologies around data engineering, can bring new value. Unlocking data from various assets and providing new insights even faster than has previously been possible.

Jim: So, let me interrupt you for a second here. You've brought up a really important topic I wanna address and that is that we have these talented folks that have been managing databases and managing services that we're now putting up in the cloud, and we don't want those folks to go away.

So training is something that we really wanna leverage because as you've used it, re-skilling them, giving them additional training. I think that helps a lot of people when they feel value, because now you appreciate me getting training. How do they learn the cloud? How do they get going? Where do these people start? 

Arila: What's also the cloud has done for all of us is make available a lot of online courses. Most of them are free. A lot for certification, like whether you want to be a network engineer in the cloud, or whether you want to be a DevOps engineer in the cloud or cloud architect, all of those tracks are available. So, I would start there like basically look at my targeted cloud. See what is offered and try things out. 

Another great way to get introduced, get your hands wet with any cloud is attending their events like AWS Reinvent and Google I/O where you can try the technology in the sandboxes and the workshops that they provide in any form of two hours to four hours, to a whole day investment. And that can give you an idea of what it'll take as next steps. 

[00:18:25] Observability using log data from IoT devices

Arila: So, Jim we started with the IoT devices, and I mentioned there's a high variability of what they're capable or not capable of doing. One is around the capability of storing and making available log data. And log data is the same anywhere. However, in the IoT space, there's the challenge of not having log data at certain times. For example, if the device might have run out of its local buffer storage buffer for whatever reason. But we still want to be able to have as much observability as possible, and that's actually one of the areas I would like to learn if you have any ideas. 

Jim: Yeah, I think that's a really good point. I think what we find is that when we look at standard devices, we can put agents on them, we can get great logs out of 'em, but we know two things. Number one, we know bad guys. The first thing they do is turn down logs or are modified the logs.

The other side of this, as you said, is those IoT devices. Now that we're in the cloud, it's a little different, right? Because we don't have access to the infrastructure is we still need to get at those packets. So, when we're talking observability tools, and you brought up a great list of them, one of the things that's important is that they need to be able to also see all those IoT devices.

So, if we can take those packets in and then generate metadata off those packets, and then provide that data to the observability tools, to your point, for those devices that either don't have good logging, don't have reliable logging, don't have space for much logging, and many other reasons, I think you're right.

I think that getting beyond that and getting to the packets, gets us a better chance of leveraging those observability tools. So, taking those packets, generating the metadata, sending that in addition to the logs, right? We don't want to get a rid of the logs, we wanna supplement them. So, when I'm looking at these observability tools, I get whole pictures.

I can look at things like what SSL levels are being used? Am I, enforcing my TLS 1.2, 1.3 rights? Do I have self-signed certs in play? Do I have expired certs? All of these things are critical in that observability space, especially when we start introducing IoT and unmanaged devices. And as we start moving to the edge, that even becomes more critical because we're pushing data into different regions.

So, I'm not here to preach about it. It's just something that I'm passionate about that I think that we really should be enriching those observability tools. 

[00:20:53] Machine2Machine Zero Trust

Jim: All right. I wanna ask you one final question, and this is something that, that, that's really interesting to me when we talk about Zero Trust. And you talking about devices, and this is really something that I don't know if you can answer simply. But as we start looking at these machine-to-machine type communications that are happening, what are your thoughts on monitoring those machine to machine and how do you secure it?

Arila: It's actually, companies already thinking in that space. And I was fortunate enough to meet one in the open-source space, Teleport. And it's providing the same concept around Zero Trust by providing identity for the machine. You want to always make sure you have identified the person or human interacting with the machines of the system. How you bring those same kind of concepts down to identifying a machine and making sure it's authorized to interact with the other machine with the system. And that is very interesting to me in the energy space, as I mentioned, it's a heterogeneous systems like Veloce provides, for example, storage and infrastructure. However, we want to work with many different manufacturers of EV chargers and how do we make sure we can trust, at the grid edge, all our partners do not create a vulnerability that can be exploited.

Jim: Yeah. And again, it's, it all goes down to what you said. It's the Zero Trust. I wanna say the trust but verify statement, right? It's get observability...

Arila: I like that better. I actually don't like the term Zero Trust because it goes against encouraging building relationships. As we build our trusted relationships, what's the best way to validate at the speed of light and that's where technology help us under the no Zero Trust. Label. 

Jim: All right I wanna thank my guest a Rilla Barnes today. Thank you very much. This was a lot of fun. Everybody. If you have any questions or you're looking to reach out, I encourage you to go to the Gigamon community. There is a podcast section where you can go to, and you can ask questions of myself or my guests. Arila as a parting thing, are there any recommendations for our guests if they wanna learn more about what we were talking about? 

Arila: I recently came across a really cool book put together by Tracy Bannon. She's a cloud architect focused on cybersecurity, and it's called Reinventing Cybersecurity, and has the collective wisdom of a lot of other Cybersecurity experts that look at that space all the way from cloud to edge and in between. 

Jim: Wonderful. I guess we'll add that to the reading list. Thank you very much and I appreciate you joining me. 

Resources:

 

Edge Computing in the Cloud
Microservices and Infrastructure as a Service
Considering Multi-Cloud Environments
Observability in the Cloud
Building out your cloud observability architecture. Do it sooner rather than later
Best Practices for moving to the Cloud
Re-skilling your staff for the Cloud
Observability using log data from IoT devices
Machine2Machine Zero Trust