Episode 2: Lessons Learned from Cloud Native

In our second Episode, Mike is joined by Ben Stineman, Vice President of Infrastructure and Security at Vinli Incorporated.

Ben will explain what it takes to start from scratch or migrate to Cloud Native. As a networking expert, he shares his personal experience at Vinli along with helpful tips you need to understand the ins-and-outs of Cloud Native.

Listen to other Navigating the Cloud Journey episodes here.

Introduction

[00:00:00] Mike: Welcome again to our navigating the cloud journey podcast series. I'm Mike Valladao, your host. Today we present episode two of the series: Lessons Learned From Cloud Native from Cloud Native.

[00:00:12] Right up front, I want to emphasize that Cloud Native defines companies and applications that are born in the cloud. While there are very few of these today, it's definitely a growing trend, and that's why we're talking about it.

[00:00:26] Prior to the crazy expansion of public cloud, most companies would build their own data centers and co-lo facilities. When we needed more room or perhaps remote locations, we would often rent cages from providers like Equinix and others. But what if you could somehow start over without the burden of running physical data centers? How might you do things differently?

[00:00:49] Today, I welcome a guest that has done exactly that. Ben Stineman is Vice President of Infrastructure and Security at Vinli Incorporated. Prior to that, Ben was a [00:01:00] Senior Cloud Infrastructure Engineer at Nutanix.

[00:01:03] So Ben, just to make things explicitly clear to our listeners. If you're head of infrastructure at your company, exactly how many physical servers do you own?

[00:01:14] Ben: Thanks for having me, Mike I have literally zero servers, physical servers.

[00:01:19] Mike: How can you do this? This is totally outside the paradigm of which most of us live!

How to become Cloud Native and the Twelve-Factor App

[00:01:25] Ben: Well, it helps when company is new enough to be able to start their data [00:01:30] platform in the cloud. A cloud, like Amazon, or Google Cloud or whatever exists in the first place, right?

[00:01:37] So you have to be what, post 2013, to have something viable to do that? Vinli was started late 2014. I came on board around 2015. And primarily the focus is vehicle telemetry and the data science behind transportation telematics. So, processing loads of data from moving vehicles, telemetry, meaning GPS coordinates. Anything you can read off of a CANbus a controller area, network, bus of a car or a truck. We can collect it and you can make meaningful, reports to drive business decisions, or fleet management, or anything like that.

[00:02:12] Mike: So how do you think differently? Tell us a little bit about what it's really like to build a cloud native company.

[00:02:19] Ben: Yeah. So, this is all timing as well. In 2014, 2015, the technical founders, which predates me were able to think about this whole microservices architecture standpoint and have that as the center of their ideology from the beginning. So, the whole 12 factor app idea, 12 factor.net.

[00:02:40] Mike: You just lost me. What is 12 factor.net?

[00:02:44] Ben: Well, that's the site that, that explains the 12 commandments of the 12-factor app, which is kind of a paradigm that we really tried to adhere to at Vinli running a microservices cloud native platform to do this data processing and data science.

[00:03:02] Mike: Okay. Stop again. You're talking about terms here, microservices. Isn't that like just a small app that is run through APIs? Explain.

[00:03:12] Ben: Yeah. So, tying in with that 12-factor idea, microservice should be, a module. The agility that comes from that, the ability to push out small updates to parts of the platform and not have to push out the entire monolith at once is a huge deal. And microservices gives you that ability. Each little micro service, it's a micro service, right? A service does something. Say it collects the data points from a car. The only thing it does is receives the data points, the telemetry, and it might then store it somewhere.

[00:03:47] Every microservice has an API. And that API is how you talk to other microservices, how they talk to each other. One microservice is not talking to another microservices database directly. So, you've got this abstraction layer of the API in between them all. And this is part of the whole 12 factor app paradigm.

[00:04:04] Mike: And I believe that all these microservices of course are independent and they have to be more or less standalone so you can change one of those without having to change the whole system.

[00:04:13] Ben: Correct.

[00:04:13] Mike: Before, what you're talking about, the way you bring all these things together is kind of like the SDLC the life cycle development. But only in your case, it is accentuated or accelerated, I guess, to be able to get things to happen quicker. Is that a true statement? False statement? [00:04:30]

[00:04:30] Ben: Yeah. So, in the case of companies like Vinli, we often deploy something to production every day. And it's hard to get to this point, unless you're starting from a clean sheet of paper.

[00:04:39] Mike: And are you rolling these out to your customers on a daily basis? How does that work?

[00:04:43] Ben: It's a rolling update to all the way out to customer production.

[00:04:47] Mike: And because of that, you also have to have a scalability that cloud offers, I presume.

[00:04:53] Ben: Yeah. We did, a study internally at Vinli to see what it would take to run it on metal. To run our cloud native app and platform, which is a cluster of apps on metal. Like what would it take to buy a bunch of blade servers, right? And run it yourself. And run VMware or Open Stack to run VMs, to then run Kubernetes, which is the first time I've said this word I think, in this recording. That's at the heart of what we're doing too, that having containerized microservices code is great. But you have to orchestrate it and manage it somehow.

Cloud vs Bare Metal Study

[00:05:27] Mike: Ahh... Automation. Automation is key.

[00:05:30] Ben: Yeah. This automation thing is crazy. It's so wide, there's so many ways you can do this. But automating the management of that containerized microservices code is the only way you can actually do it at scale. If you had to do this stuff, manually, it wouldn't scale. It's untenable for humans. You have to get some level of scripting and automation involved to assist our feeble human minds.

[00:05:56] Mike: So, let me get back to that survey that you put out is to how you might have run this thing differently and put it onto metal and get a bunch of blades and servers and stuff. What did you find out? Obviously could be done, but at what cost? What did you find out in that undertaking?

[00:06:13] Ben: So, trying to duplicate what we are already using in AWS, like we're primarily AWS and a little bit of Google Cloud. Duplicating what we get from AWS in terms of their built-in services like database backends. We don't have to actually run a Postgres SQL server ourselves. We can say, "Hey, AWS, give us an HA Postgres 10 cluster in multiple availability zones." What would it take? We'd need a full-time database admin that had HA Postgres experience running it in high availability to run it on that metal, right?

[00:06:54] Mike: And to run it in different locations.

[00:06:55] Ben: Run it in different locations with replication going on. What are we talking? Two different data centers or at least two different racks at minimum in a data center with different power feeds to replicate what we had a multi-AZ in AWS in our region? Yeah, the money adds up quickly. The capital cost is huge.

[00:07:15] Mike: Sure. But what if it scales up? Then what do you do? Is that part of your expansion plan?

[00:07:21] Ben: Yeah, you have to account for having to add more racks, add more blades. And then the problem becomes is like, oh, now we have a customer in Europe. Now we have to go do this in [00:07:30] Europe because GDPR.

[00:07:31] Mike: Before we move to GDPR, I've got a question. You said something about the different regions? I know from just being on AWS this morning, that up in the top right-hand corner is where I say, if I'm going to be in U S west or whether I'm going to be in Singapore. What does all that do to you as far as latency? Or does it?

[00:07:49] Ben: Right, so typically for a customer, we run it in the region that makes most sense. So if it's a European customer, we'll pick a European region because most of the fleet that they're managing resides in Europe.

[00:08:02] Mike: Okay.

[00:08:03] Ben: If they have a multi- global region. We make some decisions. The latency really hasn't been that big of a deal because we're not delivering video or audio or anything like that. We're collecting data points and that doesn't have to be fully synchronous. The fact that it, I become from South America to Europe or something like that, 200 milliseconds away is not a big deal.

[00:08:25] Mike: So your application, it doesn't matter. It doesn't matter, but you still do have to consider where you're putting each of your customers.

[00:08:33] Ben: What's more important actually is if the customer like the fleet manager, that's using a UI. That's where the latency matters because you don't want them to have a bad experience of getting reports out of things.

[00:08:44] Mike: Okay. So, it's basically, you want them to be able to get to the database very quickly of where the reporting is, but how the data populates. You're okay with the latency.

[00:08:54] Ben: Typically we'll run, a platform in our region that's closest to the managers, right? The [00:09:00] customer will often have Fleet Managers or something like that. People in those kinds of roles or they need to run reports on these fleets or do data science or Data Scientists. Right? So, we want that data to be closer to them just so they don't have a latent experience. So that's part of the decision on choosing a cloud region. Back to this whole bare metal thing, right? Like that flexibility is gone. You can obviously say, hey, Equinix, do you have a data center in Frankfurt? Or do you have a data center in Los Angeles? Yes, they do.

[00:09:30] But like trying to get a cage or just a couple of racks in a shared row. That takes so much time. By the time we got an email sent off to Equinix asking about it, we've already set up a whole new Cloud Native in another region before we even get a response back from Equinix sales.

[00:09:53] Mike: And because everything's containerized, it's easy to move. And because it's automated, it allows you to do these things. How quickly does it really happen? I can tell you from personal experience that sometimes a customer will deploy something. First of all, they have to get hardware, so that takes time, and even within their own data centers, you still have to fill out maintenance windows, you have to be able to get a tech in there. It takes time. So, in your environment, from soup to nuts, how long would it take to do that?

[00:10:23] Ben: On an existing platform? Minutes.

[00:10:26] Mike: On a new platform, let's say you do need to spin up something in [00:10:30] Frankfurt.

[00:10:30] Ben: So if we bring up a new AWS org account, you bring up a new account number in AWS. And then, we run the script. We fill out some variables as inputs to the scripts. And probably within about three hours, the platform is up from an infrastructure standpoint and then we're onto deploying the microservices fleet. So, we could probably do it under six hours.

[00:10:56] Mike: Wow, so in under a day you've got it up and running? Yeah. Okay. How much do you have to debug to make that thing work? Because there are so many different fields out there. There are so many different changes that can be made in a cloud environment. Sizing of the servers, for example, or I could go through a whole litany of questions. What's real?

[00:11:16] Ben: Yeah, we have a base configuration that we plug in, in terms of like instance size, like how big are these VMs, these EC2 instances or whatever. We know that because we've done testing and scale testing and, we know where to start them. So that's not] really a question, but we can make educated guesses based on how big this fleet is going to be and how much reporting they're going to need.

[00:11:36] We can scale all of that stuff independent. Whether it’s this database or that database or this, message bus system, et cetera. So, those kinds of tweaks are very easy and quick to make. Actually, what ends up being the slowest thing is getting the initial configuration back from the customer, like what devices need to start pumping data in.

[00:11:58] So it's dealing with the outside. The actual customer integration part is the slower thing. We might have the whole thing up and going in a day and then we might spend the next two or three days making sure it's all optimized. But it's not full-time work. Right? It's like, "oh, let's check on this now. How's it looking? Okay, good". Have they gotten all of the end point data that we need to start seeding these devices? So, then you're waiting on other humans.

[00:12:25] Mike: That makes perfect sense. And it's really cool how quickly it can be spun up or spun down. And of course, if something doesn't work, you just shut it down and start over, right?

[00:12:34] Ben: Yeah, we can totally scrap it. And again, like mentioned this before we're using Terraform, which is an infrastructure as code kind of concept. And they've got modules for AWS GCP, Azure. They've got it for VMware. They keep adding modules. So, you can write this effectively kind of config file looking script language to define cloud resources. And that's been one of our huge advantages from the start. Like we've been using it since 2015. Terraform that can actually, you know, say we need a VPC, this is a sub-net, this is how we're doing routing. All the way up to, this is how many VMs, of this type, they're going to do this, et cetera, into configuring the actual VMs by running some sort of startup script that pulls a systems management, scripting, like SaltStack or Chef or Puppet or whatever you want to use.

[00:13:27] Mike: We've gone down this path and this is interesting stuff. You mentioned a little four-letter term a little while ago, GDPR. What's that all about? And tell me how that affects you.

Managing GDPR in the Cloud

[00:13:40] Ben: Yeah, that's a can of worms obviously. Then we talked about that. Then we could talk for ions on. But the General Data Protection Regulation, which is the European standard for data privacy and the regulation of that. Several of our customers are European. So that's been something we've had to focus on to begin with. And having a Data Protection Officer and things like that. Using the cloud lets us get around a lot of headaches by just running that platform in Europe. The data is on European soil. Like it's in an AWS region where that flexibility of being able to run a platform for a customer in their region.

[00:14:22] Mike: How do you make sure that the traffic doesn't go somewhere else? How do you know?

[00:14:29] Ben: How do we know? Right. Well, the traffic coming in from the vehicle fleets is coming from vehicles in Europe. And we know that because of the telemetry, we have the GPS coordinates.

[00:14:41] Mike: Is that personal data or not? Is that PII? Telemetry? I don't know.

[00:14:47] Ben: It can be considered PII. But we do quite a bit of separation of human names and phone numbers et cetera. Like we have PII, but I wouldn't consider it critical PII. It's not medical records or financial records. It's, names and emails because they log in to see their driving history. And again, Vinli is often working with fleet management companies. So the car is owned by the leasing company, not by the human using it. But nonetheless, the human can use that to see what their driving habits are or how many miles or kilometers they drove for work, versus personal. So, in terms of PII, we take it, obviously have to take it very seriously with GDPR regulations, but keeping that stuff separated in the back end where telemetry doesn't- telemetry is pseudo- anonymized, like we don't even know who it belongs to.

[00:15:41] We know roughly the vehicle it's in, but you keep all of the actual human information separate and you have to do a lookup to see. If someone was going into request their vehicle's trip history, they're doing that by logging into their interface, but then since they are authenticated, that ID is then used to do a look up against a couple other data storage backends that have that actual telemetry data.

[00:16:07] But, keeping it anonymized as much as possible causes potential performance issues, but that's a price worth paying.

[00:16:13] Mike: And there's also somewhere there is a clause about invisibility or something along that line that basically says that if somebody wants to opt out at any point in time, you have to erase their data from the database.

[00:16:26] Ben: Yeah. Well, what we do is we disassociate typically. Since it's a fleet managed vehicle that doesn't necessarily mean that the actual telematics goes away, but the record that they're associated with it goes away. So that anonymization or that, you know, "I didn't exist", is applicable in that point. And some customers may also decide that yeah okay delete everything. So, we can do that as well. Like it's flexible in that regard.

[00:16:53] Mike: You've convinced me now that it made sense for your organization to do this, and it can be done.

[00:16:58] Now tell me a little bit about your path, your personal journey into the cloud. How did you get into cloud like this? You didn't just one day pop up and say, "Hey, I'm going to do this thing with telemetry". Tell us how you have gone from metal, quote unquote, into this field.

Ben's journey to Cloud Native

[00:17:15] Ben: At the turn of the century, when the internet like became a big thing. I started a web hosting company with my friends. So we were building servers and shipping them to a data center in Fremont and then hosting websites on Linux from our dorm rooms. And that right there, got me really interested in networking.

[00:17:32] So, people are paying for racks or cages. So that's obviously metal, brick, mortar, and metal. Starting networks became something that I was really passionate about. And I quickly dove into the network engineering side of things. Whether it was peering, or data center core, or even internet core.

[00:17:51] That led me to, having contacts into the software business working for a network modeling company, a software company. So that got me into release engineering which became what we're now calling DevOps .

[00:18:06] So the true jump to cloud is when I came to Vinli in 2015. I had some experience with AWS at that point, very little, just kind of toying with it. But I had already done giant VMware farms to build software. So we're talking 15 or so blade chassis with 900 VMs.

[00:18:30] Mike: So this has basically been an evolution. And networking is still networking. Even if you don't own the server or the router that's there. On the same token, things are still running through firewalls. Even if you don't own the firewall. Somewhere there, there's an, there's an ACL list, right?

Networking is still Networking

[00:18:49] Ben: Right. Even if it's not a firewall from a hardware standpoint, there's some sort of, like you said, an ACL or an ACL or a security group.

[00:18:59] Mike: What is a security group? Define that for me, please do.

[00:19:03] Ben: It's a kind of ACL if you're a network person, that typically gets applied or attached to some sort of cloud resource. Whether it's a VM or EC2 instance, or an instance on Google cloud or Azure or whatever. And it’s essentially outside of the OS's firewall, cause you can certainly have a firewall in the operating system itself, but it's actually thinking of it being a security group is the access list that applies to the network interface at the cloud infrastructure level that that VM is using. Or you can apply those to load balancers that are cloud-based.

[00:19:37] Ben & Mike: Can you have more than one?

[00:19:38] Oh, absolutely. And where do you put them? In a giant pile, and it becomes a mess in a hurry! Wherever you want, wherever you want into a giant pile.

[00:19:49] I don't know if they quote you on that one. Even worse, they could become self-referential. Hmm. Sometimes if you know what you're doing. Okay. Yeah. And that's also key to the way things flow. We have to learn as we go through these new infrastructures, because the concepts may be the same, but it's how you apply it, how you make it work.

[00:20:12] Ben: Yeah. So, the whole concept of networking. It became very clear when I started working more on the cloud that despite there being no real physical network equipment to manage, it's still relevant to your operation.

[00:20:26] Mike: How is it relevant?

[00:20:27] Ben: In all manners. In the same ways that it is on an on-prem network. You need to have some level of segmentation, number one to maintain sanity, number two, to have some level of security boundary in between subnets that run different types of applications or services.

[00:20:46] Mike: And you need the right size of your configurations as well. You know, is it class A class B class C? What are you running where? How big is this going to expand?

[00:20:57] Ben: All the same considerations you'd have on an on-prem network apply to a cloud network. And especially if you're going to hybrid them together. Maintaining some sort of sane IP schema.

[00:21:10] I've had consulting clients where we decided to start over. Because their internal networking scheme had used all of the RFC 1918 private space on prem. Because whoever set up the network just used everything, you know, the entire 10 private network was in use because they had set the masks as one giant flat network.

[00:21:34] It was madness. So, I was like, guys, there's no option here other than to forklift this. We have to start over. We have to come up with a new network scheme. And they had plans to put some things in the cloud at that time. So, part of that scheme was figuring out if we pair this down, can we operate with, a slash 16 on-prem? [00:22:00] And that could be sub netted out. And they had a campus, so like multiple buildings across two cities.

[00:22:07] Coming up with some sort of same networking schema and documenting it, and then and then adhering to it will make your life much easier.

[00:22:15] It makes sense to have on-prem networks. And it makes sense to put some things in the cloud, right?

[00:22:22] Mike: Since you've worked in both; by having Native Cloud, what lessons have you learned that would help [00:22:30] people that are transitioning?

Lessons Learned from Cloud Native

[00:22:31] Ben: Starting with that low hanging fruit. If you're trying to get some of that on-prem stuff off your plate so you can be more effective with your existing staff or, if you've run the numbers, money-wise as to what it costs to have one of your people focusing on a mail server. Their skills are probably better placed somewhere else. If you're totally on prem get the low hanging fruit off your plate, like get the mail transition. It's painful, but mail and calendaring, I'm sorry, but no one should be running that on- prem anymore.

[00:23:04] Mike: Sure. And there's lots of SaaS applications as well.

[00:23:07] Ben: There's so many options, whether it's Google Workspace or G Suite or Office 365. Pick your favorite. There’re so many options. That should be one of the first things that goes.

[00:23:17] Ben & Mike: Most companies have done pieces of the low hanging fruit. And we've made massive. Yeah. Most have done it. And I think it's a massive relief to those companies. In Salesforce, for example, CRM across the board, there are a lot of good places that people have developed applications that work very well. And I don't care if it's HR applications or whatever those make sense. But again, within your core business, what kind of applications are easy to move to the cloud versus what kind would you say really might demand some on-prem hardware?

[00:23:53] Ben: It would depend on whether or not the organization is a pure software company versus a company that does hardware.

[00:23:59] Mike: Ben, [00:24:00] why does that make a difference?

[00:24:01] Ben: Well, if you're a pure software company, you don't have the need to have a physical lab anywhere If you're a pure software company, you can rely on these SaaS platforms to do every aspect from business to engineering. So, whether or not you're using GitHub or a hosted GitLab for your source code repository, your source code management, or you're using a cloud build system to compile that code. That can all be cloud hosted as a SaaS service.

[00:24:30] If you're doing hardware, you're going to need some sort of hardware lab for people to work in. To integrate you're going to need some sort of network on-prem, which is probably a switch and a firewall at minimum.

[00:24:42] If you're running your software builds. You can get images to put on that hardware assuming you're writing firmware to put on the hardware, right? To keep that private, you might need a VPN. You might be able to just do it over the internet over a TLS connection HTTPS.

[00:24:57] Mike: What you've said is something I'm seeing in the field where people are taking their software development and they're putting that into the cloud. They have to make sure it's tightly secure d d , but they're able to do that a lot easier than maybe if they're running financial applications and have to worry about end users logging in. In many cases they may still be using that on an on-prem environment.

[00:25:19] But again, that can be migrated. But even in your case, you've got lots of data that's very important to you and important to your company. How do you secure it? What tools do you use today, or have you looked at?

Security for Cloud Native

[00:25:34] Ben: Yeah, there's a lot of new services out that are focused on evaluating the risk profile and attack surfaces of your SaaS products. You can point it at an Amazon account, or a Google Cloud account and it'll walk through the thing. You can give it read only access to specific areas and it'll go evaluate all those ACL's. It'll go evaluate the age of passwords or tokens on accounts and give you a report.

[00:26:01] Mike: Here's a question, are they real? Or are they just emerging applications? How good of a job are they doing?

[00:26:07] Ben: Good enough to hinge compliance on. I've used at several companies reports from these cloud account audit tools as evidence to get audit or a compliance certificate. They're effective enough to do that. It is sufficient evidence to garner the sign off from a Compliance Officer to pass an audit.

[00:26:27] Mike: All right. For example, might you use SIEMs?

[00:26:30] Ben: Yeah. That's a huge part of that concept, you know, the security information event management idea where you're collecting any sort of event data from your infrastructure and even applications or services into one place so you can do security event correlation. It is almost a must. If you're going to be proactive in the security space. and not be caught off guard you have to have visibility, observability, right?

[00:26:56] Mike: Yes, that's still critical because in a shared service model we still have to take care of the security. And I don't care if it's because of GDPR or PCI. You still have a certain level that you have to take care of.

[00:27:12] Ben: You need an audit log to tie so you can do forensics to determine what happened.

[00:27:18] Mike: And logs are only as good as what you have them for. Because some of the logs you're generating some of the logs, other things are generating. On top of that, somebody could use end point detection. There's also NDR, for network detection, there's a lot of different options. The key is people have to secure what they want to protect.

[00:27:39] Let's talk about something else a little bit different, but still technology related. I understand that you also are in the cloud with regard to drones. What is that about?

Ben flies drones in the Cloud(s)

[00:27:50] Ben: So as a longtime, RC hobbyist, building and flying airplanes and I had cars and aircraft of [00:28:00] fixed wing and rotary wing, standard helicopters, you know, main or tail rotor. And then in 10, 15 years, we've seen the emergence of multirotor aircraft. I've been, a drone pilot, FAA 107 certified, et cetera, for as long as they've had that certificate available. And mainly flying cameras for video production, film production, and things like that.

[00:28:23] Mike: Amazing how things have changed in the last 15 years. What used to be RC was more of a hobby.

[00:28:29] Ben: Yeah. RC is definitely still a hobby. I'm a member of the AMA, the academy of model aeronautics. We go out and fly purely for the enjoyment of flying model aircraft. What's able to happen now with what we call drones, is essentially a fly by wire aircraft. You're telling a microcomputer what to do and it executes it on the aircraft for you. Like, you want it to go forward. You're not directly controlling the motors. A microcontroller is controlling the motors based on your input commands. And that technology is only possible because of the R&D that's gone into mobile devices and tablets. Drones, as we know them would not be possible without all of the R&D that have gone into bringing an IMU and inertial measurement unit, which is an accelerometer and gyroscope, down into the size of a silicon chip.

Conclusion

[00:29:16] Mike: Ben, I want to thank you so much for sharing with our listeners some of the lessons that you've personally learned from running cloud-based environments. And also, for sharing a little bit about your drone information. If anyone wants to reach out to you, what's the best way?

[00:29:30] Ben & Mike: Ben@vin.li.com is my Vinli email address. You can find me for drone stuff at Rotor Visual on the socials.

[00:29:44] Mike: Thanks again Ben! This concludes episode two: Lessons Learned from Cloud Native. Now I have a better grasp personally of Cloud Native, and I hope our listeners do as well.

[00:29:55] Ben: Thanks Mike.

Navigating the Cloud Journey

Episode 2: Lessons Learned from Cloud Native

Listen to this podcast on