Navigating the Cloud Journey

Episode 5: Leveraging Cloud for Data Analytics Applications

December 02, 2021 Michael Valladao Episode 5
Navigating the Cloud Journey
Episode 5: Leveraging Cloud for Data Analytics Applications
Show Notes Transcript Chapter Markers

In this episode, Mike is joined by Anita Jindal,  Sr. Director of Engineering at VMware. Anita shares her experiences and advice around building, operating, and securing business applications that leverage Big Data and emerging cloud technologies. 

Mike and Anita will explore the evolution of Big Data, how the Cloud offers new opportunities for data analytics, what it takes to build and nurture a team of application and system reliability engineers, and Anita's passion for mentoring women and encouraging careers in STEM. 

Listen to other Navigating the Cloud Journey episodes here.

Episode 5: Leveraging the Cloud for data analytics applications 

Mike: Hello, this is Mike Valladao again with your Navigating the Cloud Journey podcast series. Today, our journey takes us into the realm of big data, machine learning, and data analytics. 

My guest today is Dr. Anita Jindal, a cloud and technology advocate. Anita began her career with a PhD dissertation in parallel processing and worked with major corporations we all know and love such as IBM, Sun Microsystems, and Oracle. More recently, she's focused on data sciences for NetApp. And while there she received a 2019 Timmy Award for best technical manager in the Bay Area. Anita is currently a Senior Director at Vmware leading an engineering team for their Skyline product. In addition to her day job, she's also an advocate for STEM and women in technology, but more on that later. Anita, welcome to our program. 

Anita: Thank you, Mike thank you for having me here. 

Mike: Happy to have you. Now, I don't normally beat up my guests right up front, but I know you won a Timmy Award. Most engineering managers aren't really known for their people skills. So, when you won the best technical manager award, did you have much competition? 

Anita: Yes, I did. I was one of the five finalists and the finalists that I was competing against were either senior directors, the founders, the CEOs. And when I went to the award ceremony and my name was called out, I was very delighted. 

Mike: Excellent, that's great to hear. I'm serious when I say a lot of our industry is not known for people skills. It's great to have somebody able to excel and achieve and have people that enjoy the work that they're doing. 

[00:01:47] What makes a good technical manager?

Mike: So, I'll ask you another question about that; what makes a good technical manager?

Anita: It's about working and knowing about your technical talent. Giving them autonomy for innovation. Enabling their development in terms of learning new skills. At the same time, enabling them to execute without interruption. 

[00:02:14] What is Big Data and why is it important?

Mike: Excellent, I love that. I hope that we can put some pieces of this throughout our conversation today because today we want to first start talking about big data. What is big data from your perspective and why is it important? 

Anita: Big data is about the large volume and variety of data, which comes at a very high speed. And this cannot be processed with traditional systems. When we are talking about volume of data, it's in petabytes and growing. Variety of data is the content which flows in from different data sources. For example, it could be sensors, it could be IOT devices, it could be social media content. And when we talk about velocity, it's about the speed at which this particular data comes in from sensors. For example, when we look at autonomous cars. They are constantly getting data from the sensors on the road. Another example I would like to give here is the ride sharing services. In this particular case, when we see there is a surge of prices, or there is a different route being taken by a car, there is constant data being fed through the applications, the traffic conditions, as well as the other sensors on the road. So, this is where the speed of the data becomes very important. It's from the applications it's from the sensors, it's from the road conditions it's from variety of conditions, 

[00:03:47] The Cloud enables new use cases for Big Data

Mike: And this makes sense because we've always had a lot of data, but now we're doing so much more with it. Once you collect all that data with the sensors, you then have to decide whether you stop, whether you go, what do you do. So tell us a little bit more about big data. Where's it going? 

Anita: In the last five years, we are seeing a lot of technological innovation. There are a lot of applications which are being applied in the space of big data. I will take an example of our own product Skyline that we are building. It's based on big data. It's about providing proactive intelligence to VMware customers to keep them out of harm’s way which means that as they install and start transmitting the data to Skyline, we are able to process that data and derive the insights from this data and provide recommendations for our customers on how they can keep their environments healthy. 

Okay. Let me stop you there. You're talking about taking actions, you're doing different things with it. So, you are analyzing the data and making certain decisions to take things to the next level. And that's how you're giving your customers more information. Is that correct? Or is it something that's all been done under the covers? 

Actually, when the data comes in, it's being processed, it's all under the covers, it's not visible to the customers. In order for the customers to gain these insights they come to our application Skyline Advisor which is powered via these insights and they're able to consume these insights through our application.

[00:05:28] Preserving your investments when moving to the Cloud

Mike: Anita, you've done big data and data science in a variety of forms. Some have been in the cloud, some have not. Explain what makes sense and where you should be doing things. 

Anita: There are different choices that one has to make. I'll take two examples here.

When we have to preserve our investments in our technology. In the case of Skyline, we built a big data pipeline in the cloud. We use the cloud innovation. We used the capabilities of the cloud for building the big data platform. We preserved our investment in the Skyline Advisor, which is present in our own VMware data centers. So, in that case, we are able to have our Skyline Advisor application gain these insights and collect these insights from big data platform and surface them for our customers. So, in this case, we are able to preserve some of the investments that we already had and are able to bring in the new innovations and the new insights for our customers.

And in another case, in one of the other areas that I worked in, we started at big data journey early on. At that time, the cloud wasn't that prevalent. When we started that journey, we had a lot of investment in our big data platform and the data science platform. So, in that case, we preserved our investment of that platform on prem. But as the cloud innovation started coming in, we started leveraging the machine learning capabilities, the analytical capabilities, and the API capabilities from cloud. And we started building our analytical services in cloud, which would then talk to this big data platform on prem. 

[00:07:17] What advantages does the cloud provide? 

Mike: So why does the cloud have certain advantages? What types of things do you derive out of it that you couldn't get from just doing it on prem? 

Anita: Cloud is becoming more popular because of its abundance of compute, its pay as you go model, and the elasticity of scale. So, think about it. If you see major retailers, our holiday season is coming up, they will have a lot of sales. They will have certain bursts of traffic in order to capture this holiday sales. So, the cloud has become very common for the retailers because during holiday season or during peak seasons, they can go and spin-in the resources on demand without any intervention to handle the growth and influx of incoming traffic. And once that particular peak season is over, they can release those resources. In that case, they are not buying new infrastructure or standing up new infrastructure, but they are getting the required resources from cloud and then releasing them once they don't need it. So that's one of the reasons cloud is becoming very important.

[00:08:31] Strategies for "lifting and shifting" to the Cloud

Mike: So, if you had an application that you were shifting from maybe on prem to cloud, what sides would you be doing first? Is big data one of the things that you would be moving and shifting, or is that something you would do secondarily or maybe tertiary? 

Anita: If you are shifting an application, it depends on how quickly we want to build the application. If you have lot of investment in a particular section of an application. We will have to take into account the different considerations with respect to the cost, with respect to the time it would take to build the application, and what would be the cost ongoing to maintain and operate this application in cloud. So, in order for us to shift the existing application into the cloud, there are certain aspects that we have to keep in mind. If I have a lot of investment on prem that I don't want to give up, then there are other innovations in cloud that I can use. For example, there are machine learning technologies within cloud, which are available today. Amazon is continuously adding algorithms for machine learning which enables the engineers to do machine learning very easily. So, it's democratizing machine learning. 

[00:09:49] Democratizing Machine Learning in the Cloud

Mike: Give me some examples of that. What does that really mean? Give me some solid examples. 

Anita: So democratizing machine learning is. Traditionally when machine learning started, we would hire a machine learning engineer. They would start from the dataset and then they would go through and experiment with several algorithms within the machine learning library.

Mike: I'm laughing because it used to be an experiment every time. You're absolutely right. So, where's it going now though? 

Anita: So, Mike, that's where these hyperscalers are trying to make it very easy for the technical folks so that if you want to do machine learning, they want to make it very easy so that if you have a collection of data, what problem that you want to solve. They now provide technologies. For example, SageMaker ML in Amazon and example of AutoML by Google, where you can train your datasets with these particular technologies, and it will be able to build a model for you and provide a model with a probability that you can use it with. So that's why when I talk about the democratizing machine learning is it's about training these models. There are technologies already in hyperscalers today which will work on your data and tell you what is the best model for solving your problem. 

Mike: It's simply a checkbox that you can say I need X, Y, or Z today, and you're able to use it. Is it as simple as that?

Anita: It is as simple as that. You can talk about the different algorithms that you want to use for training your data. All you have to do is curate your data, define the features in the data set, and these particular models will bring that data. 

Mike: How do you interact with those? Do you have to know certain types of programming languages? How does that work? 

Anita: So, the way it works is these particular algorithms these are provided as services by these hyperscalers, and they have a nice user interface that you can use for interacting with the services and for enabling the upload of your data set, associating what particular data set you want to use as training data set for training your algorithm. So, it's very interactive for the engineers to use this. 

[00:12:07] AI in the Cloud

Mike: So, we've talked a little bit about machine learning. What about AI? What's the demarcation between the two or is there a demarcation? Are they the same thing? 

Anita: You might have seen the Bots. For example, when you come in, you talk about the natural language processing when certain data is coming in, or when people are asking questions, the Bots are getting more intelligent about understanding your needs and they may go further on scheduling your appointment with doctors. So, when we talk about AI, it's about going a step further. It's not just about the data flowing in and then giving the predictions or the forecasting. When we talk about conversational AI, or the workflows. So that is where some of these models come into play.

[00:12:57] Choosing the right cloud hyperscaler

Mike: Great. And you've talked a little bit about the hyperscalers. Those of course are Google Cloud, we're talking about AWS, we're talking about Azure. How do you know what is best to be using? What kind of decisions do you have to make when you build out an application like this?

Anita: When we look at the hyperscalers, the technologies which are available in the hyperscalers is very comparable. So, each of the hyperscalers provides the compute, the storage, as well as the technology. The factors that come into play when selecting a hyperscaler is what will be the cost for me to build an application and operate an application and grow this application within the cloud. So cost is one aspect. The second aspect is about the relationship between the vendors with the enterprise that wants to use this.

 Some of the IT organizations might have good software and good connection with certain hyperscalers as compared to others. So that plays a role as well, like which particular IT systems and which particular hyperscalers are supported by your IT and how you can leverage them through the virtual private connections from your company enterprise to the hyperscalers. And the third thing, which is very important as a choice for the hyperscaler is with respect to what kind of Service Level Agreements are provided by these hyperscalers. Whether it is 4 nines or 5 nines. And what kind of support is provided by these hyperscalers.

So, think about it, that when we are building enterprise level applications and deploying them into the cloud, we want to provide certain level of availability for our customers and avoid disruption for our customers. So how stable and how available are the resources in that hyperscaler? That plays a very important role. So, these three the costs, the SLAs and support play a very important role in deciding which particular hyperscaler you would use. 

Mike: I think those three items are all perfect examples of things people need to figure out to help determine the directions they're going. 

[00:15:05] Technical skills needed for building apps in the Cloud

Mike: Along that same line do you ever tend to go with the skillset of your own people? If you've got people that are more familiar with a Go, for example, you might want to do something more in Google Cloud. Or if you have people that are used to doing VPCs, what are your thoughts on that? 

Anita: So actually Mike, one of the things that I want to bring up here is when we started our cloud journey in all the projects that we have delivered, we started with a team of Java engineers. 

Mike: Your background of course is from sun Microsystems. So, Java fits in perfectly!

Anita: Java fits in perfectly, but I just want to let you know, I don't code, I'm talking about the skill set to my engineers! Most of these engineers had Java skills as we were building the big data systems. So, we had to definitely build new skills. My team was excited and they were they actually were wanting and learn new cloud skills.

Mike: Give me some examples of the new skills that some of your folks had to learn.

Anita: Sure. So, some of the things are when we want to do cloud deployments, we want to make sure we can containerize our code. So, they had to learn skills about the Dockers containers. And then the second thing that they had to learn for example, was okay, when we have this containerized piece of code, how do we deploy it into the cloud? So, this is where they had to learn about Kubernetes technologies for deploying the code and then scaling these clusters. 

Mike: Was that difficult for your people to learn? Is it something that they just gravitated towards? How quickly did you see the migration take place?

Anita: Actually, my team was very motivated. They learned it pretty quickly. When we started on the project, they jumped on learning new technologies, and they were building on the go. 

Mike: So, you didn't have to rebuild a team to say, hey, we need people, we need five more people that all have this or that experience. Or was it a combination of hiring a couple and then having people grow into it? How did you bring it all together? 

Anita: If I'm looking for a perfect individual with a particular skillset, the hiring is not easy. We will never find somebody with all the skillsets. So, I always believe in building your own team. This is also an opportunity for them to grow and learn new skillsets. And my teams are very motivated, and they jump on these new technologies. 

Mike: That's awesome. I'm happy to hear that people are jumping on board because the cloud has so much to offer. And all that we can do, you you in particular, is to try to bring the teams together and to point them a particular direction. So, what other skill sets do people really need to embrace the cloud from your perspective? 

[00:17:52] PaaS services you can use in the Cloud

Anita: There are a lot of PaaS services which are already available in cloud. 

Mike: A PaaS service, what do you mean by that?

Anita: That is Platform as a Service. When I talk about Platform as a Service, for example. When we want to build out a big data pipeline, there might be a database which is available as a service for us in cloud. In that case, we don't have to go and install a version of database, but it is available from the cloud itself. And we can use it as a technology.

Mike: Are you talking about something like Hadoop or what?

Anita: So, when we use our big data for building our pipelines, in some cases there are technologies which are available in AWS for example, for stream processing. And then initially when we started our Hadoop journey, they were available as Platform as a Service in a different cloud. Some of the other analytical services which are available for example, in Google Cloud, there is Bigtable and Bigquery, which is available for big data analytics. In Amazon, there is a comparable technology, which is Athena for analytics. So, there are all these services which are already managed by cloud vendors. And as we are building our technologies, as we are using these, we pay based on the data storage, how much data we are storing and how much we are accessing these services.

Mike: I really appreciate you going into some of the details here, because this gives our audience an idea of directions that they might want to expand into because there's a lot of opportunities here. 

[00:19:25] Strategies for managing security in the Cloud

Mike: Let's talk about the security of all this. How do you keep it secure? Because now you've got all this in somebody else's environment. Or in your case, I guess with VMware, you may actually have some of the things within your own environment, but still, how do you construct the security to make sure it's baked in? 

Anita: We bake the security from the get-go, as you are talking about. There are enterprise level security architects. We engage with our security teams on reviewing our architecture and ensuring that the network firewall settings can handle the capacity to ensure that the data that we are storing can be secure. Even before we embark on building an application out, we always engage with the security architects to review the architecture. 

Mike: Let me ask you a question. Are you saying that it's somebody else's job? Or is it still partially the responsibility of the team itself?

Anita: So, Mike, it is responsibility of the team, but there is expertise, security expertise. It's a niche area. So, in this case, they will know about what we need to be doing from the penetration point of view. What are the certain criteria our applications need to meet? We want to ensure the security, but we will not have all the details and all the expertise that these particular security experts within the enterprise teams have. So that's the reason it's important to engage with these security architects to review your architecture so that we are building the secure and robust product.

Mike: That's excellent advice because we do have people, in each environment that know security. Let them take a look, let them review it. Don't block them out. Make it all work together. So, I think that's a great design for success. 

[00:21:24] Application health monitoring in the Cloud 

Mike: What about monitoring? How do you make sure the data is really doing, what's supposed to do? Where do you stand on that Anita? 

Anita: One of the big pieces or building big data platforms which are serving our customers is we should know that our system is healthy, that there are no failures in the pipelines. So, the way that we handle it is we build monitoring hooks within the code that our site reliability engineering teams can use. And we are using VMware observability product for building observability dashboards so that we can monitor the health of the system in near real time so that we can see how the system is behaving, how much data is coming in, and how much load is on the system, and how it is flowing on the system, and how the different components of the system are working. And as we are looking at these are real time dashboards. We have also a built-in additional automation so that in case there is a breach of a threshold, or if there's an outage, there are automatic hooks or integrations with tools like pager and Slack so that our SRE team can be paged, and they can engage and proactively look at the threshold breaches or issues.

Mike: Let's go to the details of this. When that happens, how do you make the interconnection to the other tools? How do you get there? Are we talking about APIs or what are you doing to make all this work together? And is it someone from your team or someone from another team building and deploying the dashboards themselves?

Anita: No, actually it's my team who's building and deploying the dashboards. So, we have a Site Reliability Engineering Team who is responsible for the complete performance and operations of the system. They are the ones who build the dashboards, and they are the ones who are knowledgeable on what kind of thresholds that we need to take a look at. And if those certain thresholds are breached that's when we have integrations with other tools like I talked about; Pager and Slack and that's through APIs. So, these tools have APIs for integrations. And when that happens, that's when our team gets notified proactively rather than sitting in front of their dashboard.

Mike: I like it. And just to summarize. You see advantages to using the cloud to make big data work. And so, all that has come together by bringing your team in and expanding their skill sets, making things work, having them learn the new applications that are more applicable to the cloud. You're also saying you have done an awful lot with hybrid cloud in order to, again, make the best use of what you've got here. And when it comes down to the actual needs, you always want to bring in the security experts as well.

Okay perfect. I just want to make sure that we're hitting the right takeaways for our listeners.

[00:24:28] Paying it forward. Women in Engineering

Mike: So, with that, I would like to change directions here. Tell me about your work in bringing other people up. We've all had different mentors to help assist our careers. What are you doing and what's your viewpoint on how to help different people? STEM for example. 

Anita: Sure Mike. I have passion for Women in Technology. When I grew up in my career, there were not too many women in engineering, but I have had passion for technology. The way I give back is through my involvement in the different women in technology forums. It's whether it's through presentations. I do mentoring for women. I'm on the advisory board for a nonprofit called Thrive-WiSE, which is for women in science and engineering, and it provides and offers development resources and support for women in technology. And I also like to give back to the community because I think as a woman, we need to help other women and let them know that the technology area is great, and they should get more involved with technology and stay within STEM and engineering. It's a great area to be in.

Mike: I think that's great. I also will, just from a personal perspective, prior to this meeting I went through some of my mentors in the past. And it's interesting because when I think of people like: Neerja Sharma, Rieko Sato, Kara Wilson, some of the people that help foster me. The majority of the people that really helped my career, were women. I don't know what that says, actually, but I personally have been fostered. 

I have three children, one is male, two are female, all three ended up getting technology degrees. We have so many opportunities. It's good to foster. And I do appreciate the efforts that you are taking to help people because we need so many more people in our industry.

Anita: That's correct Mike. And I always tell people that we have to experiment, and we have to forge ahead, and we have to pave our own path.

Mike: Nice words to live by. Wonderful. Anita, if you would, please give us a little bit of information on how people could contact you if they needed to going forward. 

Anita: Sure. Mike people can contact me on LinkedIn. My LinkedIn handle is Anita Jindal and my Twitter handle @AnitaJindal12. 

Mike: Very good. Thank you very much for your time today. Thank you for all you're doing and good luck in your cloud journey as well. 

Anita: Thank you, Mike. 

Additional Resources:

Learn more about Thrive-WiSE @ http://thrive-wise.org/

More questions about Cloud? Join the Gigamon Community Hybrid/Public Cloud Group @ https://community.gigamon.com/gigamoncp/s/group/0F91O0000009cN2SAI/hybridpublic-cloud 

What makes a good technical manager?
What is Big Data and why is it important?
The Cloud enables new use cases for Big Data
Preserving your investments when moving to the Cloud
What advantages does the Cloud provide?
Strategies for "lifting and shifting" to the Cloud
Democratizing Machine Learning in the Cloud
Artificial Intelligence in the Cloud
Choosing the right cloud hyperscaler
Technical skills needed for building apps in the Cloud
PaaS services you can use in the Cloud
Strategies for managing security in the Cloud
Application health monitoring in the Cloud
Paying it forward. Women in Engineering