Description: This article is about forwardhealth-portal. Health Code is the first specialist conference on agile software development, Continuous Delivery and DevOps for the pharmaceutical, medical device and health and hospital sectors.
My name’s Jess Breslaw. Delphix is a company, a Silicon Valley company about nine years old and hundred million dollars. We have offices around Europe. I’m going to talk about data to you today.
I’m not an expert in health. I’m going to talk a bit about some of the aspects of Delphix’s works across industry, some of the biggest companies in the world, retail finance as well as healthcare and manufacturing.
It’s an interesting story. If I asked for one thing at the end of this is that you Galway thinking about what the potential is when you look at data in a very different way than you’ve looked at it before.
There’s a lot of data around. We hear it all the time but how much data are we talking about. There are some amazing numbers. A zettabyte is a billion terabytes. Last year most analysts agree we broke the zettabyte and stage and in global data. But healthcare IDC forecast within the healthcare sector we’re going to have 2.3 zettabytes of data, 2.3 billion terabytes of data by 2020. That’s an astonishing number.
It’s even leading on to analysts and and futurists predicting and the word I’ve never even heard of was a yottabyte, a trillion terabyte. If you only remember one thing today, you’ve learned a new number, a yottabyte. That’s an incredible number. But this data is causing businesses and huge amounts of challenges of how to handle that data room and what to do with it.
I pulled this from a Deloitte white paper. It brings into forefront of what we’re talking about the issues within the healthcare industry. Data is coming from a lot of new things we’ve heard digital pills, things like new initiatives and electronic health records.
All of these new technologies and IOT and everything generating more and more data are going to pervasively go across our businesses and make us think about how we’re going to handle that data.
At the same time there’s a pull of data. There’s new expectation from consumers in how data is used to give them a better service, predicting better outcomes, post operation what can be done. Early we talked about some of the virtual nurses that could be done.
Data is coming from all of these sources and generating a huge pool of opportunity. But the question is how you handle that opportunity. There are a bunch of challenges. Many of them will be discussed today.
This digital healthcare that’s generating all this data is in the disruption. Going on within the industry is one huge challenge for IT people. But at the same time regulatory requirement is also pushing directly against that. As fast as you want to develop new things, regulations are always there to control and make sure things are done in a correct and proper way.
That often has an opposite effect. We’ve got data breaches happening in hospitals and healthcare and in every industry.
Sensitive data is a huge concern for everyone. Even before I came on this morning in the breakfast I had a couple of conversations around GDPR which is coming in May the 25th and driving people to think how can I protect my data.
We’re looking ways to operate more efficiently to make things more streamline, quicker and automated. There are huge number of people within healthcare. How do we manage those people? How do we get the most out of them? At the same time how do we do all of these things?
We’re always under the same pressure to reduce cost out of our business. So there’s many themes that go against each other in what we do. How do we balance the need to go faster and be more agile? Controlling risk and cost and managing the people and protists is what happened in our businesses.
In Delphix we see things from two points of view. The first thing is the data consumer. These are the developers, the testers, the data scientists, machine learning and AI algorithms and applications that use data.
All of these data consumers have a real set of demands. They want to make data-driven decisions. They want to make much better decisions based on the data they’ve got available to them. They want to do it fastest.
DevOps and initiatives and agile methodologies are all looking to go faster and make these people to make decisions faster. They want that data much faster than they had it before. A new architecture like cloud means that data is spreading in more places.
We have to manage data in more places but the consumers want that data to be in the cloud. They want data to be in more places geographically and across technologies. On the other side of the coin, we have the data operators the DBAs.
The storage, the infrastructure and the security have a completely opposite set of challenges. They have to make sure the data is safe and secure. They’ve got to make sure that processes in place can be followed for delivering and supplying data. They have to have governance and risk to make sure that data is protected and secured and the right people have the right access to the right data.
Also they’ve got sprawling data sets going across all over organizations. How do they manage that and control that? When you’ve got these two opposing teams of people when it comes to data, it causes something that we call data friction. It sounds like DevOps.
How’s friction between the developers and the operational side of the business? In this context we have friction between the consumers and the operators of data. This inhibits our agility to get things done in this world of incredible data that’s coming into the industry and inhibits the innovation that we want to put in.
If we accept the data as a key asset within healthcare, we have to find a way of handling data better. We have to unlock data to enable us to do innovation faster, to do it in a secure way and to be able to do it everywhere.
We have a solution for that and I’m going to tell you a bit about that. But before that I want to talk about DataOps. You can google DataOps when you go home or over the next couple of weeks. DataOps is a methodology that lots of companies are starting to talk about.
It doesn’t fit very nicely into things like DevOps and other other methodologies. This is Gartner’s definition. Everyone has a different definition for what it is. My personal opinion is that it is people’s process and tool that enables to remove that friction between the data operators and the data consumers.
If we can unlock that we can do a lot more with our businesses and our applications. So what if there was a better way? What if developers and testers and data scientists can have data in minutes whenever they want it and what if they weren’t slowed down by manual processes? They can have it whenever they want and they can do it themselves.
What if every developer and tester could have their own environment? Because there is no longer sharing a set of data between a shared environment. Those restrictions and constraints were gone.
What if you could have production like test data when you do your development and testing? But you could do that without the regulatory risks that come with that but having sensitive data and in those environments.
On the operator side what if the the DBAs and the infrastructure side can deliver that data in minutes? So instead of inhibiting agility and innovation they’re enabling it. Suddenly they’re on the same team. They’re making things happen at the speed that you want it to happen to.
What if they can manage hundreds of copies of that data with full transparency and visibility? What if you can completely remove the cost associated with supply and data and what if you could protect that data at scale and make sure that compliance and regulation fits in nicely with that?
What if you could do that across all environments on premise, cloud or hybrid? This is the Utopia of data delivery and is what we talk about. This is a new approach to data. DataOps is looking at things differently. We’re not going to do things in the same old way.
Data delivery is one of the last parts of the DevOps tool chain that hasn’t been solved until now. I’m going to explain how we do it. If you can make it portable, accessible and secure, then suddenly it fits into that automation piece and it fits into your workflows and can fuel everything that you do.
I am going to tell you how it works. With Delphix what we do is that we deliver data to the people that want it when they want it in a secure fashion. We do it fast via self-service and we do it across infrastructures.
How it works is as a piece of software. You install it and it connects to your data sources flat files, racial databases, unstructured databases that we’re building support for and configuration files. We connect with that and we take a single one-time copy of that data
From then on we take incremental changes. Every change that happens in that data source now gets replicated into our engine. We’ve now got this near live version of your data. We then compress that data to about 1/3 which we virtualize at the data block level.
Now we’ve got this. We can deliver virtualized copies with no storage footprint to people that want it downstream. But before we do that and bear in mind we’re still within your data center. Here data has not left the secure boundaries of your of your data center.
What we can do now is that we can secure data. We have a data masking tool as part of what we do. A profile will go out and analyze data fields and the actual data within inside that looks like credit-card number.
That looks like a birthdate or that looks like a national insurance health number. We are in the box that you could use to transport and try to convert that to a different format. Or you can customize those or you can build your own.
The point is that you build your data masking policy that gets applied consistently through all of your data sources. If I’ve got an Oracle database and a Sybase data and I’ve got J’s that exists in both of those databases, it’s intelligent enough to convert both those fields with this common name to the same value.
When you’re doing integration testing it’s going to be good quality data. We manage that data. That data is secure. We can deliver that downstream as virtual copies of that data. We can release many copies to many people and have full control and transparency of how we do that.
The way Delphix has been coded is that anything that’s available through self-service and through our command-line interfaces is also available through api. That opens up all of the DevOps tools that you use today.
You can code Delphix and there’s an absolutely fab demo on our website using a full DevOps tool chain, using data cow, Jenkins github and solarium Aven. I recommend you have a look at it and it takes you through the day in the life of a tester using all of these tools together. It’s a great video.
That’s what fits into the DevOps tool chain. We can use the apis and do it. We can now create something called a data pod. A data pod is a collection of virtualized databases delivered to someone downstream. A tester, a developer, an analyst or whoever could be a collection of copies of data. They now have control and features to use on their own.
I’ll explain what some of those are in a second. The data operators now have a whole great set of functionality. They can secure data. They convert lies and compress storage. They can create automation. Everyday I want all of my testers to have a fresh copy of the data when they come into the office.
You can replicate it between sites and you can view and manage those environments wherever they happen to be. The consumers have a bunch of funky tools which I’m going to talk about in a sec.
They can bookmark data. They control data like you code version. You can refresh, rewind and reset data at will. You could branch data in the same way that you branch code. And you can share copies of data between teams like you’d share a new story on on Facebook.
What’s the benefit of this within our context of healthcare? Suddenly I can accelerate any application project I do. On average most application projects that use dynamic data platforms are done a half the time they were before.
It is absolutely astonishing. You think about the days, weeks or months it takes to deliver a refresh data and secure that data. If that’s done in minutes, it’s easy to see how those kind of time savers can happen.
If you imagine that I’m not this team and is not waiting for the environment to be freed up to do an application project, because you’ve only got one shared environment new UAT, you can see how this could be beneficial. Because they can now work in parallel. It speeds up.
I can create my policy once and deliver secure data across my organization whenever I want it. When it comes to GDPR on May the 25th, you’re doing what GDPR mandates. You’re building security into your business processes.
When I deliver data de-facto, it gets secured and protected before it gets there. It enables the cloud and I’ve got a cool use case. It explains how this works. Here’s my production database in my data center on premise. I’ve got my Delphix engine. I’ve connected to the data. I’ve collected that together and I’ve secured that data still within my data center.
I have another piece of Delphix software in the cloud. I replicate the mass data into the cloud. My cloud never touches the data on-premise. I only see the anonymized data in the cloud. It never accesses that production data.
Once I’m in the cloud, I can deliver those data pods to all the developers and testers when you have access to that data without any additional storage footprint. When you think of the amount of money it costs within the cloud for data, you can start to see how much its benefits. It gets much better than that.
I’ve got the data in the cloud without platforms. When I need to refresh that data I have to go back to my own premise and get an entire new copy of production data. I have to mask that entire copy again and deliver that copy into the cloud in big companies in a lorry and a hard disk that goes there.
Because we’re tracking incremental changes. We only have to refresh the changes into the cloud. No longer do we have to do full refreshes from production on-premise into the cloud. We can refresh a cloud instance of data in six or seven minutes.
You can version control data. You have the ability to bookmark that data a point in time and then work on a different bug, a different feature, on that same application even within a single environment.
A develop or tester could have several copies of data from several different points that they want to work with. Finally the icing on the cake is that because we track incremental changes, we’re a time machine as well.
Not only can you have that data instantly and fully secure but you can have it from any point in time. If you get a bug for Wednesday last week at whether at 3:30 p.m. you can rewind the data to that second before that bug happened, spin up that data from that moment in time, trials the data and fix the bug. You can even then copy the virtual copy of the database back to production, a virtual to physical copy.
We often use in production support environments as well. It’s a lot to take in. It’s one of those things that you walk away and you start to think of different applications and different requirements within your business where this could have a transformative difference in what you do.
I’ve mentioned development testing is clearly an obvious use case for us. We have a couple of other use cases, data center and cloud migrations and security and compliance which weirdly lost their logo which is interesting in a data recovery.
As analytics reported, you have lots of applications doing reporting, financial reporting, compliance reporting at the moment they’re running off either production or old versions of data.
Imagine if you could run those financial reports of data there are only a few minutes old from whenever you need it. You’ve completely developed a standalone copy of that data. You can roll back instantly and get that access to that data quickly.
We have some healthcare examples. We’re particularly big in healthcare in the States. We have mostly large hospitals and most of the large health insurances. We’re starting to have opportunities in Europe as we grow and expand.
There are some interesting use cases. A diplomat has a patient care app that is used to develop a mobile app and may use Delphix to do that and to accelerate the time to market for that. That was an interesting use case.
They use it for their AARP system to manage all their employees as an Oracle EBS application. They use that to mask that data because of the sensitive HR. This is a security and a regulatory reporting application.
Leaves us for devops. Their current leech is using us to do a AWS migration into the cloud and that’s currently being installed at the moment. EBI is an interesting company. They’re the European permitted bioinformatics Institute. They collect all the genome data from around the world petabytes of data.
It’s unbelievable how much data they have. They deliver that over to research institutes and universities and data scientists. Their data is too big for a Del phix to handle. We don’t handle petabytes of data in the way they want us to use. But they have databases of metadata.
They use that. They had product managers for data sets and they use del phix to crunch and analyze that data. It’s an interesting range of use cases and one of our biggest challenges is that everyone doesn’t do the same thing with Delphix.
It’d be handy if they did. But it shows you the agility that we have. I talked about the explosion of data particularly in healthcare, how what’s driving that data explosion, all the digital health applications and devices that are coming along IOT. Also at the other side we talked about the expectations of the consumer. Data is feeding that end-to-end service.
That has a number of impacts and Del phix helps with that. We increase the the speed at which you can deliver these things to market by up to 50%. We’re much more flexible and agile. We can fail fast. We can do things. We can play and innovate more in a flexible way than we could before.
The data operators don’t have to worry anymore about infrastructure constraints. You don’t have to worry about the people process constraints anymore. You’re allowed to do it without that constraint and concern. Security compliance is now no longer an issue if all of your test data is anonymized and masked in an intelligent efficient way.
It’s not even relevant to GDPR anymore. Because there is no sensitive data there and the data can’t be traced back words. It’s a one-way anonymization. You can enable the cloud far quicker and agile than you could before.
You can get to there and leverage those techniques. It’s far more collaborative. I can move and play with data between teams, between infrastructure operations and the consumers but also between tests and QA and dev.
It’s a far more productive and collaborative way to use data. It opens up the way that you work. It fundamentally changes some of your business processes within your DevOps methodologies. You can do things before that you couldn’t do without this stuff.
We never talk about storage. But we remove about 90% of your storage costs in non-production. Typically that pays for the stuff but the real benefit isn’t the storage savings. The real benefit is the agility and the change management that you can do so much faster.
This is something that you get your architects, your development, your tests, your QA developers or your compliance in a room. They have a good old fight over how data is managed until we all come to a conclusion that this is going to be the best for your business. But even as a general science experiment it’s an interesting exercise to do.
I’d recommend anyone to do that. I hope that was useful. Thank you very much for reading.