Hello and welcome to MedCIDI's Pivot Podcast. This is Orantu DiBarmar, editor-in-chief of MedCIDI News, and your host. As we get ready to say farewell to 2021, and usher in a new year with all its Omocron uncertainty, I thought it would be great to talk to someone who has experienced in practicing medicine, public health, health informatics, and AI. My wishes were answered when Taha Khas Hoot, chief medical officer and director of machine learning at AWS, agreed to chat.
What follows is his vision of how AI can democratize healthcare data and put us all on a path to precision health. Welcome Taha to MedCIDI's Pivot Podcast. Thank you for having me. So I wanted to start with what attracted you to Amazon.
I mean, you have had such a distinguished career, you've been a public health official, you understand the technology side of healthcare. What has it brought you to AWS? I was a customer of AWS before I joined AWS, and I had experienced that firsthand, just something about me a little bit. So I'm going to start by training and statistician, where I rely a lot on health data to monitor diseases in a population as they propagate throughout my time as a public servant in the Obama administration.
First four years, I worked on many challenges when it came to events such as H1N1, the world had experienced back then and how we can really stand up, monetize the entire infrastructure to be cloud based, and it was one of the very first programs to join the cloud back in 2009. But more importantly, how can you really enable state and local health departments to overcome their many heavy left and challenges about monetizing their infrastructure to receive large quality information from their providers, but then provide at every level or every layer the right set of controls where state and locals are really controlled over that of their own, and however which ways they can share. And my journey throughout the years as a public servant really, where I introduced a lot of modernization and innovations around data sharing, security, and innovation within analytics and from public health surveillance to second outbreaks to really getting ahead of selling innovation in the flywheel around, for example, the FDA had access to and made public order around how you can deal with this next generation of machine learning and software's medical devices and engineering sequencing, what genomics and whatnot, and how can you really move towards FDA as an enabler of working very closely with industry to enable those innovations with regards to diagnostics and new ways of medicine. And the cloud always was sort of at the heart of that to enable modernization infrastructure, providing ways and microservices to enable the unifying of the data strategy and to end around these initiatives and also create a large flywheel that till today, you know, with the case in CDC almost 10 years later, or the FDA almost six years later, that flywheel is life and well and then continue to provide the safety for large populations around the world.
But why Amazon? Other companies that provide health, health cloud services, right? Do you have Microsoft? You certainly have Google?
Although they're not doing as well as you guys obviously. So why Amazon? What do you think they have? And maybe other companies do not?
Well, I mean, there are many strategies to be successful as a business. Amazon in particular is customer obsessed. And that's a strategy and that's a long term strategy. And we can be product obsessed.
You can be competitive or competitor obsessed. And those work, you know, those strategies have worked as well. We've seen them to work. And in case of Amazon, stick in the long shot approach and in ways about not only you listen to customers needs and whatnot, we always start with a problem and also shape, shape what else we don't have customers as the right questions so that we don't repeat the same stick that we have to learn in a long way.
I mean, the great thing about Amazon and is learning by doing and that's also something that we where you lean very heavily in helping and understand and differentiate with your solutions to customers in ways that can help them solve those problems. So so a customer obsession is is front and center to the sculpture. And that's something if you look at my entire career of always when you know whether I was attending to patients or I was helping populations, always started with those in a field and those in a very close to the problem, like for example, at CDC with the same local health departments trying to deal with pandemics. FDA, how FDA became a clear partner to innovators in Silicon Valley and around the world and data science around the world trying to get ahead of adverse events of drugs and medical device issues or bring a new wave of innovation around for example, like generation sequencing in the light of personal wellness, medicine initiative or how the new wave of machine learning is driving most of innovations and how do you regulate that space.
So for FDA to work very closely with my partner with industry on this was really kind of came to at least approach that I was I was I was leaving and so all the all these things that we put together comes really down to the company that is that's more obsessed in every meaning of the way in every line of business that we do. And Amazon is not just one business where a number of constellation of many businesses and also Amazon invented the cloud. I mean, over a decade ago, look where the world has gone in the last decade from digitization to making sense of data and the large investment we're making machine learning, making that democratizing that for the masses. Those are things where you democratize infrastructure to help them either rise or democratize access to highly accurate scale of tools and science like deep learning, which was which is sort of in accessible to a few and now we make that as as our last five years as our mission to make machine learning as boring as possible.
Anyone should be able to use that and be able to to disrupt pretty much every industry you can look across especially in health. Give me a sense of what your vision for precision health is as powered by artificial intelligence. Well, I mean, absolutely, you know, the good doctor treats the disease that patient has a great doctor treats the patient who has disease, right? So it's ultimately is going to come down to understanding more and more about the individual in order for us to make more the most tailored approach with it from prevention all the way to treatment and discoveries.
So if you look at that from that, I mean today it's fairly common that you know, patients walking around 5,000 to 10,000 data points on them, right? Anywhere from where you know, we've gone from medical records that could go almost 15% with digitize now almost 98% is digitized almost all your micro records today are digitized. There are new ways of and waves of data that's coming to the fold from smartphones, such as thermal health, behavioral data. If you go and get your flu shot and your biopharmacy, all that is digitized today.
The genome is the biggest disruptor. We've seen 100,000 folds reduction in cost of genome and a lot of accuracy in understanding the variance that, the first year that personalize each one of us and in those variants are a lot of answers to diagnostics and a lot of answers to perhaps even future treatments down the road. So if you look at it in a hole, we're seeing really a great ways about how this revolution that's happening or happened over the last decade of digitizing data has enabled many of these sort of applications to start personalizing care and prevention for each individual or sub-sederate population. I believe the next decade is going to be a lot more about how can you make sense of all this data and totality to truly start personalizing options for individuals.
The ultimate definition of precision medicine is really how can you prevent disease before how can you tailor individual journeys for each one of individuals so your journey might be different than mine even if we are twins because there are a lot of other factors extraneous that are health like social term as a health, the zip code you live in, the socioeconomic status and other things and start really close the gaps in care and improve equity of care and improve public health. If any lesson we should take from COVID pandemic as a crisis is more investment should be in the forefront of public health and the crux of that is the individual and the environment they live in and how can we start getting at removing more of those gaps in care. That's the ultimate sort of ways about how we can live healthy lives but also contributing ways that can be data driven decisions all the way from an individual to large populations. So let's talk about the patients and let's talk about data.
Some of the maybe all of the innovations that you're talking about is possible because you can get de-identified data from institutions. But what we are noticing a lot is sometimes patients are not even aware that their data is being used in this way. What does ethical data sharing look like? Do you have any sort of comment on that?
Well, I mean, if you look at progressive regulatory privacy frameworks, I believe HIPAA is an awesome thing. HIPAA was passed in 1996 but it was the first time it really flipped the ownership model with patients' concepts. We're seeing the same thing with GDPR by the way. That model is going to propagate more and more where patients' access to information will become more authorized also by individuals.
So consent will be key in the future of that interoperability and how data flows. I don't want to say a patient but an individual because we can be patients along our life and we can be healthy along our journey with the individual itself. What we've seen more and more models are centered around this. Also business models, new business models didn't exist before.
So I mentioned how perhaps sometimes under-appreciated things is that over the past 10, 20 years we've seen quite a massive visualization healthcare following 2008 economic collapse where the high-tech regulation was passed in 2009 to really stop pushing the digital health records, right? So for the past decade now, you have almost if you look in the rear view mirror, probably the last three, four years, almost all your data is now digitized, including social information. Also we've seen quite a bit of innovation about new business models, those are value-based models for improvement patient outcomes. That shift is not just here unless it's global shift because you really need to focus heavily on the quality of care that you provide.
And with that you need to look more broadly about not just health record data but even much, much broader. And patients today walking with a lot of social information around them and you see people talk to computers at home, see people carrying their mobile devices or not. There's a lot of information around individuals that's even more important about someone's determining someone's outcome. So in that constellation, the right business model is really for the data to start flowing with the patient at the center or the individual at the center.
But I also don't want to overwhelm individuals with that large volume of data. Even health organization realized in which the vast majority of the data that they hold on a patient, even from a most sort of wedged view about a medical record. I mean, almost all that data is structured in nature in the form of doctor notes, doctor notes over time, their decisions and how their knowledge and years of experience is baked and those why did they choose this test over that test based on the outcome of that test, what kind of action they've taken. I mentioned the genome by CEKG, EEGs, studies and all of those are fairly complex to put all of that together in a way that makes it start making sense of this information in the first place.
So then try to distill it down to what that means to me at this point and what is the right decision for me at this point. This is a massive data big data problem and where you have billions of data points to kind of come together in ways that you can now use it in the most efficient way. So even large health organizations, whether the hostels or payers or innovators, I mean, struggle with the amount of information, but also the complexity of the information, how you bring it together, which we did the next topic, which is something we've been making on material investment in, which we're first doing to machine learning. Yeah, exactly.
Amazon Health Lake is one of those purpose-built services, specific to health industry, the start index, structure, format information, organize it in a way that you can have a long record of each individual so you can start now, interrogate that data, build advanced models and then bring the power of the cloud of scalability and security and scale. In addition to advances in science that we're making available to, that we're democratizing, when it comes, believe we are at the friction point when it comes to machine learning and deep learning is one particular discipline with much more accurate algorithms. For example, this has settled on pattern recognition when it comes to pathology, image, and x-rays and that sort of thing, better prediction because now we have more information indexed by an individual so you can make a lot, far more refined predictions, but also national learning understanding, I mean, that comes in a ways about not only how you can enable chat bots and sort of conversational, intelligent conversational bots and that sort of thing also comes to a massive amount of this information. Healthcare or health data in general is both sequential and structured and that's perfectly what you're learning to start tackling this problem.
So in combination between power recognition, prediction, national understanding, bring that complexity and simplify it to users and democratize and access to these tools where software developers, data engineers, data scientists, business analysts, even clinicians will be able to start using this for their own decision support as well as for patients to have unfettered access to that information ways that can make sense of that. So Amazon Health Lake is one of those constellations where it's a purpose-built, paper eligible machine learning, machine learning analytics service enabled by all these other components to help you index store and transform the information and then present it on a timeline and then be able to do that a bit about scale and also support interoperability data and our ability, how the exchange of this data can happen in this new modern way of the web and mobile applications and systems back into each other through application program interfaces, surface that information while supporting emerging healthcare and operability standards such as the fast healthcare and operability resources or fire, which we believe to be the future about how most of the data exchanges will converge when it comes to health. So I'm going to stop you for a minute. You've talked in broad generalities.
I want to bring you down a little bit to specifics. Let's say I'm Mayo Clinic and I want to understand patients in my population that are pre-diabetic and so I want to intervene early and I want to make sure that they don't fall further into that diabetes spectrum. How can a service like Health Lake help me? So a couple of ways.
I mean, first of all, as I mentioned information in major parts, if you even just look at, let's say, a healthcare institution is looking at all the data they have about individual and their medical record data that they have. The majority of its data is going to be infrastructure. And if you focus on one with LTF, which is just text and I'm not talking about imaging, but we're a PDF and lab reports, we're not just going to focus on text a lot in that, like doctor notes, for example, or triage reports or imaging reports, there's a lot of condition knowledge baked into those. So those are some reason that happen about the care about the individual.
And you can look at that one of those, at our time, or you can look at the entirety of a patient journey. And when you look at that, it's a massive space of data points about individual that you want to bring together in a way that can start making sense of information. So I start at the very basic data management and be able to do that with the privacy and security control, make it in every step in a way. By being able to kind of index every encounter, every medication, every test results, what's the next action to be able to kind of index all that information on the timeline that's personalized and visual.
That's the first heavy left where healthily can take that heavy left, can remove that heavy left for a particular customer looking at that problem. So that's the first step. Then provide the query ability to kind of be able to look at across the board. For example, you want to look at, you know, I have patients who are newly diagnosed with diabetes six months later than not, they're not responding to the first sort of line of oral medication, that's the same as forming.
What is the next best action I can take for this? So that's really what you can now look at. I'm going to segment my population and find almost like what is identical population to have the similar characteristics by creating this cohort and match it to another cohort, similar cohort and see what was that next option or set of options that might be available to me. So that can be a way about how you can start learning from the data, where in the past, really, I mean, as a cardiologist myself, I mean, you know, you stick us through to three years of someone on the right, cock field, but it can just get their blood pressure under control.
So why does that have to be? And this is by all means not to take that knowledge and experience out of the equation, but rather that supplement it with more data that's tailored to that specific sort of request. And then the next level of things is now we have more data index about the individual. We believe that more information index about the individual will really help make better decisions, you know, so when you have a complete view of a newly structured data, it will lead to better clinical decisions.
And with health, like we also have analytic capability and childrenic capabilities integrated, for example, Amazon QuickSight, developers not going to create dashboards or normalize data to be able to quickly explore trends about the patient populations and be able to personalize that individual level. But also, developers can build, train and deploy their own machine learning models on their data with Amazon SageMaker on top of Amazon healthily in order to make better predictions than they could have done previously. So that way you can intervene more quickly and improve care and reduce costs. For example, if you want to predict someone's like a patient's heart risk or you want to be able to optimize on an improved patient flow, I mean, we've shown anywhere between 5 to 20%, this is working with customers, improving a prediction and tamilist, improving data also index from medical models.
For example, in the work that we've done with Cerner, with heart failure, 15 months, a warning window on heart failure patients. That's a good enough window to give care teams ways about how you can interrogate that journey. But also, it removes the black box out of the equation. I mean, so with features in SageMaker like Clarify and ways about how feature stores where you can understand what went into the model, understand biases early in the data journey, not from building the models, understanding the data biases upfront, what influences the outcome of the model, when you put this model out in the wild, how is it performing, how is it being biased and what other factors are getting in there.
In a way, you can have full transparency into what goes into this model. And similarly, when it comes to structured data, so with integrated national language processing, for example, Amazon healthily, on ingesting of the data automatically Amazon healthily indexes and structures of the textual data. When it does that, it gives you a score, for example, this is a medication, this is a medical event, this is related to this medication and so on. It gives you a lot of context that someone has a family history of access, doesn't mean they have it, but gives you a conference score and all these.
And then map into ontology also, which is another heavy life. When it comes to medicine, medicine is highly contextual. And for example, there's over 14,000 disease codes and just the ICD terminology, which becomes a fairly manual process where, you know, customers spending hours and days and weeks are sifting through and codifying. So we're also about to give you that movement that heavy life, the map into ICD or our experiment and yesterday also introduced a new ontology with SNOMED CT that's not also available to customers.
Okay. So, you know, all the examples that you've been giving that you're giving and certainly what we're using in healthcare today is narrow AI. And we are reluctant to talk about a time when, you know, AI moves from enhancing human capability to actually sort of, I don't want to go taking over, but being able to be able to think for itself. Maybe healthcare or maybe broadly, how far are we away from that general artificial intelligence when the software is able to think for itself and do cognitive functions?
I'm from the school, like where, you know, if you look at a formula one car with millions of sensors collecting hundreds of millions of data every second, you still need a human in a driver in a seat. So I believe where, you know, where the best to settle today has been around power recognition, language of standing and the ability to also make a more accurate prediction. Those are concrete examples that with those tools at hand, you'll be able to solve a lot of the complexity that we deal with today, which is how we can have a complete data strategy and to end while monitoring infrastructure in order to really innovate on this data. Now the most applications we've seen across the board is in terms of operation efficiencies.
I mean, we see, and also if you want to look at FinTech and how more of these documents continue to be the major heavy lift. So this sort of like intelligent document processing is a big deal to deal with today. So if you want to be practical, before we get to the point where AI can think for itself, I can do a lot of these things, which eventually we're going to get to a level where you feel more confident in understanding what went in the model in the first place to be able to rely specifically on certain tasks that, you know, you don't know what you need to kind of worry about, right? So like for example, you know, as a cardiologist, you use a set of scope, as a set of scope ever in a place of cardiologist, right?
So you're getting heavy on using the right tools for the right job to be able to help you accomplish something much, much wider. And oftentimes you see a lot of the applications right now, we're seeing about operation efficiency about how can I remove redundancy from the system? How can I look for things I'm not looking at? Like for example, you know, you do a cast count on a patient and guess what?
You find a nodule in their thyroid and maybe another nodule in their adrenal glands. You know, they're getting even though you were looking at the heart and lungs. Those are things are really great for AI because AI can be unbiased about segmentation of what they're looking at, you know, more broadly we're looking for. But if you ask any radiologists, we'll tell you like their biggest headache is just finding that like, you know, looking for an L4, L5 region, can you zoom in on that?
Oh, that's like a 10, 15 minutes like sort of journey that you have to go through no longer like 30 cuts on CT scan, like or, you know, 300 of those. So it becomes a heavy task. So kind of applying those tools to really help remove a lot of those complexities, making the job easier, indexing all that information so you can make better predictions and walking down the island. In fullness of time, we're going to see when more and more data indexing on a patient with high accuracy and understanding that, you know, the full scope of how these models are operating.
You're going to start seeing a lot more confidence in scenarios that don't matter. So let's not forget what the patient preference is. Let's say you have a patient, you know, which are almost like, like, you know, in stage cancer, where you try to make a prediction six months and all they can really care about is whether they make it to their about as wedding next week. Those things need to really be baked into a lot of these models and AI by far is, you know, comes to empathy.
It's it's it's a sudden and early infancy and whatnot. But look at AI where it is today, like someone in the fourth or fifth grade and they're, you know, there's like a sponge will able to take in as much knowledge and information as possible. But still you need to have the hand holding about specific tasks, perhaps not how they walk or run, but how can make more cognizant decisions and what is the right decision based on one. A lot of that you can find with imperative analysis and and fine tuning over time.
So so that's that's that's that's that's where I believe. And from our standpoint also democratizing this so we can have more adoption of AI across every industry possible by every types of users, not just enhance our of those machine learning practitioners, which we do have frameworks and layers of our stack available to them, but also for the rest of developers and that engineers and software developers and analysts and clinicians, why not to be able to also tailor those to to them. So final question here. I looked at your LinkedIn profile before I before this conversation and I saw that you joined a group called ending pandemics in January 2019 quite prescient of you one year, but we're nowhere close to ending pandemics and what has been as a human being as a as a health informatics person as a cardiologist, what has been your main takeaway in this past sort of 18 to 24 months almost?
It's how we really know about the future, the pathogens and also like if you really if you don't go, you don't know, right? So to combat diseases, you got to be in the field, you got to be close to the problem as we do it as possible. And we all will learn down the road like, you know, we're there any early early signals that could be coming and we've we've we've moaned all along that, you know, these don't like diseases will eventually make it to the human human being. So look at the last two decades, we've seen a lot of that from SARS one, now SARS two from the poll at the Zika to H1N1.
All these are are, you know, you can predict that we're going to see another one of those. And just, you know, how how you can bring early disease detection and response to the field, diagnostics, investment in public health is very key. And investment public health doesn't happen just at a central level or national level. It really have to be as close as the local, for example, in the US, like, you know, enabling local and state to modernize infrastructure, how they can work with health care providers, diagnostic initiatives, all goes happening in the ground, going all the way to sequencing bats and animals around the world and make an accurate map about that that you can share and make better prediction and that sort of thing.
And the pandemic is a nonprofit nonprofit that, you know, participates as a advisory to support along with other colleagues and, like Larry Brinant and Peggy Hamburg, the ex-emissioner at the A&L, I bring in to the guy who really eradicated polio and he worked in the World Health Organization's back in the 60s and 70s. So you can imagine that these are very well passionate folks, but also people like myself spent a lot of time in the field combating diseases and other times there are to, will you enable modernized infrastructure because right now, if you look across the world, we're probably talking about infrastructure, it's really black in behind. And I do believe the mission of our mission at Amazon by democratizing cloud computing, by democratizing access to the most advanced data science and science through, you know, enable machine learning and deep learning, bringing these capabilities in a way that respects privacy and security and enable you to share that information in such a way, be able to scale and bring, you know, the right tool to the toolbox by industry or purpose-built services like we're doing Amazon Comprehend Medical or Amazon Transcribe Medical or Amazon Health like and many others, even like genomics and other things. These, while they seem microservices, but their microservices composed of other sort of a new types of services that enable by other services and enable almost all by machine learning to take out the manual processing that takes a lot of time to index information.
Because often time you'll find that information is really deep in, or the answer is really deep in that large haystack. One thing I mentioned like last year, one thing we did here at Amazon, where I led a team working with the Allen Institute of Artificial Intelligence, Allen Institute took it upon themselves, you know, with massive amount of information we've seen, I mean, which unprecedented by the way, where scientists are on the wall publishing a pre-print service like MedArchive and even a PubMed and that sort of thing, they're finding, every single day we're finding like two hundred, three hundred new news, I'm sorry, new news at ClickFind. And how do you make sense of that information? I mean, so we put together a website called CORD19.8BSCORD is a sense for COVID, an open source data set on COVID-19, where Allen Institute just took all those articles and PDFs on our media to machine readable, and then we put together a website that's all enabled by machine learning to provide you a search enabled by your old search, basically, where you can ask national launch queries like how severe, so things around where you can understand the context, but also find you additional information that can link, for example, like the severte of COVID, it's not just to show you the signs and symptoms, but go down to mechanism accent from the channel, like IEL6 protein might be linked to that.
And imagine that you're finding an answer like that early on in April, March last year. So this is defining a needle in the bottom Atlantic Ocean, so to speak. Only a few articles were able to reference that, but then also link back to prior Ebola or SARS or clinical trials were done in the past decade or two decades. It's like, you know, you might find this to be almost relevant.
But in constellation, it was the Amazon Kendra, which is in the smart intelligent search that you can enable on top of your own structured data, Amazon company and medical to index the content, the medical textual content, and Neptune, which is a machine learning enabled knowledge graph. So the constellation of these three provided a national language query search engine, a neural search engine, and there was several independent reviews that showed that the accuracy and security of the results as we index more more information. I think today we're close to 130,000 articles and artifacts around those articles, and that spans the last 20 years. I'll be able to use just, you know, by asking a question.
Great. Thank you so much for providing such a deep insight on your capabilities at AWS and your vision for where artificial intelligence and health care can go. Taha, thank you so much for taking time and speaking to us today. Of course, no, thank you so much.
Really enjoyed the discussion. This wraps up a year's worth of podcasts at Metcity News. I don't know about you, but I feel less hopeful today than I did at the end of 2020. Back then, we knew that FDA had approved Pfizer BioNTech's COVID-19 vaccine on emergency bases, and there was light at the end of a long, dark tunnel.
To me today, the tunnel has grown only longer and darker. 40% of the country refuses to vaccinate or is hesitant about doing so, and pressure on health care workers is building again. 800,000 Americans are gone. When does individual freedom translate into rank selfishness?
Still it's the time of year when you count your blessings. So I'm thankful for a loving family and a job that I truly love and colleagues I respect. And I also know that this morass that we are in today, these two shall pass. With that, I wish you merry Christmas, happy holidays, and take care until we meet again in 2020.