What is Needed to Make GenAI Work for Users in Healthcare?

Application-centric AI and Auto-formalization

whatisneeded

As discussed in the previous blog , administrative processes in healthcare may be a high-value target for impact by artificial intelligence (AI). However, many technical challenges remain in translating what is possible in theory with AI into practice, particularly from a user experience perspective.

Clearly, AI has enormous potential. For example, what if all the contextual data needed by any physician could be retrieved with a single request? Including information exchanged between the patient and the physician and data pertaining to external medical events (e.g., laboratory test results, hospital stays, etc.)? What if the physician does not need to navigate the electronic health record (EHR) during the patient visit? What if the information is delivered in the appropriate format, such as text, graphs, or images? All of this is achievable with AI. However, much remains to be done to turn this possibility into reality, beginning with the challenge of taming AI and managing the consequent change in user experience.

Application-centric AI

One way to minimize change experienced by users is to seamlessly integrate AI capabilities into the workflows – process, data, and user experience – that clinicians prefer. For example, the default set of information displayed at the beginning of a patient visit should be accessible from places where physicians navigate to in the EHR anyway. Furthermore, this information should be relevant and customized to the patient's needs, such that a 12-year-old child's visit for an asthma check-up will elicit a different set of information compared to a 70-year-old woman presenting for a Medicare physical. Similarly, AI should deliver responses in an appropriate format that is best suited to the context. Whether the visit is for a routine exam, a follow-up after a procedure, or a new medical problem, the AI should determine, based on previous learning, what contextual information is necessary for the physician in each situation and deliver the entire picture with one request. Overall, compelling user experiences are not delivered in isolation. They must be in the context of the clinical situation process and data relevant to that situation.

In traditional application software, the User Interface, the Application Logic, and the underlying Data Model address the trifecta of concerns: experience, process, and data. This framework is so common that it has a name: model-view-controller. Non-AI applications have well-defined and predictable expected behavior, which serves as the foundation for producing high-quality software. In theory, incorporating AI into this design pattern is conceptually straightforward since, at inference time, the Application Logic invokes the AI, determines whether to accept the AI's output and generates content for the next interaction. In practice, it is not that simple. For example, there exists the challenge of how much of the uncertainty of the AI is exposed to the user and how to solicit user feedback to improve the AI without disrupting the normal application flow. The fact is that at the point of inference, there is no way of knowing whether the AI's prediction is "correct." Most applications simply accept the prediction if the "confidence" reported by the AI is "good enough." When the cost of errors is high, as in the healthcare setting, the application's process, data, and experience must account for this uncertainty because the application, not the AI, has the complete context needed to recover from errors. Therefore, we believe that the Application Logic utilizing the AI and the eventual user experience are more important to the success of AI. We call this Application-centric AI.

Auto-formalization

The inherent uncertainty in using predictions from AI is frequently misunderstood. It is tempting to believe that AI is infallible because it has reportedly outperformed physicians on standardized tests like the USMLE or has diagnosed conditions that dozens of clinicians could not. Rest assured, it is not. We will explore the technical reasons for this phenomenon in a future blog. In fact, the simplistic approach described in the previous section might have been effective if GenAI had been infallible. However, GenAI is inherently probabilistic and has unusual non-intuitive failure modes. For example, adding extra space or punctuation in the input can dramatically change the output of GenAI. Further, the same input is not guaranteed to produce the same output every time. However, constraining GenAI to consistently produce the same output for a given input all the time is unnecessary because requiring determinism would severely curtail GenAI's usability. GenAI's probabilistic nature enables it to execute a wide range of potentially valuable tasks. In our experience with early customers, we believe that rather than imposing determinism on GenAI as a requirement for robustness, it is sufficient to impose the following requirements in retrieval use cases:

Soundness: The GenAI's output should be verifiable for truthfulness. For example, suppose the output claims that a particular observation, such as a blood pressure measurement, was recorded on a specific day and had a particular value. The output should be correct regardless of how it is expressed in natural language.

Completeness: The GenAI's output should contain all relevant information. For example, if the query concerns the patient's allergies, then no allergy should be left out.

Relevance: The GenAI's output should contain only relevant information. For example, if the query concerns colonoscopy, the GenAI should not output blood pressure measurements, even if those values are correct.

Overall, an AI solution in healthcare for information retrieval is more than simply invoking the AI; the output needs to be grounded, where grounding is more than merely referencing the source. That has been attempted, and it does not solve the problem in the healthcare setting. We must transition from probabilistic regimes to formal regimes that satisfy soundness, completeness, and relevance. We must create patterns of compensation mechanisms to identify and recover from GenAI errors,. We need Auto-formalization for the healthcare setting with the appropriate formal representation to guarantee that GenAI's output is reliable. In future blogs, we will explore the specifics of the technical approaches outlined here on application-centric AI and the auto-formalization approaches we are developing at ThetaRho.

Call to Action

At ThetaRho, our goal is to provide physicians with the patient information they need to practice medicine with fewer clicks, allowing them to focus on their patients' needs.

Our journey has just begun. There is a fair amount of AI design, application logic development, operations hardening, and persistent testing and validation to be done to make the output of GenAI usable. But we are well on our way with established beta deployments. And eager for new partners to join us.

We're seeking a few physician groups that use Athenahealth to help finalize the product and provide feedback on the next set of features. To learn more, please visit ThetaRho.ai and sign up. (hyperlink)

Spend less time in the EHR - so you can spend more time taking care of your patients, your family, and yourself.