In the last blog, we focused our attention on why it is easy to overestimate the benefit of GenAI solutions. In this blog, we will focus on why it is easy to underestimate effort and cost of AI solutions. We will focus on AI solutions delivered as a SaaS application to end users, not bespoke consulting projects to make the argument. Very briefly, when we say SaaS, we mean the provider provides an integrated solution and is responsible for delivering the entire application. The SaaS revolution reduced the need for IT Services for implementing and operating IT solutions significantly reducing total cost of ownership and improving ROI. Therefore, any argument we make for difficulty of ROI in a SaaS model will likely carry over to non-SaaS models.
So, the first question is what do the organizations delivering SaaS applications do to deliver SaaS solutions economically. There is some consensus in the industry that The Twelve-Factor App is a set of best practices that form the basis of most SaaS solutions delivered over the web. Each SaaS provider organization may have its own adaptation or variation, but chances are these twelve factors are a good foundation. The next logical question is, is this approach enough for AI solutions delivered as SaaS?
However, AI solutions are different from standard software because they are so dependent on data. The table below summarizes some of the key differences between AI-SaaS applications and vanilla SaaS applications to illustrate why 12-factor by itself is not enough for AI-SaaS applications.
With GenAI, the data to train the foundation models is all the data available on the web. However, if we need to fine-tune or prompt/instruct these models with “local/private” data, all these issues become more germane.
Another level of complexity with AI applications arises because AI-SaaS applications have an additional stage of development when compared to traditional SaaS. The new stage is “experimentation” that focuses on identifying what “models” to embed within the application as summarized in the figure below.
One of the key problems that experimentation stage solves is if the customer has enough quality data for the business problem at hand and what sort of effort is needed to create the data. This includes cleaning and labeling data, using publicly available data or generated data for the relevant problem. Ensuring that one has quality, representative data with provenance is a critical starting point for AI. Ensuring data quality is an attempt to determine whether the data in the systems of record are reflective of “reality”. It turns out these problems don’t go away with GenAI, and most organizations do not know how much it will cost them to get to a level of quality so that AI solutions will create value.
Even if one has a good enough handle on data, there are many complexities in standing up the end-to-end solution. Any significant AI application has many models working in concert for business outcomes that need to be managed as part of the application. One impact of this is that AI solutions are “brittle” in that changing anything changes everything. For example, in GenAI solutions, small changes in prompts can have material impact on the output, let alone changing foundation models even if, in aggregate, foundation models are converging on the same representation of the world. Therefore, decisions across SDLC can unknowingly introduce jitter resulting in material differences in the experienced quality compared to predicted quality. In addition, it is very easy to introduce significant technical debt, that is much harder to pay back in AI solutions. Therefore, it is important to stand up what is the eventual solution end to end quickly and iterate. Quality only improves with fast iterations. But organizations may not have the patience and all the relevant expertise because the improvements are not likely to be predictable. This means that the entire SDLC for AI must be repeated often for each application potentially across experimentation, development, and operations. This approach is significantly different from typical SDLC approach and requires a dramatically different mindset. GenAI software methods have not matured to a point where we can reliably predict in advance how long it will take for an application to get to a level of quality that will be acceptable to end users without end-to-end iteration. So a SaaS model is necessary, but not yet sufficient for AI enabled solutions to have a chance of delivering ROI.
Finally, compute workloads for AI are much more diverse and difficult to characterize than traditional workloads, scale with data and other factors, so getting predictable runtimes is harder. This unpredictability makes knowing how much infrastructure to use difficult and simplistic approaches such as provisioning for peaks can be expensive. This is especially true if we use GenAI models in what appears to be a simple chat application. Each call into the AI sub-system may involve calling multiple models in compound systems, or agentic systems and the execution times, as well as correctness of answers can be difficult to evaluate. If one must add a call to check the validity of every call to an agentic sub-system, the cost and user experience impacts are not easy to predict.
Net-net, getting to a minimum viable AI solution where value, feasibility and effort for a solution can be understood and justified is difficult because it is unpredictable. As we have seen in this blog, the data, application, and infrastructure elements all make it harder to estimate the cost and time to benefit. From a SaaS company’s perspective, one needs access to customers and their data even before a candidate product can be built. Once a product is built, getting it to a level of quality where the mistakes made by the AI system are acceptable while creating value is key to get to MVP. In the next blog, we will outline what we have done at ThetaRho to begin to address some of these issues.
At ThetaRho, our goal is to provide physicians with the patient information they need to practice medicine with fewer clicks, allowing them to focus on their patients' needs.
Our journey has just begun. There is a fair amount of AI design, application logic development, operations hardening, and persistent testing and validation to be done to make the output of GenAI usable. But we are well on our way with established beta deployments.
We are seeking a few physician groups that use Athenahealth to help finalize the product. To learn more, please visit ThetaRho.ai and sign up. (hyperlink)
Spend less time in the EHR so you can spend more time taking care of your patients, your family, and yourself.