⚡ Updated 12.01.23
While LLMs are easy to use for great effect, they’re harder to use for great outcomes.
Rather than a panacea, they opened up a class of data problems that previously seemed virtually intractable. Rather than thinking of them as language understanding and language generating models, think of them as tools that are uniquely capable at going from structured data into unstructured data, and the other direction. That makes them exceptionally powerful in healthcare, because healthcare might be the industry vertical with the highest intersection of natural language and rules-based systems. Extracting contract configurations in our claims system (← structured) from contract language (← unstructured), or explaining (← unstructured language) claims payment decisions (← structured), or translating (← unstructured) a utilization management decision into a patient-friendly letter (← structured) - those are all healthcare use cases that sit right at those intersections. Seek out those intersections, and you’ll have found a language model application in healthcare.
Learn more here: Enforced planning and reasoning within our LLM Claim Assistant
Hallucinations are real, and a sensitive issue in an industry like healthcare.
LLMs are capable of spectacular feats, and they are also capable of spectacularly random flame-outs. We deal with them as follows. First, we focus on use cases with a human in the loop (like our **test result documentation automation** for Oscar Medical Group). Second, hallucinations become more controllable and identifiable in use cases that go from unstructured to structured data, such as our provider data fixer. Third, knowledge injected into the prompt is much more reliably retrieved than knowledge encoded in the model’s weights, especially when the knowledge is esoteric or sparse within the corpus. As context windows expand and with the help of embeddings and vector databases, it becomes easier to implement LLMs as a set of prompt parsing problems.
Learn more here: A Simple Example for Limits on LLM Prompting Complexity
Healthcare regulatory rules are useful language model benchmarks.
Language models need benchmarks. People like to use stuff like MMLU (a collection of language understanding tasks). However, healthcare is highly regulated, so many decisions are already made using very quantifiable and auditable rubrics. Use your rubrics to create your own language model benchmarks. One example is NCQA guidelines for care management programs: already a formalized rule set that many organizations benchmark against - might as well turn it into a language model benchmark.
Healthcare data regulation creates some unique issues.
Because this is such a fast-moving paradigm shift, many companies or models are not yet prepared to accept the contractual and legal obligations of a HIPAA business associate, and the availability of HIPAA-compliant models is inconsistent. Google has responded most quickly to meet rising demand for LLMs-as-a-service. But be aware that “alpha,” “beta,” or so-called “preview” features released by all cloud providers are not automatically included under the existing HIPAA compliance provisions you’ve negotiated with them. And of course, even if you have a signed BAA, you must comply with HIPAA’s minimum necessary rule and obtain appropriate member/patient consents, among other privacy-related obligations.
On the question of open-source models (like Meta’s LLaMA) vs. proprietary models (like OpenAI’s GPT-4):