AI medical coding, including systems powered by Large Language Models (LLMs) like GPT, has greatly enhanced the efficiency and accuracy of coding processes in healthcare, including dermatology CPT coding. However, these systems are not infallible and can make mistakes, often due to factors like data input quality, interpretation errors, and LLM hallucinations. Let’s explore some potential mistakes with examples:

1. Data Input Quality and Contextual Misinterpretation

Example Scenario: Suppose a dermatologist documented, “Reviewed the patient’s history of a 2 cm benign lesion excised last year. Today, a 0.5 cm benign lesion was excised.”

Potential AI Mistake: An AI system might mistakenly code for a 2 cm lesion excision in the current session if it misinterprets the historical note as part of the current procedure.

Rationale: AI depends heavily on the quality and clarity of the input data. If historical data are not clearly distinguished from current data, the AI might include them in its coding decision.

2. Misinterpretation of Complex Medical Terminology or Abbreviations

Example Scenario: A note reads, “Excision of BCC on the nose, size 1.5 cm.”

Potential AI Mistake: If the AI system is not well-trained to recognize “BCC” as basal cell carcinoma (a type of skin cancer), it might code this incorrectly as a benign lesion removal.

Rationale: AI’s understanding of medical abbreviations and terminologies varies based on its training data. Incomplete or inadequate training can lead to incorrect code suggestions.

3. LLM Hallucinations in Complex Cases

Example Scenario: A report details, “Complex repair of multiple lacerations on different body parts with varying lengths.”

Potential AI Mistake: The AI might “hallucinate” details not present in the report, such as specific lengths or locations of lacerations, and suggest incorrect codes based on these inaccuracies.

Rationale: LLMs, when faced with incomplete information, can sometimes generate details to fill gaps, leading to inaccuracies. This is particularly problematic in complex cases where specific details are crucial for accurate coding.

4. Overreliance on Common Patterns

Example Scenario: A dermatologist documents, “Dermabrasion for rhinophyma.”

Potential AI Mistake: The AI might suggest a code for general dermabrasion for acne scars (e.g., 15781) instead of the specific code for rhinophyma (15782), especially if acne scars are a more common reason for dermabrasion in its training data.

Rationale: AI systems can sometimes default to more commonly seen patterns in their training data, leading to errors in less common scenarios.

5. Misunderstanding Modifiers

Example Scenario: The note states, “Excision of a malignant lesion on the arm followed by a flap reconstruction at the same site.”

Potential AI Mistake: The AI might fail to apply the appropriate modifier to distinguish between the excision and reconstruction procedures, suggesting incorrect billing.

Rationale: Understanding when and how to apply modifiers can be challenging for AI, especially in complex procedures involving multiple steps or stages.

In conclusion, while AI in medical coding offers significant advantages, its reliance on the data quality, training, and inherent limitations like LLM hallucinations can lead to errors. These challenges highlight the importance of human oversight in reviewing and verifying AI-generated codes, especially in complex or ambiguous cases.