MedyCode Assistant


Revolutionize Your Medical Coding with AI-Driven Precision.


Our cutting-edge application transforms the traditional medical billing landscape by automatically determining ICD codes with unparalleled efficiency.


Powered by advanced Large Language Models and Retrieval-Augmented Generation (RAG), our solution processes clinical notes and assigns ICD-10 codes in real time, reducing manual errors, and enhancing compliance.


Ideal for hospitals and clinics, this tool is designed to streamline your operations, cut down on administrative burdens, and ensure every coding decision supports optimal patient care while maximizing reimbursements.


Join the forefront of healthcare innovation and take your medical coding process to the next level with our reliable, AI-enhanced solution.

Features

Our application leverages advanced machine learning systems to automatically assign accurate ICD codes to clinical notes, revolutionizing the medical billing process. By utilizing the comprehensive MIMIC-IV dataset from PhysioNet, the application ensures high accuracy and reliability, significantly reducing manual coding errors and administrative workload. This innovative feature not only streamlines the billing process but also ensures compliance with healthcare regulations, leading to improved reimbursement rates and enhanced operational efficiency for healthcare providers.

Features Image

We employ state-of-the-art and open-source Large Language Models (LLMs) to analyze clinical notes. We use Retrieval-Augmented Generation (RAG) to enhance the performance of our LLMs with a vectorized medical knowledge graph, ensuring precise and contextually accurate ICD-10 code assignments.

Our RAG architecture achieves up to 4 times the accuracy in ICD-10 code assignments compared to non-RAG methods, illustrating the power of our architecture over a standard LLM-based approach.

The MIMIC-IV dataset is integral to our model's training process. We preprocess the data through cleaning, normalization, and feature extraction to ensure high-quality input for our algorithms. This comprehensive dataset provides a robust foundation for accurate model predictions.

Manual coding errors are common and costly. Our application will addresses these issues by standardizing the coding process and eliminating human error thorugh the introduction of LLMs enhanced by contextually-relevant medical information.

We ensure compliance with healthcare regulations, including HIPAA, through rigorous auditing and validation processes. Our application maintains high standards for data security and patient confidentiality, meeting all regulatory requirements.

Our user-friendly interface allows healthcare providers to input clinical notes and receive ICD code determinations in real-time. The application integrates seamlessly with existing electronic health record (EHR) systems, providing a smooth and efficient workflow.

By automating the coding process, our application frees up valuable time for healthcare providers, allowing them to focus on patient care. Testimonials from users highlight significant improvements in operational efficiency and illustrate overall satisfaction with the application.

Data and Infrastructure Image

Data and Infrastructure

Our application is built on a robust data and infrastructure foundation, utilizing the MIMIC-IV dataset from PhysioNet, a leading resource for critical care research. The dataset includes detailed patient information, clinical notes, diagnoses, and procedures from ICU admissions, providing a rich source of data for training and evaluation. Our team has secured the necessary credentials to access this comprehensive dataset, ensuring compliance with all data usage policies. The infrastructure supporting our application leverages scalable cloud services and advanced machine learning frameworks, ensuring high performance, reliability, and security. This robust setup enables efficient processing of large volumes of data, facilitating accurate and real-time ICD code assignment.

Modeling Image

Technical Architecture

Our application employs a sophisticated technical architecture to process clinical notes and generate accurate ICD-10 codes. The process begins with the use of medical Named Entity Recognition (NER) on the MIMC-IV data utilizing the tools provided by John Snow Labs. The NER pipeline extracts key medical entities such as symptoms, diagnoses, medications, and procedures from the data.

Next, the structured data from the medical NER is integrated into a comprehensive knowledge graph. This graph organizes and represents the relationships and connections between different medical entities, thereby enhancing and interconnecting the medical knowledge available to the system. The knowledge graph is then vectorized for efficient querying.

A local Large Language Model (LLM) then takes the original clinical notes or text data and processes them to extract medical entities. The medical entities from the text data are used to query the knowledge graph to extract contextually-relevant medical information include the associated medical codes.

The clinical entities from the original medical note, along with additional context derived from the knowledge graph, are then used to generate a robust and contextually-relevant prompt for our second local LLM. The prompt include additional constraints and direction on what information should be used in making a prediction. The LLM, augmented with the enriched medical knowledge from the previous steps, then predicts the best possible ICD-10 code based on the provided information. These codes represent standardized medical diagnoses, which are crucial for billing, reporting, and maintaining medical records.

Modeling

Our application employs cutting-edge LLMs with Retrieval-Augmented Generation to achieve high accuracy in ICD code assignment from clinical notes. The modeling process begins with extensive data preprocessing, including text cleaning, tokenization, and vectorization, to transform raw clinical notes into structured data suitable for analysis. Using LLMs, we capture the nuanced language and complex relationships within clinical notes. The RAG approach enhances this capability by incorporating relevant external knowledge during the prediction process, further improving the accuracy of ICD code assignment. These models are trained and validated using the comprehensive MIMIC-IV dataset, ensuring they generalize well to real-world scenarios. Continuous evaluation and fine-tuning are performed to enhance model performance, leveraging metrics like accuracy, precision, recall, and F1-score to measure success. This advanced modeling approach ensures our application delivers reliable and accurate ICD code predictions, streamlining the medical billing process.

Modeling Image

Baseline Model

Data Input: The process begins with a medical professional or healthcare provider who inputs clinical notes into the system. The notes contain detailed information about a patient's symptoms, medical history, and test results.

First Local LLM: The clinical notes are first processed by a local LLM. This model is instructed and engineered to extract the medical terminology and context within the notes, producing structured medical entities in a sandardized output.

Second Local LLM Processing: The structured medical entities generated by the first LLM are fed into the prompt for the second local LLM that predicts ICD-10 Codes. The prompt includes additional context and instructions to constrain the LLM's response to ensure that the output is relevant and in a standardized format.

ICD-10 Code Prediction: Finally, the second LLM predicts the most likely ICD-10 code based on the included medical entities and the instructions provided via the prompt.


RAG Model

Retrieval: To augment the second LLM that performs ICD-10 code prediction, the system performs a retrieval step, querying our previously constructed knowledge graph to find relevant information that can provide additional context and support for assigning the correct ICD-10 code to the clinical notes.

Query Augmentation: The information retrieved in the previous step is used to augment the final prompt. This augmented prompt now includes not only the instructions and the original details from the clinical notes, but also supplementary information from the knowledge graph. This step ensures that the model has access to a broader knowledge base and can consider additional, contextually-relevant information when making predictions.

Baseline Image

About Us

Author 1

Alex Roy

Author 3

Darya Likhareva

Author 2

Dylan Daniels

Author 4

Tanmay Mahapatra

Author 5

Quyen Ha