Introduction
Experimental
This is experimental software and a stable API is not expected until version 1.0
What is it?
A python library for building vector representations of ICD-9 and ICD-10 codes. Because it takes advantage of the hierarchical nature of ICD codes, it also provides these hierarchies in a networkx
format.
Motivation
icdcodex
was the first prize winner in the Data Driven Healthcare Track of John Hopkins’ MedHacks 2020. It was hacked together to address the problem of ICD miscodes, which is a major issue for health insurance in the United States. Indeed, while ICD coding is tedious and labour intensive, it is not obvious how to automate because the output space is enourmous. For example, ICD-10 CM (clinical modification) has over 70,000 codes and growing.
There are many strategies for target encoding that address these issues. icdcodex
has two features that make ICD classification more amenable to modeling:
Access to a
networkx
tree representation of the ICD-9 and ICD-10 hierarchiesVector embeddings of ICD codes using the node2vec algorithm (including pre-computed embeddings and an interface to create new embeddings)
Example Code
from icdcodex import icd2vec, hierarchy
embedder = icd2vec.Icd2Vec(num_embedding_dimensions=64)
embedder.fit(*hierarchy.icd9())
X = get_patient_covariates()
y = embedder.to_vec(["0010"]) # Cholera due to vibrio cholerae
In this case, y
is a 64-dimensional vector close to other Infectious And Parasitic Diseases
codes.
The Hackathon Team
Jeremy Fisher (Maintainer)
Alhusain Abdalla
Natasha Nehra
Tejas Patel
Hamrish Saravanakumar
Documentation
See the full documentation: https://icd-codex.readthedocs.io/en/latest/