icdcodex package

Submodules

icdcodex.datacleaning module

preprocess icd-10 hierarchy into a graphical structure that node2vec can use

icdcodex.datacleaning.build_icd10_hierarchy(xml_root: untangle.Element, codes: List[str], root_name: Optional[str] = None, prune_extra_codes: bool = True)[source]

build the icd10 hierarchy

Some codes are specified to be invalid by plain text, so they are pruned by comparing them to a specified set of codes.

Parameters
  • xml_root (untangle.Element) – root element of the code table XML

  • codes (List[str]) – list of ICD codes

  • root_name (str, option) – arbitrary name for the root of the hierarchy. Defaults to “root.”

  • prune_extra_codes (bool) – If True, remove any leaf node not specified in codes

Returns

icd10 hierarchy and ICD-10-CM codes

Return type

Tuple[nx.Graph, List[str]]

icdcodex.datacleaning.build_icd10_hierarchy_from_url(code_desc_url, code_table_url, root_name: Optional[str] = None, return_intermediates=False)[source]

build the icd10 hierarchy by downloading from cms.gov

Parameters
  • code_desc_url (str) – url to the “Code Descriptions in Tabular Order (ZIP)” file

  • code_table_url (str) – url to the “Code Tables and Index (ZIP)” file

  • root_name (str, option) – arbitrary name for the root of the hierarchy. Defaults to “root.”

  • return_intermediates (bool) – If True, return the untangle element and codes. Defaults to False.

Returns

icd10 hierarchy and ICD-10-CM codes

Return type

Tuple[nx.Graph, List[str]]

icdcodex.datacleaning.build_icd10cm_hierarchy_from_zip(code_desc_zip_fp, code_table_zip_fp, root_name: Optional[str] = None, return_intermediates=False)[source]

build the icd10 hierarchy from zip files downloaded from cms.gov

Parameters
  • code_desc_zip_fp (Pathlike) – file path to the “Code Descriptions in Tabular Order (ZIP)” file

  • code_table_zip_fp ([type]) – file path to the “Code Tables and Index (ZIP)” file

  • root_name (str, option) – arbitrary name for the root of the hierarchy. Defaults to “root.”

  • return_intermediates (bool) – If True, return the untangle element and codes. Defaults to False.

Returns

icd10 hierarchy and ICD-10-CM codes

Return type

Tuple[nx.Graph, List[str]]

icdcodex.datacleaning.build_icd9_hierarchy(fp, root_name=None)[source]

build the icd9 hierarchy

Parameters
Returns

icd-9 hierarchy (nx.Graph) and ICD9 codes (List[str])

icdcodex.datacleaning.build_icd9_hierarchy_from_url(url='https://github.com/kshedden/icd9/blob/master/icd9/resources/icd9Hierarchy.json', root_name=None)[source]

build the icd9 hierarchy by downloading the hierarchy files

Parameters
Returns

icd-9 hierarchy (nx.Graph) and ICD9 codes (List[str])

icdcodex.datacleaning.main()[source]
icdcodex.datacleaning.traverse_diag(G, parent, untangle_elem, extensions=None)[source]

traverse the diagnosis subtrees, adding extensions as appropriate

Seventh-character extensions may be specified as a child, sibling or uncle/aunt. Also, some diagnoses are non-billable because they are, parents to more specific sub-diagnoses.

Parameters
  • G (nx.Graph) – ICD hierarchy to mutate

  • parent (str) – parent node

  • untangle_elem (untangle.Element) – XML element, from untangle API

  • extensions (List[Tuple[str,str]], optional) – Seventh character extensions and related descriptions. Defaults to None.

icdcodex.hierarchy module

deserialize icd hierarchies computed in datacleaning.py

icdcodex.hierarchy.icd10cm(version: Optional[str] = None) → Tuple[networkx.classes.graph.Graph, Sequence[str]][source]

deserialize icd-10-cm hierarchy

Parameters

version (str, optional) – icd-10-cm version, including 2019 to 2020. If None, use the system year. Defaults to None.

Returns

ICD-10-CM hierarchy and codes

Return type

Tuple[nx.Graph, Sequence[str]]

icdcodex.hierarchy.icd9() → Tuple[networkx.classes.graph.Graph, Sequence[str]][source]

deserialize icd9 hierarchy

Returns

ICD9 hierarchy and codes

Return type

Tuple[nx.Graph, Sequence[str]]

icdcodex.icd2vec module

Build a vector embedding from a networkX representation of the ICD hierarchy

class icdcodex.icd2vec.Icd2Vec(num_embedding_dimensions: int = 128, num_walks: int = 10, walk_length: int = 10, window: int = 4, workers=1, **kwargs)[source]

Bases: object

fit(icd_hierarchy: networkx.classes.graph.Graph, icd_codes: Sequence[str], **kwargs)[source]

construct vector embedding of all ICD codes

Parameters
  • icd_hierarchy (nx.Graph) – Graph of ICD hierarchy

  • kwargs – arguments passed to the Node2Vec.fit

to_code(vecs: Union[Sequence[Sequence], numpy.ndarray]) → Sequence[str][source]

decode continuous representation of ICD code(s) into the code itself

Parameters

vecs (Union[Sequence[Sequence], np.ndarray]) – continuous representation of ICD code(s)

Returns

ICD code(s)

Return type

Sequence[str]

to_vec(icd_codes: Sequence[str]) → numpy.ndarray[source]

encode ICD code(s) into a matrix of continuously-valued representations of shape m x n where m = self.num_embedding_dimensions and n = len(icd_codes)

Parameters

icd_codes (Sequence[str]) – list of icd code(s)

Raises

ValueError – If model is not fit beforehand

Returns

continuously-valued representations if ICD codes

Return type

np.ndarray

Module contents