Welcome to precision-medicine-toolbox documentation!
precision-medicine-toolbox is an open-source python package for medical imaging data preparation for data science tasks. This package is aimed to provide a tool to curate the imaging data and to perform exploratory feature analysis.
If you are using this toolbox, please, cite the original paper:
Primakov, Sergey, Elizaveta Lavrova, Zohaib Salahuddin, Henry C. Woodruff, and Philippe Lambin. "Precision-medicine-toolbox: An open-source python package for facilitation of quantitative medical imaging and radiomics analysis." arXiv preprint arXiv:2202.13965 (2022).

Currently, the toolbox has the following functionality:
- Dataset exploration. This function gets the specified metadata from the DICOM files of the dataset and allows for exploration of the diversity degree of the imaging parameters..
- Dataset quality check. This function checks every scan in the dataset to be in line with the pre-defined requirements:
- imaging modality is correct,
- slice thickness is in the acceptable range,
- number of slices is in the acceptable range,
- all the slices have a target plane resolution,
- in-plane pixel spacing is in the acceptable range,
- reconstruction kernel for CT data is presented and is acceptable.
- Conversion of DICOM to NRRD. This function allows for the conversion of DICOM (CT or MR) dataset into volume (NRRD format) dataset. 2D data is temporarily not supported.
- Basic image pre-processing. This function performs basic image pre-processing steps, selected by the user; the following methods are available:
- N4 bias field correction,
- intensity rescaling, based on fat values or percentile values,
- histogram matching,
- intensities resampling,
- histogram equalization,
- Z-scoring, based on defined normalization coefficients or image-based values,
- image reshaping.
- Unrolling NRRD images & ROI masks into jpeg slices. This function could be used for a quick check of the converted images or any existing NRRD/MHA dataset. It will generate the JPEG images for each ROI slice.
- Extracting of radiomics features. Feature extraction procedure using pyradiomics to obtain the radiomics features for NRRD/MHA dataset.
- Basic analysis of radiomics features. Export to Excel file of features basic statistics and statistical tests values and visualization (in .html report) of:
- features values distributions in binary classes,
- Shapiro-Wilk test for normality check,
- features mutual correlation (Spearman) matrix,
- p-values (corrected) for Mann-Whitney test for features mean values in groups,
- univariate ROC-curves for each feature,
- volumetric analysis: volume-based precision-recall curve + features correlation with volume.
- Binary classification metrics reporting. Given true labels and predicted probabilities, this function performs:
- classification metrics reporting,
- confusion matrices and ROC curves plotting.
Warning! Not intended for clinical use!
Code and documentation
precision-medicine-toolbox is an open-source package, the source code is available online. The online documentation is available here. The functionality of the toolbox is illustrated in the tutorial notebooks.
3rd-party packages used in precision-medicine-toolbox
Our package is using the existing quality tools for the key steps:
- pydicom (DICOM I/O),
- SimpleITK (image I/O and pre-processing),
- pyradiomics (features extraction).
See requirements.txt for more.
Installation
Before use, install the dependencies from the requirements file:
pip install -r requirements.txt
Then clone repository with the git client of your preference.
The latest version is also available at PyPi:
pip install precision-medicine-toolbox
Quick start
The following example illustrates how to initialize an object of a dataset class:
import os, sys
from pmtool.ToolBox import ToolBox
# set up parameters for your imaging dataset
parameters = {'data_path': 'root directory of the imaging data',
'data_type': 'dcm', # DICOM data
'multi_rts_per_pat': False # looks at 1 RTStruct/patient only
}
my_dataset = ToolBox(**parameters)
Contributing
You can contribute to this package at our GitHub by:
- reporting the issues,
- giving us feedback for the code and the documentation via suggestions/comments:
- directly in the Pull request,
- writing and leaving a comment in the Conversation tab,
- sending an e-mail to authors.
Authors and citation
Initial and main developers:
- Sergey Primakov @primakov
- Lisa Lavrova @lavrovaliz
Also you can see the list of the contributors.
License
This project is licensed under the BSD-3-Clause License (see the LICENSE for the details).
Acknowledgements
The authors would like to thank:
- the Precision Medicine department colleagues for their support and feedback,
- Mart Smidt for testing the tool on the different data,
- external users for the feedback,
- PyRadiomics for a reliable open-source tool for features extraction,
- Hugo Aerts et al. for the Lung1 dataset we used to demonstrate our functionality and The Cancer Imaging Archive for the publically available data.