|
|
Line 1: |
Line 1: |
| # MiXCR Shotgun Analysis
| |
|
| |
| Collection of tools to run MiXCR on shotgun (RNA-Seq and WES) samples and analyze them. Analysis tools for clonality, heavy/light chain ratio (pro-B signature) and VDJ gene usage heatmaps. | | Collection of tools to run MiXCR on shotgun (RNA-Seq and WES) samples and analyze them. Analysis tools for clonality, heavy/light chain ratio (pro-B signature) and VDJ gene usage heatmaps. |
|
| |
|
| Git repository: https://gitlab.com/guilhermenngiusti/mixcr_shotgun | | Git repository: https://gitlab.com/guilhermenngiusti/mixcr_shotgun |
|
| |
| ## How to use
| |
|
| |
| ### Running MiXCR
| |
|
| |
| MiXCR docker container can be obtained using the following command:
| |
|
| |
| `docker pull milaboratory/mixcr`
| |
|
| |
| MiXCR analysis of RNA-Seq or WES can bu run using the following command:
| |
|
| |
| `docker run -it --rm -v $(pwd)/VOLUME_DIR:/work milaboratory/mixcr:latest mixcr analyze shotgun -s mmu --starting-material MATERIAL R1_FASTQ R2_FASTQ WORK_DIR`
| |
|
| |
| ### Clonality analysis
| |
|
| |
| The Python script to perform clonality analysis can be run from the top level directory using the following command:
| |
|
| |
| `python3 tools/clonality.py -i SAMPLE_DIR -o OUTPUT_DIR -m TARGET_MARKER`
| |
|
| |
| Argument description:
| |
|
| |
| - `-i` Directory where MiXCR clonotypes files are stored;
| |
| - `-o` Directory for output files;
| |
| - `-m` Target marker. Options: `IGH`, `IGK`, `IGL`, `TRA`, `TRB`, `TRD` and `TRG`;
| |
| - `-t` (Optional) Translation table used to rename samples. Table must be a `tsv` file with the columns `original_name` and `translated_name`.
| |
|
| |
| Each wedge of the resulting plot will only be labeled if its clonotype presents frequency ≥ 5%.
| |
|
| |
| Gini-Simpson diversity index presented at the center of the plot.
| |
|
| |
| This script also produces a `count.txt` file containing the total number of reads for the analyzed marker.
| |
|
| |
| ### Pro-B cell signature analysis
| |
|
| |
| The Python script to perform pro-B cell signature analysis can be run from the top level directory using the following command:
| |
|
| |
| `python3 tools/pro-B_signature.py -i SAMPLE_DIR -O OUTPUT_DIR -n ANALYSIS_TITLE`
| |
|
| |
| Argument description:
| |
|
| |
| - `-i` Directory where MiXCR clonotypes files are stored. Each sample contained in this directory must be stored in its own subdirectory;
| |
| - `-o` Directory for output files;
| |
| - `-n` Title for the analysis. Used as the name for the `jpg` file generated;
| |
| - `-t` (Optional) Translation table used to rename samples. Table must be a `tsv` file with the columns `original_name` and `translated_name`;
| |
| - `--yscale` (Optional) Max value for the ordinates axis. Default `10`.
| |
|
| |
| ### VDJ genes usage analysis
| |
|
| |
| The Python script to perform VDJ genes usage analysis can be run from the top level directory using the following command:
| |
|
| |
| `python3 tools/vdj_heatmap.py -i SAMPLE_DIRS -o OUTPUT_DIR -m MARKER -g GENE_FAMILY`
| |
|
| |
| Argument description:
| |
|
| |
| - `-i` Directory where MiXCR clonotypes files are stored. Can receive multiple directories;
| |
| - `-o` Directory for output files;
| |
| - `-m` Target marker. Options: `IGH` or `IGK`;
| |
| - `-g` VDJ gene family. Options: `V`, `D` and `J`;
| |
| - `--xsize` (Optional) Heatmap width;
| |
| - `--ysize` (Optional) Heatmap length;
| |
| - `-t` (Optional) Translation table used to rename samples. Table must be a `tsv` file with the columns `original_name` and `translated_name`;
| |
| - `-l` (Optional) Draws horizontal lines separating sample groups in heatmap. Integer representing after how many sample the line should be drawn. Can receive multiple values;
| |
| - `--linewidth` (Optional) Sets width for horizonal lines separating sample groups in heatmap;
| |
| - `-s` (Optional) Sample to be represented on a secondary heatmap. Can receive multiple values. If a translation table is used, the provided sample name should be the translated one;
| |
| - `-S` (Optional) Group name for separate samples, used for secondary heatmap output file;
| |
| - `--secxsize` (Optional) Secondary heatmap width;
| |
| - `--secysize` (Optional) Secondary heatmap length.
| |
|
| |
| The heatmaps generated by this script always represent only one marker (e.g. IGH) and VDJ gene family (e.g. V) at a time.
| |
|
| |
| **This script currently only supports the IGH and IGK markers from mice.** In order to perform analysis on other markers, complete genome coordinate files should be added to `tools/coordinates`, using the same format as the `tsv` files already available. In order to analyze human (or other species) markers, replace the genome coordinate mice files for ones from your species.
| |
|
| |
| Genomic coordinates files can be generated at https://www.ncbi.nlm.nih.gov/datasets/gene/.
| |
|
| |
| ## Credits
| |
|
| |
| Created by Guilherme Giusti at Boldrini Research Center, 2023.
| |