MiXCR for RNA-Seq and WES

From CPB Wiki
Revision as of 15:33, 21 August 2023 by Giusti (talk | contribs) (MiXCR for RNA-Seq and WES)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  1. MiXCR Shotgun Analysis

Collection of tools to run MiXCR on shotgun (RNA-Seq and WES) samples and analyze them. Analysis tools for clonality, heavy/light chain ratio (pro-B signature) and VDJ gene usage heatmaps.

Git repository: https://gitlab.com/guilhermenngiusti/mixcr_shotgun

    1. How to use
      1. Running MiXCR

MiXCR docker container can be obtained using the following command:

`docker pull milaboratory/mixcr`

MiXCR analysis of RNA-Seq or WES can bu run using the following command:

`docker run -it --rm -v $(pwd)/VOLUME_DIR:/work milaboratory/mixcr:latest mixcr analyze shotgun -s mmu --starting-material MATERIAL R1_FASTQ R2_FASTQ WORK_DIR`

      1. Clonality analysis

The Python script to perform clonality analysis can be run from the top level directory using the following command:

`python3 tools/clonality.py -i SAMPLE_DIR -o OUTPUT_DIR -m TARGET_MARKER`

Argument description:

- `-i` Directory where MiXCR clonotypes files are stored; - `-o` Directory for output files; - `-m` Target marker. Options: `IGH`, `IGK`, `IGL`, `TRA`, `TRB`, `TRD` and `TRG`; - `-t` (Optional) Translation table used to rename samples. Table must be a `tsv` file with the columns `original_name` and `translated_name`.

Each wedge of the resulting plot will only be labeled if its clonotype presents frequency ≥ 5%.

Gini-Simpson diversity index presented at the center of the plot.

This script also produces a `count.txt` file containing the total number of reads for the analyzed marker.

      1. Pro-B cell signature analysis

The Python script to perform pro-B cell signature analysis can be run from the top level directory using the following command:

`python3 tools/pro-B_signature.py -i SAMPLE_DIR -O OUTPUT_DIR -n ANALYSIS_TITLE`

Argument description:

- `-i` Directory where MiXCR clonotypes files are stored. Each sample contained in this directory must be stored in its own subdirectory; - `-o` Directory for output files; - `-n` Title for the analysis. Used as the name for the `jpg` file generated; - `-t` (Optional) Translation table used to rename samples. Table must be a `tsv` file with the columns `original_name` and `translated_name`; - `--yscale` (Optional) Max value for the ordinates axis. Default `10`.

      1. VDJ genes usage analysis

The Python script to perform VDJ genes usage analysis can be run from the top level directory using the following command:

`python3 tools/vdj_heatmap.py -i SAMPLE_DIRS -o OUTPUT_DIR -m MARKER -g GENE_FAMILY`

Argument description:

- `-i` Directory where MiXCR clonotypes files are stored. Can receive multiple directories; - `-o` Directory for output files; - `-m` Target marker. Options: `IGH` or `IGK`; - `-g` VDJ gene family. Options: `V`, `D` and `J`; - `--xsize` (Optional) Heatmap width; - `--ysize` (Optional) Heatmap length; - `-t` (Optional) Translation table used to rename samples. Table must be a `tsv` file with the columns `original_name` and `translated_name`; - `-l` (Optional) Draws horizontal lines separating sample groups in heatmap. Integer representing after how many sample the line should be drawn. Can receive multiple values; - `--linewidth` (Optional) Sets width for horizonal lines separating sample groups in heatmap; - `-s` (Optional) Sample to be represented on a secondary heatmap. Can receive multiple values. If a translation table is used, the provided sample name should be the translated one; - `-S` (Optional) Group name for separate samples, used for secondary heatmap output file; - `--secxsize` (Optional) Secondary heatmap width; - `--secysize` (Optional) Secondary heatmap length.

The heatmaps generated by this script always represent only one marker (e.g. IGH) and VDJ gene family (e.g. V) at a time.

    • This script currently only supports the IGH and IGK markers from mice.** In order to perform analysis on other markers, complete genome coordinate files should be added to `tools/coordinates`, using the same format as the `tsv` files already available. In order to analyze human (or other species) markers, replace the genome coordinate mice files for ones from your species.

Genomic coordinates files can be generated at https://www.ncbi.nlm.nih.gov/datasets/gene/.

    1. Credits

Created by Guilherme Giusti at Boldrini Research Center, 2023.