Mutation Analysis

You can use mutation_pred.py to predict the likelihood of break occurrences in the vicinity of a specified mutation site.

Downloading Genome Chromosome Files

Before running mutation_pred.py, you need to ensure that the respective genome chromosome files (.fa) have been downloaded. Specifically, if you plan to analyze a mutation on chromosome 14 of the hg19 genome version, you need to download the chr14.fa.

Here's a step-by-step guide on how you might download these files from a public database:

  1. Visit a public database that provides genome sequences, such as the UCSC Genome Browser.

  2. Locate and select the genome version you need, for example, "hg19".

  3. Find and click the chromosome file you need to download, such as "chr14".

  4. The file should download automatically.

Make sure to place the downloaded file in the ./genome directory. For instance, the file path should look like ./genome/hg19/chr14.fa.

Running Prediction

After downloading the genome chromosome file, you can run the mutation prediction using the following command:

python mutation_pred.py --genome hg19 --chr chr14 --pos 73659501 --ref T --alt C

In this command:

  • --genome hg19 specifies the genome version.

  • --chr chr14 represents the chromosome of interest.

  • --pos 73659501 identifies the position of the mutation on the chosen chromosome.

  • --ref 'T' indicates the reference allele, i.e., the original nucleotide at the mutation site.

  • --alt 'C' designates the alternate allele, which is the mutated nucleotide.

Output and Visualization

This script outputs a CSV file, containing predicted break probabilities for 75 nucleotides upstream and downstream of the Single Nucleotide Polymorphism (SNP) site. This range allows a detailed analysis of the local effects of the mutation on the genome's stability.

Additionally, the script generates a visual representation of the break probabilities before and after the mutation, will be saved in ./snp. This figure can provide a quick, intuitive understanding of the mutation's potential impact.

Last updated