Computational-Biology Lab 5 Solution

$30.00 $24.00

— Task 1 => Download the CDS sequences of Escherichia coli from the following link. `ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz` => Unzip the file. Task 1A (`1 pt`) => Write a python script (`answers.py`) to retrieve the start and stop codon, and gene length for each protein_coding gene (not pseudogenes). You may benefit from the key of the previous…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Task 1

=> Download the CDS sequences of Escherichia coli from the following link.

`ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz`

=> Unzip the file.

Task 1A (`1 pt`)

=> Write a python script (`answers.py`) to retrieve the start and stop codon, and gene length for each protein_coding gene (not pseudogenes). You may benefit from the key of the previous lab (for eg `fastareader` function). Save the output in a tab-separated text file (`ecoli_startStop.tsv`).

“`

start stop length

ATG TAG 939

ATG TAA 414

ATG TGA 414

ATG TAG 600

ATG TAA 693

ATG TAA 1320

ATG TAA 1536

ATG TAA 966

ATG TGA 1110

“`

=> Save your work in `answers.py`.

=> Push the python script and tsv file to your GitHub repo.

Task 1B (`2 pts`)

=> Write a python code (in the same file: `answers.py`) to generate the codon usage profile and save it in a file named `ecoli_codons.tsv`. You may benefit from the `genetic_code` module to retrieve amino acids. The file will look like:

“`

codon aminoacid count

AAA K 44704

AAC N 28589

AAG K 13563

AAT N 23075

ACA T 9128

ACC T 31223

ACG T 19133

“`

Task 2

=> Use `R` and `ggplot2` package to generate the following plots. You may either download and use local Rstudio or use [Rstudio cloud](https://rstudio.cloud). Remember to install `ggplot2` and `dplyr` packages.

=> Save your script in `plots.R` and your plots. Push them all to GitHub.

Task 2A (`2pts`)

Generate the plots given in `task2A.pdf` and save them in a single pdf file named `ecoli_startStop.pdf`

The plots basically give:

1) Gene length distribution

2) Gene length distribution filled by start codons

3) Gene length distribution filled by start codons both x and y axes are in log scale

4) Gene length distribution filled by stop codons

5) Gene length distribution filled by stop codons both x and y axes are in log scale

6) Gene length distribution for each start (x) and stop (y) codon. Use `facet_grid`

7) Gene length distribution for each start (x) and stop (y) codon log-scaled. Use `facet_grid`

Hints:

* `ggplot(genes) + geom_histogram(aes(…))` is going to be used for every plot in this task.

* Use the following code block to save multiple plots in a single pdf

“`

genes <- read.table(‘ecoli_startStop.tsv’, sep=’\t’, header=T)

pdf(‘ecoli_startStop.pdf’)

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution filled by start codons

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution filled by start codons both x and y axes are in log scale

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution filled by stop codons

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution filled by stop codons both x and y axes are in log scale

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution for each start (x) and stop (y) codon. Use facet_grid

ggplot(genes) + geom_histogram(aes(…)) Gene length distribution for each start (x) and stop (y) codon log-scaled. Use facet_grid

dev.off()

“`

Task 2B (`2 pts`)

Generate the codon usage profile plot as given in `task2B.pdf` and save it in a single pdf file named `ecoli_codon.pdf`

Hints:

* Use `filter` function to remove stop codons by requiring `aminoacid` to be not equal to `*`.

* Use `geom_bar(stat=”identity” aes(…))`. `…` will be filled in with correct parameters such as `x` `y` and `fill`.

* Use `facet_wrap` to generate bar plot individually for each amino acid.

Task 3 (`3 pts`)

=> Perform the same analysis in task 1 and task 2 with _`Thermotoga maritima`_. Append `answers.py` and `plots.R` with the new code for _T. maritima_. Your output pdf files should be named as `tmaritima_startStop.pdf` and `tmaritima_codons.pdf`.

The download link is the following:

`ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/545/GCF_000008545.1_ASM854v1/GCF_000008545.1_ASM854v1_cds_from_genomic.fna.gz`

=> Push all the output files to your repo.

=> Compare and contrast the start and stop codon usages as well as codon usage profiles between _E. coli_ and _T. maritima_. Take a special look into `Arginine (R)` codons. Write 1 paragraph on what you observe in this file, below.

Your paragraph on codon usage profiles of two organisms:

“`

“`

Computational-Biology Lab 5 Solution
$30.00 $24.00