Assignment 4 — Dynamic Programming Solved

$24.99 $18.99

Many problems in bioinformatics amount to approximate string matching. For example, whenever a new gene is sequenced, searching known genes for close matches allows us to attempt to infer the new gene’s functions. Sequence Alignment Specifically, a gene is a string over the alphabet Σ = {A, C, G, T}, and the closeness of two…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Many problems in bioinformatics amount to approximate string matching. For example, whenever a new gene is sequenced, searching known genes for close matches allows us to attempt to infer the new gene’s functions.

  • Sequence Alignment

Specifically, a gene is a string over the alphabet Σ = {A, C, G, T}, and the closeness of two genes is measured by the extent to which they are aligned. An alignment of two strings, x and y, is an arrangement of their characters in columns, in the same order as they originally appeared, potentially interspersed by spaces.

For example, given x = ACGAT and y = TACGCA, one possible alignment is:

A – – – C G A T

T A C G C – – A

Where the ‘’ character is used to indicate a space. Note that a typical further requirement (and will be a requirement for this assignment) is that no column may contain two spaces.

Another possible alignment is:

– A C G – A T

T A C G C A –

Of these two alignments, the second appears to be “better”, but how can we quantify “better”? The score of an alignment is specified by a (|Σ| + 1) × (|Σ| + 1) scoring matrix, δ. For example, the latter of the above alignments has a score of δ(, T) + δ(A, A) + δ(C, C) + δ(G, G) + δ(, C) + δ(A, A) + δ(T, ).

In your programming language of choice, implement a dynamic programming algorithm to find the highest scoring alignment of two strings.

You may assume that both of the given strings will be non-empty and will contain only the characters ‘A’, ‘C’, ‘G’, or ‘T’. You may also assume that the scoring matrix will always contain exactly 5 rows and 5 columns, each in the order ‘A’, ‘C’, ‘G’, ‘T’, ‘’, will always be symmetric about the major diagonal, and will always provide integer scores.

For example, the above strings, along with a simple scoring matrix, would be represented as:

ACGAT

TACGCA

x A C G T –

A10000

C01000

G00100

T00010

-00000

Since this scoring matrix only rewards exact matches, in this situation, the second of the above alignments, with a score of 4, would indeed be better than the first, with a score of 1 (and, in fact, is the highest scoring alignment overall).

Your program must accept as a command line argument the name of a file containing two strings and a scoring matrix as described above, then print to stdout the highest scoring alignment as follows.

  • The first given string, named x, must be printed above the second given string, named y

  • The columns of the alignment must be space-separated, and ‘’ characters must be used to represent spaces within the aligned strings.

  • After printed the alignment, print the score.

For example:

CSC 349 Assignment 4 — Dynamic Programming Page 2 of 2

$ ./compile.sh

  • ./run.sh test_files/in1.txt x: – A C G – A T

y: T A C G C A –

Score: 4

You may first assume that the highest scoring alignment will be unique. Your program will be tested using diff, as its printed output must match exactly.

  • Submission

The following items must be demonstrated/presented to me in lab on the day of the deadline.

  • Definition, base cases, formula, and solution for a dynamic programming algorithm to find the highest scoring alignment of two strings.

The following files are required and must be pushed to your GitHub Classroom repository by the deadline:

  • compile.sh — A bash script to compile your submission (even if it does nothing).

  • run.sh — A bash script to run your submission.

  • *.py or *.java — Your source code in Python or Java of a working program.

Assignment 4 — Dynamic Programming Solved
$24.99 $18.99