Strings
Excercise 1
#!/usr/bin/env python
seq = raw_input("Enter a DNA sequence: ")
print "It is", len(seq), "bases long."
bases = ["adenine", "thymine", "cytosine", "guanine"]
print bases[0], seq.count('A')
print bases[1], seq.count('T')
print bases[2], seq.count('C')
print bases[3], seq.count('G')
Result:
Enter a DNA sequence: ATTAC It is 5 bases long. adenine 2 thymine 2 cytosine 1 guanine 0
Exercise 2
#!/usr/bin/env python
seq = raw_input("Enter a DNA sequence: ")
seq = seq.upper()
print "It is", len(seq), "bases long."
bases = ["adenine", "thymine", "cytosine", "guanine"]
print bases[0], seq.count('A')
print bases[1], seq.count('T')
print bases[2], seq.count('C')
print bases[3], seq.count('G')
Result:
Enter a DNA sequence: ATTgtc It is 6 bases long. adenine 1 thymine 3 cytosine 1 guanine 1
Exercise 3
#!/usr/bin/env python
seq = raw_input("Enter a DNA sequence: ")
seq = seq.upper()
print "It is", len(seq), "bases long."
bases = ["adenine", "thymine", "cytosine", "guanine"]
print bases[0], seq.count('A')
print bases[1], seq.count('T')
print bases[2], seq.count('C')
print bases[3], seq.count('G')
unknown = len(seq) - seq.count('A') - seq.count('T') - seq.count('C') - seq.count('G')
print "unknown", str(unknown)
Result:
Enter a DNA sequence: ATTU*gtc It is 8 bases long. adenine 1 thymine 3 cytosine 1 guanine 1 unknown 2
'for' loop
Exercise 1
#!/usr/bin/env python
seq = raw_input("Enter a DNA sequence: ")
for i in range(10):
print i, seq
Result:
Enter a DNA sequence: TACG 0 TACG 1 TACG 2 TACG 3 TACG 4 TACG 5 TACG 6 TACG 7 TACG 8 TACG 9 TACG
Exercise 2
#!/usr/bin/env python
seq = raw_input("Enter a DNA sequence: ")
n = len(seq)
for i in range(n):
print 'base', i, 'is', seq[i]
Result:
Enter a DNA sequence: GTTCAG base 0 is G base 1 is T base 2 is T base 3 is C base 4 is A base 5 is G
Exercise 3
#!/usr/bin/env python
restriction_sites = [
"GAATTC", # EcoRI
"GGATCC", # BamHI
"AAGCTT", # HindIII
]
for site in restriction_sites:
print site
Result:
GAATTC GGATCC AAGCTT
Exercise 4
#!/usr/bin/env python
site2name = {
"GAATTC": "EcoRI",
"GGATCC": "BamHI",
"AAGCTT": "HindIII"
}
seq = raw_input("Enter your sequence: ")
for site in site2name.keys():
if site in seq:
print site2name[site], "(", site, ")", "is present"
else:
print site2name[site], "(", site, ")", "is not present"
Result:
Enter your sequence: GAATTCTTTTT EcoRI ( GAATTC ) is present BamHI ( GGATCC ) is not present HindIII ( AAGCTT ) is not present
'if' loop
Exercise 1
#!/usr/bin/env python
seq = raw_input("Enter your sequence: ")
bases = "ACGT"
for b in bases:
if b in seq:
print b, 'count:', seq.count(b)
Result:
Enter your sequence: ACCAGGCA A count: 3 C count: 3 G count: 2
Reading files
Exercise 1
#!/usr/bin/env python
fn = './10_sequences.seq'
f = open(fn,'r')
lines = f.readlines()
for i,l in enumerate(lines):
print i+1, l.rstrip()
Result:
1 CCTGTATTAGCAGCAGATTCGATTAGCTTTACAACAATTCAATAAAATAGCTTCGCGCTAA 2 ATTTTTAACTTTTCTCTGTCGTCGCACAATCGACTTTCTCTGTTTTCTTGGGTTTACCGGAA 3 TTGTTTCTGCTGCGATGAGGTATTGCTCGTCAGCCTGAGGCTGAAAATAAAATCCGTGGT 4 CACACCCAATAAGTTAGAGAGAGTACTTTGACTTGGAGCTGGAGGAATTTGACATAGTCGAT 5 TCTTCTCCAAGACGCATCCACGTGAACCGTTGTAACTATGTTCTGTGC 6 CCACACCAAAAAAACTTTCCACGTGAACCGAAAACGAAAGTCTTTGGTTTTAATCAATAA 7 GTGCTCTCTTCTCGGAGAGAGAAGGTGGGCTGCTTGTCTGCCGATGTACTTTATTAAATCCAATAA 8 CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA 9 TCTGAAAAGTGCAAAGAACGATGATGATGATGATAGAGGAACCTGAGCAGCCATGTCTGAACCTATAGC 10 GTATTGGTCGTCGTGCGACTAAATTAGGTAAAAAAGTAGTTCTAAGAGATTTTGATGATTCAATGCAAAGTTCTATTAATCGTTCAATTG
Exercise 2
#!/usr/bin/env python
f = open('./10_sequences.seq','r')
lines = f.readlines()
motif = 'CTATA'
for i,l in enumerate(lines):
pos = l.find(motif)
if pos != -1:
print motif, 'is found in sequence', i+1, 'Position', pos, '(0 based)'
print i+1, l.rstrip()
Result:
CTATA is found in sequence 8 Position 29 (0 based) 8 CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA CTATA is found in sequence 9 Position 62 (0 based) 9 TCTGAAAAGTGCAAAGAACGATGATGATGATGATAGAGGAACCTGAGCAGCCATGTCTGAACCTATAGC
Advanced exercises
Exercise 1: sum up, csv file
#!/usr/bin/env python
f = open('./numbers1.csv','r')
lines = f.readlines()
for l in lines:
numbers = l.rstrip().split(',')
numbers = [ int(i) for i in numbers ]
row_sum = sum(numbers)
print row_sum
Result:
716 1139 11 1707 1516
Exercise 2: sum up
#!/usr/bin/env python
f = open('./numbers1.csv','r')
lines = f.readlines()
total = 0
for l in lines:
numbers = l.rstrip().split(',')
numbers = [ int(i) for i in numbers ]
row_sum = sum(numbers)
print row_sum
total += row_sum
print 'Total:', total
Result:
716 1139 11 1707 1516 Total: 5089
Exercise 3: sum up, csv file with quotes
#!/usr/bin/env python
f = open('./numbers2.csv','r')
lines = f.readlines()
total = 0
for l in lines:
numbers = l.rstrip().split(',')
numbers = [ i.strip() for i in numbers ] # remove space
numbers = [ i.strip('"') for i in numbers ] # remove quotes
numbers = [ int(i) for i in numbers ]
row_sum = sum(numbers)
print row_sum
total += row_sum
print 'Total:', total
Result:
716 1139 11 1707 1516 Total: 5089
Exercise 4: parse FASTA
#!/usr/bin/env python
f = open('./sample1.fa','r')
lines = f.readlines()
for l in lines:
l = l.rstrip()
if not l:
print 'Header', h, 'refer to sequence', s
elif l[0] == '>':
h = l[1:]
s = ''
else:
s += l
Result:
Header YL069W-1.334 refer to sequence CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATACTGTTCTTCTACCCACCATATTGAAACGCTAACAA Header YAL068C-7235.2170 refer to sequence TACGAGAATAATTTCTCATCATCCAGCTTTAACACAAAATTCGCACAGTTTTCGTTAAGAGAACTTAACATTTTCTTATGACGTAAATGAAGTTTATATATAAATTTCCTTTTTATTGGATACATTACGTGCAACCAAAAGTGTAAAATGATTGGTTGCAATGTTTCACCTAAATTACTT Header YAL070W-223.3355 refer to sequence CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTTACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAACCACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATCTAATATGCCT
Exercise 5: FASTA file with blank lines
#!/usr/bin/env python
f = open('./sample2.fa','r')
lines = f.readlines()
fasta = dict()
for l in lines:
l = l.rstrip()
if not l:
next
elif l[0] == '>':
header = l[1:]
fasta[header] = ''
else:
fasta[header] += l
for h,s in fasta.items():
print 'Header', h, 'refer to sequence', s
Result:
Header YAL070W-223.3355 refer to sequence CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTTACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAACCACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATCTAATATGCCT Header YL069W-1.334 refer to sequence CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATACTGTTCTTCTACCCACCATATTGAAACGCTAACAA Header YAL068C-7235.2170 refer to sequence TACGAGAATAATTTCTCATCATCCAGCTTTAACACAAAATTCGCACAGTTTTCGTTAAGAGAACTTAACATTTTCTTATGACGTAAATGAAGTTTATATATAAATTTCCTTTTTATTGGATACATTACGTGCAACCAAAAGTGTAAAATGATTGGTTGCAATGTTTCACCTAAATTACTT
Hide Comments