Strings
Excercise 1
#!/usr/bin/env python seq = raw_input("Enter a DNA sequence: ") print "It is", len(seq), "bases long." bases = ["adenine", "thymine", "cytosine", "guanine"] print bases[0], seq.count('A') print bases[1], seq.count('T') print bases[2], seq.count('C') print bases[3], seq.count('G')
Result:
Enter a DNA sequence: ATTAC It is 5 bases long. adenine 2 thymine 2 cytosine 1 guanine 0
Exercise 2
#!/usr/bin/env python seq = raw_input("Enter a DNA sequence: ") seq = seq.upper() print "It is", len(seq), "bases long." bases = ["adenine", "thymine", "cytosine", "guanine"] print bases[0], seq.count('A') print bases[1], seq.count('T') print bases[2], seq.count('C') print bases[3], seq.count('G')
Result:
Enter a DNA sequence: ATTgtc It is 6 bases long. adenine 1 thymine 3 cytosine 1 guanine 1
Exercise 3
#!/usr/bin/env python seq = raw_input("Enter a DNA sequence: ") seq = seq.upper() print "It is", len(seq), "bases long." bases = ["adenine", "thymine", "cytosine", "guanine"] print bases[0], seq.count('A') print bases[1], seq.count('T') print bases[2], seq.count('C') print bases[3], seq.count('G') unknown = len(seq) - seq.count('A') - seq.count('T') - seq.count('C') - seq.count('G') print "unknown", str(unknown)
Result:
Enter a DNA sequence: ATTU*gtc It is 8 bases long. adenine 1 thymine 3 cytosine 1 guanine 1 unknown 2
'for' loop
Exercise 1
#!/usr/bin/env python seq = raw_input("Enter a DNA sequence: ") for i in range(10): print i, seq
Result:
Enter a DNA sequence: TACG 0 TACG 1 TACG 2 TACG 3 TACG 4 TACG 5 TACG 6 TACG 7 TACG 8 TACG 9 TACG
Exercise 2
#!/usr/bin/env python seq = raw_input("Enter a DNA sequence: ") n = len(seq) for i in range(n): print 'base', i, 'is', seq[i]
Result:
Enter a DNA sequence: GTTCAG base 0 is G base 1 is T base 2 is T base 3 is C base 4 is A base 5 is G
Exercise 3
#!/usr/bin/env python restriction_sites = [ "GAATTC", # EcoRI "GGATCC", # BamHI "AAGCTT", # HindIII ] for site in restriction_sites: print site
Result:
GAATTC GGATCC AAGCTT
Exercise 4
#!/usr/bin/env python site2name = { "GAATTC": "EcoRI", "GGATCC": "BamHI", "AAGCTT": "HindIII" } seq = raw_input("Enter your sequence: ") for site in site2name.keys(): if site in seq: print site2name[site], "(", site, ")", "is present" else: print site2name[site], "(", site, ")", "is not present"
Result:
Enter your sequence: GAATTCTTTTT EcoRI ( GAATTC ) is present BamHI ( GGATCC ) is not present HindIII ( AAGCTT ) is not present
'if' loop
Exercise 1
#!/usr/bin/env python seq = raw_input("Enter your sequence: ") bases = "ACGT" for b in bases: if b in seq: print b, 'count:', seq.count(b)
Result:
Enter your sequence: ACCAGGCA A count: 3 C count: 3 G count: 2
Reading files
Exercise 1
#!/usr/bin/env python fn = './10_sequences.seq' f = open(fn,'r') lines = f.readlines() for i,l in enumerate(lines): print i+1, l.rstrip()
Result:
1 CCTGTATTAGCAGCAGATTCGATTAGCTTTACAACAATTCAATAAAATAGCTTCGCGCTAA 2 ATTTTTAACTTTTCTCTGTCGTCGCACAATCGACTTTCTCTGTTTTCTTGGGTTTACCGGAA 3 TTGTTTCTGCTGCGATGAGGTATTGCTCGTCAGCCTGAGGCTGAAAATAAAATCCGTGGT 4 CACACCCAATAAGTTAGAGAGAGTACTTTGACTTGGAGCTGGAGGAATTTGACATAGTCGAT 5 TCTTCTCCAAGACGCATCCACGTGAACCGTTGTAACTATGTTCTGTGC 6 CCACACCAAAAAAACTTTCCACGTGAACCGAAAACGAAAGTCTTTGGTTTTAATCAATAA 7 GTGCTCTCTTCTCGGAGAGAGAAGGTGGGCTGCTTGTCTGCCGATGTACTTTATTAAATCCAATAA 8 CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA 9 TCTGAAAAGTGCAAAGAACGATGATGATGATGATAGAGGAACCTGAGCAGCCATGTCTGAACCTATAGC 10 GTATTGGTCGTCGTGCGACTAAATTAGGTAAAAAAGTAGTTCTAAGAGATTTTGATGATTCAATGCAAAGTTCTATTAATCGTTCAATTG
Exercise 2
#!/usr/bin/env python f = open('./10_sequences.seq','r') lines = f.readlines() motif = 'CTATA' for i,l in enumerate(lines): pos = l.find(motif) if pos != -1: print motif, 'is found in sequence', i+1, 'Position', pos, '(0 based)' print i+1, l.rstrip()
Result:
CTATA is found in sequence 8 Position 29 (0 based) 8 CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA CTATA is found in sequence 9 Position 62 (0 based) 9 TCTGAAAAGTGCAAAGAACGATGATGATGATGATAGAGGAACCTGAGCAGCCATGTCTGAACCTATAGC
Advanced exercises
Exercise 1: sum up, csv file
#!/usr/bin/env python f = open('./numbers1.csv','r') lines = f.readlines() for l in lines: numbers = l.rstrip().split(',') numbers = [ int(i) for i in numbers ] row_sum = sum(numbers) print row_sum
Result:
716 1139 11 1707 1516
Exercise 2: sum up
#!/usr/bin/env python f = open('./numbers1.csv','r') lines = f.readlines() total = 0 for l in lines: numbers = l.rstrip().split(',') numbers = [ int(i) for i in numbers ] row_sum = sum(numbers) print row_sum total += row_sum print 'Total:', total
Result:
716 1139 11 1707 1516 Total: 5089
Exercise 3: sum up, csv file with quotes
#!/usr/bin/env python f = open('./numbers2.csv','r') lines = f.readlines() total = 0 for l in lines: numbers = l.rstrip().split(',') numbers = [ i.strip() for i in numbers ] # remove space numbers = [ i.strip('"') for i in numbers ] # remove quotes numbers = [ int(i) for i in numbers ] row_sum = sum(numbers) print row_sum total += row_sum print 'Total:', total
Result:
716 1139 11 1707 1516 Total: 5089
Exercise 4: parse FASTA
#!/usr/bin/env python f = open('./sample1.fa','r') lines = f.readlines() for l in lines: l = l.rstrip() if not l: print 'Header', h, 'refer to sequence', s elif l[0] == '>': h = l[1:] s = '' else: s += l
Result:
Header YL069W-1.334 refer to sequence CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATACTGTTCTTCTACCCACCATATTGAAACGCTAACAA Header YAL068C-7235.2170 refer to sequence TACGAGAATAATTTCTCATCATCCAGCTTTAACACAAAATTCGCACAGTTTTCGTTAAGAGAACTTAACATTTTCTTATGACGTAAATGAAGTTTATATATAAATTTCCTTTTTATTGGATACATTACGTGCAACCAAAAGTGTAAAATGATTGGTTGCAATGTTTCACCTAAATTACTT Header YAL070W-223.3355 refer to sequence CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTTACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAACCACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATCTAATATGCCT
Exercise 5: FASTA file with blank lines
#!/usr/bin/env python f = open('./sample2.fa','r') lines = f.readlines() fasta = dict() for l in lines: l = l.rstrip() if not l: next elif l[0] == '>': header = l[1:] fasta[header] = '' else: fasta[header] += l for h,s in fasta.items(): print 'Header', h, 'refer to sequence', s
Result:
Header YAL070W-223.3355 refer to sequence CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTTACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAACCACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATCTAATATGCCT Header YL069W-1.334 refer to sequence CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATACTGTTCTTCTACCCACCATATTGAAACGCTAACAA Header YAL068C-7235.2170 refer to sequence TACGAGAATAATTTCTCATCATCCAGCTTTAACACAAAATTCGCACAGTTTTCGTTAAGAGAACTTAACATTTTCTTATGACGTAAATGAAGTTTATATATAAATTTCCTTTTTATTGGATACATTACGTGCAACCAAAAGTGTAAAATGATTGGTTGCAATGTTTCACCTAAATTACTT
Hide Comments