Bioinformatics Tools from Bioinformatics Analysis of Macromolecules class


  • Make sure you are not putting a nucleotide sequence into a protein query, and vice-versa.


Intr - internal

Term - terminal

Prom - promoter

Sngl - single

FegenesH, Animal Version - Ab initio gene prediction (SOFTBERRY)




Chimeric Sequences

t(9,22)(q34,q11)  Chromosomes 9 and 22 are affected. Locations q34 and q11 are affected respectively.

Comparison of Fgenesh (softberry) and Genscan.


NG_011759 both return 15 exons total with 11 on one stand, and 4 on the other.

EU445484 Genscan - 7 exons. Softberry - 6 exons.

AF083883 Both returned a result of 3 total exons.

Y16787 GenScan returned a result of 6 exons, while Fgenesh returned a result of 8 exons. In reality there are 7 exons total. Fgenesh (Softberry) had results that were superior to GenScan.

Notes from prof and students

Use both in the lab practical. BLAST putative exons for quality control. GenScan uually had the exons in the right location. Fgen was generally wrong with this regard. Fgen was more sensitive  one out of the 4 cases.

Bacterial Genomes

When accessing a particular region at NCBI search for the ID number of the gene, then click on “change region shown”.


  • don’t forget to select the given organism

Notes: Tu/Op = transcription units/ operons


GeneMark (preferred method)

  • Note, GeneMark not only creates a prediction, but gives alternative predictions and their likelihood.

MRSA sequence NC_002952 GeneMark accurately predicted two of the three genes, which was closer than either of the two other two other programs. It is agreed in  class that GeneMark is the superior overall program.

NC_012578 GeneMark and Softberry both analyzed the sequence and predicted accurately, however Softberry requires the selection of a particular organisms -_-

NC_002505 - GeneMarkined up the numbers perfectly! Softberry was close.

Secondary Structure Prediction (for proteins)

alpha helix

beta strand

coils, turn disorganized region

With regards to CD4, all three structural prediction proteins on the Japanese website turned out to be utterly useless. All three programs predicted an abundance of alpha helices, yet the crystalized structure contained none. I feel like I am spinning chicken bones to try and predict the harvest with these useless programs. (Note, when I used these programs during my practice practical for a general bioinformatics class, these tools turned out to be more useful.) is far superior to the three rotein prediction methods on the Japanese website.

Use cn3d to view the 3D protein structure. Don’t forget Style - Coloring Shortcuts - econdary Structure

Protein Location Prediction Tools

PSORT II - Protein Sorting


Protein Compartment

Between Protcomp and PSORTII, Protcomp accurately predicted the location of CD4 (plasma membrane) however PSORT II did not.

Protcomp accurately predicted the location of all 3 isoforms of CD4. It said that all three were located in the plasma membrane.

Accession Number

Protein Location




mitochondrial protein




cytoplasmic protein




plasma membrane

Not Accurate



nuclear protein


Not Accurate


cytoplasmic protein




extracellular protein

Not Accurate



endoplasmic reticulum




extracellular protein



Note: Protcomp was inaccurate for NP_000509 at first, however when a FASTA format was copy/pasted into the query, instead of the version of the sequence at the bottom of the protein database that includes numbered lines etc., the protein prediction came back accurate.

Conclusion for Protein Prediction Location

Both programs have similar degrees of accuracy. Use both programs to predict protein locations and compare the predictions.

Discrepancy During Lab Practical

Make note of the respective predictions for each program. Rationally interpret the results. One prediction may need to be discarded, or it might have a close tie between two predictions. Also remember that protein locations are not static within the cell, proteins move from one location to another, and can often be found in multiple locations.

In Professional Setting

Follow the steps during the Lab Practical discrepancy, as described above. Additionally, it may be a good idea to search the professional databases (such as NCBI) and infer the location of this protein based on similar proteins.

Transmembrane Proteins Prediction

(When comparing to NCBI ctrl F for “transmembrane region”)

Sosui - (may require firefox.)

In Windows go to the control panel. May need to allow the Sosui website, and update Java.

In Mac Click the security tab in FireFox, edit site list, add Sosui.

TM Predict


Helical Wheel program - A great compliment to the other three programs, a section of a protein sequence thought to be a helical structure can be pasted into this program for analysis. (NOTE: This program assumes a-priori that any sequence caopy/pasted into the query box is a helical region.)

Main site for the Helical Wheel lab lab






Human glycophorin A


Vitamin K epoxide reductase


Bovine rhodopsin


Prediction somewhat off

Prediction was very accurate

Nearly identical to NCBI flat file

TMPredict seemed to be the most clear tool, and the most accurate, with TMHMM2 being a close second.

Gorilla ABC-transporter


CFTR protein


Class verdict - all three programs were pretty accurate and performed with minimal error.

Protein Structural Prediction

Predicts signal to go to the ER and associated cleavage site:

Signal P

Signal P explaination

Disulfide Bonds (Cysteine - Cysteine bonds) (See also the Prosite Tool under Protein Signatures)


  • Used to predict the presence of disulfide bonds in a protein.

DiANNA (not used in BAM class) - A tool for predicting disulfide bonds.

Predict coiled-coils



COILS/ PCOILS (not used in BAM class)

  • Predicts coiled coils and compares the prediction to a database of known sequances.

Repetitive Elements Search

Nucleotide Blast for dbALU database. Finds alu repetitive elements in your nucleotide sequence query.

Leucine Zipper

2zip Server

Protein Signatures

Scan Prosite Tool - Tool excludes motifs with high probability by default (this is a good thing).

Fingerprint Scan - A tool that scans for motif fingerprints.

Nucleotide Signatures

Search for Human Promoters (nucleic acid analysis) - A threshhold of .8 is used by default to avoid false positives (this is a good thing).

CpG - Searches for CpG islands.

Searching with Putative Chimeric Proteins

BLAST-P Against rough-seq database

Miscilaneous Bioinformatics Links

3D Visualization and Prediction



CN3D Exercise

α, β, α/β, α+β, and a transmembrane protein


Inferred Class

Actual Class

1rnb, G-specific endonuclease

1cd8, CD8 molecule

3hhb, hemoglobin

1kzu, light-harvesting complex

1a50, tryptophan synthase β subunit

Phylogenetics Analysis

MEGA - MEGA is a program that I have found to be exceptionally useful for phylogenetics analysis. When I took Evolutionary Biology in 2013, MEGA was the main tool that I used for my phylogenetics project. My project involved comparing 5 different genes between marsupials and placentals, and paying special attention to counterparts in each group.

The program included multiple different algorithms and made the analysis very easy. For that reason, I am very proud to recommend this program to others.

NOTE!!! Be sure to be very careful that everything is aligned properly. When I first made a phylogenetic tree, everything was aligned improperly, and I got bizarre results. My professor, Dr. B, was a very good professor and took the time to show me where I went wrong, and helped me to fix it. If you get bizarre results, chances are, the alignment is off.