• Bioinformatics is the application of
information to the field of molecular biology.
• Bioinformatics involves the collection,
organization and analysis of large amounts of biological data using networks of
computers and databases.
•The primary goal of bioinformatics is to
increase our understanding of biological processes.
• It is a science of managing and analyzing
biological data using advanced computing techniques.
• Common activities in bioinformatics
include mapping and analyzing DNA and protein sequences, aligning different DNA
and protein sequences for comparison among them.
• Applications of bioinformatics includes
gene mapping, sequence analysis, measuring biodiversity etc.
Summary of Bioinformatics |
• PIR maintains a protein sequence database
that contains almost three lack sequences covering the entire taxonomic range.
• In a biological database the information
stored will be various biological data.
• SWISS-PROT is one of the most popular
protein sequence resources because of the quality of its entries. Also
SWISS-PROT contains 70,000 entries from more than 5000 different species.
• TrEMBL (Translated European Molecular
Biology Laboratory). A special feature of TrEMBL format is that it contains
translations of all coding sequences (CDS).
• Composite databases use a variety of
different primary sources and are hence efficient to search.
• In secondary databases homologous
sequences may be gathered together in multiple alignments.
• The EMBL Nucleotide Sequence Database is
a comprehensive database of DNA and RNA sequences collected from the scientific
literature and scientific applications.
• In DDBJ data is produced, maintained and
distributed at the national institute of Genetics.
• Gen Bank is another DNA database and it
incorporates sequences from publicly available sources.
• The purpose of specialized resources to
focus on species — species genomics and to particular sequencing techniques.
• UniGene represents genes from many
organisms and each cluster relating to a unique gene and including related
information corresponding to the gene.
• The term DNA sequencing refers to methods
for determining the order of nucleotide bases adenine (A), Thymine (T), Guanine
(G) and Cytosine (C) in a molecule of DNA.
• Pairwise sequence alignments can only be
used between two sequences at a time but they are very efficient to find out
the similarities.
• A large part of currently available DNA
data is made up of partial sequences. They are called expressed sequence tags
(ESTs).
• The most basic method of comparing two
sequence is a visual approach known a dot-plot. Dot lot is a biological
sequence comparison plot.
• Within a dot plot two identical sequences
are characterized by a single unbroken diagonal line across the plot.
• Global alignments attempt to align every
residue in every sequence and they are most useful when the sequences in the
query set are similar and of roughly equal size.
• Local alignment searches for regions of
local similarity and need not include the entire length of the sequences.
• Needleman and Wunsch algorithm is used
for computing a global alignment between two sequences and it is based on
dynamic programming.
• Smith Waterman algorithm is used to find
out this local similarity.
• The most common task of sequence analysis
is the detection of more distant relationships. BLOSUM matrices are derived in
order to represent distant relationships more clearly.
• Multiple sequence alignment is an
extension of pair wise sequence alignment to incorporate (unite together) more
than two sequences at a time.
• There are different methods of producing
a MSA. The most direct method uses a dynamic programming technique to identify
the globally optimal alignment solution.
• Progressive alignment is the most widely
used approach to multiple sequence alignments. It is also called hierarchical
or tree method.
• A major problem in the progressive
alignment method is that the accuracy of alignment heavily depend on the
accuracy of initial pairwise alignment.
• HMM is a probabilistic model consisting
of a number of interconnecting states.
• The fast A and BLAST programs are local
similarity search methods that concentrate on finding short identical matches
between sequences.
• The advantages of using multiple sequence
alignment is database searches is that more information is used, which results
in higher sensitivity compared with pair wise searches.
• So the main principle behind the
development of secondary database is that by using them, we can share the
structural and functional characteristics of the constituent sequences.
• Within a sequence alignment, we can find
several motifs (motif means a consecutive string of amino acids in a protein
sequence, whose general character is repeated).
• Profile is a pattern recognition method
in 2 database. Profiles define which residues are allowed that given positions,
which positions are highly conserved and so profiles helps in defining the full
domain alignments.
Author Bio: The Author of this article, Sreejith is writing articles on Bioinformatics and its Applications and Electronics andCommunications.