Monday, September 24, 2018

Summary of Bioinformatics


• Bioinformatics is the application of information to the field of molecular biology.
• Bioinformatics involves the collection, organization and analysis of large amounts of biological data using networks of computers and databases.
•The primary goal of bioinformatics is to increase our understanding of biological processes.
• It is a science of managing and analyzing biological data using advanced computing techniques.
• Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences for comparison among them.
• Applications of bioinformatics includes gene mapping, sequence analysis, measuring biodiversity etc.
Summary of Bioinformatics

• PIR maintains a protein sequence database that contains almost three lack sequences covering the entire taxonomic range.
• In a biological database the information stored will be various biological data.
• SWISS-PROT is one of the most popular protein sequence resources because of the quality of its entries. Also SWISS-PROT contains 70,000 entries from more than 5000 different species.
• TrEMBL (Translated European Molecular Biology Laboratory). A special feature of TrEMBL format is that it contains translations of all coding sequences (CDS).
• Composite databases use a variety of different primary sources and are hence efficient to search.
• In secondary databases homologous sequences may be gathered together in multiple alignments.
• The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences collected from the scientific literature and scientific applications.
• In DDBJ data is produced, maintained and distributed at the national institute of Genetics.
• Gen Bank is another DNA database and it incorporates sequences from publicly available sources.
• The purpose of specialized resources to focus on species — species genomics and to particular sequencing techniques.
• UniGene represents genes from many organisms and each cluster relating to a unique gene and including related information corresponding to the gene.
• The term DNA sequencing refers to methods for determining the order of nucleotide bases adenine (A), Thymine (T), Guanine (G) and Cytosine (C) in a molecule of DNA.
• Pairwise sequence alignments can only be used between two sequences at a time but they are very efficient to find out the similarities.
• A large part of currently available DNA data is made up of partial sequences. They are called expressed sequence tags (ESTs).
• The most basic method of comparing two sequence is a visual approach known a dot-plot. Dot lot is a biological sequence comparison plot.
• Within a dot plot two identical sequences are characterized by a single unbroken diagonal line across the plot.
• Global alignments attempt to align every residue in every sequence and they are most useful when the sequences in the query set are similar and of roughly equal size.
• Local alignment searches for regions of local similarity and need not include the entire length of the sequences.
• Needleman and Wunsch algorithm is used for computing a global alignment between two sequences and it is based on dynamic programming.
• Smith Waterman algorithm is used to find out this local similarity.
• The most common task of sequence analysis is the detection of more distant relationships. BLOSUM matrices are derived in order to represent distant relationships more clearly.
• Multiple sequence alignment is an extension of pair wise sequence alignment to incorporate (unite together) more than two sequences at a time.
• There are different methods of producing a MSA. The most direct method uses a dynamic programming technique to identify the globally optimal alignment solution.
• Progressive alignment is the most widely used approach to multiple sequence alignments. It is also called hierarchical or tree method.
• A major problem in the progressive alignment method is that the accuracy of alignment heavily depend on the accuracy of initial pairwise alignment.
• HMM is a probabilistic model consisting of a number of interconnecting states.
• The fast A and BLAST programs are local similarity search methods that concentrate on finding short identical matches between sequences.
• The advantages of using multiple sequence alignment is database searches is that more information is used, which results in higher sensitivity compared with pair wise searches.
• So the main principle behind the development of secondary database is that by using them, we can share the structural and functional characteristics of the constituent sequences.
• Within a sequence alignment, we can find several motifs (motif means a consecutive string of amino acids in a protein sequence, whose general character is repeated).
• Profile is a pattern recognition method in 2 database. Profiles define which residues are allowed that given positions, which positions are highly conserved and so profiles helps in defining the full domain alignments.



Author Bio: The Author of this article, Sreejith is writing articles on Bioinformatics and its Applications and Electronics andCommunications.

No comments:

Post a Comment