1.Write notes on the National
1.Write notes on the National Centre for BiotechnologyInformation (NCBI) database.
2. Describe how the principle of parsimony is employed in orderto infer phylogenetic trees. Illustrate your answer with ahypothetical example.
3. When assembling a genome what does the N50 score measure?
4. With respect to phylogenetic trees, write notes on bootstrapsupport values.
Please answer all 4 questions
Answer:
1. Answer:
NATIONAL CENTER FORBIOTECHNOLOGY DATA BASE (NCBI):
It is a multi-disciplinary researchteam that is being served as a resource for molecular biologyinformation. It was formed in the year 1988 toconduct complementary activities of the National Institutes OfHealth (NIH) and the National Library Of Medicine (NLM). It’sfacilities are located in BETHESDA, MARYLAND,USA.
Previously, NCBI’s creation wasbeing intended to aid in understanding the molecular mechanismsthat affect human health and disease with the followingconcepts:
· To create andmaintaining public databases.
· Develop softwareto analyze genomic data.
· To conductresearch in computational biology.
Later on, due to wide spread ofinternet, NCBI focused on the role of pure biological research.Molecular biology became prominent as biomedical research andvarious specialized databases were being created by theNCBI, tocompliment those that control directly with human health.
NCBI providing access to analysisand computing tools, which allows to researchers and thepublic.
NCBI formed database standards likedatabase nomenclature which are used by other non-NCBI databases.The most useful software is GenBank. It is GenBank as the modeldatabase nucleic acid sequence database that contains sequenceinformation nearly 2 lakh different organisms.
Role:
To maintain available databases andopen to public
It is one of the key criteria for abiological database is persistent data.
Fast and inexpensive database.
GenBank, a database containing allknown nucleic acid sequences, is one of the members of the “TripleEntente” of sequence databases.
The other two are the EuropeanMolecular Biology Laboratory (EMBL) and the DNA Database of Japan(DDBJ), all three of which are part of the International NucleotideSequence Database project.
Various methods are used to generatethe sequence information found in Genbank.
Around 70% of all sequences inGenBank are being ESTs (Expressed Sequence Tags), which aregenerated by reverse transcribing mRNAs into complementary cDNAs,and then performing single-pass sequencing on those cDNAs. ESTsthus represent segments of DNA that code for an mRNA.
- Online software to help researcherssubmit sequence data into GenBank is being offered by NCBI.
- The use of these online softwaresis the key link between submitting sequence data to GenBank andpublication is also a coordinated effort; journals that publishsequence data require GenBank submission as a condition forpublication.
- The online submission softwaretools were Bankit, Sequin and tbl2asn.
- Bankit is the simplest tool, andrequires the author to enter the sequence, and then add anybiological annotations such as coding regions.
- Sequin allows for the submission ofmultiple or complex sequences and has a more organized method ofsequence submission.
- Genome centers use programs liketbl2asn, a more powerful command-line analog of Sequin.
Entrez is thenucleotide database Genbank, which links to the following databaseslike PubMed, Protein Sequence, Genomes, Taxonomy, Structure,Population, Online Mendelian Inheritance in Man (OMIM), Books, and3D Domains. Connections between these entries in a database arebeing as called neighbours, and connections between these entriesof different databases are being called as hardlinks.
Other database like LocusLink whichis a retrieval systems offered by NCBI .It contains the TaxonomyBrowser, and Gene and descriptive information about genes and isbased on curated data is provided. The Taxonomy Browser offersinformation on linkeage of organisms that have correspondingsequences in GenBank.
2. Answer:
Hypothetical example:
- First feature is observed tone ofspecies
- Second feature is observed hummingor singing of species.
Four species of humming birds, allof which is having well toned, but only three of which can easilyhum or sing. Based on parsimonious possible model would be that allfour species have one ancestor, and that assumption will be true,if we observe that whether it is well toned or not. But when we addthe presence or absence of humming or singing. When we add thesecond feature, it is more likely that the three species thathumming or singing have a common ancestor than that the trait ofhumming or singing arose from two different evolutionary paths.Based on parsimonious tree would have a branch linking the threehumming or singing species with a single common ancestor and thenlink that common ancestor with a common ancestor for all fourspecies, the root species.
3. Answer:
The N50 is defined as the minimumcontig (a set of overlapping DNA segment or sequence data) lengthneeded to cover 50% of the genome.
N50 is a measure to describe thequality of assembled genomes that are fragmented in contigs (a setof overlapping DNA segment or sequence data) of differentlength.
It is meant by that half of thegenome sequence is in contigs larger than or equal the N50 contigsize or that the sum of the lengths of all contigs of size N50 orlonger contains at least 50 percent of its total genomesequence.
4. Answer:
Bootstrapping is a procedure whereallow to take a random subset of the data and conducting re-run thephylogenetic analysis, and the reported value is the percentage ofbootstrap replicates.
Bootstrap support values must beanalyzed carefully. Most researchers consider 70% or above as agood. Support, but others consider as low as 50% as probablysignificant. However, you may obtain a higher support if youinclude more information in your phylogenetic analyses, i.e. moreloci or a longer fragment of the same gene. One interesting thingis that sometimes you can get bootstrap values below 70%, but whenyou analyze the same dataset with a different method, for exampleBayesian analysis, you get clearly good statistical support(posterior probabilities of 0.95 or higher).
If probabilities are higher aftercross checking with other methods then boot strap values supports.Thus, 100 mean that the node is well-supported in all bootstrapreplicates.
Note: The provided answer is as per my knowledge may be or maynot be 100% correct, but definitely revelant to your question thankyou so much.
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"
