DBpedia 2014 |

DBpedia 2014

Matches in DBpedia 2014 for { <http://dbpedia.org/resource/Linguistic_sequence_complexity> ?p ?o. }

Showing items 1 to 30 of 30 with 100 items per page.

Linguistic_sequence_complexity abstract "Linguistic sequence complexity (LC) is a measure of the 'vocabulary richness' of a genetic text in gene sequences. When a nucleotide sequence is written as text using a four-letter alphabet, the repetitiveness of the text, that is, the repetition of its N-grams (words), can be calculated and serves as a measure of sequence complexity. Thus, the more complex a DNA sequence, the richer its oligonucleotide vocabulary, whereas repetitious sequences have relatively lower complexities. Subsequent work improved the original algorithm described in Trifonov (1990), without changing the essence of the linguistic complexity approach.The meaning of LC may be better understood by regarding the presentation of a sequence as a tree of all subsequences of the given sequence. The most complex sequences have maximally balanced trees, while the measure of imbalance or tree asymmetry serves as a complexity measure. The number of nodes at the tree level i is equal to the actual vocabulary size of words with the length i in a given sequence; the number of nodes in the most balanced tree, which corresponds to the most complex sequence of length N, at the tree level i is either 4i or N-j+1, whichever is smaller. Complexity (C) of a sequence fragment (with a length RW) can be directly calculated as the product of vocabulary-usage measures (Ui): Vocabulary usage for oligomers of a given size i can be defined as the ratio of the actual vocabulary size of a given sequence to the maximal possible vocabulary size for a sequence of that length. For example, U2 for the sequence ACGGGAAGCTGATTCCA = 14/16, as it contains 14 of 16 possible different dinucleotides; U3 for the same sequence = 15/15, and U4=14/14. For the sequence ACACACACACACACACA, U1=1/2; U2=2/16=0.125, as it has a simple vocabulary of only two dinucleotides; U3 for this sequence = 2/15. k-tuples with k from two to W considered, while W depends on RW. For RW values less than 18, W is equal to 3; for RW less than 67, W is equal to 4; for RW<260, W=5; for RW<1029, W=6, and so on. The value of C provides a measure of sequence complexity in the range 0<C<1 for various DNA sequence fragments of a given length. This formula is different from the original LC measure in two respects: in the way vocabulary usage Ui is calculated, and because i is not in the range of 2 to N-1 but only up to W. This limitation on the range of Ui makes the algorithm substantially more efficient without loss of power.This sequence analysis complexity calculation can be used to search for conserved regions between compared sequences for the detection of low-complexity regions including simple sequence repeats, imperfect direct or inverted repeats, polypurine and polypyrimidine triple-stranded DNA structures, and four-stranded structures (such as G-quadruplexes).".
Linguistic_sequence_complexity wikiPageID "34986220".
Linguistic_sequence_complexity wikiPageRevisionID "602517190".
Linguistic_sequence_complexity hasPhotoCollection Linguistic_sequence_complexity.
Linguistic_sequence_complexity subject Category:Bioinformatics.
Linguistic_sequence_complexity subject Category:Nucleic_acids.
Linguistic_sequence_complexity type Abstraction100002137.
Linguistic_sequence_complexity type Chemical114806838.
Linguistic_sequence_complexity type Compound114818238.
Linguistic_sequence_complexity type Macromolecule114944888.
Linguistic_sequence_complexity type Material114580897.
Linguistic_sequence_complexity type Matter100020827.
Linguistic_sequence_complexity type Molecule114682133.
Linguistic_sequence_complexity type NucleicAcid114964129.
Linguistic_sequence_complexity type NucleicAcids.
Linguistic_sequence_complexity type OrganicCompound114727670.
Linguistic_sequence_complexity type Part113809207.
Linguistic_sequence_complexity type PhysicalEntity100001930.
Linguistic_sequence_complexity type Relation100031921.
Linguistic_sequence_complexity type Substance100019613.
Linguistic_sequence_complexity type Thing100002452.
Linguistic_sequence_complexity type Unit109465459.
Linguistic_sequence_complexity comment "Linguistic sequence complexity (LC) is a measure of the 'vocabulary richness' of a genetic text in gene sequences. When a nucleotide sequence is written as text using a four-letter alphabet, the repetitiveness of the text, that is, the repetition of its N-grams (words), can be calculated and serves as a measure of sequence complexity. Thus, the more complex a DNA sequence, the richer its oligonucleotide vocabulary, whereas repetitious sequences have relatively lower complexities.".
Linguistic_sequence_complexity label "Linguistic sequence complexity".
Linguistic_sequence_complexity sameAs m.0j67cyk.
Linguistic_sequence_complexity sameAs Q6554066.
Linguistic_sequence_complexity sameAs Q6554066.
Linguistic_sequence_complexity sameAs Linguistic_sequence_complexity.
Linguistic_sequence_complexity wasDerivedFrom Linguistic_sequence_complexity?oldid=602517190.
Linguistic_sequence_complexity isPrimaryTopicOf Linguistic_sequence_complexity.