Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2008/paper/50> ?p ?o. }
Showing items 1 to 16 of
16
with 100 items per page.
- 50 creator bingjun-sun.
- 50 creator c-lee-giles.
- 50 creator prasenjit-mitra.
- 50 type InProceedings.
- 50 label "Mining, Indexing, and Searching for Textual Chemical Molecule Information on the Web".
- 50 sameAs 50.
- 50 abstract "Current search engines do not recognize chemical entities (chemical names and formulae). A scientist, who seeks to search for information related to molecules from text documents cannot do so in any meaningful way except performing exact keyword search. We outline how a chemical-entity-aware search engine can be built and demonstrate empirically that it supports various chemical entity search with full document retrieval and also improves the relevance of the returned documents for search queries involving chemical entities. Our search engine first extracts chemical entities from text, performs novel indexing suitable for chemical names and formulae, and supports different query models that a scientist may require. We apply a hierarchical model of conditional random fields (HCRFs) for chemical formula tagging that considers long-term dependencies at the higher levels of documents like the sentence level. We propose an algorithm of independent frequent subsequence mining (IFSM) to discover sub-terms of chemical names and estimate their probabilities of occurrence. We also propose an unsupervised hierarchical text segmentation (HTS) method to represent a sequence with a tree structure based on discovered independent frequent subsequences. An index pruning method is proposed based on HTS. Query models with corresponding ranking functions are introduced for chemical name searches. Furthermore, we show that index pruning can reduce the index size and also query time without changing the returned ranked results much. Finally, experiments also show that our approaches out-perform traditional methods for document search with ambiguous chemical terms.".
- 50 hasAuthorList authorList.
- 50 hasTopic World_Wide_Web.
- 50 isPartOf proceedings.
- 50 keyword "chemical entity extraction".
- 50 keyword "hierarchical text segmentation".
- 50 keyword "independent frequent subsequence".
- 50 keyword "similarity search".
- 50 keyword "substring search".
- 50 title "Mining, Indexing, and Searching for Textual Chemical Molecule Information on the Web".