Matches in ScholarlyData for { <https://w3id.org/scholarlydata/inproceedings/www2008/paper/299> ?p ?o. }
Showing items 1 to 15 of
15
with 100 items per page.
- 299 creator frank-neven.
- 299 creator geert-jan-bex.
- 299 creator stijn-vansummeren.
- 299 creator wouter-gelade.
- 299 type InProceedings.
- 299 label "Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data".
- 299 sameAs 299.
- 299 abstract "Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning \emph{deterministic} regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of regular expressions in which each alphabet symbol occurs at most $k$ times, for some small $k$. We refer to such expressions as $k$-occurrence regular expressions ($k\ores$ for short). Motivated by this observation, we provide a probabilistic algorithm that learns $k\ores$ for increasing values of $k$, and selects the one which best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data sets. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.".
- 299 hasAuthorList authorList.
- 299 hasTopic World_Wide_Web.
- 299 isPartOf proceedings.
- 299 keyword "XML".
- 299 keyword "regular expressions".
- 299 keyword "schema inference".
- 299 title "Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data".