Data Portal @ linkeddatafragments.org

ScholarlyData

Search ScholarlyData by triple pattern

Matches in ScholarlyData for { ?s ?p In this paper, we present an approach for representing an email archive in form of a network, capturing the communication among users and relations between entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. Extracted entities are organized in a graph including email connected with named entities (NE) extracted from emails such as people, email addresses, telephone numbers. Edges in the graph denote relations between NEs, representing occurrence in same email part, paragraph, sentence or composite NE. We study mathematical properties of the graph structure created by the proposed approach and we describe our hands-on experience with the processing of such structure. Enron Graph corpus contains a few millions of nodes and it is a large corpus for experimenting with various graph-querying techniques, e.g. graph traversing or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as that keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.. }

Showing items 1 to 1 of 1 with 100 items per page.