OPAL Konferenzbeiträge 2019

Die Ergebnisse des OPAL Projekts werden in erster Linie als Deliverables veröffentlicht und bestehen entweder aus Software-Komponenten auf GitHub oder aus Berichten. Neben diesen Ergebnissen werden im Rahmen des Projekts unterliegende Konzepte aus der Domäne des Semantischen Webs erforscht und präsentiert. Hierdurch wird das OPAL auch im Kontext der wissenschaftlichen Arbeit bekannt gemacht. Der folgende Abschnitt besteht aus Zusammenfassungen von Konferenzbeiträgen und zusätzlichen Links, die im Rahmen von OPAL veröffentlicht wurden.

Forschungsartikel und Zusammenfassungen

The 18th International Semantic Web Conference (ISWC 2019)

LimesWebUI – Link Discovery Made Simple

Abstract: In this paper we present LimesWebUI, our web interface of Limes. Limes, the Link Discovery Framework for Metric Spaces, is a framework for discovering links between entities contained in Linked Data sources. LimesWebUI assists the end user during the link discovery process. By representing the link specifications (LS) as interlocking blocks, our interface eases the manual creation of links for users who already know which LS they would like to execute. However, most users do not know which LS suits their linking task best and therefore need help throughout this process. Hence, our interface provides wizards which allow the easy configuration of many link discovery machine learning algorithms, that does not require the user to enter a manual LS. We evaluate the usability of the interface by using the standard system usability scale questionnaire. Our overall usability score of 76.5 suggests that the online interface is consistent, easy to use, and the various functions of the system are well integrated.

Sherif, Mohamed Ahmed ; Svetlana, Pestryakova ; Dreßler, Kevin ; Ngomo, Axel-Cyrille Ngonga: LimesWebUI – Link Discovery Made Simple. In: 18th International Semantic Web Conference (ISWC 2019) : CEUR-WS.org, 2019
PDFMore information

THOTH: Neural Translation and Enrichment of Knowledge Graphs

Abstract: Knowledge Graphs are used in an increasing number of applications. Although considerable human effort has been invested into making knowledge graphs available in multiple languages, most knowledge graphs are in English. Additionally, regional facts are often only available in the language of the corresponding region. This lack of multilingual knowledge availability clearly limits the porting of machine learning models to different languages. In this paper, we aim to alleviate this drawback by proposing THOTH, an approach for translating and enriching knowledge graphs. THOTH extracts bilingual alignments between a source and target knowledge graph and learns how to translate from one to the other by relying on two different recurrent neural network models along with knowledge graph embeddings. We evaluated THOTH extrinsically by comparing the German DBpedia with the German translation of the English DBpedia on two tasks: fact checking and entity linking. In addition, we ran a manual intrinsic evaluation of the translation. Our results show that THOTH is a promising approach which achieves a translation accuracy of 88.56%. Moreover, its enrichment improves the quality of the German DBpedia significantly, as we report +18.4% accuracy for fact validation and +19% F11 for entity linking.

Moussallem, Diego ; Soru, Tommaso ; Ngomo, Axel-Cyrille Ngonga: {THOTH: Neural Translation and Enrichment of Knowledge Graphs}. In: International Semantic Web Conference, 2019, S. 505–522
SpringerMore information

Semantic Web for Machine Translation: Challenges and Directions

Abstract: A large number of machine translation approaches have recently been developed to facilitate the fluid migration of content across languages. However, the literature suggests that many obstacles must still be dealt with to achieve better automatic translations. One of these obstacles is lexical and syntactic ambiguity. A promising way of overcoming this problem is using Semantic Web technologies. This article is an extended abstract of our systematic review on machine translation approaches that rely on Semantic Web technologies for improving the translation of texts. Overall, we present the challenges and opportunities in the use of Semantic Web technologies in Machine Translation. Moreover, our research suggests that while Semantic Web technologies can enhance the quality of machine translation outputs for various problems, the combination of both is still in its infancy.

Moussallem, D., Wauer, M., & Ngomo, A.C. (2019). Semantic Web for Machine Translation: Challenges and Directions. In International Semantic Web Conference (pp. 8).
PDFMore information

Towards More Intelligent SPARQL Querying Interfaces

Abstract: Over years, the Web of Data has grown significantly. Various interfaces such as SPARQL endpoints, data dumps, and Triple Pattern Fragments (TPF) have been proposed to provide access to this data. Studies show that many of the SPARQL endpoints have availability issues. The data dumps do not provide live querying capabilities. The TPF solution aims to provide a trade-off between the availability and performance by dividing the workload among TPF servers and clients. In this solution, the TPF server only performs the triple patterns execution of the given SPARQL query. While the TPF client performs the joins between the triple patterns to compute the final resultset of the SPARQL query. High availability is achieved in TPF but increase in network bandwidth and query execution time lower the performance. We want to propose a more intelligent SPARQL querying server to keep the high availability along with high query execution performance, while minimizing the network bandwidth. The proposed server will offer query execution services (can be single triple patterns or even join execution) according to the current status of the workload. If a server is free, it should be able to execute the complete SPARQL query. Thus, the server will offer execution services while avoiding going beyond the maximum query processing limit, i.e. the point after which the performance start decreasing or even service shutdown. Furthermore, we want to develop a more intelligent client, which keeps track of a server’s processing capabilities and therefore avoid DOS attacks and crashes.

Khan, H. (2019). Towards More Intelligent SPARQL Querying Interfaces. International Semantic Web Conference.
PDFMore information

Unsupervised Discovery of Corroborative Paths for Fact Validation

Abstract: Any data publisher can make RDF knowledge graphs available for consumption on the Web. This is a direct consequence of the decentralized publishing paradigm underlying the Data Web, which has led to more than 150 billion facts on more than 3 billion things being published on the Web in more than 10,000 RDF knowledge graphs over the last decade. However, the success of this publishing paradigm also means that the validation of the facts contained in RDF knowledge graphs has become more important than ever before. Several families of fact validation algorithms have been developed over the last years to address several settings of the fact validation problems. In this paper, we consider the following fact validation setting: Given an RDF knowledge graph, compute the likelihood that a given (novel) fact is true. None of the current solutions to this problem exploits RDFS semantics—especially domain, range and class subsumption information. We address this research gap by presenting an unsupervised approach dubbed COPAAL, that extracts paths from knowledge graphs to corroborate (novel) input facts. Our approach relies on a mutual information measure that takes the RDFS semantics underlying the knowledge graph into consideration. In particular, we use the information shared by predicates and paths within the knowledge graph to compute the likelihood of a fact being corroborated by the knowledge graph. We evaluate our approach extensively using 17 publicly available datasets. Our results indicate that our approach outperforms the state of the art unsupervised approaches significantly by up to 0.15 AUC-ROC. We even outperform supervised approaches by up to 0.07 AUC-ROC. The source code of COPAAL is open-source and is available at https://github.com/dice-group/COPAAL.

Syed, Z. H., Röder, M. & Ngomo, A.-C. N. (2019). Unsupervised Discovery of Corroborative Paths for Fact Validation. In C. Ghidini, O. Hartig, M. Maleshkova, V. Svátek, I. Cruz, A. Hogan, J. Song, M. Lefrançois & F. Gandon (eds.), The Semantic Web — ISWC 2019 (p./pp. 630–646), Cham: Springer International Publishing. ISBN: 978-3-030-30793-6
SpringerMore information

International Conference on Web Engineering (ICWE 2019)

Dragon: Decision Tree Learning for Link Discovery

Abstract: The provision of links across RDF knowledge bases is regarded as fundamental to ensure that knowledge bases can be used joined to address real-world needs of applications. The growth of knowledge bases both with respect to their number and size demands the development of time-efficient and accurate approaches for the computation of such links. This is generally done with the aid of machine learning approaches, such as e.g. Decision Trees. While Decision Trees are known to be fast, they are generally outperformed in the link discovery task by the state-of-the-art in terms of quality, i.e. F-measure. In this work, we present Dragon, a fast decision-tree-based approach that is both efficient and accurate. Our approach was evaluated by comparing it with state-of-the-art link discovery approaches as well as the common decision-tree-learning approach J48. Our results suggest that our approach achieves state-of-the-art performance with respect to its F-measure while being 18 times faster on average than existing algorithms for link discovery on RDF knowledge bases. Furthermore, we investigate why Dragon significantly outperforms J48 in terms of link accuracy. We provide an open-source implementation of our algorithm in the LIMES framework.

Obraczka, Daniel ; Ngonga Ngomo, Axel-Cyrille ; Bakaev, Maxim ; Frasincar, Flavius ; Ko, In-Young: Dragon: Decision Tree Learning for Link Discovery.. 11496. In: ICWE : Springer, 2019 (Lecture Notes in Computer Science). – ISBN 978-3-030-19274-7, S. 441-456
SpringerMore information

30th ACM Conference on Hypertext and Social Media

Ranking on Very Large Knowledge Graphs

Abstract: Ranking plays a central role in a large number of applications driven by RDF knowledge graphs. Over the last years, many popular RDF knowledge graphs have grown so large that rankings for the facts they contain cannot be computed directly using the currently common 64-bit platforms. In this paper, we tackle two problems: Computing ranks on such large knowledge bases efficiently and incrementally. First, we present D-HARE, a distributed approach for computing ranks on very large knowledge graphs. D-HARE assumes the random surfer model and relies on data partitioning to compute matrix multiplications and transpositions on disk for matrices of arbitrary size. Moreover, the data partitioning underlying D-HARE allows the execution of most of its steps in parallel. As very large knowledge graphs are often updated periodically, we tackle the incremental computation of ranks on large knowledge bases as a second problem. We address this problem by presenting I-HARE, an approximation technique for calculating the overall ranking scores of a knowledge without the need to recalculate the ranking from scratch at each new revision. We evaluate our approaches by calculating ranks on the 3 × 109 and 2.4 × 109 triples from Wikidata resp. LinkedGeoData. Our evaluation demonstrates that D-HARE is the first holistic approach for computing ranks on very large RDF knowledge graphs. In addition, our incremental approach achieves a root mean squared error of less than 10−7 in the best case. Both D-HARE and I-HARE are open-source and are available at: https://github.com/dice-group/incrementalHARE.

Desouki, Abdelmoneim Amer ; Röder, Michael ; Ngonga Ngomo, Axel-Cyrille: Ranking on Very Large Knowledge Graphs. In: Proceedings of the 30th ACM Conference on Hypertext and Social Media, 2019, S. 163–171
PDFMore information

International Conference on Knowledge Capture (K-Cap 2019)

Jointly Learning from Social Media and Environmental Data for Typhoon Intensity Prediction

Abstract: Existing technologies employ different machine learning approachesto predict disasters from historical environmental data. However, for short-term disasters (e.g., earthquakes), historical data alone has a limited prediction capability. In this work, we consider social media as a supplementary source of knowledge in addition to historical environmental data. Further, we build a joint model that learns from disaster-related tweets and environmental data to improve prediction. We propose the combination of semantically enriched word embedding to represent entities in tweets with their semantics representations computed with the traditional word2vec. Our experiments show that our proposed approach outperforms the accuracy of state-of-the-art models in disaster prediction.

Hamada M. Zahera, Mohamed Ahmed Sherif, & Axel-Cyrille Ngonga Ngomo (2019). Jointly Learning from Social Media and Environmental Data for Typhoon Intensity Prediction. In K-CAP 2019: Knowledge Capture Conference (pp. 4).
PDFMore information

Do your Resources Sound Similar? On the Impact of Using Phonetic Similarity in Link Discovery

Abstract: An increasing number of heterogeneous datasets abiding by the Linked Data paradigm is published everyday. Discovering links between these datasets is thus central to achieving the vision behind the Data Web. Declarative Link Discovery (LD) frameworks rely on complex Link Specification (LS) to express the conditions under which two resources should be linked. Complex LS combine similarity measures with thresholds to determine whether a given predicate holds between two resources. State of the art LD frameworks rely mostly on string-based similarity measures such as Levenshtein and Jaccard. However, string-based similarity measures often fail to catch the similarity of resources with phonetically similar property values when these property values are represented using different string representation (e.g., names and street labels). In this paper, we evaluate the impact of using phonetics-based similarities in the process of LD. Moreover, we evaluate the impact of phonetic-based similarity measures on a state-of-the-art machine learning approach used to generate LS. Our experiments suggest that the combination of string-based and phonetic-based measures can improve the Fmeasures achieved by LD frameworks on most datasets.

Abdullah Fathi Ahmed, Mohamed Ahmed Sherif, & Axel-Cyrille Ngonga Ngomo (2019). Do your Resources Sound Similar? On the Impact of Using Phonetic Similarity in Link Discovery. In K-CAP 2019: Knowledge Capture Conference (pp. 8).
PDFMore information

International Conference Recent Advances in Natural Language Processing

A Holistic Natural Language Generation Framework for the Semantic Web

Abstract: With the ever-growing generation of data for the Semantic Web comes an increasing demand for this data to be made available to non-semantic Web experts. One way of achieving this goal is to translate the languages of the Semantic Web into natural language. We present LD2NL, a framework for verbalizing the three key languages of the Semantic Web, i.e., RDF, OWL, and SPARQL. Our framework is based on a bottom-up approach to verbalization. We evaluated LD2NL in an open survey with 86 persons. Our results suggest that our framework can generate verbalizations that are close to natural languages and that can be easily understood by nonexperts. Therewith, it enables non-domain experts to interpret Semantic Web data with more than 91% of the accuracy of domain experts.

Ngonga Ngomo, A.-C., Moussallem, D. & Bühman, L. (2019). A Holistic Natural Language Generation Framework for the Semantic Web. Proceedings of the International Conference Recent Advances in Natural Language Processing (p./pp. 8), .
PDFMore information

International Workshop on Chatbot Research (CONVERSATIONS 2019)

An Approach for Ex-Post-Facto Analysis of Knowledge Graph-Driven Chatbots – the DBpedia Chatbot

Abstract: As chatbots are gaining popularity for simplifying access to information and community interaction, it is essential to examine whether these agents are serving their intended purpose and catering to the needs of their users. Therefore, we present an approach to perform an ex-post-facto analysis over the logs of knowledge base-driven dialogue systems. Using the DBpedia Chatbot as our case study, we inspect three aspects of the interactions, (i) user queries and feedback, (ii) the bot’s response to these queries, and (iii) the overall flow of the conversations. We discuss key implications based on our findings. All the source code used for the analysis can be found at https://github.com/dicegroup/DBpedia-Chatlog-Analysis.

Rricha Jalota, Priyansh Trivedi, Gaurav Maheshwari, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck: An Approach for Ex-Post-Facto Analysis of Knowledge Graph-Driven Chatbots – the DBpedia Chatbot. Pre-print of full paper presented at CONVERSATIONS 2019 – an international workshop on chatbot research, November 19-20, Amsterdam, the Netherlands. The final version of the paper will be published in the post-workshop proceedings as part of Springer LNCS.
PDFPreprint website


OPAL at conferences in 2019

Die Forscher der Fachgruppe DICE / @DiceResearch / @CompScience_UPB / @unipb schreiben regelmäßig über Neuigkeiten in ihren Forschungsbereichen:

@AbdelmonemMAmer @Abdullah_Fathi_ @DiegoMoussallem @hamadazahera @hashimkhanwazi4 @kvndrsslr @NgongaAxel @MAhmedSherif @MatthiasWauer @mommi84 @Ricardo_Usbeck @RrichaJalota @zafarhabeeb