The AMW DATA SCIENCE School is a two-day "Summer School" preceding the Alberto Mendelzon Workshop on Foundations of Data Management 2019 to be held in Asunción, Paraguay. The event consists of multiple tutorials aimed at a mixed audience of students and other interested attendees.
- Host tutorials targeted at students (advanced undergraduate or postgraduate level) or other early term researchers interested in the area of Data Science;
- Provide a venue where young Latin American students and researchers can meet, discuss, learn, and seek feedback on their research topics, thus reinforcing research networks (of the future) in the area.
Natural language processing (NLP) is the field of designing methods and algorithms that take as input or produce as output unstructured, natural language data. Well known applications of NLP are: machine translation, sentiment analysis, and chatbots. Until 2014, many NLP models relied on shallow learning schemes based on hand-crafted features and linear machine learning models. Deep neural networks architectures had became very popular in the computer vision community due to its success for detecting objects (“cat”, “bicycles”) regardless of its position in the image. These approaches have also been adopted for many natural language processing (NLP) tasks with successful results. In this tutorial, we will introduce modern neural network architectures for NLP, including word embeddings, convolutional neural networks, and recurrent neural networks. No previous linguistic knowledge is required. Basic understanding of mathematical concepts such as functions, matrices, and derivatives may be helpful but is not essential.
Felipe Bravo-Marquez is an Assistant Professor with the Department of Computer Science at the University of Chile. He received his PhD degree from the University of Waikato, New Zealand, where he also held a research fellow position for two years in the machine learning group. Previously, he received two engineering degrees in the fields of computer science and industrial engineering, and a masters degree in computer science, all from the University of Chile. He worked for three years as a research engineer at Yahoo! Labs Latin America. His main areas of interest are: natural language processing, machine learning, and information retrieval. His full list of publications is available here: https://felipebravom.com
Decisions are typically made based on insights that are derived from datasets. However, preparing data so that they are ready to be analyzed or integrated with other datasets is often a tedious process that involves many steps. The data preparation and integration pipeline often require a number of manual data curation activities, such as deriving structure from text, schema matching, data cleaning, linking, and integration with other datasets.
In this tutorial, I will present the problems of data preparation and data integration, together with BigGorilla, an open-source resource for data scientists who need data preparation and integration tools. I will dive into some specific components in BigGorilla such as Koko, a system for scalable semantic querying of text and FlexMatcher, a schema matching tool.
Wang-Chiew Tan leads the research efforts at Megagon Labs. Prior to joining Megagon Labs, she was a professor of Computer Science at the University of California, Santa Cruz and she also spent two years at IBM Almaden Research Center. Her research interests include data provenance, data integration, and very recently, natural language processing. She is the recipient of an NSF CAREER award, a Google Faculty Award, and an IBM Faculty Award. She is a co-recipient of the 2014 ACM PODS Alberto O. Mendelzon Test-of-Time Award and a co-recipient of the 2018 ICDT Test-of-Time Award. She was the program committee chair of ICDT 2013 and PODS 2016. She is currently on the VLDB Board of Trustees and she is a Fellow of the ACM.
This tutorial will concentrate on the description of state-of-the-art approaches, representation models and tools for extracting, representing, querying and reasoning over knowledge graphs. By the end of the tutorial, attendants will be able to set-up a state-of-the-art data management infrastructure to extract, store and perform semantic parsing over knowledge graphs extracted from different types of data sources. A Question Answering (QA) system will be demonstrated as a use case application over the knowledge graph.
André Freitas is a lecturer (assistant professor) at the School of Computer Science at the University of Manchester. Prior to Manchester, he was an associate researcher and lecturer at the Natural Language Processing and Semantic Computing Group at the University of Passau (Germany) at the Chair of Digital Libraries and Web Information Systems. Before joining Passau, he was part of the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway where he did his PhD on Schema-agnostic Query Mechanisms for Large-Schema Databases. André holds a BSc. in Computer Science from the Federal University of Rio de Janeiro (UFRJ), Brazil (2005). His main research areas include Question Answering, Schema-agnostic Database Query Mechanisms, Natural Language Query Mechanisms over Large-Schema Databases, Distributional Semantics, Hybrid Symbolic-Distributional Models, Approximate Reasoning and Knowledge Graphs. Before joining DERI, André worked as a research assistant (trainee) at Siemens Corporate Research, Princeton, USA. André worked as a software engineer, product designer and project manager in different industries including Oil & Gas Exploration, IT Security, Medical, Healthcare, Banking, Mining and Telecom.
In this tutorial we will learn the basics of two families of formalisms that play a fundamental role in knowledge-enriched data management: Datalog and Description Logics. We will introduce the basics of both formalisms and their most popular variants, as well as their main reasoning services. We will discuss some of their respective strengths and limitations for knowledge representation, and their applications in knowledge-enriched data management.
Magdalena Ortiz studied computer science in Mexico before moving to Europe to study computational logic in Italy and Austria. She is an assistant professor for Knowledge Representation and Reasoning at the Vienna University of Technology, where she works on the boundary between artificial intelligence and databases. Most of her research aims at using knowledge to make data-centric systems smarter and more reliable, specially using formalisms based on description logics.
Thanks to the generous support of the VLDB Endowment, we have travel grants available to offer students attending the AMW school and workshop. All students wanting to participate are encouraged to apply, however, preference will be given to students registered in a Latin American university. Grants will have a maximum value of US$1,000 to help cover flight, accommodation and registration in the main workshop. These grants will be awarded in cash at the event.
The students selected for the travel grants are expected to prepare a poster to be presented during the evening sessions of the school and the workshop. If you would like to receive travel support, please submit your application on the website https://easychair.org/conferences/?conf=amwschool2019. For the submission, you need to inform the title and keywords describing your poster and attach a short 1-page CV (including at least: name, address, universities attended, degrees achieved/in-progress, current position; and if applicable: theses and/or publications authored, supervisor, relevant professional experience, other important merits, etc.).
We will then select those students who we feel stand to gain the most from the experience of attending the school.