23/06/2021
Yesterday, Matteo Lissandrini presented a nice demo exploring the challenges of view materialization for query optimization at , accompanied by Haridimos Kondylakis, Davide Mottin and Georgia Troullinou
Conference website, incl. recording of the presentation:
https://lnkd.in/eQHvP6X
For the full paper, click here:
https://lnkd.in/evmrsUy
Abstract:
Analytical queries over RDF data are becoming prominent as a result of the proliferation of knowledge graphs. Yet, RDF databases are not optimized to perform such queries efficiently, leading to long processing times. A well-known technique to improve the performance of analytical queries is to exploit materialized views. Although popular in relational databases, view materialization for RDF and SPARQL has not yet transitioned into practice, due to the non-trivial application to the RDF graph model. Motivated by a lack of understanding of the impact of view materialization alternatives for RDF data, we demonstrate Sofos, a system that implements and compares several cost models for view materialization. Sofos is, to the best of our knowledge, the first attempt to adapt cost models, initially studied in relational data, to the generic RDF setting, and to propose new ones, analyzing their pitfalls and merits. Sofos takes an RDF dataset and an analytical query for some facet in the data, and compares and evaluates alternative cost models, displaying statistics and insights about time, memory consumption, and query characteristics.
18/06/2021
On Monday, June 21, 8 AM (CET), our colleague Torben Bach Pedersen will give a keynote speech at the Tenth International Conference on Model & Data Engineering ( ).
The presentation is entitled Extreme-Scale Model-Based Time Series Management with ModelarDB. The conference will take place online.
For more information, visit: https://lnkd.in/erPHXqH
About Torben: Torben Bach Pedersen is Professor of Computer Science at Aalborg University, Denmark. His research interests include extreme-scale data analytics, data warehouses and data lakes, predictive and prescriptive analytics, with a focus on technologies for "Big Multidimensional Data" - the integration and analysis of large amounts of complex and highly dynamic multidimensional data.
His major application domain is digital energy, where he focuses on energy flexibility and analytics on extreme-scale energy time series. He is an Distinguished Scientist, and a member of the Danish Academy of Technical Sciences, the SSTD Endowment, and the SSDBM Steering Committee. He has served as Area Editor for IEEE Transactions on Big Data, Information Systems and Springer EDBS, PC (Co-)Chair for DaWaK, DOLAP, SSDBM, and DASFAA, and regularly serves on the PCs of the major database conferences like SIGMOD, PVLDB, ICDE and EDBT. He received Best Paper/Demo awards from ACM e-Energy and WWW. He is co-founder of the spin-out companies FlexShape and ModelarData.
26/04/2021
Our colleague Christian Aebeloe presented on Tuesday at The Web Conference https://www2021.thewebconf.org the paper titled "ColChain: Collaborative Linked Data Networks" in collaboration with Gabriela Montoya and Katja Hose.
Find the full paper here:
https://relweb.cs.aau.dk/colchain/files/ColChain.pdf
a website with sources, experimental setup, pre-prints here:
https://relweb.cs.aau.dk/colchain/
and the presentation here:
https://www.youtube.com/watch?v=xuI2hqyCZbQ
Abstract
One of the major obstacles that currently prevents the Semantic Web from exploiting its full potential is that the data it provides access to is sometimes not available or outdated. The reason is rooted deep within its architecture that relies on data providers to keep the data available, queryable, and up-to-date at all times – an expectation that many data providers, in reality, cannot live up to for an extended (or infinite) period of time. Hence, decentralized architectures have recently been proposed that use replication to keep the data available in case the data provider fails. Although this increases availability, it does not help keeping the data upto-date or allow users to query and access previous versions of a dataset. In this paper, we therefore propose ColChain (COLlaborative knowledge CHAINs), a novel decentralized architecture based on blockchains that not only lowers the burden for the data providers but at the same time also allows users to propose updates to faulty or outdated data, trace updates back to their origin, and query older versions of the data. Our extensive experiments show that ColChain reaches these goals while achieving query processing performance comparable to the state of the art.
https://relweb.cs.aau.dk/colchain/
ColChain
One of the major obstacles that currently prevents the Semantic Web from exploiting its full potential is that the data it provides access to is sometimes not available or outdated. The reason is rooted deep within its architecture that relies on data providers to keep the data available, queryable,...
23/04/2021
A Joint Ph.D. position in "Spatio-temporal data integration & analysis" by Aalborg University (Denmark) and Université Libre de Bruxelles (Belgium) is now available!
Objectives: Many companies collect very large quantities of spatio-temporal data, e.g., transport companies collect telemetric data that are currently used for scheduling, billing, and other administrative tasks. Efficiently managing, integrating, and analysing such spatio-temporal data together with other types of data is a very challenging task. Based on multiple, large datasets, the project annotates a digital map with novel information to be able to accurately quantify the driving, e.g., in terms of fuel consumption that is used for both analytic and predictive analysis. This project will therefore develop scalable processing techniques for integrating heterogeneous data with spatiotemporal datasets.
Read more in ESR 3.1 at this link https://deds.ulb.ac.be/
The Data Engineering for Data Science (DEDS) Ph.D. positions are 3-year programs under the Horizon 2020 - Marie Skłodowska-Curie Innovative Training Networks (H2020-MSCA-ITN-2020) framework.
The research topics here: https://deds.ulb.ac.be/ but only a few available still, so do not miss the chance!
DEDS - Data Engineering for Data Science
Data is a key asset in modern society. Data Science, which focuses on deriving valuable insight and knowledge from raw data, is indispensable for any economic, governmental, and scientific activity. Data Engineering provides the data ecosystem (i.e., data management pipelines, tools and services) th...
23/04/2021
A Joint Ph.D. position in "Model-based storage for time series" by Aalborg University (Denmark) and Université Libre de Bruxelles (Belgium) is now available!
Objectives: Industrial sensors, like those in wind turbines, generate large amounts of never-ending time series, but only small parts of them (e.g., averages over 15 minute windows) can be handled and stored for analysis. It is vital to develop new methods for incremental storage and fast retrieval, that avoid accessing raw data to produce results. In this project, we focus on how to store such time series data by means of models and investigate how these models can be incrementally maintained and used to access both past data as well as predict future data.
Read more in ESR 2.3 in this link https://deds.ulb.ac.be/
The Data Engineering for Data Science (DEDS) Ph.D. positions are 3-year programs under the Horizon 2020 - Marie Skłodowska-Curie Innovative Training Networks (H2020-MSCA-ITN-2020) framework.
The research topics here: https://deds.ulb.ac.be/ but only a few available still, so do not miss the chance!
DEDS - Data Engineering for Data Science
Data is a key asset in modern society. Data Science, which focuses on deriving valuable insight and knowledge from raw data, is indispensable for any economic, governmental, and scientific activity. Data Engineering provides the data ecosystem (i.e., data management pipelines, tools and services) th...
24/03/2021
Our colleague Katja Hose will present today at 14:15 a keynote titled "The Quest for Knowledge" in the 24th International Conference on Extending Database Technology (EDBT2021) https://edbticdt2021.cs.ucy.ac.cy/ #
Abstract:
Throughout the entire history of mankind, humans have always strived to acquire new knowledge. In a way, striving for knowledge is still what motivates researchers across all modern research disciplines. Within Computer Science, especially in the past couple of years, we have witnessed an increasing interest in structured knowledge in the form of graphs, so-called knowledge graphs; not only in academia but also in industry.
In this talk, I will sketch current advances across the knowledge life cycle – spanning from knowledge extraction via knowledge integration, management, and sharing to knowledge querying, and knowledge-enhanced applications. In particular, I will highlight different perspectives on querying, sharing, and (re)using knowledge with a focus on federated scenarios, and discuss open challenges as well as the roles of the different communities within Computer Science that are involved in advancing this vital area of research.
Katja Hose is a professor in Computer Science at Aalborg University. Her research is rooted in databases and Semantic Web technologies and spans theory, algorithms, and applications of Data Science and Web Science incl. knowledge management, querying, analytics, publishing, and extracting. She has co-authored more than 100 peer-reviewed scientific publications and regularly serves as a reviewer for databases and Semantic Web conferences and journals. She has served in many different roles for a broad range of international conferences incl. VLDB, SIGMOD, ICDE, TheWebConf/WWW, and ISWC. http://www.cs.aau.dk/~khose.
https://edbticdt2021.cs.ucy.ac.cy/keynote-speakers/ #
Keynotes and ICDT Invited Talk | EDBT/ICDT 2021
Keynotes and ICDT Invited Talk ' + h + 'Less'; jQuery(this).html(html); } }); jQuery(".moreless").click(function() { var thisEl = $(this); var cT = thisEl.closest(".truncate-text"); var tX = ".truncate-text"; if (thisEl.hasClass("less")) { cT.prev(tX).toggle(); cT.slideToggle(); } else { cT.toggle()...
23/03/2021
Our colleague Mohsin Iqbal presented today the paper "A Foundation for Spatio-Textual-Temporal Cube Analytics" in collaboration with Matteo Lissandrini and Torben Bach Pedersen in The 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP2021) https://sites.google.com/view/dolap-2021/home
Find the paper in this link:http://www.info.univ-tours.fr/~marcel/dolap2021/paper3-long.pdf
ABSTRACT
Large amounts of spatial, textual, and temporal (STT) data are being produced daily. This is data containing an unstructured component (text), a spatial component (geographic position), and a time component (timestamp). Therefore, there is a need for a powerful and general way of analyzing STT data together. In this paper, we define and formalize the Spatio-Textual-Temporal Cube (STTCube) structure to enable combined effective and efficient analytical queries over STT data. Our novel data model over STT objects enables novel joint and integrated STT insights that are hard to obtain using existing methods. Moreover, we introduce the new concept of STT measures with associated novel STT-OLAP operators. To allow for efficient large-scale analytics, we present a pre-aggregation framework for exact and approximate computation of STT measures. Our comprehensive experimental evaluation on a real-world Twitter dataset confirms that our proposed methods reduce query response time by 1-5 orders of magnitude compared to the No Materialization baseline and decrease storage cost between 97% and 99.9% compared to the Full Materialization baseline while adding only a negligible overhead in the STTCube construction time. Moreover, approximate computation achieves an accuracy between 90% and 100% while reducing query response time by 3-5 orders of magnitude compared to No Materialization.
23/03/2021
Our colleague Kashif Rabbani will present the paper "Optimizing SPARQL Queries using Shape Statistics" in collaboration with Matteo Lissandrini and Katja Hose at The 24th International Conference on Extending Database Technology (EDBT2021) https://edbticdt2021.cs.ucy.ac.cy in the Graph Management session on Friday at 17:45.
Find more details about the paper, including the presentation and the link to the source code here: https://relweb.cs.aau.dk/rdfshapes/
Find the pdf here:https://relweb.cs.aau.dk/rdfshapes/files/edbt2021.pdf
ABSTRACT
With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries. As a consequence, query optimization and in particular cardinality estimation and join ordering becomes even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query ex*****on plans. The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics. Hence, in this paper, we propose shapes statistics, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties. Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.
Optimizing SPARQL Queries using Shape Statistics - EDBT 2021 - Teaser
A short video about work.
21/12/2020
Data Engineering for Data Science (DEDS) Ph.D. positions (3 years) now available!
DEDS is jointly organized by Université Libre de Bruxelles (Belgium), Universitat Politècnica de Catalunya (Spain), Aalborg Universitet (Denmark), and the Athena Research and Innovation Centre (Greece). Partner organisations from research, industry and the public sector prominently contribute to the programme by training students and providing secondments in a wide range of domains including Energy, Finance, Health, Transport, and Customer Relationship and Support. https://deds.ulb.ac.be
DEDS operates under the Horizon 2020 - Marie Skłodowska-Curie Innovative Training Networks (H2020-MSCA-ITN-2020) framework.
Application deadline: February 7, 2021, midnight AoE (Anywhere on Earth)
Find the application details here: https://deds.ulb.ac.be/
and the research topics here: https://deds.ulb.ac.be/
DEDS - Data Engineering for Data Science
Data is a key asset in modern society. Data Science, which focuses on deriving valuable insight and knowledge from raw data, is indispensable for any economic, governmental, and scientific activity. Data Engineering provides the data ecosystem (i.e., data management pipelines, tools and services) th...
21/12/2020
Associate Professor positions now available at Daisy!
As an Associate Professor in Computer Science, you are will be conducting research, teaching, and engagement with society at an international standard. DEADLINE: 10 February 2021
The Department of Computer Science at Aalborg University is highly regarded in Denmark among researchers, students, and external partners because of our leading research, education, and collaboration. The department currently have more than 130 scientific staff, 15 administrative staff and 1300 students.
The research at Daisy concerns data management and analytics. Within this broad area, substantial research concerns temporal, spatial, and spatio-temporal data; multidimensional data; time-series data; and metric data. Prominent more specific areas include business intelligence, data warehousing, OLAP, and data integration. In the context of analytics, the research covers query processing, data mining, and machine learning, while in the context of data management, the research covers data modeling and database design, data models, query languages, and indexing.
Find the application procedure, the qualifications needed, and details about the employment in this link:
ASSOCIATE PROFESSOR IN COMPUTER SCIENCE
Department of Computer Science at Aalborg University’s Technical Faculty of IT is looking to appoint a number of Associate Professors for its Aalborg and Copenhagen Campus, starting June 1, 2020 or soon hereafter.
03/12/2020
A postdoctoral position in graph databases and machine learning for microbial genome recovery now available at Daisy!
This is a joint position between the Center for Data-intensive Systems (Daisy) and the Distributed, Embedded, and Intelligent Systems group (DEIS), available immediately with a flexible starting date within the next couple of months.
The topic is embedded within the context of the VILLUM Synergy project "Data Science meets Microbial Dark Matter". In this project, we want to improve the rate of recovery of microbial genomes and ensure evidence-based analysis by leveraging, expanding, and combining state-of-the-art methods within several fields in exponential growth: DNA sequencing, machine learning, and graph-based analysis. This ambitious goal can only be achieved by the synergy of both data science and bioscience and will thus push the boundaries of both fields.
Read more about the qualifications needed and the application procedure here: http://people.cs.aau.dk/~khose/Vacancy_Synergy.html
About Daisy (Center for Data-Intensive Systems):
Research at Daisy focuses on data-intensive systems, Semantic Web technologies, Web Science and engineering, spatio-temporal data management, business intelligence, and applications of machine learning. International evaluations place Daisy in the global top tier. For example, an independent study of publication performance in the top database outlets in the 10-year period 2001-2010 ranks Daisy second among all research groups in Europe. More information about Daisy can be found at http://daisy.aau.dk.
About the DEIS (Distributed, Embedded and Intelligent Systems)
The Distributed, Embedded and Intelligent Systems research group covers mathematical foundation, verification tools, validation methodologies, probabilistic graphical models and machine learning focusing on distributed, embedded and intelligent systems. This includes the design, implementation and models for the analysis and construction of distributed, embedded and intelligent systems as well as probabilistic models and algorithms for intelligent decision making and machine learning
http://people.cs.aau.dk/~khose/Vacancy_Synergy.html