Database Systems

Database systems allow the efficient management of structured data. However, in dealing with time-stamped, incrementally arriving records that must be analysed as soon as possible they reach their limits.

Projects

Term: 01/04/2015 - 31/12/2018
Project leader: Klaus Meyer-Wegener

"You should know your queries!" is the long version of the project title. It means that you should not just want tho have a database, but you should also think about the evaluations (which are written down as queries) that you actually would like to do with that database. The creation of a database is quite an effort, regarding not only the deployment of the software on a computer, but even more the capture of all the data to fill it. This effort should be spent with a goal in mind. The project will therefore collect queries, which can then even be used to automatically design a database. This saves resources on one hand, but on the other also supports the privacy goal of data minimization.

→ More information

Term: 01/09/2014 - 31/12/2018
Project leader: Klaus Meyer-Wegener

The world of data-management systems has become a bit confusing during the last years. Next to the well-established relational database systems, so-called NoSQL systems have been developed, which pretend to cope with much larger data volumes. At the same time, they can only offer limited functionality with respect to efficient data access and can only give reduced consistency guarantees. That raises the question when to stick to a relational database and when to move to a NoSQL system. This project collects the criteria that allow to make such a decision on a well-founded basis.

→ More information

Term: 01/01/2017 - 31/12/2020
Funding source: Sonstige EU-Programme (z. B. RFCS, DG Health, IMI, Artemis), Bayerische Staatsministerien
Project leader: Klaus Meyer-Wegener

Within the framework of the EFRE-E|ASY-Opt subproject, the potential of data mining methods in the area of manufacturing is being investigated. Especially the training of Deep-Learning models is a computationally intensive task, which may take hours or several days. The training time can be shortened considerably by using an already trained model, as long as the goal and source task are closely related. This connection is not yet fully understood.

The aim of this research project is to implement a system called REAPER (Reusable Neural Network Pattern Repository) to support data scientists in storing and reusing already trained deep learning models.

→ More information

Term: 28/08/2017 - 01/03/2023
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Stefan Wildermann, Jürgen Teich, Klaus Meyer-Wegener

This project is funded by the German Research Foundation (DFG) within the Priority Program SPP 2037 "Scalable Data Management for Future Hardware".

The goal of this project is to provide novel hardware and optimisation techniques for scalable, high-performance processing of Big Data. We particularly target huge datasets with flexible schemas (row-oriented, column-oriented, document-oriented, irregular, and/or non-indexed) as well as data streams as found in click-stream analysis, enterprise sources like e-mails, software logs and discussion-forum archives, as well as produced by sensors in the Internet of Things (IoT) and in Industrie 4.0. In this realm, the project investigates the potential of hardware-reconfigurable, FPGA-based Systems-on-Chip (SoCs) for near-data processing where computations are pushed towards such heterogeneous data sources. Based on FPGA technology and in particular thier dynamic reconfiguration, we propose a generic architecture called ReProVide for low-cost processing of database queries.

→ More information

Term: since 01/08/2018
Project leader: Richard Lenz

Within the project SIML (Schema Inference and Machine Learning), methods of topological data analysis and unsupervised learning are combined, applied and further developed in order to derive a conceptual schema from unstructured, multivariant data.

→ More information

Term: 02/01/2020 - 19/09/2022
Project leader: Klaus Meyer-Wegener

The compression of data has played a decisive role in data management for a long time. Compressed data can be permanently stored in a more space-saving manner and sent over the network more efficiently. However, the ever-increasing volumes of data mean that the importance of good compression methods is growing all the time.

Within the scope of project Anania (Architecture of Non-Multiple Autoencoders for Non-Lossy Information Agglomeration), we are investigating to what extent classical…

→ More information

Term: 01/08/2021 - 31/07/2024
Funding source: Deutsche Forschungsgemeinschaft (DFG)
Project leader: Klaus Meyer-Wegener

Analysing petabytes of data in an affordable amount of time and energy requires massively parallel processing of data at their source. Active research is therefore directed towards emerging hardware architectures to reduce the volume of data close to their source and towards sound query analysis and optimisation techniques to exploit such novel architectures. The goal of the ReProVide project is to investigate FPGA-based solutions for smart storage and near-data processing together with novel query-optimisation…

→ More information

Term: since 19/09/2022
Project leader: Klaus Meyer-Wegener

With the ongoing rise in global data volumes, database compression is becoming increasingly relevant. While the compression of numeric data types has been extensively researched, the compression of strings has only recently received renewed scientific attention.

A promising approach to string compression is the use of symbol tables, where recurring substrings within a database are substituted with short codes. A corresponding table enables the smooth reconstruction of the original data.…

→ More information

Term: since 16/05/2023
Project leader: Klaus Meyer-Wegener

To test and evaluate a heterogeneous stream-processing system consisting of an FPGA-based systemon-chip and a host, we develop a benchmark called SKYSHARK. It uses real-world data from air-traffic control that is publicly available. These data are enhanced for the purpose of the benchmark without changing their characteristics. They are further enriched with aircraft and airport data. We define 14 queries with respect to the particular requirements of our system. They should be useful for other…

→ More information

Participating Scientists

Klaus Meyer-Wegener
Peter Schwab
Melanie Sigl
Maximilian Langohr
Dominik Probst
Richard Lenz
Julian Rith
Stefan Wildermann
Jürgen Teich
Tobias Hahn
Andreas Becher
Lekshmi Beena Gopalakrishnan Nair

Publications

Langohr M., Vogler T., Meyer-Wegener K.:
SKYSHARK: A Benchmark with Real-world Data for Line-rate Stream Processing with FPGAs
Lernen, Wissen, Daten, Analysen (LWDA) (Marburg, 09/10/2023 - 11/10/2023)
In: Leyer M, Wichmann J (ed.): Lernen, Wissen, Daten, Analysen (LWDA) Conference Proceedings, Marburg, Germany, October 9-11, 2023 2023
Open Access: https://ceur-ws.org/Vol-3630/LWDA2023-paper9.pdf
URL: https://ceur-ws.org/Vol-3630/LWDA2023-paper9.pdf
BibTeX: Download

Database Systems

Consultation hours

Projects

KYQ: Know Your Queries!

DAMSEL: Assessment of Data Management Systems

E|ASY-Opt INF6: REAPER: A Framework for Materializing and Reusing Deep-Learning Models

ReProVide: Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis

SIML: Schema Inference and Machine Learning

ANANIA: Architecture of Non-Multiple Autoencoders for Non-Lossy Information Agglomeration (working title, preliminary)

ReProVide II INF6: Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis (Phase II)

FST: Generation of Symbol Tables for String Compression with Frequent-Substring Trees

SKYSHARK: SKYSHARK -Benchmarking Data Processing Systems Using Real-Time Flight Data

Participating Scientists

Publications