Fall 2018

A 8 minute read, Posted by Sebastian Büttrich on Sun, Sep 24, 2017

Project Topics (Fall 2018)

I/O Speculation PB:

I/O speculation is an alternative to current I/O handling techniques that either block the processor for tens of nanoseconds or yield it only to be ready again milliseconds later. I/O speculation enables an application to continue running after issuing an I/O until it produces an externally visible side-effect. I/O speculation has not been studied in depth so far. The topic of the thesis is to explore how Hardware Transactional Memory can be used to implement I/O speculation.

Computational Storage [PB]

Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., Broadcom Stingray or NXP LS2). Topics for thesis include (1) the design/implementation and evaluation of a prototype key-value store running on the storage controller, (2) the design/implementation/evaluation of a 100GB Ethernet-based RPC connection between host and storage controller, and (3) the development of a new recovery scheme for a user-space Flash Translation Layer embedded on the storage controller.

Database Performance Characteristics [PB]

Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment swith a range of tuning strategies to measure their impact on performance and reliability.

Storage System Performance Characteristics [PB]

New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct experiments to fully characterize the performance of such SSDs.

FPGA-based Hardware Acceleration [PB]

Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS.

Decentralized Cloud Management [PB]

In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered directly on a renewable source, via a power conditioner, and is thus equipped with batteries. It is necessary to assume that such Pods are intermittently powered and connected to the core. This thesis focuses on the design and implementation of prototype Pods in the lab, followed by a deployment in Orkney. We also consider Pods as gateways to sensor nodes. Thesis around this topic include projects on time-sensitive networking and projects on mobile gateways connecting to Pods.

Multimedia Analytics Data Services [BÞJ

Media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for not only finding content in those collections, but also gaining insights into the collections and analyzing them. In this project, we propose to implement media server and media browser encapsulating a novel data model for analysing media collections.

Deep Learning in Content-Based Music Classification [BÞJ

At ITU, we have copies of nearly 1M songs in audio format, of which nearly 200K have reliable genre metadata. The goal of this project is to use some existing machine learning library to implement a deep learning system capable of classifing songs into the best genre, based on this large collection. Joint with Sami Brandt

Impact of New Memory Technologies on Update Performance of High-Dimensional Indexing [BÞJ

This project investigates the confluence of two recent developments, the advent of persistent memory and the rapid growth of multimedia collections. In this project, a simulation model will be used to investigate the impact of these new memory technologies on update performance of high-dimensional indexing.

Open-Source eCP Implementation [BÞJ

In recent work, we have proposed and evaluated the extended cluster pruning (eCP) algorithm for high-dimensional retrieval. In this project, we propose to clean up and improve the C++ version of the code, both with an eye towards OS portability and code readability (for teaching purposes), and make it publicly available as an open-source project.

Mobile Air Qualty Monitoring [SB, PB]

New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. Partly in collaboration with the Institute for Environmental Science.

Small Smart City [SB, PB]

Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!

Satellite Data [SB, PB]

Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understading of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.

Sensors on water [SB, PB]

In our Mozilla funded Orkney Cloud project, we meet the need for a wide variety of marine sensors, which would be useful for oceanologists, environmentalists, a.o. Have you ever put a sensor node on a surfboard? Here you can!

Building Occupancy [SB, PB]

Create meaningful applications based on our data on people’s and devices’ movement in our building, e.g. correlation between building management, booking and factual utilization. Combine with sensor data to see interaction between people and environment. In collaboration with Københavns Kommune.

Leveraging Heterogeneous Hardware [PT]

The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems as well. As the heterogeneity of hardware resources increase, it becomes essential for the data management systems to decide on the optimal design options based on the processor types they are running on. This project targets identifying the granularity of data management tasks for different workloads that can be offloaded from a general purpose CPU to specialized hardware (e.g., GPU, FPGA) or low-power cores (e.g., ARM), and figuring out how to perform this offloading efficiently. This would be split into several sub-projects targeting specific workloads and hardware types.

What is HTAP? [PT]

The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of transactions. Efficient processing of individual transactional and analytical requests, however, leads to different optimizations and architectural decisions while building a data management system. For the kind of data processing that requires both analytics and transactions, Gartner recently coined the term Hybrid Transactional/Analytical Processing (HTAP). Many HTAP solutions are emerging both from the industry as well as academia that target these new applications. However, there is no standard set of capabilities all of these systems support. The goal of this project is to understand the HTAP landscape and develop a benchmark suite that would be representative of the different set of use cases that fall under the HTAP umbrella.

Workload Characterization for Big Data Management [PT]

The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing traditional analytical processing), are not suitable to explain the behavior of the emerging big data applications (with heavy ingest rates and complex analytical queries involving machine learning) on modern hardware. While the behavior of TPC-C and TPC-H on commodity multicore servers are heavily studied, the behavior of the newer benchmarks are still a mystery to many people in the database community. A comprehensive workload characterization of these new benchmarks is crucial to understand in more detail how they differ from the older benchmarks and what they require from the data management systems and hardware.

Micro-architectural Analysis of SystemML [PT]

Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at understanding how efficiently SystemML utilizes the resources of commodity server hardware, and how this differs from some other widely used systems used to run machine learning (e.g., Apache Spark MLlib).

Efficient OS-level Context-Switching for Thread Migration [PT]

Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed techniques at the hardware-level. However, developing techniques at the level of the OS might be more effective in terms of the adoption of such thread migration mechanisms. The goal of this project is to investigate how to implement lightweight thread migration at the OS-level targeting data management workloads.