M.Sc. Topic: Generate a Multimodal High-Quality Dataset for Proximity Modeling
Modern smartphones are equipped with a series of powerful sensors, such as GPS, compass, light sensor, camera, microphone and so on. In addition, the wide distribution of smart mobile devices enables new possibilities  for creating high-quality datasets used in research. In our case, we want to generate a multimodal dataset which includes a range of different data types as basis for further research like proximity modeling of mobile users [3, 4] to find people and places in nearby environment (What’s around us). The better the underlying dataset (e.g. diverse types of values, amount of participants, duration of collection), the more convincing and realistic is the proposed solution which uses the dataset for evaluation. A further way to produce a high-quality dataset would be to generate synthetic data based on real-world data for a certain scenario. The overall processing chain consists of three basic steps: (1) collecting raw sensor data, (2) extracting characteristic features and modeling and (3) classification and reasoning for specific scenario. The master thesis covers the first two stages: collecting raw data and extracting meaningful features for proximity modeling. Another interesting point is how to collect the data. The phone context problem describes the difficulty, when to make a sensor sample / reading. For instance, taking a sound sample for a characteristic environment can be a problem, when the phone is in the pocket or bag. There are two sensing paradigms . First, in participatory sensing the user is actively engaged in the data collection process. Regarding the phone context problem, the user decides when the sensor reading happens. Second, in opportunistic sensing the data collection is fully automated without any user involvement. Each sensing paradigm has its own advantages and drawbacks.
The goal of this thesis is to design and implement a system for collecting a diverse set of mobile data on resource-restricted mobile devices, e.g. smartphones and tablets. The dataset must cover a range of different modalities. For example, position, speed, acceleration, sound, illumination and radio environment of a mobile user. Afterwards, generate characteristic features of the raw data for further proximity modeling.
- Identify which sensors (raw data types) are available on most modern mobile devices, e.g. smartphones.
- Develop a mobile application which is able to collect the data in an efficient way (e.g. sensing frequency) in terms of hardware constraints, such as computation, energy and storage.
- Develop a synchronization mechanism to upload the local data to a central server for fusion of sensor data over multiple mobile users. Especially, consider privacy concerns, because the collected mobile data is highly user-specific and therefore sensitive. Which mechanisms are available to protect the personal information.
- Feature selection and modeling: which features are most informative for the use case of proximity modeling.
- Consider sensor calibration if using devices from different hardware vendors. At the beginning we have a homogenous environment with identical devices. Afterwards, using devices from different producers and investigate the value range. In this context, another question is the detection of a broken sensor, which should not be used for the data collection.
- Evaluation of frameworks which directly produce meaningful features without the need for collecting raw sensor data. One advantage of collecting raw sensor data is the greater flexibility regarding later usage scope. If only certain data features are available this inherently limits the usability.
There is some flexibility regarding the aforementioned goals and aspects depending on your interests and skills. However, the minimum goal is the collection of raw data and the synchronization mechanism (upload and data fusion) considering privacy concerns.
The candidate should have the following skills:
- Programming of mobile apps
- Native application for Android or iOS
- Hybrid app (e.g. PhoneGap) or Web application (platform independent), if all necessary sensor data are available
- Privacy-preserving communication protocols for the synchronization mechanism between client and server
- Experience in feature selection and modeling as part of machine learning
- Different modalities, require different features.
- For example, many systems use MFCC features for sound data or accelerometer data as information for activity recognition.
 R. Ganti, F. Ye, and H. Lei, “Mobile Crowdsensing: Current State and Future Challenges,” IEEE Communications Magazine, vol. 49, no. 11, pp. 32–39, 2011.
 N. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. Campbell, “A Survey of Mobile Phone Sensing,” IEEE Communications Magazine, vol. 48, no. 9, pp. 140–150, 2010.
 S. A. Hoseini-Tabatabaei, A. Gluhak, and R. Tafazolli, “A Survey on Smartphone-Based Systems for Opportunistic User Context Recognition,” ACM Computing Surveys, vol. 45, no. 3, pp. 1–51, 2013.
 A. A. de Freitas and A. K. Dey, “Using Multiple Contexts to Detect and Form Opportunistic Groups,” in Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 2015, pp. 1612–1621.
Supervisor & Contact
Michael Haus (M.Sc.), haus at in tum de