Hamed Haddadi

Guarantee

2024-04-21T00:00:00+00:00

Our work “GuaranTEE: Towards Attestable and Private ML with CCA” will appear at EuroMLSys 2024! 🎉

This is joint work with Sina Abdollahi, Mohammad Maheri, Marios Kogias, and Sandra Siby.

We explore how Arm’s Confidential Computing Architecture (CCA) can be used to deploy private and attestable ML models on end devices. We develop a prototype on Arm’s Fixed Virtual Platform (FVP) simulator.

As CCA is still under development, to facilitate further research in this space, we’re releasing code and a setup guide. Check out our GitHub link above!

Chat with us at EuroMLSys later this month, and stay tuned for (longer) follow-up work in this direction!

URKI EPSRC Open Plus Fellowship on Securing the Next Billion Consumer Devices on the Edge

2021-10-26T00:00:00+00:00

Jan 2024: We are looking for a postdoctoral researcher to join us for my EPSRC fellowship. Please see the job ad here https://www.imperial.ac.uk/jobs/description/ENG02969/research-associate-user-centred-systems’-securityprivacy. Deadline : 30 Apr.

I am really excited to announce that I have been selected for an EPSRC Open Fellowship (Plus) 2022-2027 with a funding of around £2m from the UKRI, the industry, and Imperial College London. Throughout the next 5-years, I’ll be working on providing better security and privacy for edge devices (IoT & Browser), all the way from the on-device TEE to analytics done at the ISP end. The industrial supporters of the fellowship are : Arm Research, Telefonica I+D, Samsung AI, and CISCO.

As part of the Plus component of the fellowship, I will be closely working with The Information Commissioner’s Office (ICO) addressing the privacy recommendations and regulatory challenges raised by the consumer IoT sector and its data collection practices.

In this fellowship, I aim to address major challenge in the adoption of user-centred privacy-enhancing technologies by designing and evaluating an ecosystem where analytics from, and interaction with, consumer devices can happen with trust in the model and authenticity, while enabling auditing and personalisation, hence pushing today’s boundaries on all-or-nothing privacy and enabling new economic models. This approach requires designing for capabilities beyond the current trusted memory and processing limitations of the devices, and a cooperative dialogue and ecosystem involving service providers, ISPs, regulators, device manufacturers, and the end users. By designing our framework around the latest architectural and security features in edge devices, before they become commercially available, we provision for Model Privacy and a User-Centred ecosystem, where service providers can have trust in the authenticity, attestability, and trustworthiness of the valuable models running on user devices, without the users having to reveal sensitive personal information to these cloud-based centralised systems. This approach will enable advanced and sensitive edge-based analytics to be performed, without jeopardising the individuals’ privacy. Importantly, we aim to integrate mechanisms for data authenticity and attestation into our proposed framework, to enable trust in models and the data used by them. Such privacy-preserving technologies have the capacity to enable new forms of sensitive analytics, without sharing raw data and thereby providing legal balancing capabilities that might enable certain sensitive (or currently unlawful) data analysis.

I am really excited about the next 5 years! As part of the fellowship team, I will be recruiting a postdoctoral researcher, an engineer, and 2 PhD students. Watch out for adverts coming out soon! If you are thinking of applying for PhD with us, please get in touch and apply before February 2022.

Analytics on the Edge (Privacy, Utility, and Cost)

2018-05-24T00:00:00+00:00

The rapid rise in the connected sensors, actuators, and their accompanying applications surrounding us, often collectively referred to as the Internet of Things (IoT) has lead to a growing interest and attention from the governments, the industry, the scientific community amongst other communities. The numerous opportunities presented by the IoT industry, however, often come at the cost of excessive energy usage, or privacy and security threats, in exchange for fine-grained sensing and data analytics. In this post, I advocate for the use of optimisation trade-offs between the utility and value gained from information, the privacy risks and security threats to the data subject, and the cost (e.g., energy and bandwidth) of performing the sensing and analytics. We argue for leveraging the network edge (i.e., the IoT device itself) to support this optimisation process and provide a cooperative framework between the edge and the cloud. Such an architecture will play a pivotal role in protecting the individuals’ privacy, while reducing the cost of the operation and the privacy and security risks.

Introduction

There is a rapid introduction of the Internet of Things (IoT) devices in our daily lives, from always-on voice-enabled home assistants such as Amazon Alexa and Google Home, to smart thermostats, plugs, toys, and remote monitoring devices. Gartner predicts that by 2020, we will have over 20 Billion IoT devices in use and connected to the Internet. Presence of such a large number of devices will introduce new challenges in connectivity, data management, privacy, and security.

In parallel with this trend in IoT, advances in machine learning, particularly unsupervised methods such as Deep Learning, on mobile and edge devices have enabled these devices to act as part of the whole data analytics ecosystem, performing a first set of local inferences (e.g., activity recognition on a smartphone), hence redeeming the network from transmitting costly, raw sensor data to the cloud. These challenges can be broadly categorised as the tension between data quality, the cost of obtaining such data, and the privacy (and arguably security) consequences.

Motivations

Personal IoT devices might collect a range of rich, sensitive devices about individuals and households. In addition to the privacy risks of exposing these data to their primary collectors, third parties with access to these data also pose security and privacy threats. These include data from autonomous vehicles, smart meters, home security systems, child monitors, and personal health and well-being devices.

Today’s IoT ecosystem relies on continuous data collection and offloading to cloud services. As the numbers and complexity of these devices grows, this modus operandi can have dire network/energy costs and privacy consequences, especially considering the huge volumes of data generated by some of these systems. On the other hand, relying on performing complex data analytics on the device, or encryption-based methods, impose resource constraints (e.g., storage and bandwidth constraints, energy limitations, or computational costs) and jeopardise the user experience. Cryptography-based approaches for data encryption and analytics are often too costly and complex to implement on IoT devices protecting these data. The promise of edge computing, i.e., relying on complete local analytics can also introduce a burden for devices, as most machine learning models are large and complex.


Hybrid Analytic Engine

Her we advocate for a cooperative, hybrid approach between the edge and the cloud. The high-level overview of this scheme is visualised in Figure above, the raw sensor data goes through an initial layer of feature extraction on the device using lightweight, simple models to perform dimensional reduction and compression, while providing a privacy shield against detailed, invasive analysis using well-known privacy techniques. The more complex and intensive analytics take place at the cloud server. One of the primary objectives of this scheme is to separate the feature extraction and the inference phase between the edge device and the cloud. This approach can potentially lead to a reduction in data transmission to the cloud and removal of potentially sensitive information during the feature extraction phase on the edge node. The extracted features are then transferred to the cloud server for post-processing and finally the user receives the results from the cloud.

Challenges and Opportunities

There are a number of future directions which naturally follow on from advances in edge computing for the IoT domains. Techniques such as Federated Learning can directly benefit from the proposed hybrid edge-cloud schemes, where nano-updates can be aggregated in a centralised fashion without the threat of de-identification faced by traditional machine learning models. Further, Privacy-preserving approaches such as Auto-encoders or differential privacy can be employed on edge devices to help protect against privacy threats facing traditional deep-learning and federated learning models.

The approach proposed in this vision might not be the silver bullet in defending against IoT security and privacy attacks, or limiting all the bandwidth requirements of future IoT systems. However, research in this area provides a promising direction in improving the status quo and providing a framework for balancing the trade-offs between risks and threats (privacy-security), utility, and operational costs in a broad setting.

References

[1] N. Apthorpe, D. Reisman, S. Sundaresan, A. Narayanan, and N. Feamster, “Spying on the smart home: Privacy attacks and defenses on encrypted IoT traffic”, CoRR, abs/1708.05044, 2017.

[2] R. C. Geyer, T. Klein, and M. Nabi. “Differentially private federated learning: A client level perspective”, CoRR, abs/1712.07557, 2017.

[3] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the GAN: Information leakage from collaborative deep learning”, In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, pages 603–618, New York, NY, USA, 2017. ACM.

[4] J. Konecny, H. B. McMahan, F. X. Yu,P. Richtarik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency”, CoRR, abs/1610.05492,2016.

[5] M. Malekzadeh, R. G. Clegg, A. Cavallaro, and H. Haddadi, “Protecting sensory data against sensitive inferences”, In Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems, W-P2DS’18, pages 2:1–2:6, New York, NY, USA, 2018. ACM.

[6] S. A. Osia, A. S. Shamsabadi, A. Taheri, H. R.Rabiee, and H. Haddadi, “Private and scalable personal data analytics using a hybrid edge-cloud deep learning”, IEEE Computer, 2018.

[7] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless networking: A survey”, arXiv preprint arXiv:1803.04311, 2018.

Wearables - the Last Year’s Hype

2018-03-23T00:00:00+00:00

More and more devices, claiming to make us fitter, stronger, and healthier were flooding the marked over the last couple of years. While those devices started out as mere ‘smart’ pedometers to count our steps during the day and magically transferring the collected data to the cloud for users to view in online dashboards or on their phone. Over the years, additional sensors and metrics, such as sleep quality, heart rate, VOmax or even stress have been added to those smart gadgets. But the question remains: How smart, accurate and suitable are those wearables really?

We were especially interested in the claim that those wristwatches can measure and track stress. The newest generation of devices from Garmin promise to keep track of users stress levels. A lot of stuff happens within our body when we get stressed: Our heart beats faster, muscles tension increases, our skin gets sweaty to cool down our body, and respectively our surface temperature drops due to the blood flow being diverted to the centre of our body. All those things can be measured with unobtrusive sensors, such as heart rate monitors, skin conductance sensors and skin surface thermometers. Consumer wearables already contain those sensors. But are they sensitive enough to detect subtle changes in the case of stress? Let’s find out.

We conducted a study [1] where we equipped participants with 3 consumer wearables and a professional laboratory device whilst they were performing some relaxing and stressful tasks (solving mental arithmetic tasks). To induce more real-world conditions, where people are moving around, we did this two times: once while participants were sitting still and once while walking on a treadmill. For the devices, we chose two fitness wristbands/smartwatches: the Microsoft Band and Apple Watch. We also tested the Polar H7, which is a chest strap to detect heart beats and send the data to the phone.

What we found first of all (and not a very new finding): Wearables get more unreliable in measuring heart rate when people are moving opposed to sitting still. While people were sitting, the mean error percentage compared to the laboratory device was between 3 to 5%. Walking on a treadmill increased this error to 10 to 19%. Especially the wrist devices were prone to a higher error rate. Those wrist devices use an optical sensor to detect heart beats; they emit a green light and optically pick up the blood flow under the skin. Most fitness trackers rely on this technology, but it has its faults (like the scandal about the first gen Apple watches which had problems with heart rate readings on darker or tattooed skin [1] or the fact that it gets more unreliable under movement or when wristbands are worn to tight). The more reliable and older approach is to use electrical signals to detect the heart beats directly at the chest, like the Polar H7 does - but of course that is less convenient for a device which is worn all day.

Further, we found that the expensive reference device was the only device to show that it picked up the stress responses (increased heart rate, increased skin sweatiness and decrease in skin temperature) during our stress task in a statistically valid manner - and just while participants were seated. We could not find any statistical effects for any of the devices for when people were walking on the treadmill (due to all of the sensors, even the reference device, becoming more unreliable).

What do we learn from this: While those nifty smartwatches and fitness trackers are convenient and comfortable, their data has to be taken with a grain of salt and the sensors are just very prone to movement and various factors. Especially such claims that a fitness tracker can tell you when you are stressed, should be further investigated and evaluated.

[1] Katrin Hänsel, Romina Kettner, Hamed Haddadi, Akram Alomainy, Albrecht Schmidt, “What to Put on the User: Sensing Technologies for Studies and Physiology Aware Systems”, Proceedings of the ACM Conference on Human Factors in Computing Systems (ACM CHI’18), Montréal, Canada, April 21-26, 2018. (paper)

[2] People with tattoos report the Apple Watch is having trouble determining they are alive — Quartz


Illustrative schematic of the design space evaluation for our 4 test devices (Nexus, Polar, Apple Watch, Microsoft Band) in 5 criteria dimensions (data reliability, comfort of attachment, mobility, data richness, and data accessibility)

Privacy-Preserving time-Series Data Analysis

2018-02-27T00:00:00+00:00

An increasing number of sensors on mobile, Internet of things (IoT), and wearable devices generate time-series measurements of physical activities. Though access to the sensory data is critical to the success of many beneficial applications such as health monitoring or activity recognition, a wide range of potentially sensitive information about the individuals can also be discovered through access to sensory data and this cannot easily be protected using traditional privacy approaches.

Specifically, there are two ways of drawing sensitive inferences from time-series (sensor, IoT, etc) data : (1) Temporal Inferences, that means each section of time-series can be assigned to a specific inference, sensitive or non-sensitive (including desired information that users gain utility from sharing them). (2) Concurrent Inferences, that means information available in each section of time-series can be used to make both sensitive and non-sensitive inferences.


Temporal Inferences	Concurrent Inferences

Recently, we have been working on enabling privacy-preserving techniques for time-series data. This is an area where solutions are slightly more challenging than traditional methods used in databases or spatial data like Differential-Privacy or k-anonymity. Through the works of PhD candidate Mohammad Malekzadeh We have been investigating new method to address the challenges in this space.

1) Replacement AutoEncoder: A Privacy-Preserving Algorithm for Sensory Data Analysis Paper: https://arxiv.org/abs/1710.06564

In this paper, we propose a privacy-preserving sensing framework for managing access to time-series data in order to protecting temporal inferences. We introduce Replacement AutoEncoder(RAE), a novel algorithm which learns how to transform discriminative features of data that correspond to sensitive inferences, into some features that have been more observed in non-sensitive inferences, to protect users’ privacy. This efficiency is achieved by defining a user-customized objective function for deep autoencoders. We also used GANs to see if an attacker can detect when non-sensitive activity inferred from data is actually a replacement of a sensitive one, not a real non-sensitive activity. We show that this will only be possible if a GAN is trained on the users’ original, unmodified data.


Replacement AutoEncoder (RAE)

2) Protecting Sensory Data against Sensitive Inferences Paper: https://arxiv.org/abs/1802.07802

In this paper we propose a data transformation architecture inspired by GANs for protecting concurrent inferences. Here, we set up a GANs-like game between a data transformer model (Guardian) and an information extractor model (Estimator), in a way that Estimator helps Guardian to efficiently transform data for providing a good utility-privacy tradeoff. As a usecase, we show that it maintains the usefulness of the transformed data for activity recognition (with around an average loss of three percentage points) while almost eliminating the possibility of gender classification (from more than 90% to around 50%, the target random guess).


Guardian Estimator Neutralizer (GEN)

This work comes alognside the MotionSense Dataset time-series data generated by accelerometer and gyroscope sensors (attitude, gravity, userAcceleration, and rotationRate). It is collected with an iPhone 6s kept in the participant’s front pocket using SensingKit which collects information from Core Motion framework on iOS devices. A total of 24 participants in a range of gender, age, weight, and height performed 6 activities in 15 trials in the same environment and conditions: downstairs, upstairs, walking, jogging, sitting, and standing. With this dataset, we aim to look for personal attributes fingerprints in time-series of sensor data, i.e. attribute-specific patterns that can be used to infer gender or personality of the data subjects in addition to their activities.

I hope you find these useful and interesting. We are always looking forward to comments and interesting ideas.

Privacy-Preserving Analytics using Edge Computing

2017-10-06T00:00:00+00:00

A recent NSF report and a number of security and privacy disasters in the IoT space (see the blog post on Schneier’s blog) highlighted the challenges and opportunities in Edge Computing, leveraging the high processing capabilities and low latency offered at the edge of the network (IoT devices, smartphones, cloudlets) for achieving scalable yet secure and private analytics. Recently we put a few papers on ArXiv, focusing on Privacy-Preserving Analytics using smartphones and constrained devices on the network (such as a Raspberry Pi and Smartphones). I encourage the privacy, machine learning, and mobile computing enthusiasts to read these papers and kindly provide us with any feedback on the analytics which can improve the research efforts in this space.

Seyed Ali Osia, Ali Shahin Shamsabadi, Ali Taheri, Kleomenis Katevas, Hamid R. Rabiee, Nicholas D. Lane, Hamed Haddadi, “Privacy-Preserving Deep Inference for Rich User Data on The Cloud”, Available on ArXiv, October 2017. (paper)

Deep neural networks are increasingly being used in a variety of machine learning applications applied to rich user data on the cloud. However, this approach introduces a number of privacy and efficiency challenges, as the cloud operator can perform secondary inferences on the available data. Recently, advances in edge processing have paved the way for more efficient, and private, data processing at the source for simple tasks and lighter models, though they remain a challenge for larger, and more complicated models. In this paper, we present a hybrid approach for breaking down large, complex deep models for cooperative, privacy-preserving analytics. We do this by breaking down the popular deep architectures and fine-tune them in a particular way. We then evaluate the privacy benefits of this approach based on the information exposed to the cloud service. We also asses the local inference cost of different layers on a modern handset for mobile applications. Our evaluations show that by using certain kind of fine-tuning and embedding techniques and at a small processing costs, we can greatly reduce the level of information available to unintended tasks applied to the data feature on the cloud, and hence achieving the desired tradeoff between privacy and performance.

Sandra Servia-Rodriguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, Hamed Haddadi, “Personal Model Training under Privacy Constraints”, Available on ArXiv, March 2017. paper

Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is moved to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users’ current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also demonstrate the feasibility of our approach by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi).

Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R. Rabiee, Nic Lane, Hamed Haddadi, “A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics”, Available on ArXiv, March 2017. paper

The increasing quality of smartphone cameras and variety of photo editing applications, in addition to the rise in popularity of image-centric social media, have all led to a phenomenal growth in mobile-based photography. Advances in computer vision and machine learning techniques provide a large number of cloud-based services with the ability to provide content analysis, face recognition, and object detection facilities to third parties. These inferences and analytics might come with undesired privacy risks to the individuals. In this paper, we address a fundamental challenge: Can we utilize the local processing capabilities of modern smartphones efficiently to provide desired features to approved analytics services, while protecting against undesired inference attacks and preserving privacy on the cloud? We propose a hybrid architecture for a distributed deep learning model between the smartphone and the cloud. We rely on the Siamese network and machine learning approaches for providing privacy based on defined privacy constraints. We also use transfer learning techniques to evaluate the proposed method. Using the latest deep learning models for Face Recognition, Emotion Detection, and Gender Classification techniques, we demonstrate the effectiveness of our technique in providing highly accurate classification results for the desired analytics, while proving strong privacy guarantees.

Workshop on Decentralized Machine Learning, Optimization and Privacy (Sep 11-12, 2017)

2017-09-12T00:00:00+00:00

Schedule and slides available on:

https://team.inria.fr/magnet/workshop-on-decentralized-machine-learning-optimization-and-privacy/

I attended the INRIA Lille Magnet (MAchine learninG in information NETworks) Workshop on Decentralized Machine Learning, Optimization and Privacy. Here’s a brief summary of some of the amazing talks on machine learning, edge processign, and privacy-preserving analytics.

Talk session (chair: Sébastien Gambs)

Stephen Hardy: Learning nothing but the model [<abstract]

Stephen talked about the problem of joint analysis of data across multiple organisations. They have done trials with organisations on this problem, for applications such as data sharing for organisations, cross-governance-boundary analytics, PDM, and analytics across device data. Their solutions include [partial] homomorphic encryption, graph computation engine, and private entity resolutions. PER works by sharing secret salt from a hashing process.

The examples include credit scoring for example, hence sensitive data across organisations on-premises or in cloud, has to be analysed independently for predictions around credit failing. Hundreds of features can be included from the two organisations to run joint logistic regression (simple to do and explainable). However the process still reveals whether the same entity exists on both datasets hence it requires consent. Private learning is about 500x slower on encrypted data, however a score can be generated in real time.Customer feedback indicates a strong emphasis on learning the model rather than finding common users. In order to deal with this, they have created an encrypted mask in the reordering and matching stage at the broker to break the linkage possibility. They use Taylor approximation to logistic loss in order to reduce feature space size and regularise them. Some implementation code available on: <https://github.com/n1analytics

Borja Balle: Secure Multi-Party Linear Regression on High-Dimensional Data [<abstract]

Borja presented a similar problem set, working with multi-party differential privacy to address the trusted third party challenge of access to training data. Some of the challenges have been framed as an optimization problem. The optimisation has been implemented as a multi-party Ridge regression. Borja presented the challenges in this space (MPC protocol, scalability, privacy guarantees, etc). This can be addressed by using separate crypto providers, data providers, computing providers, and data providers. Hence differential privacy can be done at the output to help with scalability. The PETS paper and the open source implementation are available online.

Poster spotlights [poster list]

Talk session (chair: Joseph Salmon)

Mikael Johansson: Sparsity and asynchrony in distributed optimization: models and convergence results [<abstract]

Mikael covered optimization for large-scale learning and issues such as centralised versus distributed, or asynchronous versus synchronous.

Peter Richtárik: Privacy preserving randomized gossip algorithms [<abstract]

Talk session (chair: Morten Dahl)

Meilof Veeningen: Distributed Privacy-Preserving Data Mining in the Medical Domain[<abstract]

Meilof discussed the IoT and medical devices that Philips develops and their interest in data science. Decentralised ML is important for building personalised models where privacy and accuracy are both critical. Experiments are often useful for understanding workflows and hospital patient movement. Using MPC it is possible to perform privacy-preserving tracking and analytics. Examples can include multi-hospital analytics of patient data using differential privacy. PErformance and acceptance by all parties remains a challenge.

Talk session (chair: Aurélien Bellet)

Keith Bonawitz: Federated Learning: Privacy-Preserving Collaborative Machine Learning without Centralized Training Data [<abstract]

Keith presented the Google Federated learning approach for learning on the device. The sensitive data stored on the phone makes the personalised training a unique challenge to build a central model to be equivalent to all the individually trained models. This can be useful for multitask learning, and learning to learn. He presented a number of papers in topics such as federated learning or distributed mean estimation with limited communications (ICML 2017). Secure aggregation (using SMPC) is an important aspect here in moving away from keeping large datasets to ephemeral data from focused trained models. Using pairwise Diffie-Hellman key agreements they enable secret model sharing across users. Hence updates can be aggregated while not being inspected. Differential privacy can be useful here for aggregating results from queries across multiple databases.

Dan Alistarh: Quantized Stochastic Gradient Descent [<abstract]

Dan presented the computational challenges in large-scale model training.

Talk session (chair: George Giakkoupis)

Hamed Haddadi: Containing Personal Data Processing with the Databox [<abstract]

I presented the Databox Project.

Designing an open source IoT Hub with MQTT and Android

2017-09-01T00:00:00+00:00

With constantly evolving hardware and increased competitiveness from manufacturers in the construction of the IoT enabled home, the difficulty in managing and securing the multitude of internet enabled devices at any individual’s disposal is ever greater, with competing applications tailored to manage Bluetooth devices, Wi-Fi Direct or NFC enabled “things”. While the means of connectivity are ever increasing, the lack of a single standard of IoT connectivity as well as the lack of a single interoperability solution difficult consumer adoption of an internet enabled home.

The solution to these issues is here presented in the form of a single, simple, user-friendly interface that can be intuitively used by any consumer. Pairing this interface with an optimal communication protocol will assist in bridging the interoperability gap and provide the necessary abstraction layer to facilitate the interchange of data regardless of which device is being used. This paper proposes that the solution for both these issues lies with leveraging the capabilities of mobile devices, in this case particularly targeting Android, paired with an integration of the lightweight communication protocol MQTT.

See the full text PDF for details:

Designing an open source IoT Hub: bridging interoperability and security gaps with MQTT and your Android device

The 2nd workshop on Personal Data Systems, Sommarøy, Norway (PDS 2017)

2017-08-17T00:00:00+00:00

Hosted by the Department of Computer Science, UiT – The Arctic University of Norway, Schedule and slides available on:

http://www.corporesano.no/eventspersonal-data-systems-workshop-2017pds-2017-program-preliminary/

Keynote: “Security, Privacy, and the Free Rider Problem: The Dark Side of the Internet of Things”, Stephen B. Wicker, School of Electrical and Computer Engineering, Cornell University

Stephen Wicker discussed the broad side of privacy and economics of the IoT space. Internet connectivity coming to everyday objects, app,s and apparatus (e.g., in health-related products) has seen potentials for botnets (e.g., Dyn attacks), attacks, and privacy threats at fine level and granularity. The lack of security from manufacturers (some hard-coded into the devices) leads to large-scale DNS attacks in this space. Stephan discussed the economics of the Internet, and treating it as a “public good”, and the economic impacts of this philosophy. Though the “common good” does not necessarily mean open access, one needs cultural and societal governance norms and rules to avoid free-rider problems and tragedies. Solutions such as policy-based mechanisms (certification, regulation), and technology-based mechanisms (cheaper security solutions, automated vulnerability analysis tools, etc) could improve this situation.

“Biometric Key Generation from Body Impedance Data“, Kasper Bonne Rasmussen, Department of Computer Science, University of Oxford

Kasper presented a method for generating crypto keys from biometric data and the privacy implications of this approach. The permanent nature of biometric data makes them difficult to use as a key on their own in case of security flaws and leaks. Hence acquisition of biometrics, extracting biometric samples, then features from samples, and generating keys from them makes the process of key-generation repeatable. Using the Siamese networks on a neural nets for comparing the feature vectors of the individuals it’s possible to converge on quantised feature vectors from the same individual. These are then used to feed into a tokenhash and a keyhash functions.This leads to an equivalence of 59-bit keys to guarantee that a person was involved in a transaction.

“WiFi Scanning, Crowd Monitoring, Privacy: An Experience Report“, Maarten van Steen,

University of Twente, The Netherlands

WiFI scanners are mostly designed for MAC address collection and processing for data mining purposes (security, crowd analysis, visitor detection, configured network SSID list, location classification etc). Though there are major privacy challenges here: small MAC address space, identification by opt-out, etc. However research in this space faces issues such as : faulty scanners, irregular/dynamic transmission ranges, signal and timing issues, multiple addresses per device (or vice-versa!), lost data, etc. However, using feature vector extraction methods it is possible to develop fingerprints from by just looking at visiting patterns for locations analysis. Others have already discovered that MAC randomization is a nuisance but not secure, since the implementations are sloppy, as packet information uniquely identifies a device. An observation Maarten made was that apart from ad agencies, most others are interested in aggregate statistics, hence designing systems based on questions asked might help with building useful and privacy-aware systems. Using client-side encryption and bloom-filter-based hashing functions can be a solution in this space, though scalability is an issue.

“SGX Enforcement of Use-Based Privacy”, Eleanor Birrell, Department of Computer Science, Cornell University

Eleanor discussed the privacy vs. utility conflicts of using personal data. Use-based privacy and contextual awareness is key to utilizing personal data. This mandates the presence of an expressive policy language, efficient policy associations, and pervasive enforcement. Examples include user-defined preferences for data sharing to researchers and legal data-use restrictions. The proposed Avenance Policies language enables the data processors to cope with these changes and expressions. They have evaluated the language on privacy preferences of Facebook and other use cases such as HIPPA. The challenges here include enforcement and policy checks, hence a few prototypes has been developed for understanding enforcement scenarios by the provider or delegated monitoring. They have then extended the Ohmage mobile health app API system to provide policy enforcement in few kLoCs, using SGX for program attestation.

“Privacy in the Cloud, Hard-won Lessons from Shipping Information Retrieval and Discovery Experiences at Scale to Microsoft Office 365 Users“, Bjørn Olstad and Troels Walsted Hansen, Microsoft Development Center Norway

Microsoft has seen an increasing success with its cloud-based model. The next step is to use Data+machine learning to reason about data while keeping the users’ trust (e.g., enterprise search). Understanding the types of data and interactions and aggregation of data for analytics is important for organising a product like Office around its users. A social-network-based search for the enterprise (e.g., MS Delve) is an attempt in this space for making the search relevant for an individual. The information can also include graphs from other sources such as LinkedIn. Addressing privacy perceptions and comfort levels are important here.

Some lessons learned: perceived privacy is equal to privacy for most consumers, hence it is important to communicate privacy in an acceptable yet intuitive and simple way to the consumer. There is also a shift away towards simple, user-centric permission models. It’s important to have self-explanatory products and communicate with the user at the time of actions. Signals and their perception can be different from the users’ point of view, for example “editing” a document might be a public signal, while “viewing” it might be a private signal. Presenting the social network of collaborators to the user also eases these choices.

“Building and Measuring Privacy-Preserving Mobility Analytics”, Emiliano De Cristofaro, Department of Computer Science, University College London

Location analytics are important for assessing user behaviours in spaces, urban transport, or crowd management. Two main modes of trusted aggregator, or centralised ways, are present today. Additively homomorphic encryption can be utilised to enable users to take part in such schemes. Yet these might not be scalable on large user spaces/items. Aggregate analytics are useful here for the statistics were individuals should not be identified. Data from 1 month of TFL users, and 1 month of SF cab commutes is used to assess these. Aggregate stats can be used to forecast the traffic very well. Methods such as differential privacy also does not work for such time series data due to utility loss. Prior Knowledge about individuals can improve aggregate estimates and potentially identify individuals probabilistically.

“User-Centric Personal Data Analytics on the Edge”, Hamed Haddadi, School of Electronic Engineering and Computer Science, Queen Mary University of London

Abstract: In this talk, I discuss the ways in which we can utilize edge-computing to improve the scalability and privacy of user-centered analytics in the context of Databox project. I present a hybrid framework where edge devices and resources centered around the user, collectively referred to as fog, can complement the cloud for providing privacy-aware, yet accurate and efficient analytics. I present the evaluations of the proposed framework on a number of exemplar applications, and discuss the broader implications of such approaches for future systems.

“Efficient Machine Learning for Disease Detection in the Human Digestive System”, Michael Riegler, Simula Research Laboratory, Oslo

Michael presented the results of using machine learning for medical image analysis, and challenges with quality of images and complexity in analysis. Using tensorflow and CNNs they get 80% precision and 96% recall, while the global feature approach performs better (94%p, 98%r). They also release new datasets and tools for the vision community to improve detection and localisation of diseases. They found collaborations between the medical doctors and computer scientists challenging! http://datasets.simula.no

“META-pipe: marine metagenomics data analysis service”, Lars Ailo Bongo, Department of Computer Science, UiT – The Arctic University of Norway

Lars presented the methods and data acquisition and training for systems and the marine metagenomics data analysis pipelines developed in the ELIXIR Excelerate project. He presented the systems, security, and implementation challenges of the infrastructure, and possibilities for including human data.

“Safeguarding Analytics on Privacy Sensitive Data”, Anders Tungeland Gjerdrum, Department of Computer Science, UiT – The Arctic University of Norway

Pervasive storing of personal data for monetisation and analytics raises a number of privacy challenges. Trusted computing can be useful in running secure pieces of code and data in trusted environments. Intel SGX and ARM TrustZone provide such mechanisms today by reducing some of the functionality and inherent risks of interrupts, system calls, and illegal instructions using Virtual Enclave Memory. Anders presented a setup for evaluating latency and memory overheads of enclave memory. He provided recommendations for use of enclaves (data leaks versus provision costs) for larger applications and users. Diggi Analytics lets the users implement isolation schemes, and performs security analytics for the configurations. Diggi provides asynchronous and synchronous communications schemes. Future work will focus on storage, caching, and fault tolerance.

Jorg presented Liberouter for DIY networking to provide distributed networking for disconnected areas or cloudless operations. This can also enable localised content sharing. Challenges include distributed data storage management, replications and caching, and distributed access control. Mutable data, arising from the DTN & ICN community, can be utilised with such data sharing (e.g., opportunistic wikipedia!). This approach has interesting problems with mergers or adoption (git-like operations) but it can be useful for transient data sharing. Combined adoption and merging systems perform pretty well for the data. Write access control is more complex (think GoogleDoc) for a distributed system, Using middlebox-style local hubs these permissions can be managed and supported, though challenges remain here in ongoing work.

UW Allen School MSR Summer Institute 2017 (Day 2)

2017-08-02T00:00:00+00:00

Schedule and slides available on:

https://www.cs.washington.edu/mssi/2017/schedule.html

Session 4: Privacy & Security

Lynette Millett, National Academy of Sciences: Avoiding Predictable but Alarming Trajectories

Lynette has had nearly 20 years of experience in the technology policy scene in Washington. She expanded on political and technical definitions of privacy and security in different domains and the importance of making the IoT systems safe and secure, considering the existing shortcomings on the Internet. Historical perspectives from earlier reports were presented during the talk on lessons learnt from major failures in critical systems and scalability of the IoT will put major challenges ahead in this area. Maintenance and overseeing of the deployments of these long-lived systems “out there” will be important.

Josh Siegel, MIT: Context and Cognition for a Secure and Efficient IoT

Josh touched upon data ownership, policy issues and technical issues around IoT systems. Standardised interfaces and familiar architectures (IP, AC electricity, etc) are not yet prevalent in IoT. A common, “human-inspired” architecture is needed to isolate the access to data, from access to actual devices, based on consent, requirements, granularity, etc. A “cognitive firewall” idea has been developed for mirroring the behaviour of these systems in the cloud to address fragmentation, openness, data ownership, security, and resource management. Using the same models behind the cognitive firewall, sampling rates (and consequently bandwidth and energy) may be optimized while complying with application requirements. Additionally, data flow visualisations have been successful in improving user trust, e.g., in the vehicular systems area for vehicle health and safety monitoring.

Dave Thaler, Microsoft: Trusted Cyber-Physical Systems

Dave discussed the economics of security in cyber physical systems in large-scale environments (e.g., factories, hospitals, industry space, smart cities). These systems are vulnerable to state actors, malware, rogue internal agents. Strong security promises are needed for making systems tamper-proof, auditable and accountable, encryption-friendly. Trusted Execution Environments (TEEs like SGX, TrustZone, TPM, etc). Physical security and cryptographic security come hand to hand in the presented framework for trust model presented in TEE from a sensor to data to an actuator or a piece of code. These systems ensure unique identifier per device, authorised code execution, hardware access, and access to trusted peripherals. Trusting the code is still a tough challenge. Though we already know how to solve it, we need it to actually happen!

Philip Levis, Stanford: Safely and Efficiently Programming a 64kB Computer

Phil focuses on securing the IoT, especially from the point of view of the Operating System. He presented Tock, an operating system whose kernel is written in a typesafe language (Rust) for auditability and safety. Capsules are included in the kernel, written in Rust, and use Rust for isolation/safety. Userland processes can be written in any language and use hardware protection for isolation/safety. The kernel dynamically allocates from processes to ensure that a process can’t exhaust the kernel heap and language mechanisms ensure references to these allocations don’t escape into the rest of the kernel. This allows programmability of edge devices (e.g., a smartwatch!) in a language of choice. Authentication of devices remains a challenge here as the app-to-app authentication in an untrusted zone remains a challenge. See more on www.tockos.org

Ben Zorn, Microsoft: Building an Internet of Things we can Trust

Ben manages the research in software engineering in MS. He discussed how the security and privacy risks of many software and hardware platforms, even those by major brands and companies, might be hidden in million lines of codes.He discussed deepspec.org project for software verification and the Everest project on secure software development, aiming to replace untrusted codes in services with verified trusted code. This has been evaluated on protocols such as TLS, open SSL, etc. However in the next phase in future, Ben highlighted the importance of verifying ML/AI models, mathematical reasoning about the models, and understanding data quality and its effects. We need to address human understanding, adversaries, and failsafe operation. Perhaps we could have annual software/IoT vulnerability check ups?

Roundtable Presentations #2

Peter Bodik, Microsoft Research

Deploying even relatively simple IoT applications is complex in current IoT frameworks; it requires stitching together containers running on the edge, through a communication pipeline to the cloud and through several cloud services. This introduces challenges in deployment, monitoring, estimating cost, and so on. Further, there are many different optimization that customers have to implement manually. This requires providing a declarative way to specifying the IoT applications and automatically optimizing and deploying them.

Prabal Dutta, Berkeley: Signpost: Sensors for Urban Monitoring

Prabal focused on urban sensing from pedestrians, sounds, etc in the city. Deployments face challenges in infrastructure and maintenance. Energy adaptivity, ease of installation, HCI, and storage/processing are important factors to consider in such deployments. Many of these problems re-emerge in different settings considering the rise of new hardware and applications.

Dan Lieberman, Pioneer Square Labs: Taking Enterprise IoT from Prototype to Production

Dan discussed the issues and challenges in moving from proof of concept to prototypes to commercial systems. He discussed the software design process and the path to development of devices from the idea to manufacturing.

Chenyang Lu, Washington University: Dependable Internet of Things

End-to-end latency from the sensor to the edge to the cloud to an actuator is an important challenge to address in the IoT space. Use of Edge-cloud processing in virtual machines can help in addressing the demands of real time applications and event processing. The work in RT-Xen project developed demonstrates great performance improvement.

Steve Myers, Indiana University: Long-lived cryptography for long-lived devices and secure updating in the IoT

Steve brought up the issue of long-lived smart devices entering the home and the privacy-security challenges introduced in this space. Data encryption is of importance here, hence these deployments need to be done with the next decade in mind. Advances in faster computing methods such as quantum computing makes this an imminent issues to deal with.

Thomas Pfenning, Microsoft

Thomas presented the Windows IoT Core platform, in mobile and other enterprise devices. The platform aims to get secure systems, with capability of cloud connected solutions. The platform aims to offer scalable device management from the cloud for updates and settings. Intelligence at the edge was highlighted as an area of importance, security of gateway devices, and device capabilities giving rise to local processing capabilities which can be utilised using the platform.

Matt Reynolds, University of Washington: Millimeter Wave Imaging and Long-Range Wireless Power

Battery takes up 50% of the volume and wireless hardware takes over 40% of board area in a smartphone. Hence architectural innovations are needed to improve the status-quo. IoT devices will extend to living things (people, animals) - not just non-living sensors. Using backscatter comms and wireless power systems, Matt’s group has designed a neural data telemetry system to downlink dragonfly brain activity in real time. Matt also presented their sensing efforts on security domains using millimeter wave MIMO for imaging.

Stefan Thom, Microsoft: How to fight an adversary that is not bound by the linearity of time?

Updating devices and their maintenance in-person after firmware issues and leaks is a costly operation. The physical limitations of these systems make it difficult to do malware analytics or software checking easily or do policy verifications by the human operator. Missing the context and device state makes the application analytics challenging for event analytics. Hence a focus on bootloader and boot time code analysis per device was highlighted as a way of establishing identity and issuing certificates for devices. RIoT from MS brings more information on this.

Invited Talk

Chris Diorio, CEO, Vice Chairman, and Founder at Impinj: The Littlest Biggest Internet Opportunity

Abstract: We are fast approaching a day when trillions of everyday items are connected to the Internet. This connectivity presents both challenges and opportunities for the IoT. In this talk I will review RAIN RFID’s significant role in connecting everyday items and will then propose a framework for addressing those IoT challenges and opportunities.

Chris talked about motivations for connecting items, future visions, and opportunities and challenges. How would we go about connecting trillions of items in the near future? History, ownership and services will be recorded. RAIN RFID allows unique, small radio-identified battery-less tags. RAIN RFID alliance has secured spectrum in many countries and is used by many industries. Each tag might be read 10s or 100s of times. The value is putting the data, context and analytics together. Billions of these tags are already used in industries, in consumer sectors like Delta bag tags (a $50m investment) and tracking marathon runners.

Perhaps in the near future, these RAIN tags can be integrated into suitcases and with a phone app and airline registration, to enable a smoother check in and collection process. Though currently the lack of an IoT backend is is slowing the deployment of RAIN readers in phones.

Chris concluded by proposing some principles: items’ digital lives should mirror their physical lives, and their history, ownership and services must be kept in a journal. Services and history are atomic; ownership is chained. Applications’ access journals subject to persistent owner rights. Items will have digital twins in the cloud for journal keeping and data storage. Connecting items should hopefully improve all our lives!

Session 5: Experiencing the IoT

Speakers:

James Landay, Stanford: Out of body User Experience

James presented his group’s experience on visions in interaction with AR and drones. These could include personalised tours and navigation. They observed a cultural dependency on gestures and interactions with drones. They are discovering natural interaction patterns for how people will interact intuitively with objects and public drones one day!

Rajalakshmi Nandakumar, UW Allen School: Interacting with small devices using active sonar

Interaction with IoT devices is a challenge! FingerIO is a motion-based finger tracking software for using speaker and microphone on the device to understand motions based on reflections to enable any surface as an input surface with high accuracy.

Joe Paradiso, MIT Media Lab: Siri/Cortanai/Alexa vs. the Other Me – Pre-Cognitive Human Extension as the Future of IoT

Joe presented the CHAIN-API, an early version of JSON-based open standards for sensor connectivity and data posting. As smart environments will become an extension of self and understand more context, naturally tunneling this information into perception becomes challenging. Joe demoed DoppelLab live visualizing different sensors which are in the Media Lab building with scrambled audio and social media. The demo highlighted the need for a unified control panel for sensor aggregation and bringing data together from different sources. Joe also demonstrated the Tidmarsh living observatory as a live environmental monitoring platform. The audio enables real time classification of different animals at the site. The aim is to understand the Human-Data Interaction and manifestation of data from sensors and the individuals in the environment. He also showed examples of a room that transformed via lighting and projected imagery as a function of a user’s context and affective state.

Gregory Abowd, Georgia Tech: Extreme Ubiquity: [De-emphasizing the importance of Moore’s Law](https://www.cs.washington.edu/mssi/2017/abstracts.html#abowdtalk

Moore’s law is no longer applicable in today as the transistor packing space is saturating. However we can now move on to moving from silicon-based ICs to embedded sensors manufactured items, and computational material which can harvest information and compute, store data, and actuate. The aim of the COSMOS project is to reach there soon!

Invited Talk

Ron Zahavi, Microsoft: IoT Success Factors & Business Models

Abstract: In this session I will describe how IoT combines elements of existing and new capabilities into systems of systems that can be highly complex, involving many new business models. I will then review issues and pitfalls to avoid, different business models, and the elements existing organizations ongoing transformation, as well as startups, need to succeed in the new IoT connected world.

Hamed Haddadi

Guarantee

URKI EPSRC Open Plus Fellowship on Securing the Next Billion Consumer Devices on the Edge

Analytics on the Edge (Privacy, Utility, and Cost)

Wearables - the Last Year’s Hype

Privacy-Preserving time-Series Data Analysis

Privacy-Preserving Analytics using Edge Computing

Workshop on Decentralized Machine Learning, Optimization and Privacy (Sep 11-12, 2017)

Schedule and slides available on:

Talk session (chair: Sébastien Gambs)

Poster spotlights [poster list]

Talk session (chair: Joseph Salmon)

Talk session (chair: Morten Dahl)

Talk session (chair: Aurélien Bellet)

Talk session (chair: George Giakkoupis)

Designing an open source IoT Hub with MQTT and Android

The 2nd workshop on Personal Data Systems, Sommarøy, Norway (PDS 2017)

Hosted by the Department of Computer Science, UiT – The Arctic University of Norway, Schedule and slides available on:

Keynote: “Security, Privacy, and the Free Rider Problem: The Dark Side of the Internet of Things”, Stephen B. Wicker, School of Electrical and Computer Engineering, Cornell University

“Biometric Key Generation from Body Impedance Data“, Kasper Bonne Rasmussen, Department of Computer Science, University of Oxford

“WiFi Scanning, Crowd Monitoring, Privacy: An Experience Report“, Maarten van Steen,

“SGX Enforcement of Use-Based Privacy”, Eleanor Birrell, Department of Computer Science, Cornell University

“Privacy in the Cloud, Hard-won Lessons from Shipping Information Retrieval and Discovery Experiences at Scale to Microsoft Office 365 Users“, Bjørn Olstad and Troels Walsted Hansen, Microsoft Development Center Norway

A legal case faced by Microsoft is the new EU GDPR. Microsoft services must be adapted to ensure GDPR-compliance. The office 365 privacy model gives strong promises on control, ownership and encryption to the consumers. The new MSFT privacy dashboard brings together a range of tools for awareness.

“Building and Measuring Privacy-Preserving Mobility Analytics”, Emiliano De Cristofaro, Department of Computer Science, University College London

“User-Centric Personal Data Analytics on the Edge”, Hamed Haddadi, School of Electronic Engineering and Computer Science, Queen Mary University of London

“Efficient Machine Learning for Disease Detection in the Human Digestive System”, Michael Riegler, Simula Research Laboratory, Oslo

“META-pipe: marine metagenomics data analysis service”, Lars Ailo Bongo, Department of Computer Science, UiT – The Arctic University of Norway

“Safeguarding Analytics on Privacy Sensitive Data”, Anders Tungeland Gjerdrum, Department of Computer Science, UiT – The Arctic University of Norway

“Do You See What I See? — Mutable Data for Localized Data Sharing”, Jörg Ott, Faculty of Informatics, Technische Universität München.

UW Allen School MSR Summer Institute 2017 (Day 2)

Schedule and slides available on:

Session 4: Privacy & Security

Lynette Millett, National Academy of Sciences: Avoiding Predictable but Alarming Trajectories

Josh Siegel, MIT: Context and Cognition for a Secure and Efficient IoT

Dave Thaler, Microsoft: Trusted Cyber-Physical Systems

Philip Levis, Stanford: Safely and Efficiently Programming a 64kB Computer

Ben Zorn, Microsoft: Building an Internet of Things we can Trust

Roundtable Presentations #2

Peter Bodik, Microsoft Research

Prabal Dutta, Berkeley: Signpost: Sensors for Urban Monitoring

Dan Lieberman, Pioneer Square Labs: Taking Enterprise IoT from Prototype to Production

Chenyang Lu, Washington University: Dependable Internet of Things

Steve Myers, Indiana University: Long-lived cryptography for long-lived devices and secure updating in the IoT

Thomas Pfenning, Microsoft

Matt Reynolds, University of Washington: Millimeter Wave Imaging and Long-Range Wireless Power

Stefan Thom, Microsoft: How to fight an adversary that is not bound by the linearity of time?

Invited Talk

Chris Diorio, CEO, Vice Chairman, and Founder at Impinj: The Littlest Biggest Internet Opportunity

Session 5: Experiencing the IoT

James Landay, Stanford: Out of body User Experience

Rajalakshmi Nandakumar, UW Allen School: Interacting with small devices using active sonar

Joe Paradiso, MIT Media Lab: Siri/Cortanai/Alexa vs. the Other Me – Pre-Cognitive Human Extension as the Future of IoT

Gregory Abowd, Georgia Tech: Extreme Ubiquity: [De-emphasizing the importance of Moore’s Law](https://www.cs.washington.edu/mssi/2017/abstracts.html#abowdtalk

Invited Talk

Ron Zahavi, Microsoft: IoT Success Factors & Business Models