2024
Traditional malware-detection research has focused on techniques for detection on end hosts or passively on networks. In contrast, global malware detection on the Internet using active Internet scanning remains relatively unstudied, with research still relying on manual reverse engineering and handwritten scanning code.
In this work, we introduce BluePrint, the first end-to-end system for analyzing new samples of "server-like malware" and automatically preparing and executing Internet scans for them. BluePrint requires only a low degree of human involvement, requiring only analysis of results before launching Internet scans. Important high-level challenges BluePrint must overcome include resiliency and scalability issues with symbolic execution, state explosion from some common networking code patterns, and a high number of duplicate symbolic network signatures with only small, inconsequential structural differences. We solve these with novel binary analysis techniques; respectively, "path sketches" for guided symbolic execution, new symbolic models for accept() and recv(), and an efficient and effective signature deduplication algorithm.
We evaluate BluePrint on a varied selection of server-like malware, and demonstrate that it successfully identifies infected devices both in simulated local network experiments and on the Internet. We then perform more detailed analyses to characterize the infected hosts found by our scanning experiments, and find that they span a wide range of usage scenarios and geographical locations. We also show that many of these devices seem to have poor general security posture, and that the infections persist for months.
@inproceedings{stevens:blueprint, title = {{BluePrint: Automatic Malware Signature Generation for Internet Scanning}}, author = {Kevin Stevens and Mert Erdemir and Hang Zhang and Taesoo Kim and Paul Pearce}, year = 2024, month = sep, doi = {10.1145/3678890.3678923}, booktitle = {Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (RAID)}, address = {Padua, Italy}, }
Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness levels and the adequacy of their risk mitigation strategies.
Based on a semi-structured interview study (N = 14) and a follow-up survey (N = 493), this paper investigates DeFi users' security perceptions and commonly adopted practices, and how those affected by previous scams or hacks (DeFi victims) respond and try to recover their losses. Our analysis shows that users often prefer DeFi over CeFi due to their decentralized nature and strong profitability. Despite being aware that DeFi, compared to CeFi, is prone to more severe attacks, users are willing to take those risks to explore new investment opportunities. Worryingly, most victims do not learn from previous experiences; unlike victims studied through traditional systems, DeFi victims tend to find new services, without revising their security practices, to recover their losses quickly. The abundance of various DeFi services and opportunities allows victims to continuously explore new financial opportunities, and this reality seems to cloud their security priorities. Indeed, our results indicate that DeFi users' strong financial motivations outweigh their security concerns – much like those who are addicted to gambling. Our observations about victims' post-incident behaviors suggest that stronger control in the form of industry regulations would be necessary to protect DeFi users from future breaches.
@inproceedings{liu:defi, title = {{I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures}}, author = {Mingyi Liu and Jun Ho Huh and HyungSeok Han and Jaehyuk Lee and Jihae Ahn and Frank Li and Hyoungshick Kim and Taesoo Kim}, booktitle = {Proceedings of the 33rd USENIX Security Symposium (Security)}, month = aug, year = 2024, address = {Philadelphia, PA}, }
ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential security risks posed by speculative execution attacks against MTE. Specifically, this paper identifies new TIKTAG gadgets capable of leaking the MTE tags from arbitrary memory addresses through speculative execution. With TIKTAG gadgets, attackers can bypass the probabilistic defense of MTE, increasing the attack success rate by close to 100%. We demonstrate that TIKTAG gadgets can be used to bypass MTE-based mitigations in real-world systems, Google Chrome and the Linux kernel. Experimental results show that TIKTAG gadgets can successfully leak an MTE tag with a success rate higher than 95% in less than 4 seconds. We further propose new defense mechanisms to mitigate the security risks posed by TIKTAG gadgets.
@inproceedings{kim:tiktag, title = {{Bypassing ARM's Memory Tagging Extension with a Side-Channel Attack}}, author = {Juhee Kim and Jinbum Park and Sihyeon Roh and Jaeyoung Chung and Youngjoo Lee and Taesoo Kim and and Byoungyoung Lee}, booktitle = {Black Hat USA Briefings (Black Hat USA)}, month = aug, year = 2024, address = {Las Vegas, NV}, }
The rise of cloud computing, IoT, and edge computing has led users to often give up data control to third-party providers, raising security concerns. Trusted Execution Environments (TEEs), initially developed for cloud computing, create secure processor areas to protect sensitive data. However, TEEs are not yet integrated into emerging platforms due to their recency and ongoing development. Despite this, increasing security expectations and new privacy regulations necessitate adapting TEEs for these platforms. This thesis focuses on hardening and adapting TEEs for emerging platforms.
To harden existing TEEs, this thesis first presents PRIDWEN, a novel framework that dynamically synthesizes a secure TEE program that is optimally hardened against various side-channel attacks (SCAs) simultaneously. This thesis then presents SENSE, an architectural extension that allows TEE programs to subscribe to fine-grained microarchitectural events, thus improving the microarchitectural awareness of TEEs and enabling proactive defenses previously unfeasible. To enable TEEs on emerging platforms, this thesis presents PORTAL, a secure and efficient device I/O interface for Arm Confidential Compute Architecture (CCA) on modern mobile Arm processors. PORTAL addresses challenges due to memory encryption in the architectural trend of an increasing number of integrated devices within Arm processors. By leveraging Arm CCA’s memory isolation mechanism, PORTAL enforces hardware-level access control without memory encryption. PORTAL offers robust security guarantees while eliminating the overhead of memory encryption, maintaining the performance and energy requirement crucial for emerging mobile platforms.
@phdthesis{sang:thesis, title = {{Hardening and Adapting Trusted Execution Environments for Emerging Platforms}}, author = {Fan Sang}, school = {{Georgia Institute of Technology}}, year = 2024, month = aug, }
Effectively mitigating side-channel attacks (SCAs) in Trusted Execution Environments (TEEs) remains challenging despite advances in existing defenses. Current detection-based defenses hinge on observing abnormal victim performance characteristics but struggle to detect attacks leaking smaller portions of the secret across multiple executions. Limitations of existing detection-based defenses stem from various factors, including the absence of a trusted microarchitectural data source in TEEs, low-quality available data, inflexibility of victim responses, and platform-specific constraints. We contend that the primary obstacles to effective detection techniques can be attributed to the lack of direct access to precise microarchitectural information within TEEs.
We propose Sense, a solution that actively exposes underlying microarchitectural information to userspace TEEs. Sense enables userspace software in TEEs to subscribe to fine-grained microarchitectural events and utilize the events as a means to contextualize the ongoing microarchitectural states. We initially demonstrate Sense’s capability by applying it to defeat the state-of-the-art cache-based side-channel attacks. We conduct a comprehensive security analysis to ensure that Sense does not leak more information than a system without it does. We prototype Sense on a gem5-based emulator, and our evaluation shows that Sense is secure, can effectively defeats cache SCAs, and incurs negligible performance overhead (1.2%) under benign situations.
@inproceedings{sang:sense, title = {{Sense: Enhancing Microarchitectural Awareness for TEEs via Subscription-Based Notification}}, author = {Fan Sang and Jaehyuk Lee and Xiaokuan Zhang and Meng Xu and Scott Constable and Yuan Xiao and Michael Steiner and Mona Vij and Taesoo Kim}, booktitle = {Proceedings of the 2024 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2024, address = {San Diego, CA}, }
2023
Intel® Software Guard Extensions (Intel® SGX) supports the creation of shielded enclaves within unprivileged processes. While enclaves are architecturally protected against malicious system software, Intel SGX’s privileged attacker model could potentially expose enclaves to new powerful side-channel attacks. In this paper, we consider hardware-software co-design countermeasures to an important class of single-stepping attacks that use privileged timer interrupts to precisely step through enclave execution exactly one instruction at a time, as supported, e.g., by the open-source SGX-Step framework. This is a powerful deterministic attack primitive that has been employed in a broad range of high-resolution Intel SGX attacks, but so far remains unmitigated.
We propose AEX-Notify, a flexible hardware ISA extension that makes enclaves interrupt aware: enclaves can register a trusted handler to be run after an interrupt or exception. AEX-Notify can be used as a building block for implementing countermeasures against different types of interrupt-based attacks in software. With our primary goal to thwart deterministic single-stepping, we first diagnose the underlying hard- ware behavior to determine the root cause that enables it. We then apply the learned insights to remove this root cause by building an efficient software handler and constant-time disassembler to transparently determine and atomically prefetch the working set of the next enclave application instruction.
The ISA extension we propose in this paper has been incorporated into a revised version of the Intel SGX specification.
@inproceedings{constable:aex-notify, title = {{AEX-Notify: Thwarting Precise Single-Stepping Attacks through Interrupt Awareness for Intel SGX Enclaves}}, author = {Scott Constable and Jo Van Bulck and Xiang Cheng and Yuan Xiao and Cedric Xing and Ilya Alexandrovich and Taesoo Kim and Frank Piessens and Mona Vij and Mark Silberstein}, booktitle = {Proceedings of the 32nd USENIX Security Symposium (Security)}, month = aug, year = 2023, address = {Anaheim, CA}, }
Blockchains with smart contracts are distributed ledger systems that achieve block-state consistency among distributed nodes by only allowing deterministic operations of smart contracts. However, the power of smart contracts is enabled by interacting with stochastic off-chain data, which in turn opens the possibility to undermine the block-state consistency. To address this issue, an oracle smart contract is used to provide a single consistent source of external data; but, simultaneously, this introduces a single point of failure, which is called the oracle problem. To address the oracle problem, we propose an adaptive conformal consensus (\texttt{ACon}$^2$) algorithm that derives consensus from multiple oracle contracts via the recent advance in online uncertainty quantification learning. In particular, the proposed algorithm returns a consensus set, which quantifies the uncertainty of data and achieves a desired correctness guarantee in the presence of Byzantine adversaries and distribution shift. We demonstrate the efficacy of the proposed algorithm on two price datasets and an Ethereum case study. In particular, the Solidity implementation of the proposed algorithm shows the potential practicality of the proposed algorithm, implying that online machine learning algorithms are applicable to address issues in blockchains.
@inproceedings{park:acon2, title = {{ACon2: Adaptive Conformal Consensus for Provable Blockchain Oracles}}, author = {Sangdon Park and Osbert Bastani and Taesoo Kim}, booktitle = {Proceedings of the 32nd USENIX Security Symposium (Security)}, month = aug, year = 2023, address = {Anaheim, CA}, }
Fuzzing has gained in popularity for software vulnerability detection by virtue of the tremendous effort to develop a diverse set of fuzzers. Thanks to various fuzzing techniques, most of the fuzzers have been able to demonstrate great performance on their selected targets. However, paradoxically, this diversity in fuzzers also made it difficult to select fuzzers that are best suitable for complex real-world programs, which we call selection burden. Communities attempted to address this problem by creating a set of standard benchmarks to compare and contrast the performance of fuzzers for a wide range of applications, but the result was always a suboptimal decision - the best performing fuzzer on average does not guarantee the best outcome for the target of a user's interest.
To overcome this problem, we propose an automated, yet non-intrusive meta-fuzzer, called autofz, to maximize the benefits of existing state-of-the-art fuzzers via dynamic composition. To an end user, this means that, instead of spending time on selecting which fuzzer to adopt (similar in concept to hyperparameter tuning in ML), one can simply put all of the available fuzzers to autofz (similar in concept to AutoML), and achieve the best, optimal result. The key idea is to monitor the runtime progress of the fuzzers, called trends (similar in concept to gradient descent), and make a fine-grained adjustment of resource allocation (e.g., CPU time) of each fuzzer. This is a stark contrast to existing approaches that statically combine a set of fuzzers, or via exhaustive pre-training per target program - autofz deduces a suitable set of fuzzers of the active workload in a fine-grained manner at runtime. Our evaluation shows that, given the same amount of computation resources, autofz outperforms any best-performing individual fuzzers in 11 out of 12 available benchmarks and beats the best, collaborative fuzzing approaches in 19 out of 20 benchmarks without any prior knowledge in terms of coverage. Moreover, on average, autofz found 152% more bugs than individual fuzzers on unifuzz and fts, and 415% more bugs than collaborative fuzzing on unifuzz.
@inproceedings{fu:autofz, title = {{autofz: Automated Fuzzer Composition at Runtime}}, author = {Yu-Fu Fu and Jaehyuk Lee and Taesoo Kim}, booktitle = {Proceedings of the 32nd USENIX Security Symposium (Security)}, month = aug, year = 2023, address = {Anaheim, CA}, }
Decompilation is a crucial capability in forensic analysis, facilitating analysis of unknown binaries. The recent rise of Python malware has brought attention to Python decompilers that aim to obtain source code representation from a Python binary. However, Python decompilers fail to handle various binaries, limiting their capabilities in forensic analysis.
This paper proposes a novel solution that transforms a decompilation error-inducing Python binary into a decompilable binary. Our key intuition is that we can resolve the decompilation errors by transforming error-inducing code blocks in the input binary into another form. The core of our approach is the concept of Forensically Equivalent Transformation (FET) which allows non-semantic preserving transformation in the context of forensic analysis. We carefully define the FETs to minimize their undesirable consequences while fixing various error-inducing instructions that are difficult to solve when preserving the exact semantics. We evaluate the prototype of our approach with 17,117 real-world Python malware samples causing decompilation errors in five popular decompilers. It successfully identifies and fixes 77,022 errors. Our case studies demonstrate our approach can handle various anti-analysis techniques, including opcode remapping, and migrate Python 3.9 binaries to 3.8 binaries.
@inproceedings{ahad:pyfet, title = {{PyFET: Forensically Equivalent Transformation for Python Binary Decompilation}}, author = {Ali Ahad and Chijung Jung and Ammar Askar and Doowon Kim and Taesoo Kim and Yonghwi Kwon}, booktitle = {Proceedings of the 44th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2023, address = {San Francisco, CA}, }
@inproceedings{jeong:utopia, title = {{UTOPIA: Automatic Fuzz Driver Generation using Unit Tests}}, author = {Bokdeuk Jeong and Joonun Jang and Hayoon Yi and Jiin Moon and Junsik Kim and Intae Jeon and Taesoo Kim and WooChul Shim and Yong Ho Hwang}, booktitle = {Proceedings of the 44th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2023, address = {San Francisco, CA}, }
The recovery of contextual meanings on a machine code is required by a wide range of binary analysis applications, such as bug discovery, malware analysis, and code clone detection. To accomplish this, advancements on binary code analysis borrow the techniques from natural language processing to automatically infer the underlying semantics of a binary, rather than replying on manual analysis. One of crucial pipelines in this process is instruction normalization, which helps to reduce the number of tokens and to avoid an out-of-vocabulary (OOV) problem. However, existing approaches often substitutes the operand(s) of an instruction with a common token (e. g., callee target → FOO), inevitably resulting in the loss of important information. In this paper, we introduce well-balanced instruction normalization (WIN), a novel approach that retains rich code information while minimizing the downsides of code normalization. With large swaths of binary code, our finding shows that the instruction distribution follows Zipf’s Law like a natural language, a function conveys contextually meaningful information, and the same instruction at different positions may require diverse code representations. To show the effectiveness of WIN, we present DEEP SEMANTIC that harnesses the BERT architecture with two training phases: pre-training for generic assembly code representation, and fine-tuning for building a model tailored to a specialized task. We define a downstream task of binary code similarity detection, which requires underlying code semantics. Our experimental results show that our binary similarity model with WIN outperforms two state-of-the-art binary similarity tools, DeepBinDiff and SAFE, with an average improvement of 49.8% and 15.8%, respectively.
@article{koo:deepsemantic, title = {{Binary Code Representation With Well-Balanced Instruction Normalization}}, author = {Hyungjoon Koo and Soyeon Park and Daejin Choi and Taesoo Kim}, journal = {IEEE Access}, year = 2023, }
Memory protection keys offers per-thread memory protection with an affordable overhead, which prompts a lot of new studies in research. With protection key extension, MPK provides more fine-grained protection and better functionality. Although not yet widely adopted, MPK can be an attractive option for memory protection in industry.
@article{park:mpkfacts, title = {{Memory Protection Keys: facts, key extension perspectives, and discussions}}, author = {Soyeon Park and Sangho Lee and Taesoo Kim}, journal = {IEEE Security \& Privacy}, year = 2023, }
2022
External vendors develop a significant percentage of Windows kernel drivers, and Microsoft relies on these vendors to handle all aspects of driver security. Unfortunately, device vendors are not immune to software bugs, which in some cases can be exploited to gain elevated privileges. Testing the security of kernel drivers remains challenging: the lack of source code, the requirement of the presence of a physical device, and the need for a functional kernel execution environment are all factors that can prevent thorough security analysis. As a result, there are no binary analysis tools that can scale and accurately find bugs at the Windows kernel level.
To address these challenges, we introduce POPKORN, a lightweight framework that harnesses the power of taint analysis and targeted symbolic execution to automatically find security bugs in Windows kernel drivers at scale. Our system focuses on a class of bugs that affect security-critical Windows API functions used in privilege-escalation exploits. POPKORN analyzes drivers independently of both the kernel and the device, avoiding the complexity of performing a full-system analysis.
We evaluate our system on a diverse dataset of 212 unique signed Windows kernel drivers. When run against these drivers, POPKORN reported 38 high impact bugs in 27 unique drivers, with manual verification revealing no false positives. Among the bugs we found, 31 were previously unknown vulnerabilities that potentially allow for Elevation of Privilege (EoP). During this research, we have received two CVEs and six acknowledgments from different driver vendors, and we continue to work with vendors to fix the issues that we identified.
@inproceedings{gupta:popkorn, title = {{POPKORN: Popping Windows Kernel Drivers At Scale}}, author = {Rajat Gupta and Lukas Dresel and Noah Spahn and Giovanni Vigna and Christopher Kruegel and Taesoo Kim}, booktitle = {Proceedings of the Annual Computer Security Applications Conference (ACSAC)}, month = dec, year = 2022, address = {Austin, TX}, }
Autonomous driving has become real; semi-autonomous driving vehicles in an affordable price range are already on the streets, and major automotive vendors are actively developing full self-driving systems to deploy them in this decade. Before rolling the products out to the end-users, it is critical to test and ensure the safety of the autonomous driving systems, consisting of multiple layers intertwined in a complicated way. However, while safety-critical bugs may exist in any layer and even across layers, relatively little attention has been given to testing the entire driving system across all the layers. Prior work mainly focuses on white-box testing of individual layers and preventing attacks on each layer.
In this paper, we aim at holistic testing of autonomous driving systems that have a whole stack of layers integrated in their en- tirety. Instead of looking into the individual layers, we focus on the vehicle states that the system continuously changes in the driving environment. This allows us to design DriveFuzz, a new system- atic fuzzing framework that can uncover potential vulnerabilities regardless of their locations. DriveFuzz automatically generates and mutates driving scenarios based on diverse factors leveraging a high-fidelity driving simulator. We build novel driving test oracles based on the real-world traffic rules to detect safety-critical misbe- haviors, and guide the fuzzer towards such misbehaviors through driving quality metrics referring to the physical states of the vehicle.
DriveFuzz has discovered 30 new bugs in various layers of two autonomous driving systems (Autoware and CARLA Behavior Agent) and three additional bugs in the CARLA simulator. We fur- ther analyze the impact of these bugs and how an adversary may exploit them as security vulnerabilities to cause critical accidents in the real world.
@inproceedings{kim:drivefuzz, title = {{DriveFuzz: Discovering Autonomous Driving Bugs through Driving Quality-Guided Fuzzing}}, author = {Seulbae Kim and Major Liu and Junghwan Rhee and Yuseok Jeon and Yonghwi Kwon and Chung Hwan Kim}, booktitle = {Proceedings of the 29th ACM Conference on Computer and Communications Security (CCS)}, month = nov, year = 2022, address = {Los Angeles, CA}, }
Robotic systems are becoming an integral part of human lives. Responding to the increased demands for robot productions, Ro- bot Operating System (ROS), an open-source middleware suite for robotic development, is gaining traction by providing practical tools and libraries for quickly developing robots. In this paper, we are concerned with a relatively less-tested class of bugs in ROS and ROS-based robotic systems, called semantic correctness bugs, in- cluding the violation of specification, violation of physical laws, and cyber-physical discrepancy. These bugs often stem from the cyber-physical nature of robotic systems, in which noisy hardware components are intertwined with software components, and thus cannot be detected by existing fuzzing approaches that mostly focus on finding memory-safety bugs.
We propose RoboFuzz, a feedback-driven fuzzing framework that integrates with ROS and enables testing of the correctness bugs. RoboFuzz features (1) data type-aware mutation for effec- tively stressing data-driven ROS systems, (2) hybrid execution for acquiring robotic states from both real-world and a simulator, cap- turing unforeseen cyber-physical discrepancies, (3) an oracle han- dler that identifies correctness bugs by checking the execution states against predefined correctness oracles, and (4) a semantic feedback engine for providing augmented guidance to the input mu- tator, complementing classic code coverage-based feedback, which is less effective for distributed, data-driven robots. By encoding the correctness invariants of ROS and four ROS-compatible robotic systems into specialized oracles, RoboFuzz detected 30 previously unknown bugs, of which 25 are acknowledged and six have been fixed.
@inproceedings{kim:robofuzz, title = {{RoboFuzz: Fuzzing Robotic Systems over Robot Operating System (ROS) for Finding Correctness Bugs}}, author = {Seulbae Kim and Taesoo Kim}, booktitle = {Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)}, month = nov, year = 2022, address = {Singapore}, }
Fuzzing is a practical technique to automatically find vulnerabilities in software. It is a proper application to scale on distributed computing platforms thanks to its parallelizability. Therefore, individual researchers and companies typically setup fuzzing platforms on multiple servers and run fuzzers in parallel. However, as such resources are private, they suffer from financial/physical limits. In this paper, we propose Fuzzing@Home; the first public collaborative fuzzing network, based on heterogeneous machines owned by potentially untrusted users. Using our system, multiple organizations (or individuals) can easily collaborate to fuzz a software of common interest in an efficient way. For the general public, one can participate and earn economic benefits if the fuzzing network is tied to a bug-bounty program, or simply donate spare computing power as a volunteer. If the network compensates collaborators, system fairness becomes an issue. In this light, we tailor fuzzers to make the computation result verifiable and devise cheat detection techniques to ensure integrity and fairness in collaboration. In terms of performance, we devise a technique to effectively sync the global coverage state hence minimizing the overhead for verifying computation results. Finally, as an experimental approach, Fuzzing@Home uses WebAssembly to run fuzzers inside the web browser engine, allowing anyone to instantly join a fuzzing network with a single click on their mobile phone, tablet, or any modern computing device. To evaluate our system, we bootstrapped Fuzzing@Home with 72 open-source projects and ran experimental fuzzing networks for 330 days with 826 collaborators as beta testers
@inproceedings{jang:fuzzcoin, title = {{Fuzzing@Home: Distributed Fuzzing on Untrusted Heterogeneous Clients}}, author = {Daehee Jang and Ammar Askar and Insu Yun and Stephen Tong and Yiqin Cai and Taesoo Kim}, booktitle = {Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses (RAID)}, month = oct, year = 2022, address = {Limassol, Cyprus}, }
This paper presents an in-kernel, hardware-based control-flow integrity (CFI) protection, called PAL, that utilizes ARM's Pointer Authentication (PA). It provides three important benefits over commercial, state-of-the-art PA-based CFIs like iOS's: 1) enhancing CFI precision via automated refinement techniques, 2) addressing hindsight problems of PA for inkernel uses such as preemptive hijacking and brute-forcing attacks, and 3) assuring the algorithmic or implementation correctness via post validation.
PAL achieves these goals in an OS-agnostic manner, so could be applied to commodity OSes like Linux and FreeBSD. The precision of the CFI protection can be adjusted for better performance or improved for better security with minimal engineering efforts. Our evaluation shows that PAL incurs negligible performance overhead: e.g., <1% overhead for Apache benchmark and 3-5% overhead for Linux perf benchmark on the latest Mac mini (M1). Our post-validation approach helps us ensure the security invariant required for the safe uses of PA inside the kernel, which also reveals new attack vectors on the iOS kernel. PAL as well as the CFI-protected kernels will be open sourced.
@inproceedings{yoo:pal, title = {{In-Kernel Control-Flow Integrity on Commodity OSes using ARM Pointer Authentication}}, author = {Sungbae Yoo and Jinbum Park and Seolheui Kim and Yeji Kim and Taesoo Kim}, booktitle = {Proceedings of the 31st USENIX Security Symposium (Security)}, month = aug, year = 2022, address = {Boston, MA}, }
While there exist many consistency models for distributed systems, most of those models seek to provide the basic guarantee of convergence: given enough time and no further inputs, all replicas in the system should eventually converge to the same state. However, because of Convergence Failure Bugs (CFBs), many distributed systems do not provide even this basic guarantee. The violation of the convergence property can be crucial to safety-critical applications collectively working together with a shared distributed system. Indeed, many CFBs are reported as major issues by developers. Our key insight is that CFBs are caused by divergence, or differences between the state of replicas, and that a focused exploration of divergence states can reveal bugs in the convergence logic of real distributed systems while avoiding state explosion. Based on this insight, we have designed and implemented Modulo, the first Model-Based Testing tool using Divergence Resync Models (DRMs) to systematically explore divergence and convergence in real distributed systems. Modulo uses DRMs to explore an abstract state machine of the system and derive schedules, the intermediate representation of test cases, which are then translated into test inputs and injected into systems under test (SUTs). We ran Modulo to check ZooKeeper, MongoDB, and Redis and found 11 bugs (including 6 previously unknown ones)
@inproceedings{kim:modulo, title = {{Modulo: Finding Convergence Failure Bugs in Distributed Systems with Divergence Resync Models}}, author = {Beom Heyn Kim and Taesoo Kim and David Lie}, booktitle = {Proceedings of the 2022 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2022, address = {Carlsbad, CA}, }
A growing class of threats to Intel Software Guard Extensions (SGX) is Side-Channel Attacks (SCAs). As a response, numerous countermeasures have been proposed. However, it is hard to incorporate them to protect SGX programs against multiple SCAs simultaneously. A naive combination of distinct countermeasures does not work in practice because some of them are 1) undeployable in target environments lacking dependent hardware features, 2) redundant if there are already defenses with similar functionalities, and 3) incompatible with each other by design or implementation. Identifying all of such conditions and preparing potential workarounds before deployment are challenging, primarily when an SGX program targets multiple platforms that abstract or manipulate their configurations.
Pridwen is a framework that selectively applies essential SCA countermeasures when loading an SGX program based on the configurations of the target execution platform. Pridwen allows a developer to deploy a program in the form of WebAssembly (Wasm). Upon receiving a Wasm binary, Pridwen probes the current hardware configuration, synthesizes a program (i.e., a native binary) with an optimal set of countermeasures, and validates the final binary. Pridwen supports both software-only and hardware-assisted countermeasures, and our evaluations show Pridwen efficiently, faithfully synthesizes multiple benchmark programs and real-world applications while securing them against multiple SCAs.
@inproceedings{sang:pridwen, title = {{Pridwen: Universally Hardening SGX Programs via Load-Time Synthesis}}, author = {Fan Sang and Ming-Wei Shih and Sangho Lee and Xiaokuan Zhang and Michael Steiner and Mona Vij and Taesoo Kim}, booktitle = {Proceedings of the 2022 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2022, address = {Carlsbad, CA}, }
Kernel synchronization primitives are the backbone of any OS design. Kernel locks, for instance, are crucial for both application performance and correctness. However, unlike application locks, kernel locks are far from the reach of application developers, who have minimal interpolation of the kernel's behavior and cannot control or influence the policies that govern kernel synchronization behavior. This disconnect between the kernel and applications can lead to pathological scenarios in which optimizing the kernel synchronization primitives under one context, such as high contention, leads to adversarial effects under a context with no lock contention. In addition, rapid-evolving heterogeneous hardware makes kernel lock development too slow for modern applications with stringent performance requirements and frequent deployment timelines.
This paper addresses the above issues with application-informed kernel synchronization primitives. We allow application developers to deploy workload-specific and hardware-aware kernel lock policies to boost application performance, resolve pathological usage of kernel locks, and even enable dynamic profiling of locks of interest. To showcase this idea, we design SynCord, a framework to modify kernel locks without recompiling or rebooting the kernel. SynCord abstracts key behaviors of kernel locks and exposes them as APIs for designing user-defined kernel locks. SynCord provides the mechanisms to customize kernel locks safely and correctly from the user space. We design five lock policies specialized for new heterogeneous hardware and specific software requirements. Our evaluation shows that SynCord incurs minimal runtime overhead and generates kernel locks with performance comparable to that of the state-of-the-art locks.
@inproceedings{park:syncord, title = {{Application-Informed Kernel Synchronization Primitives}}, author = {Sujin Park and Diyu Zhou and Yuchen Qian and Irina Calciu and Taesoo Kim and Sanidhya Kashyap}, booktitle = {Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}, month = jul, year = 2022, address = {Carlsbad, CA}, }
In this paper, we present DynaGraph, a system that supports dynamic Graph Neural Networks (GNNs) efficiently. Based on the observation that existing proposals for dynamic GNN architectures combine techniques for structural and temporal information encoding independently, DynaGraph proposes novel techniques that enable cross optimizations across these tasks. It uses cached message passing and timestep fusion to significantly reduce the overhead associated with dynamic GNN processing. It further proposes a simple distributed data-parallel dynamic graph processing strategy that enables scalable dynamic GNN computation. Our evaluation of DynaGraph on a variety of dynamic GNN architectures and use cases shows a speedup of up to 2.7X compared to existing state-of-the-art frameworks.
@inproceedings{guan:dynagraph, title = {{DynaGraph: Dynamic Graph Neural Networks at Scale}}, author = {Mingyu Guan and Anand Padmanabha Iyer and Taesoo Kim}, booktitle = {Proceedings of the 2022 Joint Workshop on Graph Data Management Experiences and Systems (GRADES) and Network Data Analytics (NDA)}, month = jun, year = 2022, address = {Philadelphia, PA, USA}, }
Dynamic program analyses are essential to creating safe, reliable, and productive computing environments. However, these analyses are challenging and time-consuming to construct due to the low-level optimization required to achieve acceptable performance. Consequently, many analyses are often never realized, or have inefficient implementations. In this work we argue that many analyses can and should be constructed with a high-level description language, leaving the burden of low-level optimizations to the analysis instrumentation system itself. We propose a novel language for dynamic analysis called ALDA. ALDA leverages common structuring of dynamic analyses to provide a simple and high-level description for dynamic analyses. Through restricting the supported behaviors to only essential operations in dynamic analyses, an optimizing compiler for ALDA can create analysis implementations with performance on-par to hand- tuned analyses. To demonstrate ALDA’s universality and efficiency, we create an optimizing compiler for ALDA targeting the LLVM instrumentation framework named ALDAcc. We use ALDAcc to construct 8 different dynamic analysis algorithms, including the popular MemorySanitizer analysis, and show their construction is succinct and simple. By comparing two of them (Eraser and MemorySanitizer) with their hand-tuned implementations, we show that ALDAcc’s optimized analyses are comparable to hand-tuned implementations
@inproceedings{cheng:alda, title = {{Creating Concise and Efficient Dynamic Analyses with ALDA}}, author = {Xiang Cheng and David Devecsery}, booktitle = {Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, month = mar, year = 2022, address = {Lausanne, Switzerland}, }
2021
Secure allocators have been extensively studied to mitigate heap vulnerabilities. They employ safe designs and randomized mechanisms to stop or mitigate heap exploitation. Despite extensive research efforts, secure allocators can only be evaluated by with theoretical analysis or pre-defined data sets, which are insufficient to effectively reflect powerful adversaries in the real world.
In this paper, we present HardsHeap, an automatic tool for evaluating secure allocators. The key idea of HardsHeap is to use random testing (i.e., fuzzing) to evaluate secure allocators. To handle the diverse properties of secure allocators, HardsHeap supports an extensible framework, making it easy to write a validation logic for each property. Moreover, HardsHeap employs sampling-based testing, which enables us to evaluate a probabilistic mechanism prevalent in secure allocators. To eliminate redundancy in findings from HardsHeap, we devise a new technique called Statistical Significance Delta Debugging (SSDD), which extends the existing delta debugging for stochastically reproducible test cases.
We evaluated HardsHeap to 10 secure allocators. Consequently, we found 56 interesting test cases, including several unsecure yet underestimated behaviors for handling large objects in secure allocators. Moreover, we discovered 10 implementation bugs. One of the bugs is integer overflow in secure allocators, making them even more invulnerable than ordinary allocators. Our evaluation also shows that SSDD successfully reduces test cases by 37.2% on average without a loss of reproducibility.
@inproceedings{yun:hardsheap, title = {{HardsHeap: A Universal and Extensible Framework for Evaluating Secure Allocators}}, author = {Insu Yun and Woosun Song and Seunggi Min and Taesoo Kim}, booktitle = {Proceedings of the 28th ACM Conference on Computer and Communications Security (CCS)}, month = nov, year = 2021, address = {Seoul, South Korea}, }
Coverage-guided fuzzing is considered one of the most efficient bug-finding techniques, given its number of bugs reported. However, coverage tracing provided by existing software-based approaches, such as source instrumentation and dynamic binary translation, can incur large overhead. Hindered by the significantly lowered execution speed, it also becomes less beneficial to improve coverage feedback by incorporating additional execution states.
In this paper, we propose SNAP, a customized hardware platform that implements hardware primitives to enhance the performance and precision of coverage-guided fuzzing. By sitting at the bottom of the computer stack, SNAP leverages the existing CPU pipeline and micro-architectural features to provide coverage tracing and rich execution semantics with near-zero cost regardless of source code availability. Prototyped as a synthesized RISC-V BOOM processor on FPGA, SNAP incurs a barely 3.1 tracing overhead on the SPEC benchmarks while achieving a 228x higher fuzzing throughput than the existing software-based solution. Posing only a 4.8% area and 6.5% power overhead, SNAP is highly practical and can be adopted by existing CPU architectures with minimal changes.
@inproceedings{ding:snap, title = {{Hardware Support to Improve Fuzzing Performance and Precision}}, author = {Ren Ding and Yonghae Kim and Fan Sang and Wen Xu and Gururaj Saileshwar and Taesoo Kim}, booktitle = {Proceedings of the 28th ACM Conference on Computer and Communications Security (CCS)}, month = nov, year = 2021, address = {Seoul, South Korea}, }
Rust is a promising system programming language that guarantees memory safety at compile time. To support diverse requirements for system software such as accessing low-level hardware, Rust allows programmers to perform operations that are not protected by the Rust compiler with the unsafe keyword. However, Rust's safety guarantee relies on the soundness of all unsafe code in the program as well as the standard and external libraries, making it hard to reason about their correctness. In other words, a single bug in any unsafe code breaks the whole program's safety guarantee.
In this paper, we introduce RUDRA, a program that analyzes and reports potential memory safety bugs in unsafe Rust. Since a bug in unsafe code threatens the foundation of Rust's safety guarantee, our primary focus is to scale our analysis to all the packages hosted in the Rust package registry. RUDRA can scan the entire registry (43k packages) in 6.5 hours and identified 264 previously unknown memory safety bugs - leading to 76 CVEs and 112 RustSec advisories being filed, which represent 51.6% of memory safety bugs reported to RustSec since 2016. The new bugs RUDRA found are non-trivial, subtle, and often made by Rust experts: two in the Rust standard library, one in the official futures library, and one in the Rust compiler. RUDRA is open-source, and part of its algorithm is integrated into the official Rust linter.
@inproceedings{bae:rudra, title = {{Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale}}, author = {Yechan Bae and Youngsuk Kim and Ammar Askar and Jungwon Lim and Taesoo Kim}, booktitle = {Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2021, address = {Virtual}, }
Memory-unsafe languages are widely used to implement critical systems like kernels and browsers, leading to thousands of memory safety issues every year. A use-after-free bug is a temporal memory error where the program accidentally visits a freed memory location. Recent studies show that useafter-free is one of the most exploited memory vulnerabilities. Unfortunately, previous efforts to mitigate use-after-free bugs are not widely deployed in real-world programs due to either inadequate accuracy or high performance overhead.
In this paper, we propose to resurrect the idea of one-time allocation (OTA) and provide a practical implementation with efficient execution and moderate memory overhead. With onetime allocation, the memory manager always returns a distinct memory address for each request. Since memory locations are not reused, attackers cannot reclaim freed objects, and thus cannot exploit use-after-free bugs. We utilize two techniques to render OTA practical: batch page management and the fusion of bump-pointer and fixed-size bins memory allocation styles. Batch page management helps reduce the number of system calls which negatively impact performance, while blending the two allocation methods mitigates the memory overhead and fragmentation issues. We implemented a prototype, called FFmalloc, to demonstrate our techniques. We evaluated FFmalloc on widely used benchmarks and real-world large programs. FFmalloc successfully blocked all tested useafter-free attacks while introducing moderate overhead. The results show that OTA can be a strong and practical solution to thwart use-after-free threats.
@inproceedings{wickman:ffmalloc, title = {{Preventing Use-After-Free Attacks with Fast Forward Allocation}}, author = {Brian Wickman and Hong Hu and Insu Yun and Daehee Jang and JungWon Lim and Sanidhya Kashyap and Taesoo Kim}, booktitle = {Proceedings of the 30th USENIX Security Symposium (Security)}, month = aug, year = 2021, address = {Vancouver, B.C., Canada}, }
Ethereum is the second-largest blockchain platform next to Bitcoin. In the Ethereum network, decentralized Ethereum clients reach consensus through transitioning to the same blockchain states according to the Ethereum specification. Consensus bugs are bugs that make Ethereum clients transition to incorrect blockchain states and fail to reach consensus with other clients. Consensus bugs are extremely rare but can be exploited for network split and theft, which cause reliability and security-critical issues in the Ethereum ecosystem.
We describe fluffy, a multi-transaction differential fuzzer for finding consensus bugs in Ethereum. First, fluffy mutates and executes multi-transaction test cases to find consensus bugs which cannot be found using existing fuzzers for Ethereum. Second, fluffy uses multiple existing Ethereum clients that independently implement the specification as cross-referencing oracles. Compared to a state-of-the-art fuzzer, fluffy improves the fuzzing throughput by 510X and the code coverage by 2.7X with various optimizations: in-process fuzzing, fuzzing harnesses for Ethereum clients, and semantic-aware mutation that reduces erroneous test cases.
fluffy found two new consensus bugs in the most popular Geth Ethereum client which were exploitable on the live Ethereum mainnet. Four months after we reported the bugs to Geth developers, one of the bugs was triggered on the mainnet, and caused nodes using a stale version of Geth to hard fork the Ethereum blockchain. The blockchain community considers this hard fork the greatest challenge since the infamous 2016 DAO hack. We have open sourced fluffy at https://github.com/snuspl/fluffy to contribute to the security of Ethereum.
@inproceedings{yang:fluffy, title = {{Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing}}, author = {Youngseok Yang and Taesoo Kim and Byung-Gon Chun}, booktitle = {Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}, month = jul, year = 2021, address = {(virtual)}, }
Kernel synchronization primitives are of paramount importance to achieving good performance and scalability for applications. However, they are usually invisible and out of the reach of application developers. Instead, kernel developers and synchronization experts make all the decisions regarding kernel lock design.
In this paper, we propose contextual concurrency control (C3), a new paradigm that enables userspace applications to tune concurrency control in the kernel. C3 allows developers to change the behavior and parameters of kernel locks, to switch between different lock implementations and to dynamically profile one or multiple locks for a specific scenario of interest.
To showcase this idea, we designed and implemented CONCORD, a framework that allows a privileged userspace process to modify kernel locks on the fly without re-compiling the existing code base. We performed a preliminary evaluation on two locks showing that CONCORD allows userspace tuning of kernel locks without incurring significant overhead.
@inproceedings{park:c3, title = {{Contextual Concurrency Control}}, author = {Sujin Park and Irina Calciu and Taesoo Kim and Sanidhya Kashyap}, booktitle = {Proceedings of the 18th Workshop on Hot Topics in Operating Systems (HotOS XVIII)}, month = may, year = 2021, address = {(virtual)}, }
API-based systems, a large group of security-critical software programs including web browser and OS kernels, accept program inputs being composed of API calls. Considering the scale and complexity of an API-based system, fuzzing proves to be the most effective approach for bug detection in practice. To effectively discover new bugs in an API-based system nowadays, a fuzzer needs to generate syntactically and semantically correct API calls, which are not declined at an early stage.
Grammar-based API fuzzers generate random API calls in various syntaxes described by context-free grammars. Nevertheless, context-free grammars are unable to deliver certain API semantics in an API program, especially how an API call interacts with the objects in the program. Therefore, the random API calls generated by such fuzzers largely have reference errors, type errors or state errors.
To effectively fuzz an API-based system, we present a context-aware fuzzing approach, which relies on RPG IR to generate random API calls. RPG IR is a formal and contextual representation that defines an object-based context for an API program and models not only the syntax but also the context-based semantics of every API call in the program. Hence, the generated API calls in RPG IR have reduced semantic errors and are more likely to trigger bugs in an API-based system. To evaluate the effectiveness of RPG IR in API fuzzing, we present FreeDom and Janus, two IR-based context-aware fuzzers targeting web browsers and file systems, respectively. In particular, FreeDom has revealed 24 previously unknown bugs in Apple Safari, Mozilla Firefox, and Google Chrome, 10 of which are assigned with CVEs. Meanwhile, FreeDom largely outperforms the grammar-based DOM fuzzer, Domato, with 3× more unique crashes. On the other hand, Janus visits at most 4.19× more code paths compared to the state-of- the-art system call fuzzer, Syzkaller, by generating context-aware file operations. More importantly, Janus has found 90 bugs in eight Linux file systems with 32 CVEs assigned.
We further present RPG (Random Program Generator), a more generic approach to conduct context-aware API fuzzing via RPG IR against different API-based systems. In particular, RPG accepts API description in ASL (API Specification Language), a formal language for developers to describe APIs that can be modeled by RPG IR. RPG manages to compile ASL files into a context-aware API fuzzer based on RPG IR specifically targeting the described APIs. We implement a prototype of RPG, which is evaluated by fuzzing WebKit with the ASL files that describe DOM and SVG specifications. As a domain-agnostic approach, RPG manages to discover a similar number of code blocks and unique crashes compared to FreeDom.
@phdthesis{xuwen:thesis, title = {{An IR-based Fuzzing Approach for Finding Context-aware Bugs in API-based Systems}}, author = {Wen Xu}, school = {{Georgia Institute of Technology}}, year = 2021, month = may, }
With a booming number of applications and end-users in the past decade, software security has been emphasized more than ever. Nonetheless, a consistent increase of security-critical bugs has been observed along the way, mainly due to the variety and complexity of existing software pieces. To mitigate the situation, software hardening in the daily development cycle typically involves three phases, including bug finding, runtime security enforcement, and fault analyses in case the prior steps have failed. Among the various software hardening techniques proposed, a considerable number of works have relied on available hardware support to achieve their goals. The reasons behind the noticeable trend are three-folded. First, the performance benefit from hardware can be substantial compared to a purely software-based solution. Second, compatibility and ease of use are also keys for more solutions to adopt hardware features besides the performance gain. Last, implementation with hardware support can consequentially present a smaller codebase, thus introducing less attack surface for attackers.
In this dissertation, I present three hardware-assisted solutions for performant software hardening. The first one is PITTYPAT, a runtime enforcement for path-sensitive control-flow integrity. By utilizing Intel PT, it computes branch targets with points-to analyses in an efficient and precise manner. The second one is SNAP, a customized hardware platform that implements hardware primitives to enhance the performance of coverage-guided fuzzing. Given the program states originated from the existing CPU pipeline, our prototype on the FPGA platform enables a transparent support of fuzzing with near-zero tracing overhead. Finally, I will present a nested virtualization framework for fuzzing non-user applications, such as hypervisors. With a snapshot mechanism supported by the x86 virtualization extension and a customized kernel for fuzzing execution, our system demonstrates a 72x improvement on the fuzzing throughput compared to the prior solutions, and finds 14 zero-day bugs among the real-world hypervisors.
@phdthesis{ding:thesis, title = {{Performant Software Hardening under Hardware Support}}, author = {Ren Ding}, school = {{Georgia Institute of Technology}}, year = 2021, month = may, }
Fuzzing is an emerging technique to automatically validate programs and uncover bugs. It has been widely used to test many programs and has found thousands of security vulnerabilities. However, existing fuzzing efforts are mainly centered around Unix-like systems, as Windows imposes unique challenges for fuzzing: a closed-source ecosystem, the heavy use of graphical interfaces and the lack of fast process cloning machinery.
In this paper, we propose two solutions to address the challenges Windows fuzzing faces. Our system, WINNIE, first tries to synthesize a harness for the application, a simple program that directly invokes target functions, based on sample executions. It then tests the harness, instead of the original complicated program, using an efficient implementation of fork on Windows. Using these techniques, WINNIE can bypass irrelevant GUI code to test logic deep within the application. We used WINNIE to fuzz 59 closed-source Windows programs, and it successfully generated valid fuzzing harnesses for all of them. In our evaluation, WINNIE can support 2.2x more programs than existing Windows fuzzers could, and identified 3.9× more program states and achieved 26.6x faster execution. In total, WINNIE found 61 unique bugs in 32 Windows applications.
@inproceedings{jung:winnie, title = {{WINNIE: Fuzzing Windows Applications with Harness Synthesis and Fast Cloning}}, author = {Jinho Jung and Stephen Tong and Hong Hu and Jungwon Lim and Yonghwi Jin and Taesoo Kim}, booktitle = {Proceedings of the 2021 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2021, address = {Virtual}, }
A function recognition problem serves as a basis for further binary analysis and many applications. Although common challenges for function detection are well known, prior works have repeatedly claimed a noticeable result with a high precision and recall. In this paper, we aim to fill the void of what has been overlooked or misinterpreted by closely looking into the previous datasets, metrics, and evaluations with varying case studies. Our major findings are that i)~a common corpus like GNU utilities is insufficient to represent the effectiveness of function identification, ii)~it is difficult to claim, at least in the current form, that an ML-oriented approach is scientifically superior to deterministic ones like IDA or Ghidra, iii)~the current metrics may not be reasonable enough to measure \updated{varying} function detection cases, \updated{and} iv)~the capability of recognizing functions depends on each tool's strategic or peculiar choice. We perform re-evaluation of existing approaches on our own dataset, demonstrating that not a single state-of-the-art tool dominates all the others. In conclusion, a function detection problem has not yet been fully addressed, and %our community first has to seek %a better metric for fair comparison we need a better methodology and metric %for fair comparison. to make advances in the field of function identification.
@inproceedings{koo:func-identification, title = {{A Look Back on a Function Identification Problem}}, author = {Hyungjoon Koo and Soyeon Park and Taesoo Kim}, booktitle = {Proceedings of the Annual Computer Security Applications Conference (ACSAC)}, year = 2021, }
DRAMs in modern computers or hand-held devices store private or often security-sensitive data. Unfortunately, one known attack vector, called a cold boot attack, remains threatening and easy-to-exploit, especially when attackers have physical access to the device. It exploits the fundamental property of current DRAMs: remanence effects that retain the stored contents for a certain period of time even after powering off. To magnify the remanence effect, cold boot attacks typically freeze the victim DRAM, thereby providing a chance to detach, move, and reattach it to an attacker's computer. Once power is on, attackers can steal all the security-critical information from the victim's DRAM, such as a master decryption key for an encrypted disk storage. Two types of defenses were proposed in the past: 1) CPU-bound cryptography, where keys are stored in CPU registers and caches instead of in DRAMs, and 2) full or partial memory encryption, where sensitive data are stored encrypted. However, both methods impose non-negligible performance or energy overheads to the running systems, and worse, significantly increase the hardware and software manufacturing costs. We found that these proposed solutions attempted to address the cold boot attacks passively: either by avoiding or by indirectly addressing the root cause of the problem, the remanence effect. In this paper, we propose and evaluate a proactive defense mechanism, Amnesiac DRAM, that comprehensively prevents the cold boot attacks. The key idea is to discard the contents in the DRAM when attackers attempt to retrieve (i.e., power on) them from the stolen DRAM. When Amnesiac DRAM senses a physical separation, it locks itself and deletes all the remaining contents, making it amnesiac. The Amnesiac DRAM causes neither performance nor energy overhead in ordinary operations (e.g., load and store) and can be easily implemented with negligible area overhead in commodity DRAM architectures.
@article{seol:amnesiac, title = {{Amnesiac DRAM: A Proactive Defense Mechanism Against Cold Boot Attacks}}, author = {Hoseok Seol and Minhye Kim and Taesoo Kim and Yongdae Kim and Lee-Sup Kim and Lee-Sup Kim}, journal = {IEEE Transactions on Computers (ToC)}, volume = 70, number = 4, year = 2021, }
2020
Recently, hybrid fuzzing, which combines fuzzing and concolic execution, has been highlighted to overcome limitations of both techniques. Despite its success in contrived programs such as DARPA Cyber Grand Challenge (CGC), it still falls short in finding bugs in real-world software due to its low performance of existing concolic executors.
To address this issue, this dissertation suggests and demonstrates concolic execution tailored for hybrid fuzzing with two systems: QSYM and Hybridra. First, we present QSYM, a binary-only concolic executor tailored for hybrid fuzzing. It significantly improves the performance of conventional concolic executors by removing redundant symbolic emulations for a binary. Moreover, to efficiently produce test cases for fuzzing, even sacrificing its soundness, QSYM introduces two key techniques: optimistic solving and basic block pruning. As a result, QSYM outperforms state-of-the-art fuzzers, and, more importantly, it found 13 new bugs in eight real-world programs, including file, ffmpeg, and OpenJPEG.
Enhancing the key idea of QSYM, we discuss Hybridra, a new concolic executor for file systems. To apply hybrid fuzzing for file systems, which are gigantic and convoluted, Hybridra employs compilation-based concolic execution to boost concolic execution leveraging the existing of source code. Moreover, Hybridra introduces a new technique called staged reduction, which combines existing heuristics to efficiently generate test cases for file systems. Consequently, Hybridra outperforms a state-of-the-art file system fuzzer, Hydra, by achieving higher code coverage, and successfully discovered four new bugs in btrfs, which has been heavily tested by other fuzzers.
@phdthesis{yun:thesis, title = {{Concolic Execution Tailored for Hybrid Fuzzing}}, author = {Insu Yun}, school = {{Georgia Institute of Technology}}, year = 2020, month = dec, }
The DOM engine of a web browser is a popular attack surface and has been thoroughly fuzzed during its development. A common approach adopted by the latest DOM fuzzers is to generate new inputs based on context-free grammars. However, such a generative approach fails to capture the data dependencies in the inputs of a DOM engine, namely, HTML documents. Meanwhile, it is unclear whether or not coverage-guided mutation, which is well-known to be effective in fuzzing numerous software, still remains to be effective against DOM engines. Worse yet, existing DOM fuzzers cannot adopt a coverage-guided approach because they are unable to fully support HTML mutation and suffer from low browser throughput.
To scientifically understand the effectiveness and limitations of the two approaches, we propose FreeDom, a full-fledged cluster-friendly DOM fuzzer that works with both generative and coverage-guided modes. FreeDom relies on a context-aware intermediate representation to describe HTML documents with proper data dependencies. FreeDom also exhibits up to 3.74x higher throughput through browser self-termination. FreeDom has found 24 previously unknown bugs in commodity browsers including Safari, Firefox, and Chrome, and 10 CVEs has been assigned so far. With the context-aware generation, FreeDom finds 3x more unique crashes in WebKit than the state-of-the-art DOM fuzzer, Domato. FreeDom guided by coverage is more effective in revealing new code blocks (2.62%) and finds three complex bugs that its generative approach fails to find. However, coverage-guided mutation that bootstraps with an empty corpus triggers 3.8x fewer unique crashes than the generative approach. The newly revealed coverage, more often than not, negatively affects the effectiveness of DOM fuzzers in bug finding. Therefore, we consider context-aware generation the best practice to find more DOM engine bugs and expect further improvement on coverage-guided DOM fuzzing facilitated by FreeDom.
@inproceedings{xu:freedom, title = {{FREEDOM: Engineering a State-of-the-Art DOM Fuzzer}}, author = {Wen Xu and Soyeon Park and Taesoo Kim}, booktitle = {Proceedings of the 27th ACM Conference on Computer and Communications Security (CCS)}, month = nov, year = 2020, address = {Orlando, FL}, }
Today, a web browser plays a crucial role in offering a broad spectrum of web experiences. The most popular browser, Chromium, has become an extremely complex application to meet ever-increasing user demands, exposing unavoidably large attack vectors due to its large code base. Code debloating attracts attention as a means of reducing such a potential attack surface by eliminating unused code. However, it is very challenging to perform sophisticated code removal without breaking needed functionalities because Chromium operates on a large number of closely connected and complex components, such as a renderer and JavaScript engine. In this paper, we present Slimium, a debloating framework for a browser (i.e., Chromium) that harnesses a hybrid approach for a fast and reliable binary instrumentation. The main idea behind Slimium is to determine a set of features as a debloating unit on top of a hybrid (i.e., static, dynamic, heuristic) analysis, and then leverage feature subsetting to code debloating. It aids in i) focusing on securityoriented features, ii) discarding unneeded code simply without complications, and iii) reasonably addressing a non-deterministic path problem raised from code complexity. To this end, we generate a feature-code map with a relation vector technique and prompt webpage profiling results. Our experimental results demonstrate the practicality and feasibility of Slimium for 40 popular websites, as on average it removes 94 CVEs (61.4%) by cutting down 23.85 MB code (53.1%) from defined features (21.7% of the whole) in Chromium.
@inproceedings{qian:slimium, title = {{Slimium: Debloating the Chromium Browser with Feature Subsetting}}, author = {Chenxiong Qian and Hyungjoon Koo and Changseok Oh and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 27th ACM Conference on Computer and Communications Security (CCS)}, month = nov, year = 2020, address = {Orlando, FL}, }
Exploitation techniques to abuse metadata of heap allocators have been widely studied because of their generality (i.e., application independence) and powerfulness (i.e., bypassing modern mitigation). However, such techniques are commonly considered arts, and thus the ways to discover them remain ad-hoc, manual, and allocator-specific.
In this paper, we present an automatic tool, ArcHeap, to systematically discover the unexplored heap exploitation primitives, regardless of their underlying implementations. The key idea of ArcHeap is to let the computer autonomously explore the spaces, similar in concept to fuzzing, by specifying a set of common designs of modern heap allocators and root causes of vulnerabilities as models, and by providing heap operations and attack capabilities as actions. During the exploration, ArcHeap checks whether the combinations of these actions can be potentially used to construct exploitation primitives, such as arbitrary write or overlapped chunks. As a proof, ArcHeap generates working PoC that demonstrates the discovered exploitation technique.
We evaluated ArcHeap with ptmalloc2 and 10 other allocators, and discovered five previously unknown exploitation techniques in ptmalloc2 as well as several techniques against seven out of 10 allocators including the security-focused allocator, DieHarder. To show the effectiveness of ArcHeap's approach in other domains, we also studied how security features and exploit primitives evolve across different versions of ptmalloc2.
@inproceedings{yun:archeap, title = {{Automatic Techniques to Systematically Discover New Heap Exploitation Primitives}}, author = {Insu Yun and Dhaval Kapil and Taesoo Kim}, booktitle = {Proceedings of the 29th USENIX Security Symposium (Security)}, month = aug, year = 2020, address = {Boston, MA}, }
Compromising a kernel through a browser is the ultimate goal for offensive security researchers. Because of continuous efforts to eliminate vulnerabilities and introduce various mitigations, a remote kernel exploit from a browser becomes extremely difficult, seemingly impossible.
In this talk, we will share our Safari exploit submitted to Pwn2Own 2020. Combining SIX different vulnerabilities, our exploit successfully compromises the macOS kernel starting from the Safari browser. It breaks every mitigation in macOS including ASLR, DEP, sandbox, and even System Integrity Protection (SIP). Inspecting every vulnerability used in this exploit, we will show not only state-of-the-art hacking techniques but also challenges in protecting complicated systems (i.e., browsers and operating systems) and in introducing their mitigations. Moreover, we will introduce a new technique that reliably exploits a TOCTOU vulnerability in macOS.
@inproceedings{jin:pwn2own2020-safari, title = {{Compromising the macOS kernel through Safari by chaining six vulnerabilities}}, author = {Yonghwi Jin and Jungwon Lim and Insu Yun and Taesoo Kim}, booktitle = {Black Hat USA Briefings (Black Hat USA)}, month = aug, year = 2020, address = {Las Vegas, NV}, }
The practical art of constructing database management systems (DBMSs) involves a morass of trade-offs among query execution speed, query optimization speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that DBMSs, like all complex software systems, contain bugs that can adversely affect their performance. The performance of DBMSs is an important metric as it determines how quickly an application can take in new information and use it to make new decisions.
Both developers and users face challenges while dealing with performance regression bugs. First, developers usually find it challenging to manually design test cases to uncover performance regressions since DBMS components tend to have complex interactions. Second, users encountering performance regressions are often unable to report them, as the regression-triggering queries could be complex and database-dependent. Third, developers have to expend a lot of effort on localizing the root cause of the reported bugs, due to the system complexity and software development complexity.
Given these challenges, this paper presents the design of APOLLO, a toolchain for automatically detecting, reporting, and diagnosing performance regressions in DBMSs. We demonstrate that APOLLO automates the generation of regression-triggering queries, simplifies the bug reporting process for users, and enables developers to quickly pinpoint the root cause of performance regressions. By automating the detection and diagnosis of performance regressions, APOLLO reduces the labor cost of developing efficient DBMSs.
@inproceedings{jung:apollo, title = {{APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems}}, author = {Jinho Jung and Hong Hu and Joy Arulraj and Taesoo Kim and Woonhak Kang}, booktitle = {Proceedings of the 46th International Conference on Very Large Data Bases (VLDB)}, month = aug, year = 2020, address = {Tokyo, Japan}, }
The scale and pervasiveness of concurrent software pose challenges for security researchers: race conditions are more prevalent than ever, and the growing software complexity keeps exacerbating the situation -- expanding the arms race between security practitioners and attackers beyond memory errors. As a consequence, we need a new generation of bug hunting tools that not only scale well with increasingly larger codebases but also catch up with the growing importance of race conditions.
In this thesis, two complementary race detection frameworks for OS kernels are presented: multi-dimensional fuzz testing and symbolic checking. Fuzz testing turns bug finding into a probabilistic search, but current practices restrict themselves to one dimension only (sequential executions). This thesis illustrates how to explore the concurrency dimension and extend the bug scope beyond memory errors to the broad spectrum of concurrency bugs. On the other hand, conventional symbolic executors face challenges when applied to OS kernels, such as path explosions due to branching and loops. They also lack a systematic way of modeling and tracking constraints in the concurrency dimension (e.g., to enforce a particular schedule for thread interleavings) The gap can be partially filled with novel techniques for symbolic execution in this thesis.
@phdthesis{xu:thesis, title = {{Finding Race Conditions in Kernels: the Symbolic Way and the Fuzzy Way}}, author = {Meng Xu}, school = {{Georgia Institute of Technology}}, year = 2020, month = jul, }
We propose ecoTLB—software-based eventual translation lookaside buffer (TLB) coherence—that eliminates the overhead of the synchronous TLB shootdown mechanism in operating systems that use address space identifiers (ASIDs). With an eventual TLB coherence, ecoTLB improves the performance of free and page swap operations, by removing the inter-processor interrupt (IPI) overheads incurred to invalidate TLB entries. We show that the TLB shootdown has implications for page swapping in particular in emerging, disaggregated data centers and demonstrate that ecoTLB can improve both the performance and the specific swapping policy decisions using ecoTLB’s asynchronous mechanism. We demonstrate that ecoTLB improves the performance of real-world applications, such as Memcached and Make, that perform page swapping using Infiniswap, a solution for next generation data centers that use disaggregated memory, by up to 17.2%. Moreover, ecoTLB improves the 99th percentile tail latency of Memcached by up to 70.8% due to its asynchronous scheme and improved policy decisions. Furthermore, we show that recent features to improve security in the linux kernel, like kernel page table isolation (KPTI), can result in significant performance overheads on architectures without support for specific instructions to clear single entries in tagged TLBs, falling back to full TLB flushes. In this scenario, ecoTLB is able to recover the performance lost for supporting KPTI due to its asynchronous shootdown scheme and its support for tagged TLBs. Finally, we demonstrate that ecoTLB improves the performance of free operations by up to 59.1% on a 120-core machine and improves the performance of Apache on a 16-core machine by up to 13.7% compared to baseline Linux, and by up to 48.2% compared to ABIS, a recent state-of-the-art research prototype that reduces the number of IPIs.
@article{maass:ecotlb, title = {{EcoTLB: Eventually Consistent TLBs}}, author = {Steffen Maass and Mohan Kumar and Taesoo Kim and Tushar Krishna and Abhishek Bhattacharjee}, journal = {ACM Transactions on Architecture and Code Optimization (TACO '20)}, month = jul, year = 2020, }
Over the past decade, multicore machines have become the norm. A single machine is capable of having thousands of hardware threads or cores. Even cloud providers offer such large multicore machines for data processing engines and databases. Thus, a fundamental question arises is how efficient are existing synchronization primitives ---timestamping and locking---that developers use for designing concurrent, scalable, and performant applications. This dissertation focuses on understanding the scalability aspect of these primitives, and presents new algorithms and approaches, that either leverage the hardware or the application domain knowledge, to scale up to hundreds of cores.
First, the thesis presents Ordo, a scalable ordering or timestamping primitive, that forms the basis of designing scalable timestamp-based concurrency control mechanisms. Ordo relies on invariant hardware clocks and provides a notion of a globally synchronized clock within a machine. We use the Ordo primitive to redesign a synchronization mechanism and concurrency control mechanisms in databases and software transactional memory.
Later, this thesis focuses on the scalability aspect of locks in both virtualized and non-virtualized scenarios. In a virtualized environment, we identify that these locks suffer from various preemption issues due to a semantic gap between the hypervisor scheduler and a virtual machine scheduler---the double scheduling problem. We address this problem by bridging this gap, in which both the hypervisor and virtual machines share minimal scheduling information to avoid the preemption problems.
Finally, we focus on the design of lock algorithms in general. We find that locks in practice have discrepancies from locks in design. For example, popular spinlocks suffer from excessive cache-line bouncing in multicore (NUMA) systems, while state-of-the-art locks exhibit sub-par single-thread performance. We classify several dominating factors that impact the performance of lock algorithms. We then propose a new technique, shuffling, that can dynamically accommodate all these factors, without slowing down the critical path of the lock. The key idea of shuffling is to re-order the queue of threads waiting to acquire the lock with some pre-established policy. Using shuffling, we propose a family of locking algorithms, called SHFLLOCKS that respect all factors, efficiently utilize waiters, and achieve the best performance.
@phdthesis{kashyap:thesis, title = {{Scaling Synchronization Primitives}}, author = {Sanidhya Kashyap}, school = {{Georgia Institute of Technology}}, year = 2020, month = jun, }
File systems are too large to be bug free. Although handwritten test suites have been widely used to stress file systems, they can hardly keep up with the rapid increase in file system size and complexity, leading to new bugs being introduced. These bugs come in various flavors: buffer overflows to complicated semantic bugs. Although bug-specific checkers exist, they generally lack a way to explore file system states thoroughly. More importantly, no turnkey solution exists that unifies the checking effort of various aspects of a file system under one umbrella.
In this article, to highlight the potential of applying fuzzing to find any type of file system bugs in a generic way, we propose Hydra, an extensible fuzzing framework. Hydra provides building blocks for file system fuzzing, including input mutators, feedback engines, test executors, and bug post-processors. As a result, developers only need to focus on building the core logic for finding bugs of their interests. We showcase the effectiveness of Hydra with four checkers that hunt crash inconsistency, POSIX violations, logic assertion failures, and memory errors. So far, Hydra has discovered 157 new bugs in Linux file systems, including three in verified file systems (FSCQ and Yxv6).
@article{kim:hydra-tos, title = {{Finding Bugs in File Systems with an Extensible Fuzzing Framework}}, author = {Seulbae Kim and Meng Xu and Sanidhya Kashyap and Jungyeon Yoon and Wen Xu and Taesoo Kim}, journal = {ACM Transactions on Storage (ToS)}, volume = 16, number = 10, month = may, year = 2020, }
Fuzzing is a practical, widely-deployed technique to find bugs in complex, real-world programs like JavaScript engines. We observed, however, that existing fuzzing approaches, either generative or mutational, fall short in fully harvesting high-quality input corpora such as known proof of concept (PoC) exploits or unit tests. Existing fuzzers tend to destruct subtle semantics or conditions encoded in the input corpus in order to generate new test cases because this approach helps in discovering new code paths of the program. Nevertheless, for JavaScript-like complex programs, such a conventional design leads to test cases that tackle only shallow parts of the complex codebase and fails to reach deep bugs effectively due to the huge input space.
In this paper, we advocate a new technique, called an aspect-preserving mutation, that stochastically preserves the desirable properties, called aspects, that we prefer to be maintained across mutation. We demonstrate the aspect preservation with two mutation strategies, namely, structure and type preservation, in our fully-fledged JavaScript fuzzer, called DIE. Our evaluation shows that DIE's aspect-preserving mutation is more effective in discovering new bugs (5.7 x more unique crashes) and producing valid test cases (2.4 x fewer runtime errors) than the state-of-the-art JavaScript fuzzers. DIE newly discovered 48 high-impact bugs in ChakraCore, JavaScriptCore, and V8 (38 fixed with 12 CVEs assigned as of today). The source code of DIE is publicly available as an open-source project.
@inproceedings{park:die, title = {{Fuzzing JavaScript Engines with Aspect-preserving Mutation}}, author = {Soyeon Park and Wen Xu and Insu Yun and Daehee Jang and Taesoo Kim}, booktitle = {Proceedings of the 41st IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2020, address = {San Francisco, CA}, }
Data races occur when two threads fail to use proper synchronization when accessing shared data. In kernel file systems, which are highly concurrent by design, data races are common mistakes and often wreak havoc on the users, causing inconsistent states or data losses. Prior fuzzing practices on file systems have been effective in uncovering hundreds of bugs, but they mostly focus on the sequential aspect of file system execution and do not comprehensively explore the concurrency dimension and hence, forgo the opportunity to catch data races.
In this paper, we bring coverage-guided fuzzing to the concurrency dimension with three new constructs: 1) a new coverage tracking metric, alias coverage, specially designed to capture the exploration progress in the concurrency dimension; 2) an evolution algorithm for generating, mutating, and merging multi-threaded syscall sequences as inputs for concurrency fuzzing; and 3) a comprehensive lockset and happens-before modeling for kernel synchronization primitives for precise data race detection. These components are integrated into KRACE, an end-to-end fuzzing framework that has discovered 23 data races in ext4, btrfs, and the VFS layer so far, and 9 are confirmed to be harmful.
@inproceedings{xu:krace, title = {{Krace: Data Race Fuzzing for Kernel File Systems}}, author = {Meng Xu and Sanidhya Kashyap and Hanqing Zhao and Taesoo Kim}, booktitle = {Proceedings of the 41st IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2020, address = {San Francisco, CA}, }
Software vendors collect crash reports from end-users to assist in the debugging and testing of their products. However, crash reports may contain users' private information, like names and passwords, rendering the user hesitant to share the reports with developers. We need a mechanism to protect users' privacy in crash reports on the client side while keeping sufficient information to support server-side debugging and analysis.
In this paper, we propose the DESENSITIZATION technique, which generates privacy-aware and attack-preserving crash reports from crashed executions. Our tool adopts lightweight methods to identify bug-related and attack-related data from the memory, and removes other data to protect users' privacy. Since a large portion of the desensitized memory contains null bytes, we store crash reports in spare files to save the network bandwidth and the server-side storage. We prototype DESENSITIZATION and apply it to a large number of crashes of real-world programs, like browsers and the JavaScript engine. The result shows that our DESENSITIZATION technique can eliminate 80.9% of non-zero bytes from coredumps, and 49.0% from minidumps. The desensitized crash report can be 50.5% smaller than the original one, which significantly saves resources for report submission and storage. Our DESENSITIZATION technique is a push-button solution for the privacy-aware crash report.
@inproceedings{ding:desensitization, title = {{DESENSITIZATION: Privacy-Aware and Attack-Preserving Crash Report}}, author = {Ren Ding and Hong Hu and Wen Xu and Taesoo Kim}, booktitle = {Proceedings of the 2020 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2020, address = {San Diego, CA}, }
2019
System software commonly uses indirect calls to realize dynamic program behaviors. However, indirect-calls also bring challenges to constructing a precise control-flow graph that is a standard prerequisite for many static program-analysis and system-hardening techniques. Unfortunately, identifying indirect-call targets is a hard problem. In particular, modern compilers do not recognize indirect-call targets by default. Existing approaches identify indirect-call targets based on type analysis that matches the types of function pointers and the ones of address-taken functions. Such approaches, however, suffer from a high false-positive rate as many irrelevant functions may share the same types.
In this paper, we propose a new approach, namely Multi-Layer Type Analysis (MLTA), to effectively refine indirect-call targets for C/C++ programs. MLTA relies on an observation that function pointers are commonly stored into objects whose types have a multi-layer type hierarchy; before indirect calls, function pointers will be loaded from objects with the same type hierarchy "layer by layer". By matching the multi-layer types of function pointers and functions, MLTA can dramatically refine indirect-call targets. MLTA is effective because multi-layer types are more restrictive than single-layer types. It does not introduce false negatives by conservatively tracking targets propagation between multi-layer types, and the layered design allows MLTA to safely fall back whenever the analysis for a layer becomes infeasible. We have implemented MLTA in a system, namely TypeDive, based on LLVM and extensively evaluated it with the Linux kernel, the FreeBSD kernel, and the Firefox browser. Evaluation results show that TypeDive can eliminate 86% to 98% more indirect-call targets than existing approaches do, without introducing new false negatives. We also demonstrate that TypeDive not only improves the scalability of static analysis but also benefits semantic-bug detection. With TypeDive, we have found 35 new deep semantic bugs in the Linux kernel.
@inproceedings{lu:typedive, title = {{Where Does It Go? Refining Indirect-Call Targets with Multi-Layer Type Analysis}}, author = {Kangjie Lu and Hong Hu}, booktitle = {Proceedings of the 26th ACM Conference on Computer and Communications Security (CCS)}, month = nov, year = 2019, address = {London, UK}, }
The hardware security module (HSM) has been used as a root of trust for various key management services. At the same time, rapid innovation in emerging industries, such as container-based microservices, accelerates demands for scaling security services. However, current on-premises HSMs have limitations to afford such demands due to the restricted scalability and high price of deployment. This paper presents ScaleTrust, a framework for scaling security services by utilizing HSMs with SGX-based key management service (KMS) in a collaborative, yet secure manner. Based on a hierarchical model, we design a cryptographic workload distribution between HSMs and KMS enclaves to achieve both the elasticity of cloud software and the hardware-based security of HSM appliances. We demonstrate practical implications of ScaleTrust using two case studies that require secure cryptographic operations with low latency and high scalability.
@inproceedings{han:sgxhsm, title = {{Toward Scaling Hardware Security Module for Emerging Cloud Services}}, author = {Juhyeng Han and Seongmin Kim and Taesoo Kim and Dongsu Han}, booktitle = {Proceedings of the 4th Workshop on System Software for Trusted Execution (SysTEX)}, month = oct, year = 2019, address = {Ontario, Canada}, }
File systems are too large to be bug free. Although handwritten test suites have been widely used to stress file systems, they can hardly keep up with the rapid increase in file system size and complexity, leading to new bugs being introduced and reported regularly. These bugs come in various flavors: simple buffer overflows to sophisticated semantic bugs. Although bug-specific checkers exist, they generally lack a way to explore file system states thoroughly. More importantly, no turnkey solution exists that unifies the checking effort of various aspects of a file system under one umbrella.
In this paper, we highlight the potential of applying fuzzing to find not just memory errors but, in theory, any type of file system bugs with an extensible fuzzing framework: Hydra. Hydra provides building blocks for file system fuzzing, including input mutators, feedback engines, a libOS-based executor, and a bug reproducer with test case minimization. As a result, developers only need to focus on building the core logic for finding bugs of their own interests. We showcase the effectiveness of Hydra with four checkers that hunt crash inconsistency, POSIX violations, logic assertion failures, and memory errors. So far, Hydra has discovered 91 new bugs in Linux file systems, including one in a verified file system (FSCQ), as well as four POSIX violations.
@inproceedings{kim:hydra, title = {{Finding Semantic Bugs in File Systems with an Extensible Fuzzing Framework}}, author = {Seulbae Kim and Meng Xu and Sanidhya Kashyap and Jungyeon Yoon and Wen Xu and Taesoo Kim}, booktitle = {Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2019, address = {Ontario, Canada}, }
Locks are an essential building block for high-performance multicore system software. To meet performance goals, lock algorithms have evolved towards specialized solutions for architectural characteristics (e.g., NUMA). However, in practice, applications run on different server platforms and exhibit widely diverse behaviors that evolve with time (e.g., number of threads, number of locks). This creates performance and scalability problems for locks optimized for a single scenario and platform. For example, popular spinlocks suffer from excessive cache-line bouncing in NUMA systems, while scalable, NUMA-aware locks exhibit sub-par single-thread performance.
In this paper, we identify four dominating factors that impact the performance of lock algorithms. We then propose a new technique, shuffling, that can dynamically accommodate all these factors, without slowing down the critical path of the lock. The key idea of shuffling is to re-order the queue of threads waiting to acquire the lock in accordance with some pre-established policy. For best performance, this work is done off the critical path, by the waiter threads. Using shuffling, we demonstrate how to achieve NUMA-awareness and implement an efficient parking/wake-up strategy, without any auxiliary data structure, mostly off the critical path. The evaluation shows that our family of locks based on shuffling improves the throughput of real-world applications up to 12.5x, with impressive memory footprint reduction compared with the recent lock algorithms.
@inproceedings{kashyap:shfllock, title = {{Scalable and Practical Locking With Shuffling}}, author = {Sanidhya Kashyap and Irina Calciu and Xiaohe Cheng and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2019, address = {Ontario, Canada}, }
We present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be translated with small changes to crash-consistency when the same index is used in PM. We present a set of conditions that enable the identification of this class of DRAM indexes, and the actions to be taken to convert each index to be persistent. Based on these conditions and conversion actions, we modify five different DRAM indexes based on B+ trees, tries, radix trees, and hash tables to their crash-consistent PM counterparts. The effort involved in this conversion is minimal, requiring 30--200 lines of code. We evaluated the converted PM indexes on Intel DC Persistent Memory, and found that they outperform state-of-the-art, hand-crafted PM indexes in multi-threaded workloads by up-to 5.2x. For example, we built P-CLHT, our PM implementation of the CLHT hash table by modifying only 30 LOC. When running YCSB workloads, P-CLHT performs up to 2.4x better than Cacheline-Conscious Extendible Hashing (CCEH), the state-of-the-art PM hash table.
@inproceedings{lee:recipe, title = {{RECIPE: Converting Concurrent DRAM Indexes to Persistent-Memory Indexes}}, author = {Se Kwon Lee and Jayashree Mohan and Sanidhya Kashyap and Taesoo Kim and Vijay Chidambaram}, booktitle = {Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2019, address = {Ontario, Canada}, }
We present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory-mapping the underlying file, and serving the read and overwrites using processor loads and stores. Metadata operations are handled by the kernel PM file system (ext4 DAX). SplitFS introduces a new primitive termed relink to efficiently support file appends and atomic data operations. SplitFS provides three consistency modes, which different applications can choose from, without interfering with each other. SplitFS reduces software overhead by up-to 4x compared to the NOVA PM file system, and 17x compared to ext4 DAX. On a number of micro-benchmarks and applications such as the LevelDB key-value store running the YCSB benchmark, SplitFS increases application performance by up to 2x compared to ext4 DAX and NOVA while providing similar consistency guarantees.
@inproceedings{kadekodi:splitfs, title = {{SplitFS: Reducing Software Overhead in File Systems for Persistent Memory}}, author = {Rohan Kadekodi and Se Kwon Lee and Sanidhya Kashyap and Taesoo Kim and Aasheesh Kolli and Vijay Chidambaram}, booktitle = {Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2019, address = {Ontario, Canada}, }
@inproceedings{das:mlsploit-kdd, title = {{MLsploit: A Framework for Interactive Experimentation with Adversarial Machine Learning Research}}, author = {Nilaksh Das and Siwei Li and Chanil Jeon and Jinho Jung and Shang-Tse Chen and Carter Yagemann and Evan Downing and Haekyu Park and Evan Yang and Li Chen and Michael Kounavis and Ravi Sahita and David Durham and Scott Buck and Duen Horng Chau and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 1st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) Project Showcase}, month = aug, year = 2019, address = {Ankorage, AK}, }
VMware ESXi is an enterprise-class, bare-metal hypervisor dedicated to providing the state-of-the-art private-cloud infrastructures. Accordingly, the design and implementation of ESXi is of our community’s interest, yet lacking a thorough evaluation of its security internals. In this paper, we give a comprehensive analysis of the guest-to-host attack surfaces of ESXi and its recent security mitigation (i.e., the vSphere sandbox). In particular, we introduce an effective and reliable approach to chain multiple vulnerabilities for exploitation, and demonstrate our approach by leveraging two new bugs (i.e., uninitialized stack usages), namely, CVE-2018-6981 and CVE-2018-6982. Our exploit-chain is the first, public demonstration of a virtual machine escape against VMware ESXi.
@inproceedings{zhao:esxi, title = {{Breaking Turtles All the Way Down: An Exploitation Chain to Break out of VMware ESXi}}, author = {Hanqing Zhao and Yanyu Zhang and Kun Yang and Taesoo Kim}, booktitle = {Proceedings of the 13th USENIX Workshop on Offensive Technologies (WOOT)}, month = aug, year = 2019, address = {Santa Clara, CA}, }
A new breed of low-latency I/O devices, such as the emerging remote memory access and the high-speed Ethernet NICs, are becoming ubiquitous in current data centers.For example, big data center operators such as Amazon, Facebook, Google, and Microsoft are already migrating their networks to 100G. However, the overhead incurred by the system software, such as protocol stack and synchronous operations, is dominant with these faster I/O devices. This dissertation approaches the above problem by redesigning a protocol stack to provide an interface for the latency-sensitive operation, and redesigning synchronous operation such as TLB shootdown and consensus in the operating systems and distributed systems respectively.
First, the dissertation presents an extensible protocol stack, Xps to address the software overhead incurred in protocol stacks such as TCP and UDP. Xps provides the abstractions to allow an application-defined, latency-sensitive operation to run immediately after the protocol processing (called the fast path) in various protocol stacks: in a commodity OS protocol stack (Linux), a user space protocol stack (mTCP), as well as recent smartNICs. For all other operations, Xps retains the popular, well-understood socket interface. Xps' approach is practical: rather than proposing a new OS or removing the socket interface completely, our goal is to provide stack extensions for latency-sensitive operations and use the existing socket layer for all other operations.
Second, the dissertation provides a lazy, asynchronous mechanism to address the system software overhead incurred due to a synchronous operationTLB shootdown. The key idea of the lazy shootdown mechanism, called LATR, is to use lazy memory reclamation and lazy page table unmap to perform an asynchronous TLB shootdown. By handling TLB shootdowns in a lazy fashion, LATR can eliminate the performance overheads associated with IPI mechanisms as well as the waiting time for acknowledgments from remote cores. By proposing an asynchronous mechanism, LATR provides an eventually consistent solution.
Finally, the dissertation untangles the logically coupled consensus mechanism from the application which alleviates the overhead incurred by consensus algorithms such as Multi-Paxos/Viewstamp Replication(VR). By physical isolation, DYAD eliminates the consensus component from competing for system resources with the application which improves the application performance. To provide physical isolation, DYAD defines the abstraction needed from the SmartNIC and the operations performed on the application running on the host processor. With the resulting consensus mechanism, the host processor handles only the client requests on the host processor in the normal case and the disappropriate messages needed for consensus is handled on the SmartNIC.
@phdthesis{kumar:thesis, title = {{Taming Latency in Data Center Applications}}, author = {Mohan Kumar Kumar}, school = {{Georgia Institute of Technology}}, year = 2019, month = aug, }
Large-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks. However, while it may be possible to apply the tools used at this scale to smaller, more common problems as well, this dissertation presents approaches to large-scale data processing on only a single machine. This approach has obvious cost benefits and lowers the barrier of entrance to large-scale data processing. This dissertation approaches this problem by redesigning applications to enable trillion-scale graph processing on a single machine while also enabling the processing of evolving, billion-scale graphs.
First, this dissertation presents a new out-of-core graph processing engine, called Mosaic, for executing graph algorithms on trillion-scale datasets on a single machine. Mosaic makes use of many-core processors and fast I/O devices coupled with a novel graph encoding scheme to allow processing of graphs of up to one trillion edges on a single machine. Mosaic also employs a locality-preserving, space-filling curve to allow for high compression and high locality when storing graphs and executing algorithms. Our evaluation shows that for smaller graphs, Mosaic consistently outperforms other state-of-the-art out-of-core engines by 3.2x–58.6x and shows comparable performance to distributed graph engines. Furthermore, Mosaic can complete one iteration of the Pagerank algorithm on a trillion-edge graph in 21 minutes, outperforming a distributed disk-based engine by 9.2x.
Second, while Mosaic addresses the setting of processing static graph, this dissertation presents Cytom, a new engine for processing billion-scale evolving graphs based on insights about achieving high compression and locality while improving load-balancing when processing a graph that changes rapidly. Cytom introduces a novel programming model that takes advantage of its subgraph-centric approach coupled with the setting of evolving graphs. This is an important enabling step for emerging workloads when processing graphs that change over time. Cytom’s programming model allows algorithm developers to quickly react to graph updates, discarding uninteresting ones while focusing on updates that, in fact, change the algorithmic result. We show that Cytom is effective in scaling to billion-edge graphs, as well as providing higher throughput when updating the graph structure (2.0x–192x) and higher throughput (1.5x–200x) when additionally processing an algorithm.
@phdthesis{maass:thesis, title = {{Systems Abstractions for Big Data Processing on a Single Machine}}, author = {Steffen Maass}, school = {{Georgia Institute of Technology}}, year = 2019, month = aug, }
Commodity software typically includes a large number of functionalities for a broad user population. However, each individual user usually only needs a small subset of all supported functionalities. The bloated code not only hinders optimal execution, but also leads to a larger attack surface. Recent works have explored program debloating as an emerging solution to this problem. Unfortunately, these works require program source code, limiting their real-world deployability.
In this paper, we propose a practical debloating framework, RAZOR, that performs code reduction for deployed binaries. Based on users’ specifications, our tool customizes the binary to generate a functional program with minimal code size. Instead of only supporting given test cases, RAZOR takes several control-flow heuristics to infer complementary code that is necessary to support user-expected functionalities. We evaluated RAZOR on commonly used benchmarks and real-world applications, including the web browser FireFox and the close-sourced PDF reader FoxitReader. The result shows that RAZOR is able to reduce over 70% of the code from the bloated binary. It produces functional programs and does not introduce any security issues. RAZOR is thus a practical framework for debloating real-world programs.
@inproceedings{qian:razor, title = {{RAZOR: A Framework for Post-deployment Software Debloating}}, author = {Chenxiong Qian and Hong Hu and Mansour A Alharthi and Pak Ho Chung and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 28th USENIX Security Symposium (Security)}, month = aug, year = 2019, address = {Santa Clara, CA}, }
Fuzzing is a software testing technique that quickly and automatically explores the input space of a program without knowing its internals. Therefore, developers commonly use fuzzing as part of test integration throughout the software development process. Unfortunately, it also means that such a blackbox and the automatic natures of fuzzing are appealing to adversaries who are looking for zero-day vulnerabilities.
To solve this problem, we propose a new mitigation approach, called Fuzzification , that helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques. Given a performance budget, this approach aims to hinder the fuzzing process from adversaries as much as possible. We propose three Fuzzification techniques: 1) SpeedBump, which amplifies the slowdown in normal executions by hundreds of times to the fuzzed execution, 2) BranchTrap, interfering with feedback logic by hiding paths and polluting coverage maps, and 3) AntiHybrid, hindering taint-analysis and symbolic execution. Each technique is designed with best-effort, defensive measures that attempt to hinder adversaries from bypassing Fuzzification .
Our evaluation on popular fuzzers and real-world applications shows that Fuzzification effectively reduces the number of discovered paths by 70.3% and decreases the number of identified crashes by 93.0% from real-world binaries, and decreases the number of detected bugs by 67.5% from LAVA-M dataset while under user-specified overheads for common workloads. We discuss the robustness of Fuzzification techniques against adversarial analysis techniques. We open-source our Fuzzification system to foster future research.
@inproceedings{jung:fuzzification, title = {{Fuzzification: Anti-Fuzzing Techniques}}, author = {Jinho Jung and Hong Hu and David Solodukhin and Daniel Pagan and Kyu Hyung Lee and Taesoo Kim}, booktitle = {Proceedings of the 28th USENIX Security Symposium (Security)}, month = aug, year = 2019, address = {Santa Clara, CA}, }
Intel Memory Protection Keys (MPK) is a new hardware prim- itive to support thread-local permission control on groups of pages without requiring modification of page tables. Unfortu- nately, its current hardware implementation and software sup- port suffer from security, scalability, and semantic problems: (1) vulnerable to protection-key-use-after-free; (2) providing the limited number of protection keys; and (3) incompatible with mprotect()’s process-based permission model. In this paper, we propose libmpk, a software abstraction for MPK. It virtualizes the hardware protection keys to eliminate the protection-key-use-after-free problem while providing accesses to an unlimited number of virtualized keys. To sup- port legacy applications, it also provides a lazy inter-thread key synchronization. To enhance the security of MPK itself, libmpk restricts unauthorized writes to its metadata. We apply libmpk to three real-world applications: OpenSSL, JavaScript JIT compiler, and Memcached for memory protection and isolation. Our evaluation shows that it introduces negligi- ble performance overhead (<1%) compared with the origi- nal, unprotected versions and improves performance by 8.1× compared with the secure equivalents using mprotect(). The source code of libmpk is publicly available and maintained as an open source project.
@inproceedings{park:libmpk, title = {{libmpk: Software Abstraction for Intel Memory Protection Keys (Intel MPK)}}, author = {Soyeon Park and Sangho Lee and Wen Xu and Hyungon Moon and Taesoo Kim}, booktitle = {Proceedings of the 2019 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2019, address = {Renton, WA}, }
The Internet of Things (IoT) is rapidly emerging as one of the dominant computing paradigms of this decade. Applications range from in-home entertainment to large-scale industrial deployments such as controlling assembly lines and monitoring traffic. While IoT devices are in many respects similar to traditional computers, user expectations and deployment scenarios as well as cost and hardware constraints are sufficiently different to create new security challenges as well as new opportunities. This is especially true for large-scale IoT deployments in which a central entity deploys and controls a large number of IoT devices with minimal human interaction.
Like traditional computers, IoT devices are subject to attack and compromise. Large IoT deployments consisting of many nearly identical devices are especially attractive targets. At the same time, recovery from root compromise by conventional means becomes costly and slow, even more so if the devices are dispersed over a large geographical area. In the worst case, technicians have to travel to all devices and manually recover them. Data center solutions such as the Intelligent Platform Management Interface (IPMI) which rely on separate service processors and network connections are not only not supported by existing IoT hardware, but are unlikely to be in the foreseeable future due to the cost constraints of mainstream IoT devices.
This paper presents sys, a system that can recover IoT devices within a short amount of time, even if attackers have taken root control of every device in a large deployment. The recovery requires minimal manual intervention. After the administrator has identified the compromise and produced an updated firmware image, he/she can instruct sys to force the devices to reset and to install the patched firmware on the devices. We demonstrate the universality and practicality of sys by implementing it on three popular IoT platforms (HummingBoard Edge, Raspberry Pi Compute Module 3 and Nucleo-L476RG) spanning the range from high to low end. Our evaluation shows that the performance overhead of sys is generally negligible.
@inproceedings{xu:cider, title = {{Dominance as a New Trusted Computing Primitive for the Internet of Things}}, author = {Meng Xu and Manuel Huber and Zhichuang Sun and Paul England and Marcus Peinado and Sangho Lee and Andrey Marochko and Dennis Mattoon and Rob Spiger and Stefan Thom}, booktitle = {Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2019, address = {San Francisco, CA}, }
File systems, a basic building block of an OS, are too big and too complex to be bug free. Nevertheless, file systems rely on regular stress-testing tools and formal checkers to find bugs, which are limited due to the ever-increasing complexity of both file systems and OSes. Thus, fuzzing, proven to be an effective and a practical approach, becomes a preferable choice, as it does not need much knowledge about a target. However, three main challenges exist in fuzzing file systems: mutating a large image blob that degrades overall performance, generating image-dependent file operations, and reproducing found bugs, which is difficult for existing OS fuzzers.
Hence, we present JANUS, the first feedback-driven fuzzer that explores the two-dimensional input space of a file system, i.e., mutating metadata on a large image, while emitting image- directed file operations. In addition, JANUS relies on a library OS rather than on traditional VMs for fuzzing, which enables JANUS to load a fresh copy of the OS, thereby leading to better reproducibility of bugs. We evaluate JANUS on eight file systems and found 90 bugs in the upstream Linux kernel, 62 of which have been acknowledged. Forty-three bugs have been fixed with 32 CVEs assigned. In addition, JANUS achieves higher code coverage on all the file systems after fuzzing 12 hours, when compared with the state-of-the-art fuzzer Syzkaller for fuzzing file systems. JANUS visits 4.19× and 2.01× more code paths in Btrfs and ext4, respectively. Moreover, JANUS is able to reproduce 88–100% of the crashes, while Syzkaller fails on all of them.
@inproceedings{xu:janus, title = {{Fuzzing File Systems via Two-Dimensional Input Space Exploration}}, author = {Wen Xu and Hyungon Moon and Sanidhya Kashyap and Po-Ning Tseng and Taesoo Kim}, booktitle = {Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2019, address = {San Francisco, CA}, }
This paper presents multi-version read-log-update (MVRLU), an extension of the read-log-update (RLU) synchronization mechanism. While RLU has many merits including an intuitive programming model and excellent performance for read-mostly workloads, we observed that the performance of RLU significantly drops in workloads with more write operations. The core problem is that RLU manages only two versions. % and its log reclamation is synchronous. To overcome such limitation, we extend RLU to support multi-versioning and propose new techniques to make multi-versioning efficient. At the core of MVRLU design is concurrent autonomous garbage collection, which prevents reclaiming invisible versions being a bottleneck, and reduces the version traversal overhead - the main overhead of multi-version design. We extensively evaluate MVRLU with the state-of-the-art synchronization mechanisms, including RCU, RLU, software transactional memory (STM), and lock-free approaches, on concurrent data structures and real-world applications (database concurrency control and in-memory key-value store). Our evaluation shows that MVRLU significantly outperforms other techniques for a wide range of workloads with varying contention levels and data-set size.
@inproceedings{kim:mvrlu, title = {{MV-RLU: Scaling Read-Log-Update with Multi-Versioning}}, author = {Jaeho Kim and Ajit Mathew and Sanidhya Kashyap and Madhava Krishnan Ramanathan and Changwoo Min}, booktitle = {Proceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, month = apr, year = 2019, address = {Providence, RI}, }
@inproceedings{das:mlsploit, title = {{MLsploit: A Cloud-Based Framework for Adversarial Machine Learning Research}}, author = {Nilaksh Das and Siwei Li and Chanil Jeon and Jinho Jung and Shang-Tse Chen and Carter Yagemann and Evan Downing and Haekyu Park and Evan Yang and Li Chen and Michael Kounavis and Ravi Sahita and David Durham and Scott Buck and Polo Chau and Taesoo Kim and Wenke Lee}, booktitle = {Black Hat Asia Briefings (Black Hat Asia Arsenal)}, month = mar, year = 2019, address = {Singapore}, }
2018
With the wide deployment of security mechanisms such as Address Space Layout Randomization (ASLR), memory disclosures have become a prerequisite for critical memory-corruption attacks (e.g., code-reuse attack) — adversaries are forced to exploit memory disclosures to circumvent ASLR as the first step. As a result, the security threats of memory disclosures are now significantly aggravated — they break not only data confidentiality but also the effectiveness of security mechanisms.
In this paper, we propose a general detection methodology and develop a system to stop memory disclosures. We observe that memory disclosures are not root causes but rather consequences of a variety of hard-to-detect program errors such as memory corruption and uninitialized read. We thus propose a replicated execution–based methodology to generally detect memory disclosures, regardless of their causes. We realize this methodology with BUDDY: By seamlessly maintaining two identical running instances of a target program and diversifying only its target data, BUDDY can accurately detects memory disclosures of the data, as doing so will result in the two instances outputting different values. Extensive evaluation results show that BUDDY is reliable and efficient while stopping real memory disclosures such as the Heartbleed leak.
@article{lu:buddy, title = {{Stopping Memory Disclosures via Diversification and Replicated Execution}}, author = {Kangjie Lu and Meng Xu and Chengyu Song and Taesoo Kim and Wenke Lee}, journal = {IEEE Transactions on Dependable and Secure Computing (TDSC)}, volume = {preprint}, month = oct, year = 2018, }
With Tor being a popular anonymity network, many attacks have been proposed to break its anonymity or leak information of a private communication on Tor. However, guaranteeing complete privacy in the face of an adversary on Tor is especially difficult, because Tor relays are under complete control of world-wide volunteers. Currently, one can gain private information, such as circuit identifiers and hidden service identifiers, by running Tor relays and can even modify their behaviors with malicious intent. This paper presents a practical approach to effectively enhancing the security and privacy of Tor by utilizing Intel SGX, a commodity trusted execution environment. We present a design and implementation of Tor, called SGX-Tor, that prevents code modification and limits the information exposed to untrusted parties. We demonstrate that our approach is practical and effectively reduces the power of an adversary to a traditional network-level adversary. Finally, SGX-Tor incurs moderate performance overhead; the end-to-end latency and throughput overheads for HTTP connections are 3.9% and 11.9%, respectively.
@article{kim:sgx-tor2, title = {{SGX-Tor: A Secure and Practical Tor Anonymity Network With SGX Enclaves}}, author = {Seongmin Kim and Juhyeng Han and Jaehyeong Ha and Taesoo Kim and Dongsu Han}, journal = {IEEE/ACM Transactions on Networking (ToN)}, volume = 26, number = 5, pages = {2174--2187}, month = oct, year = 2018, }
The goal of control-flow integrity (CFI) is to stop control-hijacking attacks by ensuring that each indirect control-flow transfer (ICT) jumps to its legitimate target. However, existing implementations of CFI have fallen short of this goal because their approaches are inaccurate and as a result, the set of allowable targets for an ICT instruction is too large, making illegal jumps possible.
In this paper, we propose the Unique Code Target (UCT) property for CFI. Namely, for each invocation of an ICT instruction, there should be one and only one valid target. We develop a prototype called uCFI to enforce this new property. During compilation, uCFI identifies the sensitive instructions that influence ICT and instruments the program to record necessary execution context. At runtime, uCFI monitors the program execution in a different process, and performs points-to analysis by interpreting sensitive instructions using the recorded execution context in a memory safe manner. It checks runtime ICT targets against the analysis results to detect CFI violations. We apply uCFI to SPEC benchmarks and 2 servers (nginx and vsftpd) to evaluate its efficacy of enforcing UCT and its overhead. We also test uCFI against control-hijacking attacks, including 5 real-world exploits, 1 proof of concept COOP attack, and 2 synthesized attacks that bypass existing defenses. The results show that uCFI strictly enforces the UCT property for protected programs, successfully detects all attacks, and introduces less than 10% performance overhead.
@inproceedings{hu:ucfi, title = {{Enforcing Unique Code Target Property for Control-Flow Integrity}}, author = {Hong Hu and Chenxiong Qian and Carter Yagemann and Simon Pak Ho Chung and William R. Harris and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2018, address = {Toronto, ON, Canada}, }
Debugging software failures in deployed systems is important because they impact real users and customers. However, debugging such failures is notoriously hard in practice because developers have to rely on limited information such as memory dumps. The execution history is usually unavailable because high-fidelity program tracing is not affordable in deployed systems.
In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed systems. REPT reconstructs the execution history with high fidelity by combining online lightweight hardware tracing of a program's control flow with offline binary analysis that recovers its data flow. It is seemingly impossible to recover data values thousands of instructions before the failure due to information loss and concurrent execution. REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction.
We design and implement REPT, deploy it on Microsoft Windows, and integrate it into Windows Debugger. We evaluate REPT on 16 real-world bugs and show that it can recover data values accurately (92% on average) and efficiently (less than 20 seconds) for these bugs. We also show that it enables effective reverse debugging for 14 bugs.
@inproceedings{cui:rept, title = {{REPT: Reverse Debugging of Failures in Deployed Software}}, author = {Weidong Cui and Xinyang Ge and Baris Kasikci and Ben Niu and Upamanyu Sharma and Ruoyu Wang and Insu Yun}, booktitle = {Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}, month = oct, year = 2018, address = {Carlsbad, CA}, }
Recently, hybrid fuzzing has been proposed to address the limitations of fuzzing and concolic execution by combining both approaches. The hybrid approach has shown its effectiveness in various synthetic benchmarks such as DARPA Cyber Grand Challenge (CGC) binaries, but it still suffers from scaling to find bugs in complex, real-world software. We observed that the performance bottleneck of the existing concolic executor is the main limiting factor for its adoption beyond a small-scale study.
To overcome this problem, we design a fast concolic execution engine, called QSYM, to support hybrid fuzzing. The key idea is to tightly integrate the symbolic emulation with the native execution using dynamic binary translation, making it possible to implement more fine-grained, so faster, instruction-level symbolic emulation. Additionally, QSYM loosens the strict soundness requirements of conventional concolic executors for better performance, yet takes advantage of a faster fuzzer for validation, providing unprecedented opportunities for performance optimizations, e.g., optimistically solving constraints and pruning uninteresting basic blocks.
Our evaluation shows that QSYM does not just outperform state-of-the-art fuzzers (i.e., found 14× more bugs than VUzzer in the LAVA-M dataset, and outperformed Driller in 104 binaries out of 126), but also found 13 previously unknown security bugs in eight real-world programs like Dropbox Lepton, ffmpeg, and OpenJPEG, which have already been intensively tested by the state-of-the-art fuzzers, AFL and OSS-Fuzz.
@inproceedings{yun:qsym, title = {{QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing}}, author = {Insu Yun and Sangho Lee and Meng Xu and Yeongjin Jang and Taesoo Kim}, booktitle = {Proceedings of the 27th USENIX Security Symposium (Security)}, month = aug, year = 2018, address = {Baltimore, MD}, }
Investigating attacks across multiple hosts is challenging. The true dependencies between security-sensitive files, network endpoints, or memory objects from different hosts can be easily concealed by dependency explosion or undefined program behavior (e.g., memory corruption). Dynamic information flow tracking (DIFT) is a potential solution to this problem, but, existing DIFT techniques only track information flow within a single host and lack an efficient mechanism to maintain and synchronize the data flow tags globally across multiple hosts.
In this paper, we propose RTAG, an efficient data flow tagging and tracking mechanism that enables practical cross-host attack investigations. RTAG is based on three novel techniques. First, by using a record-and-replay technique, it decouples the dependencies between different data flow tags from the analysis, enabling lazy synchronization between independent and parallel DIFT instances of different hosts. Second, it takes advantage of system-call-level provenance information to calculate and allocate the optimal tag map in terms of memory consumption. Third, it embeds tag information into network packets to track cross-host data flows with less than 0.05% network bandwidth overhead. Evaluation results show that RTAG is able to recover the true data flows of realistic cross-host attack scenarios. Performance wise, RTAG reduces the memory consumption of DIFT-based analysis by up to 90% and decreases the overall analysis time by 60%-90% compared with previous investigation systems.
@inproceedings{ji:rtag, title = {{Enabling Refinable Cross-host Attack Investigation with Efficient Data Flow Tagging and Tracking}}, author = {Yang Ji and Sangho Lee and Mattia Fazzini and Joey Allen and Evan Downing and Taesoo Kim and Alessandro Orso and Wenke Lee}, booktitle = {Proceedings of the 27th USENIX Security Symposium (Security)}, month = aug, year = 2018, address = {Baltimore, MD}, }
Linux kernel contains a large number of features that not all systems need, while Linux distributors enable as many features as possible to make their distributions generic, leading to severe bloating problem. Intuitively, we can use the existing configuration system to remove unnecessary features. However, it is unclear whether this system is adequate for kernel debloating. In this study, we perform analysis to understand how much security benefit a user can obtain if she performs the kernel debloating through the compile-time configuration. Our study shows that existing configuration system provides a convenient and effective vector to reduce the attack surface while producing a functional kernel. With such result, we plan to spend more effort to build a secure kernel through the compile-time debloating.
@inproceedings{alharthi:debloat-study, title = {{On the Effectiveness of Kernel Debloating via Compile-time Configuration}}, author = {Mansour Alharthi and Hong Hu and Hyungon Moon and Taesoo Kim}, booktitle = {Proceedings of the 1st Workshop on SoftwAre debLoating And Delayering}, month = jul, year = 2018, address = {Amsterdam, Netherlands}, }
Multi-core virtual machines (VMs) are now a norm in data center environments. However, one of the well-known problems that VMs suffer from is the vCPU scheduling problem that causes poor scalability behaviors. More specifically, the symptoms of this problem appear as preemption problems in both under- and over-committed scenarios. Although prior research efforts attempted to alleviate these symptoms separately, they fail to address the common root cause of these problems: the missing semantic gap that occurs when a guest OS is preempted while executing its own critical section, thereby leading to degradation of application scalability.
In this work, we strive to address all preemption problems together by bridging the semantic gap between guest OSes and the hypervisor: the hypervisor now knows whether guest OSes are running in critical sections and a guest OS has hypervisor's scheduling context. We annotate all critical sections by using the lightweight para-virtualized APIs, so we called enlightened critical sections ($e$CS), that provide scheduling hints to both the hypervisor and VMs. The hypervisor uses the hint to reschedule a vCPU to fundamentally overcome the double scheduling problem for these annotated critical sections and VMs use the hypervisor provided hints to further mitigate the blocked-waiter wake-up problem. Our evaluation results show that $e$CS guarantees the forward progress of a guest OS by 1) decreasing preemption counts by 85--100% while 2) improving the throughput of applications up to 2.5X in an over-committed scenario and 1.6X in an under-committed scenario for various real-world workloads on an 80-core machine.
@inproceedings{kashyap:ecs, title = {{Scaling Guest OS Critical Sections with eCS}}, author = {Sanidhya Kashyap and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the 2018 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2018, address = {Boston, MA}, }
During system call execution, it is common for operating system kernels to read userspace memory multiple times (multi-reads). A critical bug may exist if the fetched userspace memory is subject to change across these reads, i.e., a race condition, which is known as a double-fetch bug. Prior works have attempted to detect these bugs both statically and dynamically. However, due to their improper assumptions and imprecise definitions regarding double-fetch bugs, their multi-read detection is inherently limited and suffers from significant false positives and false negatives. For example, their approach is unable to support device emulation, inter-procedural analysis, loop handling, etc. More importantly, they completely leave the task of finding real double-fetch bugs from the haystack of multi-reads to manual verification, which is expensive if possible at all.
In this paper, we first present a formal and precise definition of double-fetch bugs and then implement a static analysis system - Deadline - to automatically detect double-fetch bugs in OS kernels. Deadline uses static program analysis techniques to systematically find multi-reads throughout the kernel and employs specialized symbolic checking to vet each multi-read for double-fetch bugs. We apply Deadline to Linux and FreeBSD kernels and find 23 new bugs in Linux and one new bug in FreeBSD. We further propose four generic strategies to patch and prevent double-fetch bugs based on our study and the discussion with kernel maintainers.
@inproceedings{xu:deadline, title = {{Precise and Scalable Detection of Double-Fetch Bugs in OS Kernels}}, author = {Meng Xu and Chenxiong Qian and Kangjie Lu and Michael Backes and Taesoo Kim}, booktitle = {Proceedings of the 39th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2018, address = {San Francisco, CA}, }
We propose Solros - a new operating system architecture for heterogeneous systems that comprises fast host processors, slow but massively parallel co-processors, and fast I/O devices. A general consensus to fully drive such a hardware system is to have a tight integration among processors and I/O devices. Thus, in the Solros architecture, a co-processor OS (data-plane OS) delegates its services, specifically I/O stacks, to the host OS (control-plane OS). Our observation for such a design is that global coordination with system-wide knowledge (e.g., PCIe topology, a load of each co-processor) and the best use of heterogeneous processors is critical to achieving high performance. Hence, we fully harness these specialized processors by delegating complex I/O stacks on fast host processors, which leads to an efficient global coordination at the level of the control-plane OS.
We developed Solros with Xeon Phi co-processors and implemented three core OS services: transport, file system, and network services. Our experimental results show significant performance improvement compared with the stock Xeon Phi running the Linux kernel. For example, Solros improves the throughput of file system and network operations by 19x and 7x, respectively. Moreover, it improves the performance of two realistic applications: 19x for text indexing and 2x for image search.
@inproceedings{min:solros, title = {{SOLROS: A Data-Centric Operating System Architecture for Heterogeneous Computing}}, author = {Changwoo Min and Woon-Hak Kang and Mohan Kumar and Sanidhya Kashyap and Steffen Maass and Heeseung Jo and Taesoo Kim}, booktitle = {Proceedings of the 13th European Conference on Computer Systems (EuroSys)}, month = apr, year = 2018, address = {Porto, Portugal}, }
Timestamping is an essential building block for designing concurrency control mechanisms and concurrent data structures. Various algorithms either employ physical timestamping, assuming that they have access to synchronized clocks, or maintain a logical clock with the help of atomic instructions. Unfortunately, these approaches have two problems. First, hardware developers do not guarantee that the available hardware clocks are exactly synchronized, which they find difficult to achieve in practice. Second, the atomic instructions are a deterrent to scalability resulting from cache-line contention. This paper addresses these problems by proposing and designing a scalable ordering primitive, called Ordo, that relies on invariant hardware clocks. Ordo not only enables the correct use of these clocks, by providing a notion of a global hardware clock, but also frees various logical timestamp-based algorithms from the burden of the software logical clock, while trying to simplify their design. We use the Ordo primitive to redesign 1) a concurrent data structure library that we apply on the Linux kernel; 2) a synchronization mechanism for concurrent programming; 3) two database concurrency control mechanisms; and 4) a clock-based software transactional memory algorithm. Our evaluation shows that there is a possibility that the clocks are not synchronized on two architectures ( Intel and ARM) and that Ordo generally improves the efficiency of several algorithms by 1.2-39.7x on various architectures.
@inproceedings{kashyap:ordo, title = {{A Scalable Ordering Primitive For Multicore Machines}}, author = {Sanidhya Kashyap and Changwoo Min and Kangnyeon Kim and Taesoo Kim}, booktitle = {Proceedings of the 13th European Conference on Computer Systems (EuroSys)}, month = apr, year = 2018, address = {Porto, Portugal}, }
Large-scale graphs and their analytics are common in many applications such as social networks, data mining, and machine learning. Many approaches to operate on static graphs of sizes in the trillions of edges have been proposed and adopted. However, real-world graphs are evolving in nature and are constantly changing in many application domains. We observe that current systems to process evolving graphs show three problems: they impose a storage overhead when adding new edges and vertices, they involve a synchronous compaction step to keep their internal data structures optimized and they exhibit poor cache locality.
We propose Kaleidoscope - a graph processing engine for evolving graphs - that aims at tackling these three problems by the use of a localized graph data structure, that is helpful in reducing both the storage overhead as well as the overhead of the synchronous compaction step. Kaleidoscope also enables the use of a locality-optimizing space-filling curve, the Hilbert order, when traversing the subgraphs built by Kaleidoscope, to allow for better cache locality.
@inproceedings{maass:kaleidoscope, title = {{Kaleidoscope: Graph Analytics on Evolving Graphs}}, author = {Steffen Maass and Taesoo Kim}, booktitle = {12th EuroSys Doctoral Workshop (EuroDW)}, month = apr, year = 2018, address = {Porto, Portugal}, }
We propose LATR-lazy TLB coherence-a software-based TLB shootdown mechanism that can alleviate the overhead of the synchronous TLB shootdown mechanism in existing operating systems. By handling the TLB coherence in a lazy fashion, LATR can avoid expensive IPIs which are required for delivering a shootdown signal to remote cores, and the performance overhead of associated interrupt handlers. Therefore, virtual memory operations, such as free and page migration operations, can benefit significantly from LATR's mechanism. For example, LATR improves the latency of munmap by 70.8% on a 2-socket machine, a widely used configuration in modern data centers. Real-world, performance-critical applications such as web servers can also benefit from LATR: without any application-level changes, LATR improves Apache by 59.9% compared to Linux, and by 37.9% compared to ABIS, a highly optimized, state-of-the-art TLB coherence technique.
@inproceedings{kumar:latr, title = {{LATR: Lazy Translation Coherence}}, author = {Mohan Kumar and Steffen Maass and Sanidhya Kashyap and J\'{a}n Vesel\'{y} and Zi Yan and Taesoo Kim and Abhishek Bhattacharjee and Tushar Krishna}, booktitle = {Proceedings of the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, month = mar, year = 2018, address = {Williamsburg, VA}, }
Updating applications is an important mechanism to enhance their availability, functionality, and security. However, without careful considerations, application updates can bring other security problems. In this paper, we consider a novel attack that exploits application updates on Android: a cross-update privacy-leak attack called COUPLE. The COUPLE attack allows an application to secretly leak sensitive data through the cross-update interaction between its old and new versions; each version only has permissions and logic for either data collection or transmission to evade detection. We implement a runtime security system, BREAKUP, that prevents cross-update sensitive data transactions by tracking permission-use histories of individual applications. Evaluation results show that BREAKUP’s time overhead is below 5%. We further show the feasibility of the COUPLE attack by analyzing the versions of 2,009 applications (28,682 APKs).
@article{cho:breakup, title = {{Prevention of Cross-update Privacy Leaks on Android}}, author = {Beumjin Cho and Sangho Lee and Meng Xu and Sangwoo Ji and Taesoo Kim and Jong Kim}, journal = {Computer Science and Information Systems}, volume = 15, number = 1, pages = {111--137}, month = jan, year = 2018, }
2017
Intel Software Guard Extensions (SGX) provides a strongly isolated memory space, known as an enclave, for a user process, ensuring confidentiality and integrity against software and hardware attacks. Even the operating system and hypervisor cannot access the enclave because of the hardware-level isolation. Further, hardware attacks are neither able to disclose plaintext data from the enclave because its memory is always encrypted nor modify it because its integrity is always verified using an integrity tree. When the processor detects any integrity violation, it locks itself to prevent further damages; that is, a system reboot is necessary. The processor lock seems a reasonable solution against such a powerful hardware attacker; however, if a software attacker has a way to trigger integrity violation, the lock could result in a severe denial-of-service (DoS) attack.
In this paper, we introduce the SGX-Bomb attack that launches the Rowhammer attack against enclave memory to trigger the processor lockdown. The SGX-Bomb attack is simple yet alarming. Inside an enclave, this attack first finds conflicting row addresses at the same DRAM bank, and then repeatedly accesses them while bypassing the cache. If arbitrary bit flips have occurred inside the enclave because of the Rowhammer attack, any read attempts to the enclave memory results in a failure of integrity check so that the processor will be locked, and the system should be rebooted. The SGX-Bomb attack is a serious threat especially to the public cloud providers who are supposed to run unknown enclave programs received from their clients, which might shut down their servers shared with other clients. We evaluate the effectiveness of the SGX-Bomb attack in a real environment with DDR4 DRAM; it takes 283 s to hang the entire system with the default DRAM refresh rate, 64 ms.
@inproceedings{jang:sgx-bomb, title = {{SGX-Bomb: Locking Down the Processor via Rowhammer Attack}}, author = {Yeongjin Jang and Jaehyuk Lee and Sangho Lee and Taesoo Kim}, booktitle = {Proceedings of the 2nd Workshop on System Software for Trusted Execution (SysTEX)}, month = oct, year = 2017, address = {Shanghai, China}, }
In recent years, various organizations and communities have been putting numerous computing resources on automated fuzzing, which has been proved as a highly efficient approach to find security bugs in complicated software and OS kernels. Thus the performance of a fuzzer becomes crucial, and the fuzzer that can use less running time to hit more security issues saves significant cost. Existing research focuses on producing input data that is likely to explore more states of a targeted application while ignoring the performance overhead that originates from the operating system side. In fact, the system components that generic fuzzers rely on cause serious performance bottlenecks. Especially when fuzzing on multiple cores, the scalability of the state-of-the-art fuzzer (AFL) slows down by 24x because of its over exploiting the file system as a communication channel, intensively invoking the fork() system call, and heavily interacting with the file system.
In this paper, we design and implement three new operating primitives specialized for fuzzing that solve these performance bottlenecks and achieve scalable performance on multi-core machines. Our experiment shows that the proposed primitives speed up AFL and LibFuzzer by 6.1 to 28.9x and 1.1 to 735.7x, respectively, on the overall number of executions per second when targeting Google's fuzzer test suite with 120 cores. In addition, the primitives improve AFL's throughput up to 7.7x with 30 cores, which is a more common setting in data centers. Our fuzzer-agnostic primitives can be easily applied to any fuzzer with fundamental performance improvement and directly benefit large-scale fuzzing and cloud-based fuzzing services.
@inproceedings{xu:os-fuzz, title = {{Designing New Operating Primitives to Improve Fuzzing Performance}}, author = {Wen Xu and Sanidhya Kashyap and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2017, address = {Dallas, TX}, }
As modern attacks become more stealthy and persistent, detecting or preventing them at their early stages becomes virtually impossible. Instead, an attack investigation or provenance system aims to continuously monitor and log interesting system events with minimal overhead. Later, if the system observes any anomalous behavior, it analyzes the log to identify who initiated the attack and which resources were affected by the attack and then assess and recover from any damage incurred. However, because of a fundamental tradeoff between log granularity and system performance, existing systems typically record system-call events without detailed program-level activities (e.g., memory operation) required for accurately reconstructing attack causality or demand that every monitored program be instrumented to provide program-level information.
To address this issue, we propose RAIN, a Refinable Attack INvestigation system based on a record-replay technology that records system-call events during runtime and performs instruction-level dynamic information flow tracking (DIFT) during on-demand process replay. Instead of replaying every process with DIFT, RAIN conducts system-call-level reachability analysis to filter out unrelated processes and to minimize the number of processes to be replayed, making inter-process DIFT feasible. Evaluation results show that RAIN effectively prunes out unrelated processes and determines attack causality with negligible false positive rates. In addition, the runtime overhead of RAIN is similar to existing system-call level provenance systems and its analysis overhead is much smaller than full-system DIFT.
@inproceedings{ji:rain, title = {{RAIN: Refinable Attack Investigation with On-demand Inter-Process Information Flow Tracking}}, author = {Yang Ji and Sangho Lee and Evan Downing and Weiren Wang and Mattia Fazzini and Taesoo Kim and Alessandro Orso and Wenke Lee}, booktitle = {Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2017, address = {Dallas, TX}, }
With millions of apps available to users, the mobile app market is rapidly becoming very crowded. Given the intense competition, the time to market is a critical factor for the success and profitability of an app. In order to shorten the development cycle, developers often focus their efforts on the unique features and workflows of their apps and rely on third-party Open Source Software (OSS) for the common features. Unfortunately, despite their benefits, careless use of OSS can introduce significant legal and security risks, which if ignored can not only jeopardize security and privacy of end users, but can also cause app developers high financial loss. However, tracking OSS components, their versions, and interdependencies can be very tedious and error-prone, particularly if an OSS is imported with little to no knowledge of its provenance.
We therefore propose OSSPolice, a scalable and fully-automated tool for mobile app developers to quickly analyze their apps and identify free software license violations as well as usage of known vulnerable versions of OSS. OSSPolice introduces a novel hierarchical indexing scheme to achieve both high scalability and accuracy, and is capable of efficiently comparing similarities of app binaries against a database of hundreds of thousands of OSS sources (billions of lines of code). We populated OSSPolice with 60K C/C++ and 77K Java OSS sources and analyzed 1.6M free Google Play Store apps. Our results show that 1) over 40K apps potentially violate GPL/AGPL licensing terms, and 2) over 100K of apps use known vulnerable versions of OSS. Further analysis shows that developers violate GPL/AGPL licensing terms due to lack of alternatives, and use vulnerable versions of OSS despite efforts from companies like Google to improve app security. OSSPolice is available on GitHub.
@inproceedings{duan:osspolice, title = {{Checking Open-Source License Violation and 1-day Security Risk at Large Scale}}, author = {Ruian Duan and Ashish Bijlani and Meng Xu and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2017, address = {Dallas, TX}, }
The number of cores in a system is crucial for maximizing the parallelism of a program. Processor manufacturers have increased the number of cores in a chip so that the latest manycore processor has up to 76 physical cores and 304 hardware threads. On the other hand, the revolution of OS schedulers to manage processes in systems are slow to follow up emerging manycore processors.
In this paper, we show how much CFS, the default Linux scheduler, can break the performance of parallel applications on manycore processors (e.g., Intel Xeon Phi). Then, we propose a novel scheduler named FLsched, which is designed for lockless implementation with less context switches and more efficient scheduling decision. In our evaluations on Xeon Phi, FLsched shows better performance than CFS up to 1.73x for HPC applications and 3.12x for micro-benchmarks.
@inproceedings{jo:flsched, title = {{FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi}}, author = {Heeseung Jo and Woonhak Kang and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the 8th Asia-Pacific Workshop on Systems (APSys)}, month = sep, year = 2017, address = {Mumbai, India}, }
User input plays an essential role in computer security because it can control system behavior and make security decisions in the system. System output to users, or user output, is also important because it often contains security-critical information that must be protected regarding its integrity and confidentiality, such as passwords and user’s private data. Despite the importance of user input and output (I/O), modern computer systems often fail to provide necessary security guarantees on them, which could result in serious security breaches.
This dissertation aims to build trust in the user I/O in computer systems to keep the systems secure from attacks on the user I/O. To this end, we analyze the user I/O paths on popular platforms including desktop operating systems, mobile operating systems, and trusted execution environments such as Intel SGX, and identified that threats and attacks on the user I/O can be blocked by guaranteeing three key security properties of user I/O: integrity, confidentiality, and authenticity.
First, GYRUS addresses the integrity of user input by matching the user’s original input with the content of outgoing network traffic to authorize user-intended network transactions. Second, M-AEGIS addresses the confidentiality of user I/O by implementing an encryption layer on top of user interface layer that provides user-to-user encryption. Third, the A11Y ATTACK addresses the importance of verifying user I/O authenticity by demonstrating twelve new attacks, all of which stem from missing proper security checks that verify input sources and output destinations on alternative user I/O paths in operating systems. Finally, to establish trust in the user I/O in a commodity computer system, I built a system called SGX-USB, which combines all three security properties to ensure the assurance of user I/O. SGX-USB establishes a trusted communication channel between the USB controller and an enclave instance of Intel SGX. The implemented system supports common user input devices such as a keyboard and a mouse over the trusted channel, which guarantees the assurance of user input.
Having assurance in user I/O allows the computer system to securely handle commands and data from the user by eliminating attack pathways to a system’s I/O paths.
@phdthesis{jang:thesis, title = {{Building Trust in the User I/O in Computer Systems}}, author = {Yeongjin Jang}, school = {{Georgia Institute of Technology}}, year = 2017, month = aug, }
Foundational software systems such as operating systems and web servers are implemented in unsafe programming languages for efficiency, and system designers often prioritize performance over security. Hence, these systems inherently suffer from a variety of vulnerabilities and insecure designs that have been exploited by adversaries to launch critical system attacks. Two typical goals of these attacks are to leak sensitive data and to control victim systems.
This thesis aims to defeat both data leaks and control attacks. We first identify that, in modern systems, preventing information leaks can be a general defense that not only stops data leaks but also defeats control attacks. We then investigate three ways to prevent information leaks: eliminating information-leak vulnerabilities, re-designing system mechanisms against information leaks, and protecting certain sensitive data from information leaks. We have developed multiple tools for each way. While automatically and reliably securing complex systems, all these tools impose negligible performance overhead. Our extensive evaluation results show that preventing information leaks can be a general and practical approach to securing complex software systems.
@phdthesis{lu:thesis, title = {{Securing Software Systems by Preventing Information Leaks}}, author = {Kangjie Lu}, school = {{Georgia Institute of Technology}}, year = 2017, month = aug, }
Intel has introduced a hardware-based trusted execution environment, Intel Software Guard Extensions (SGX), that provides a secure, isolated execution environment, or enclave, for a user program without trusting any underlying software (e.g., an operating system) or firmware. Researchers have demonstrated that SGX is vulnerable to a page-fault-based attack. However, the attack only reveals page-level memory accesses within an enclave.
In this paper, we explore a new, yet critical, side-channel attack, branch shadowing, that reveals fine-grained control flows (branch granularity) in an enclave. The root cause of this attack is that SGX does not clear branch history when switching from enclave to non-enclave mode, leaving fine-grained traces for the outside world to observe, which gives rise to a branch-prediction side channel. However, exploiting this channel in practice is challenging because 1) measuring branch execution time is too noisy for distinguishing fine-grained control-flow changes and 2) pausing an enclave right after it has executed the code block we target requires sophisticated control. To overcome these challenges, we develop two novel exploitation techniques: 1) a last branch record (LBR)-based history-inferring technique and 2) an advanced programmable interrupt controller (APIC)-based technique to control the execution of an enclave in a fine-grained manner. An evaluation against RSA shows that our attack infers each private key bit with 99.8% accuracy. Finally, we thoroughly study the feasibility of hardware-based solutions (i.e., branch history flushing) and propose a software-based approach that mitigates the attack.
@inproceedings{lee:sgx-branch-shadow, title = {{Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing}}, author = {Sangho Lee and Ming-Wei Shih and Prasun Gera and Taesoo Kim and Hyesoon Kim and Marcus Peinado}, booktitle = {Proceedings of the 26th USENIX Security Symposium (Security)}, month = aug, year = 2017, address = {Vancouver, Canada}, }
Intel Software Guard Extensions (SGX) is a hardware-based Trusted Execution Environment (TEE) that is widely seen as a promising solution to traditional security threats. While SGX promises strong protection to bug-free software, decades of experience show that we have to expect vulnerabilities in any non-trivial application. In a traditional environment, such vulnerabilities often allow attackers to take complete control of vulnerable systems. Efforts to evaluate the security of SGX have focused on side-channels. So far, neither a practical attack against a vulnerability in enclave code nor a proof-of-concept attack scenario has been demonstrated. Thus, a fundamental question remains: What are the consequences and dangers of having a memory corruption vulnerability in enclave code?
To answer this question, we comprehensively analyze exploitation techniques against vulnerabilities inside enclaves. We demonstrate a practical exploitation technique, called Dark-ROP, which can completely disarm the security guarantees of SGX. Dark-ROP exploits a memory corruption vulnerability in the enclave software through return-oriented programming (ROP). However Dark-ROP differs significantly from traditional ROP attacks because the target code runs under solid hardware protection. We overcome the problem of exploiting SGX-specific properties and obstacles by formulating a novel ROP attack scheme against SGX under practical assumptions. Specifically, we build several oracles that inform the attacker about the status of enclave execution. This enables him to launch the ROP attack while both code and data are hidden. In addition, we exfiltrate the enclave’s code and data into a shadow application to fully control the execution environment. This shadow application emulates the enclave under the complete control of the attacker, using the enclave (through ROP calls) only to perform SGX operations such as reading the enclave’s SGX crypto keys.
The consequences of Dark-ROP are alarming; the attacker can completely breach the enclave’s memory protections and trick the SGX hardware into disclosing the enclave’s encryption keys and producing measurement reports that defeat remote attestation. This result strongly suggests that SGX research should focus more on traditional security mitigations rather than on making enclave development more convenient by expanding the trusted computing base and the attack surface (e.g., Graphene, Haven).
@inproceedings{lee:darkrop, title = {{Hacking in Darkness: Return-oriented Programming against Secure Enclaves}}, author = {Jaehyuk Lee and Jinsoo Jang and Yeongjin Jang and Nohyun Kwak and Yeseul Choi and Changho Choi and Taesoo Kim and Marcus Peinado and Brent B. Kang}, booktitle = {Proceedings of the 26th USENIX Security Symposium (Security)}, month = aug, year = 2017, address = {Vancouver, Canada}, }
Control-Flow Integrity (CFI), as a means to prevent control-flow hijacking attacks, enforces that each instruction transfers control to an address in a set of valid targets. The security guarantee of CFI thus depends on the definition of valid targets, which conventionally are defined as the result of a static analysis. Unfortunately, previous research has demonstrated that such a definition, and thus any implementation that enforces it, still allows practical control-flow attacks.
In this work, we present a path-sensitive variation of CFI that utilizes runtime path-sensitive point-to analysis to compute the legitimate control transfer targets. We have designed and implemented a runtime environment, PITTYPAT, that enforces path-sensitive CFI efficiently by combining commodity, low-overhead hardware monitoring and a novel runtime points-to analysis. Our formal analysis and empirical evaluation demonstrate that, compared to CFI based on static analysis, PITTYPAT ensures that applications satisfy stronger security guarantees, with acceptable overhead for security-critical contexts.
@inproceedings{ding:pittypat, title = {{Efficient Protection of Path-Sensitive Control Security}}, author = {Ren Ding and Chenxiong Qian and Chengyu Song and Bill Harris and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 26th USENIX Security Symposium (Security)}, month = aug, year = 2017, address = {Vancouver, Canada}, }
Due to the continued exploitation of Adobe Reader, malicious document (maldoc) detection has become a pressing problem. Although many solutions have been proposed, recent works have highlighted some common drawbacks, such as parser-confusion and classifier-evasion attacks.
In response to this, we propose a new perspective for maldoc detection: platform diversity. In particular, we identify eight factors in OS design and implementation that could cause behavioral divergences under attack, ranging from syscall semantics (more obvious) to heap object metadata structure (more subtle) and further show how they can thwart attackers from finding bugs, exploiting bugs, or performing malicious activities.
We further prototype PlatPal to systematically harvest platform diversity. PlatPal hooks into Adobe Reader to trace internal PDF processing and also uses sandboxed execution to capture a maldoc's impact on the host system. Execution traces on different platforms are compared, and maldoc detection is based on the observation that a benign document behaves the same across platforms, while a maldoc behaves differently during exploitation. Evaluations show that PlatPal raises no false alarms in benign samples, detects a variety of behavioral discrepancies in malicious samples, and is a scalable and practical solution.
@inproceedings{xu:platpal, title = {{PlatPal: Detecting Malicious Documents with Platform Diversity}}, author = {Meng Xu and Taesoo Kim}, booktitle = {Proceedings of the 26th USENIX Security Symposium (Security)}, month = aug, year = 2017, address = {Vancouver, Canada}, }
AVPASS is a tool for leaking the detection model of Android antivirus (AV) programs, and bypassing the AV detection by using the leaked information coupled with APK perturbation techniques. AVPASS is able to infer not only the detection features, but also hierarchy of detection rule chains. With help from the leaked model and the built-in APK perturbation functions, AVPASS is able to disguise any android malware as a benign application. Furthermore, using our novel additive mode, AVPASS supports safe querying and guarantees that one can test if the application will be detected by the AV without sending the whole or core parts of application. As a result, AVPASS leaked significant detection features of commercial AVs and achieved almost zero detection from VirusTotal when tested with more than 5,000 malware.
In this talk, we present the entire pipeline of the APK perturbation process, leaking model process, and auto-bypassing process. In addition, we show findings about commercial AVs, including their detection features and hierarchy, and inform the attendees about the potential weaknesses of modern AVs.
AVPASS will be demonstrated, showing that it modifies real world malware precisely, and allows them to bypass all AVs following the leaked model. AVPASS will be released with every tool that we have built, including the original source code and the related test data, to enable researchers to replicate the research on their own.
@inproceedings{jung:avpass, title = {{AVPASS: Leaking and Bypassing Antivirus Detection Model Automatically}}, author = {Jinho Jung and Chanil Jeon and Max Wolotsky and Insu Yun and Taesoo Kim}, booktitle = {Black Hat USA Briefings (Black Hat USA)}, month = jul, year = 2017, address = {Las Vegas, NV}, }
Application scalability is a critical aspect to efficiently use NUMA machines with many cores. To achieve that, various techniques ranging from task placement to data sharding are used in practice. However, from the perspective of an operating system, these techniques often do not work as expected because various subsystems in the OS interact and share data structures among themselves, resulting in scalability bottlenecks. Although current OSes attempt to tackle this problem by introducing a wide range of synchronization primitives such as spinlock and mutex, the widely used synchronization mechanisms are not designed to handle both under- and over-subscribed scenarios in a scalable fashion. In particular, the current blocking synchronization primitives that are designed to address both scenarios are NUMA oblivious, meaning that they suffer from cache-line contention in an under-subscribed situation, and even worse, inherently spur long scheduler intervention, which leads to sub-optimal performance in an over-subscribed situation.
In this work, we present several design choices to implement scalable blocking synchronization primitives that can address both under- and over-subscribed scenarios. Such design decisions include memory-efficient NUMA-aware locks (favorable for deployment) and scheduling-aware, scalable parking and wake-up strategies. To validate our design choices, we implement two new blocking synchronization primitives, which are variants of mutex and read-write semaphore in the Linux kernel. Our evaluation shows that these locks can scale real-world applications by 1.2--1.6X and some of the file system operations up to 4.7X in both under- and over-subscribed scenarios. Moreover, they use 1.5--10X less memory than the state-of-the-art NUMA-aware locks on a 120-core machine.
@inproceedings{kashyap:cst, title = {{Scalable NUMA-aware Blocking Synchronization Primitives}}, author = {Sanidhya Kashyap and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the 2017 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2017, address = {Santa Clara, CA}, }
Discovering the security vulnerabilities of commercial off-the-shelf (COTS) operating systems (OSes) is challenging because they not only are huge and complex, but also lack detailed debug information. Concolic testing, which generates all feasible inputs of a program by using symbolic execution and tests the program with the generated inputs, is one of the most promising approaches to solve this problem. Unfortunately, the state-of-the-art concolic testing tools do not scale well for testing COTS OSes because of state explosion. Indeed, they often fail to find a single bug (or crash) in COTS OSes despite their long execution time.
In this paper, we propose CAB-Fuzz (Context-Aware and Boundary-focused), a practical concolic testing tool to quickly explore interesting paths that are highly likely triggering real bugs without debug information. First, CAB-Fuzz prioritizes the boundary states of arrays and loops, inspired by the fact that many vulnerabilities originate from a lack of proper boundary checks. Second, CAB-Fuzz exploits real programs interacting with COTS OSes to construct proper contexts to explore deep and complex kernel states without debug information. We applied CAB-Fuzz to Windows 7 and Windows Server 2008 and found 21 undisclosed unique crashes, including two local privilege escalation vulnerabilities (CVE2015-6098 and CVE-2016-0040) and one information disclosure vulnerability in a cryptography driver (CVE2016-7219). CAB-Fuzz found vulnerabilities that are non-trivial to discover; five vulnerabilities have existed for 14 years, and we could trigger them even in the initial version of Windows XP (August 2001).
@inproceedings{kim:cab-fuzz, title = {{CAB-Fuzz: Practical Concolic Testing Techniques for COTS Operating Systems}}, author = {Su Yong Kim and Sangho Lee and Insu Yun and Wen Xu and Byoungyoung Lee and Youngtae Yun and Taesoo Kim}, booktitle = {Proceedings of the 2017 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2017, address = {Santa Clara, CA}, }
A number of security mechanisms have been proposed to harden programs written in unsafe languages, each of which mitigates a specific type of memory error. Intuitively, enforcing multiple security mechanisms on a target program will improve its overall security. However, this is not yet a viable approach in practice because the execution slowdown caused by various security mechanisms is often non-linearly accumulated, making the combined protection prohibitively expensive; further, most security mechanisms are designed for independent or isolated uses and thus are often in conflict with each other, making it impossible to fuse them in a straightforward way.
In this paper, we present Bunshin, an N-version based system that enables different and even conflicting security mechanisms to be combined to secure a program while at the same time reducing the execution slowdown. In particular, we propose an automated mechanism to distribute runtime security checks in multiple program variants in such a way that conflicts between security checks are inherently eliminated and execution slowdown is minimized with parallel execution. We also present an N-version execution engine to seamlessly synchronize these variants so that all distributed security checks work together to guarantee the security of a target program.
@inproceedings{xu:bunshin, title = {{Bunshin: Compositing Security Mechanisms through Diversification}}, author = {Meng Xu and Kangjie Lu and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 2017 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2017, address = {Santa Clara, CA}, }
Processing a one trillion-edge graph has recently been demonstrated by distributed graph engines running on clusters of tens to hundreds of nodes. In this paper, we employ a single heterogeneous machine with fast storage media (e.g., NVMe SSD) and massively parallel coprocessors (e.g., Xeon Phi) to reach similar dimensions. By fully exploiting the heterogeneous devices, we design a new graph processing engine, named MOSAIC, for a single machine. We propose a new locality-optimizing, space-efficient graph representation - Hilbert-ordered tiles, and a hybrid execution model that enables vertex-centric operations in fast host processors and edge-centric operations in massively parallel coprocessors. Our evaluation shows that for smaller graphs, MOSAIC consistently outperforms other state-of-the-art out-of-core engines by 3.2–58.6x and shows comparable performance to distributed graph engines. Furthermore, MOSAIC can complete one iteration of the Pagerank algorithm on a trillion-edge graph in 21 minutes, outperforming a distributed disk-based engine by 9.2x.
@inproceedings{maass:mosaic, title = {{Mosaic: Processing a Trillion-Edge Graph on a Single Machine}}, author = {Steffen Maass and Changwoo Min and Sanidhya Kashyap and Woonhak Kang and Mohan Kumar and Taesoo Kim}, booktitle = {Proceedings of the 12th European Conference on Computer Systems (EuroSys)}, month = apr, year = 2017, address = {Belgrade, RS}, }
With Tor being a popular anonymity network, many attacks have been proposed to break its anonymity or leak information of a private communication on Tor. However, guaranteeing complete privacy in the face of an adversary on Tor is especially difficult because Tor relays are under complete control of world-wide volunteers. Currently, one can gain private information, such as circuit identifiers and hidden service identifiers, by running Tor relays and can even modify their behaviors with malicious intent.
This paper presents a practical approach to effectively enhancing the security and privacy of Tor by utilizing Intel SGX, a commodity trusted execution environment. We present a design and implementation of Tor, called SGX-Tor, that prevents code modification and limits the information exposed to untrusted parties. We demonstrate that our approach is practical and effectively reduces the power of an adversary to a traditional network-level adversary. Finally, SGX-Tor incurs moderate performance overhead; the end-to-end latency and throughput overheads for HTTP connections are 3.9% and 11.9%, respectively.
@inproceedings{kim:sgx-tor, title = {{Enhancing Security and Privacy of Tor's Ecosystem by using Trusted Execution Environments}}, author = {Seongmin Kim and Juhyeng Han and Jaehyeong Ha and Taesoo Kim and Dongsu Han}, booktitle = {Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI)}, month = mar, year = 2017, address = {Boston, MA}, }
Traditional execution environments deploy Address Space Layout Randomization (ASLR) to defend against memory corruption attacks. However, Intel Software Guard Extension (SGX), a new trusted execution environment designed to serve security-critical applications on the cloud, lacks such an effective, well-studied feature. In fact, we find that applying ASLR to SGX programs raises non-trivial issues beyond simple engineering for a number of reasons: it conveys fundamental challenges for a number of reasons: 1) SGX is designed to defeat a stronger adversary than the traditional model, which requires the address space layout to be hidden from the kernel; 2) the limited memory uses in SGX programs present a new challenge in providing a sufficient degree of entropy; 3) remote attestation conflicts with the dynamic relocation required for ASLR; and 4) the SGX specification relies on known and fixed addresses for key data structures that cannot be randomized.
This paper presents SGX-Shield, a new ASLR scheme designed for SGX environments. SGX-Shield is built on a secure in-enclave loader to secretly bootstrap the memory space layout with a finer-grained randomization. To be compatible with SGX hardware (e.g., remote attestation, fixed addresses), SGX-Shield is designed with a software-based data execution protection mechanism through an LLVM-based compiler. We implement SGX-Shield and thoroughly evaluate it on real SGX hardware. It shows a high degree of randomness in memory layouts and stops memory corruption attacks with a high probability. SGX-Shield shows 7.61% performance overhead in running common micro-benchmarks and 2.25% overhead in running a more realistic workload of an HTTPS server.
@inproceedings{seo:sgx-shield, title = {{SGX-Shield: Enabling Address Space Layout Randomization for SGX Programs}}, author = {Jaebaek Seo and Byoungyoung Lee and Sungmin Kim and Ming-Wei Shih and Insik Shin and Dongsu Han and Taesoo Kim}, booktitle = {Proceedings of the 2017 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2017, address = {San Diego, CA}, }
Intel Software Guard Extensions (SGX) is a hardware-based trusted execution environment (TEE) that enables secure execution of a program in an isolated environment, an enclave. SGX hardware protects the running enclave against malicious software, including an operating system (OS), a hypervisor, and even low-level firmwares. This strong security property allows the trustworthy execution of programs in a hostile environment, such as a public cloud, without trusting anyone (e.g., a cloud provider) between the enclave and the SGX hardware. However, recent studies have demonstrated that enclave programs are vulnerable to an accurate controlled-channel attack: %conducted by a malicious OS. Since enclaves rely on the underlying OS, a curious or potentially malicious OS can observe a sequence of accessed addresses by intentionally triggering page faults.
In this paper, we propose T-SGX, a complete mitigation solution to the controlled-channel attack in terms of compatibility, performance, and ease of use. T-SGX relies on a commodity component of the Intel processor (since Haswell), Transactional Synchronization Extensions (TSX), which implements a restricted form of hardware transactional memory. As TSX is implemented as an extension (i.e., snooping the cache protocol), any unusual event, such as an exception or interrupt, that should be handled in its core component, results in an abort of the ongoing transaction. One interesting property is that the TSX abort suppresses the notification of errors to the underlying OS, which means that the OS cannot know whether a page fault has occurred during the transaction. T-SGX, by utilizing such property, can carefully isolate effects of attempts to tap running enclaves, thereby completely eradicating the known controlled-channel attack.
We have implemented T-SGX as a compiler-level scheme that automatically transforms a normal enclave program into a secured one. We not only evaluate the security properties of T-SGX, but also demonstrate that it applies to all the previously demonstrated attack targets including libjpeg, Hunspell, and FreeType. In addition, we evaluate the performance of T-SGX by porting ten benchmark programs of nbench to the SGX environment. The results are promising; that is, T-SGX incurs on average 50% runtime overhead, which is an order of magnitude faster than state-of-the-art mitigation schemes.
@inproceedings{shih:tsgx, title = {{T-SGX: Eradicating Controlled-Channel Attacks Against Enclave Programs}}, author = {Ming-Wei Shih and Sangho Lee and Taesoo Kim and Marcus Peinado}, booktitle = {Proceedings of the 2017 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2017, address = {San Diego, CA}, }
2016
Kernel hardening has been an important topic since many applications and security mechanisms often consider the kernel as part of their Trusted Computing Base (TCB). Among various hardening techniques, Kernel Address Space Layout Randomization (KASLR) is the most effective and widely adopted defense mechanism that can practically mitigate various memory corruption vulnerabilities, such as buffer overflow and use-after-free. In principle, KASLR is secure as long as no memory leak vulnerability exists and high entropy is ensured.
In this paper, we introduce a highly stable timing attack against KASLR, called DrK, that can precisely de-randomize the memory layout of the kernel without violating any such assumptions. DrK exploits a hardware feature called Intel Transactional Synchronization Extension (TSX) that is readily available in most modern commodity CPUs. One surprising behavior of TSX, which is essentially the root cause of this security loophole, is that it aborts a transaction without notifying the underlying kernel even when the transaction fails due to a critical error, such as a page fault or an access violation, which traditionally requires kernel intervention. DrK turned this property into a precise timing channel that can determine the mapping status (i.e., mapped versus unmapped) and execution status (i.e., executable versus non-executable) of the privileged kernel address space. In addition to its surprising accuracy and precision, DrK is universally applicable to all OSes, even in virtualized environments, and generates no visible footprint, making it difficult to detect in practice. We demonstrated that DrK can break the KASLR of all major OSes (i.e., Windows, Linux, and OS X) with near-perfect accuracy in under a second. Finally, we propose potential countermeasures that can effectively prevent or mitigate the DrK attack.
We urge our community to be aware of the potential threat of having Intel TSX, which is present in most recent Intel CPUs -- 100% in workstation and 60% in high-end Intel CPUs since Skylake -- and is even available on Amazon EC2 (X1).
@inproceedings{jang:drk-ccs, title = {{Breaking Kernel Address Space Layout Randomization with Intel TSX}}, author = {Yeongjin Jang and Sangho Lee and Taesoo Kim}, booktitle = {Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2016, address = {Vienna, Austria}, }
The operating system kernel is the de facto trusted computing base for most computer systems. To secure the OS kernel, many security mechanisms, e.g., kASLR and StackGuard, have been increasingly deployed to defend against attacks (e.g., code reuse attack). However, the effectiveness of these protections has been proven to be inadequate---there are many information leak vulnerabilities in the kernel to leak the randomized pointer or canary, thus bypassing kASLR and StackGuard. Other sensitive data in the kernel, such as cryptographic keys and file caches, can also be leaked. According to our study, most kernel information leaks are caused by uninitialized data reads. Unfortunately, existing techniques like memory safety enforcements and dynamic access tracking tools are not adequate or efficient enough to mitigate this threat.
In this paper, we propose UniSan, a novel, compiler-based approach to eliminate all information leaks caused by uninitialized read in the OS kernel. UniSan achieves this goal using byte-level, flow-sensitive, context-sensitive, and field-sensitive initialization analysis and reachability analysis to check whether an allocation has been fully initialized when it leaves kernel space; if not, it automatically instruments the kernel to initialize this allocation. UniSan's analyses are conservative to avoid false negatives and are robust by preserving the semantics of the OS kernel. We have implemented UniSan as passes in LLVM and applied it to the latest Linux kernel (x86_64) and Android kernel (AArch64). Our evaluation showed that UniSan can successfully prevent 43 known and many new uninitialized data leak vulnerabilities. Further, 19 new vulnerabilities in the latest kernels have been confirmed by Linux and Google. Our extensive performance evaluation with LMBench, ApacheBench, Android benchmarks, and the SPEC benchmarks also showed that UniSan imposes a negligible performance overhead.
@inproceedings{lu:unisan, title = {{UniSan: Proactive Kernel Memory Initialization to Eliminate Data Leakages}}, author = {Kangjie Lu and Chengyu Song and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2016, address = {Vienna, Austria}, }
This paper argues for the utility of back-end driven onloading to the edge as a way to address bandwidth use and latency challenges for future device-cloud interactions. Supporting such edge functions (EFs) requires solutions that can provide (i) fast and scalable EF provisioning and (ii) strong guarantees for the integrity of the EF execution and confidentiality of the state stored at the edge. In response to these goals, we (i) present a detailed design space exploration of the current technologies that can be leveraged in the design of edge function platforms (EFPs); (ii) develop a solution to address security concerns of EFs that leverages emerging hardware support for OS agnostic trusted execution environments such as Intel SGX enclaves; and (iii) propose and evaluate AirBox, a platform for fast, scalable and secure onloading of edge functions.
@inproceedings{bhardwaj:airbox, title = {{Fast, Scalable and Secure Onloading of Edge Functions using AirBox}}, author = {Ketan Bhardwaj and Ming-Wei Shih and Pragya Agarwal and Ada Gavrilovska and Taesoo Kim and Karsten Schwan}, booktitle = {Proceedings of the 1st IEEE/ACM Symposium on Edge Computing (SEC 2016)}, month = oct, year = 2016, }
There have been tremendous efforts to build fully secure computer systems, but it is not an easy goal. Making a simple mistake introduces a vulnerability, which can critically endanger a whole system's security.
This thesis aims at protecting computer systems from vulnerabilities. We take two complementary approaches in achieving this goal, eliminating or analyzing vulnerabilities. In the vulnerability elimination approach, we eliminate a certain class of memory corruption vulnerabilities to completely close attack vectors from such vulnerabilities. In particular, we develop tools DangNull and CaVer, each of which eliminates popular and emerging vulnerabilities, use-after-free and bad-casting, respectively. DangNull relies on the key observation that the root cause of use-after-free is that pointers are not nullified after the target object is freed. Thus, DangNull instruments a program to trace the object's relationships via pointers and automatically nullifies all pointers when the target object is freed. Similarly, CaVer relies on the key observation that the root cause of bad-casting is that casting operations are not properly verified. Thus, CaVer uses a new runtime type tracing mechanism to overcome the limitation of existing approaches, and performs efficient verification on all type casting operations dynamically. We have implemented these protection solutions and successfully applied them to Chrome and Firefox browsers. Our evaluation showed that DangNull and CaVer imposes 29% and 7.6% benchmark overheads in Chrome, respectively. We have also tested seven use-after-free and five bad-casting exploits in Chrome, and DangNull and CaVer safely prevented them all.
In the vulnerability analysis approach, we focus on a timing-channel vulnerability which allows an attacker to learn information about program's sensitive data without causing a program to perform unsafe operations. It is challenging to test and further confirm the timing-channel vulnerability as it typically involves complex algorithmic operations. We implemented SideFinder, an assistant tool identifying timing-channel vulnerabilities in a hash table. Empowered with symbolic execution techniques, SideFinder semi-automatically synthesizes inputs attacking timing-channels, and thus confirms the vulnerability. Using SideFinder, we analyzed and further synthesized two real-world attacks in the Linux kernel, and showed it can break one important security mechanism, Address Space Layout Randomization (ASLR).
@phdthesis{lee:thesis, title = {{Protecting Computer Systems through Eliminating or Analyzing Vulnerabilities}}, author = {Byoungyoung Lee}, school = {{Georgia Institute of Technology}}, year = 2016, month = aug, }
The most common cyber-attack vector is exploit of software vulnerability. Despite much efforts toward building secure software, software systems of even modest complexity still routinely have serious vulnerabilities. More alarmingly, even the trusted computing base (e.g. OS kernel) still contains vulnerabilities that would allow attackers to subvert security mechanisms such as the application sandbox on smartphones. Among all vulnerabilities, memory corruption is one of the most ancient, prevalent, and devastating vulnerabilities. This thesis proposed three projects on mitigating this threat. There are three popular ways to exploit a memory corruption vulnerability---attacking the code (a.k.a. code injection attack), the control data (a.k.a. control-flow hijacking attack), and the non-control data (a.k.a. data-oriented attack). Theoretically, code injection attack can be prevented with the executable XOR writable policy; but in practice, this policy is undermined by another important technique---dynamic code generation (e.g. JIT engines). In the first project, we first showed that this conflict is actually non-trivial to resolve, then we introduced a new design paradigm to fundamentally solve this problem, by relocating the dynamic code generator to a separate process. In the second project, we focused on preventing data-oriented attacks against operating system kernel. Using privilege escalation attacks as an example, we (1) demonstrated that data-oriented attacks are realistic threats and hard to prevent; (2) discussed two important challenges for preventing such attacks (i.e., completeness and performance); and (3) presented a system that combines program analysis techniques and system designs to solve these challenges. During these two projects, we found that lacking sufficient hardware support imposes many unnecessary difficulties in building robust and efficient defense mechanisms. In the third project, we proposed HDFI (hardware-assisted data-flow isolation) to overcome this limitation. HDFI is a new fine-grained isolation mechanism that enforces isolation at the machine word granularity, by virtually extending each memory unit with an additional tag that is defined by data-flow. This capability allows HDFI to enforce a variety of security models such as the Biba Integrity Model and the Bell--LaPadula Model. For demonstration, we developed and ported several security mechanisms to leverage HDFI, including stack protection, standard library enhancement, virtual function table protection, code pointer protection, kernel data protection, and information leak prevention. The evaluation results showed that HDFI is easy to use, imposes low performance overhead, and allows us to create simpler and more secure solutions.
@phdthesis{song:thesis, title = {{Preventing Exploits against Memory Corruption Vulnerabilities}}, author = {Chengyu Song}, school = {{Georgia Institute of Technology}}, year = 2016, month = aug, }
Memory corruption attacks may lead to complete takeover of systems. There are numerous works offering protection mechanisms for this important problem. But the security guarantees that are offered by most works are only heuristic and, furthermore, most solutions are designed for protecting the local memory. In this paper we initiate the study of provably secure remote memory attestation; we concentrate on provably detecting heap-based overflow attacks and consider the setting where we aim to protect the memory in a remote system. We present two protocols offering various efficiency and security trade-offs (but all solutions are efficient enough for practical use as our implementation shows) that detect the presence of injected malicious code or data in remotely-stored heap memory. While our solutions offer protection only against a specific class of attacks, our novel formalization of threat models is general enough to cover a wide range of attacks and settings.
@inproceedings{boldyreva:notion, title = {{Provably-Secure Remote Memory Attestation for Heap Overflow Protection}}, author = {Alexandra Boldyreva and Taesoo Kim and Richard J. Lipton and Bogdan Warinschi}, booktitle = {Proceedings of the 10th Conference on Security and Cryptography for Networks (SCN 2016)}, month = aug, year = 2016, }
The openness and extensibility of Android have made it a popular platform for mobile devices and a strong candidate to drive the Internet-of-Things. Unfortunately, these properties also leave Android vulnerable, attracting attacks for profit or fun. To mitigate these threats, numerous issue-specific solutions have been proposed. With the increasing number and complexity of security problems and solutions, we believe this is the right moment to step back and systematically re-evaluate the Android security architecture and security practices in the ecosystem. We organize the most recent security research on the Android platform into two categories: the software stack and the ecosystem. For each category, we provide a comprehensive narrative of the problem space; highlight the limitations of the proposed solutions; and identify open problems for future research. Based on our collection of knowledge, we envision a blueprint for engineering a secure, next-generation Android ecosystem.
@article{xu:android-survey, title = {{Toward Engineering a Secure Android Ecosystem: A Survey of Existing Techniques}}, author = {Meng Xu and Chengyu Song and Yang ji and Ming-Wei Shih and Kangjie Lu and Cong Zheng and Ruian Duan and Yeongjin Jang and Byoungyoung Lee and Chenxiong Qian and Sangho Lee and Taesoo Kim}, journal = {ACM Computing Surveys (CSUR)}, volume = 49, number = 2, month = aug, year = 2016, }
API misuse is a well-known source of bugs. Some of them (e.g., incorrect use of SSL API, and integer overflow of memory allocation size) can cause serious security vulnerabilities (e.g., man-in-the-middle (MITM) attack, and privilege escalation). Moreover, modern APIs, which are large, complex, and fast evolving, are error-prone. However, existing techniques to help finding bugs require manual effort by developers (e.g., providing specification or model) or are not scalable to large real-world software comprising millions of lines of code.
In this paper, we present APISAN, a tool that automatically infers correct API usages from source code without manual effort. The key idea in APISAN is to extract likely correct usage patterns in four different aspects (e.g., causal relation, and semantic relation on arguments) by considering semantic constraints. APISAN is tailored to check various properties with security implications. We applied APISAN to 92 million lines of code, including Linux Kernel, and OpenSSL, found 76 previously unknown bugs, and provided patches for all the bugs.
@inproceedings{yun:apisan, title = {{APISan: Sanitizing API Usages through Semantic Cross-checking}}, author = {Insu Yun and Changwoo Min and Xujie Si and Yeongjin Jang and Taesoo Kim and Mayur Naik}, booktitle = {Proceedings of the 25th USENIX Security Symposium (Security)}, month = aug, year = 2016, address = {Austin, TX}, }
Kernel hardening has been an important topic, as many applications and security mechanisms often consider the kernel as part of their Trusted Computing Base (TCB). Among various hardening techniques, Kernel Address Space Layout Randomization (KASLR) is the most effective and widely adopted defense mechanism that can practically mitigate various memory corruption vulnerabilities, such as buffer overflow and use-after-free. In principle, KASLR is secure as long as no memory leak vulnerability exists and high entropy is ensured.
In this paper, we introduce a novel timing attack against KASLR, called DrK, that can precisely de-randomize the memory layout of the kernel without violating any such assumptions. DrK exploits a hardware feature called Intel Transactional Synchronization Extension (TSX) that is readily available in most modern commodity CPUs. One surprising behavior of TSX, which is essentially the root cause of this security loophole, is that it aborts a transaction without notifying the underlying kernel even when the transaction fails due to a critical error, such as a page fault or an access violation, which traditionally requires kernel intervention. DrK turned this property into a precise timing channel that can determine the mapping status (i.e., mapped versus unmapped) and execution status (i.e., executable versus non-executable) of the privileged, kernel address space. In addition to its surprising accuracy and precision, DrK is universally applicable to all OSes, even on a virtualized environment, and generates no visible footprint, making it difficult to detect in practice.
We demonstrated that DrK could break KASLR of all major OSes (i.e., Windows, Linux, and OS X) with near-perfect accuracy in under a second. Finally, we propose a few potential countermeasures that can effectively prevent or mitigate the DrK attack. We urge our community to be aware of the potential threat of having Intel TSX, which is present in most recent Intel CPUs -- 100% in workstation and 60% in high-end Intel CPUs since Skylake -- and is even available on Amazon EC2 (X1).
@inproceedings{jang:drk-bh, title = {{Breaking Kernel Address Space Layout Randomization with Intel TSX}}, author = {Yeongjin Jang and Sangho Lee and Taesoo Kim}, booktitle = {Black Hat USA Briefings (Black Hat USA)}, month = aug, year = 2016, address = {Las Vegas, NV}, }
In recent years, operating systems have become increasingly complex and thus prone to security and performance issues. Accordingly, system updates to address these issues have become more frequently available and increasingly important. To complete such updates, users must reboot their systems, resulting in unavoidable downtime and further loss of the states of running applications.
We present , a practical OS update mechanism that uses a userspace checkpoint-and-restart mechanism, by using an optimized data structure for checkpointing and a memory persistence mechanism across update, combined with a fast in-place kernel switch. This allows for instant kernel updates spanning across major kernel versions without any kernel modifications.
Our evaluation shows that Kup can support any type of real kernel patches (e.g., security, minor or even major releases) with large-scale applications that include memcached, mysql, or in the middle of the Linux kernel compilation, unlike well-known dynamic hot-patching techniques (e.g., ksplice). Not only that, Kup can update a running Linux kernel in 3 seconds (overall downtime).
@inproceedings{kashyap:kup, title = {{Instant OS Updates via Userspace Checkpoint-and-Restart}}, author = {Sanidhya Kashyap and Changwoo Min and Byoungyoung Lee and Taesoo Kim and Pavel Emelyanov}, booktitle = {Proceedings of the 2016 USENIX Annual Technical Conference (ATC)}, month = jun, year = 2016, address = {Denver, CO}, }
We analyze the manycore scalability of five widelydeployed file systems, namely, ext4, XFS, btrfs, F2FS, and tmpfs, by using our open source benchmark suite, FxMark. FxMark implements 19 microbenchmarks to stress specific components of each file system and includes three application benchmarks to measure the macroscopic scalability behavior. We observe that file systems are hidden scalability bottlenecks in many I/Ointensive applications even when there is no apparent contention at the application level. We found 25 scalability bottlenecks in file systems, many of which are unexpected or counterintuitive. We draw a set of observations on file system scalability behavior and unveil several core aspects of file system design that systems researchers must address.
@inproceedings{min:fxmark, title = {{Understanding Manycore Scalability of File Systems}}, author = {Changwoo Min and Sanidhya Kashyap and Steffen Maass and Woonhak Kang and Taesoo Kim}, booktitle = {Proceedings of the 2016 USENIX Annual Technical Conference (ATC)}, month = jun, year = 2016, address = {Denver, CO}, }
Memory corruption vulnerabilities are the root cause of many modern attacks. Existing defense mechanisms are inadequate; in general, the software-based approaches are not efficient and the hardware-based approaches are not flexible. In this paper, we present hardware-assisted data-flow isolation, or, Hdfi, a new fine-grained data isolation mechanism that is broadly applicable and very efficient. Hdfi enforces isolation at the machine word granularity by virtually extending each memory unit with an additional tag that is defined by data-flow. This capability allows Hdfi to enforce a variety of security models such as the Biba Integrity Model and the Bell--LaPadula Model. We implemented Hdfi by extending the RISC-V instruction set architecture (ISA) and instantiating it on the Xilinx Zynq ZC706 evaluation board. We ran several benchmarks including the SPEC CINT 2000 benchmark suite. Evaluation results show that the performance overhead caused by our modification to the hardware is low (<2%). We also developed or ported several security mechanisms to leverage Hdfi, including stack protection, standard library enhancement, virtual function table protection, code pointer protection, kernel data protection, and information leak prevention. Our results show that Hdfi is easy to use, imposes low performance overhead, and allows us to create more elegant and more secure solutions.
@inproceedings{song:hdfi, title = {{HDFI: Hardware-Assisted Data-Fow Isolation}}, author = {Chengyu Song and Hyungon Moon and Monjur Alam and Insu Yun and Byoungyoung Lee and Taesoo Kim and Wenke Lee and Yunheung Paek}, booktitle = {Proceedings of the 37th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2016, address = {San Jose, CA}, }
With increasing demand for big-data processing and faster in-memory databases, cloud providers are moving towards large virtualized instances besides focusing on the horizontal scalability.
However, our experiments reveal that such instances in popular cloud services (e.g., 32 vCPUs with 208 GB supported by Google Compute Engine) do not achieve the desired scalability with increasing core count even with a simple, embarrassingly parallel job (e.g., Linux kernel compile). On a serious note, the internal synchronization scheme (e.g., paravirtualized ticket spinlock) of the virtualized instance on a machine with higher core count (e.g., 80-core) dramatically degrades its overall performance. Our finding is different from the previously well-known scalability problem (i.e., lock contention problem) and occurs because of the sophisticated optimization techniques implemented in the hypervisor—what we call sleepy spinlock anomaly. To solve this problem, we design and implement OTICKET, a variant of paravirtualized ticket spinlock that effectively scales the virtualized instances in both undersubscribed and oversubscribed environments.
@inproceedings{kashyap:oppspinlocks, title = {{Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds}}, author = {Sanidhya Kashyap and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the ACM SIGOPS Operating System Review}, journal = {SIGOPS Oper. Syst. Rev.}, year = 2016, month = mar, volume = 50, number = 1, }
Network Function Virtualization (NFV) applications are stateful. For example, a Content Distribution Network (CDN) caches web contents from remote servers and serves them to clients. Similarly, an Intrusion Detection System (IDS) and an Intrusion Prevention System (IPS) have both per-flow and multi-flow (shared) states to properly react to intrusions. On today's NFV infrastructures, security vulnerabilities many allow attackers to steal and manipulate the internal states of NFV applications that share a physical resource. In this paper, we propose a new protection scheme, S-NFV that incorporates Intel Software Guard Extensions (Intel SGX) to securely isolate the states of NFV applications.
@inproceedings{shih:snfv, title = {{S-NFV: Securing NFV states by using SGX}}, author = {Ming-Wei Shih and Mohan Kumar and Taesoo Kim and Ada Gavrilovska}, booktitle = {Proceedings of the 1st ACM International Workshop on Security in SDN and NFV}, month = mar, year = 2016, address = {New Orleans, LA}, }
Hardware technologies for trusted computing, or trusted execution environments (TEEs), have rapidly matured over the last decade. In fact, TEEs are at the brink of widespread commoditization with the recent introduction of Intel Software Guard Extensions (Intel SGX). Despite such rapid development of TEE, software technologies for TEE significantly lag behind their hardware counterpart, and currently only selected group of researchers have the privilege to access this technology. To address this problem, we develop an open source platform, called OpenSGX that emulates Intel SGX hardware components at the instruction level and provides new system software components necessarily required for full TEE exploration. We expect that the OpenSGX framework can serve as an open platform for SGX research, with the following contributions. First, we develop a fully functional, instruction-compatible emulator of Intel SGX for enabling the exploration of software/hardware design space, and development of enclave programs. OpenSGX provides a platform for SGX development, meaning that it provides not just emulation but also operating system components, an enclave program loader/packager, an OpenSGX user library, debugging, and performance monitoring. Second, to show OpenSGX's use cases, we applied OpenSGX to protect sensitive information (e.g., directory) of Tor nodes, and evaluated their potential performance impacts. Therefore, we believe OpenSGX has a great potential for broader communities to spark new research on soon- to-be-commodity Intel SGX.
@inproceedings{jain:opensgx, title = {{OpenSGX: An Open Platform for SGX Research}}, author = {Prerit Jain and Soham Desai and Seongmin Kim and Ming-Wei Shih and JaeHyuk Lee and Changho Choi and Youjung Shin and Taesoo Kim and Brent B. Kang and Dongsu Han}, booktitle = {Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2016, address = {San Diego, CA}, }
Operation system kernel is the foundation of the whole system, and is often the de facto trusted computing base for many higher level security mechanisms. Unfortunately, kernel vulnerabilities are not rare and are continuously being introduced with new kernel features. Once the kernel is compromised, attackers can bypass any access control checks, escalate their privileges and hide the evidence of attacks. Many protection mechanisms have been proposed and deployed to prevent kernel exploits. However, a majority of these techniques only focused on preventing control-flow hijacking attacks; and techniques that can mitigate non-control-data attacks either only apply to drivers/modules, or impose too much overhead. The goal of our research is to develop a principled defense mechanism against memory corruption based privilege escalation attacks. Towards this end, we leverage data-flow integrity to enforce security invariants of the kernel access control system. And in order for our protection mechanism to be practical, we develop two new techniques: one for automatically inferring data that are critical to the access control system without manual annotation, and the other for efficient DFI enforcement over the inference results. We have implemented a prototype of our technology for the ARM64 Linux kernel on an Android device. The evaluation results of our prototype implementation show that our technology can mitigate a majority of privilege escalation attacks, while imposing a moderate amount of performance overhead.
@inproceedings{song:kenali, title = {{Enforcing Kernel Security Invariants with Data Flow Integrity}}, author = {Chengyu Song and Byoungyoung Lee and Kangjie Lu and William R. Harris and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2016, address = {San Diego, CA}, }
Mobile applications are increasingly integrating third-party libraries to provide various features, such as advertising, analytics, social networking, and more. Unfortunately, such integration with third-party libraries comes with the cost of potential privacy violations of users, because Android always grants a full set of permissions to third-party libraries as their host applications. Unintended accesses to users’ private data are underestimated threats to users’ privacy, as complex and often obfuscated third-party libraries make it hard for application developers to estimate the correct behaviors of third-party libraries. More critically, a wide adoption of native code (JNI) and dynamic code executions such as Java reflection or dynamic code reloading, makes it even harder to apply state-of-the-art security analysis. In this work, we propose FLEXDROID, a new Android security model and isolation mechanism, that provides dynamic, fine-grained access control for third-party libraries. With FLEXDROID, application developers not only can gain a full control of third-party libraries (e.g., which permissions to grant or not), but also can specify how to make them behave after detecting a privacy violation (e.g., providing a mock user’s information or kill). To achieve such goals, we define a new notion of principals for third-party libraries, and develop a novel security mechanism, called inter-process stack inspection that is effective to JNI as well as dynamic code execution. Our usability study shows that developers can easily adopt FLEXDROID’s policy to their existing applications. Finally, our evaluation shows that FLEXDROID can effectively restrict the permissions of third-party libraries with negligible overheads.
@inproceedings{seo:flexdroid, title = {{FlexDroid: Enforcing In-App Privilege Separation in Android}}, author = {Jaebaek Seo and Daehyeok Kim and Donghyun Cho and Taesoo Kim and Insik Shin}, booktitle = {Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2016, address = {San Diego, CA}, }
Existing techniques for memory randomization such as the widely explored Address Space Layout Randomization (ASLR) perform a single, per-process randomization that is applied before or at the process' load-time. The efficacy of such upfront randomizations crucially relies on the assumption that an attacker has only one chance to guess the randomized address, and that this attack succeeds only with a very low probability. Recent research results have shown that this assumption is not valid in many scenarios, e.g., daemon servers fork child processes that inherent the state-- and if applicable: the randomization--of their parents, and thereby create clones with the same memory layout. This enables the so-called clone-probing attacks where an adversary repeatedly probes different clones in order to increase its knowledge about their shared memory layout. In this paper, we propose RuntimeASLR--the first approach that prevents clone-probing attacks without altering the intended semantics of child process forking. The paper makes the following three contributions. First, we propose a semantics- preserving and runtime-based approach for preventing clone-probing attacks by re-randomizing the address space of every child after fork() at runtime while keeping the parent's state. We achieve this by devising a novel, automated pointer tracking policy generation process that has to be run just once, followed by a pointer tracking mechanism that is only applied to the parent process. Second, we propose a systematic and holistic pointer tracking mechanism that correctly identifies pointers inside memory space. This mechanism constitutes the central technical building block of our approach. Third, we provide an open-source implementation of our approach based on Intel's Pin on an x86-64 Linux platform, which supports COTS server binaries directly. We have also evaluated our system on Nginx web server. The results show that RuntimeASLR identifies all pointers, effectively prevents clone-probing attacks. Although it takes a longer time for RuntimeASLR to start the server program (e.g., 35 seconds for Nginx), RuntimeASLR imposes no runtime performance overhead to the worker processes that provide actual services.
@inproceedings{lu:runtimeaslr, title = {{How to Make ASLR Win the Clone Wars: Runtime Re-Randomization}}, author = {Kangjie Lu and Stefan N{\"u}rnberger and Michael Backes and Wenke Lee}, booktitle = {Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2016, address = {San Diego, CA}, }
Cloud-based file synchronization services such as Dropbox are a worldwide resource for many millions of users. However, individual services often have tight resource limits, suffer from temporary outages or shutdowns, and sometimes silently corrupt or leak user data. As a solution, the authors design, implement, and evaluate MetaSync, a secure and reliable file synchronization service using multiple cloud synchronization services as untrusted storage providers. To provide global consistency among the storage providers, the authors devise a novel variant of Paxos that enables efficient updates on top of the unmodified APIs exported by each service. MetaSync provides better availability and performance, stronger confidentiality and integrity, and larger storage for end users.
@article{han:metasync2, title = {{MetaSync: Coordinating Storage Across Multiple File Synchronization Services}}, author = {Seungyeop Han and Haichen Shen and Taesoo Kim and Arvind Krishnamurthy and Thomas Anderson and David Wetherall}, journal = {IEEE Internet Computing}, year = 2016, }
2015
Network applications and protocols are increasingly adopting security and privacy features, as they are becoming one of the primary requirements. The wide-spread use of transport layer security (TLS) and the growing popularity of anonymity networks, such as Tor, exemplify this trend. Motivated by the recent movement towards commoditization of trusted execution environments (TEEs), this paper explores alternative design choices that application and protocol designers should consider. In particular, we explore the possibility of using Intel SGX to provide security and privacy in a wide range of network applications. We show that leveraging hardware protection of TEEs opens up new possibilities, often at the benefit of a much simplified application/protocol design. We demonstrate its practical implications by exploring the design space for SGX-enabled software-defined inter-domain routing, peer-to-peer anonymity networks (Tor), and middleboxes. Finally, we quantify the potential overheads of the SGX-enabled design by implementing it on top of OpenSGX, an open source SGX emulator.
@inproceedings{kim:sgxtor, title = {{A First Step Towards Leveraging Commodity Trusted Execution Environments for Network Applications}}, author = {Seongmin Kim and Youjung Shin and Jaehyung Ha and Taesoo Kim and Dongsu Han}, booktitle = {Proceedings of the 14th ACM Workshop on Hot Topics in Networks (HotNets)}, month = nov, year = 2015, address = {Philadelphia, PA}, }
While private browsing is a standard feature, its implementation has been inconsistent among the major browsers. More seriously, it often fails to provide the adequate or even the intended privacy protection. For example, as shown in prior research, browser extensions and add- ons often undermine the goals of private browsing. In this paper, we first present our systematic study of private browsing. We developed a technical approach to identify browser traces left behind by a private browsing session, and showed that Chrome and Firefox do not correctly clear some of these traces. We analyzed the source code of these browsers and discovered that the current implementation approach is to decide the behaviors of a browser based on the current browsing mode (i.e., private or public); but such decision points are scattered throughout the code base. This implementation approach is very problematic because developers are prone to make mistakes given the complexities of browser components (including extensions and add-ons). Based on this observation, we propose a new and general approach to implement private browsing. The main idea is to overlay the actual filesystem with a sandbox filesystem when the browser is in private browsing mode, so that no unintended leakage is allowed and no persistent modification is stored. This approach requires no change to browsers and the OS kernel because the layered sandbox filesystem is implemented by interposing system calls. We have implemented a prototype system called Ucognito on Linux. Our evaluations show that Ucognito, when applied to Chrome and Firefox, stops all known privacy leaks identified by prior work and our current study. More importantly, Ucognito incurs only negligible performance overhead: e.g., 0%-2.5% in benchmarks for standard JavaScript and webpage loading.
@inproceedings{xu:ucognito, title = {{UCognito: Private Browsing without Tears}}, author = {Meng Xu and Yeongjin Jang and Xinyu Xing and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2015, address = {Denver, Colorado}, }
A general prerequisite for a code reuse attack is that the attacker needs to locate code gadgets that perform the desired operations and then direct the control flow of a vulnerable application to those gadgets. Address Space Layout Randomization (ASLR) attempts to stop code reuse attacks by making the first part of the prerequisite unsatisfiable. However, research in recent years has shown that this protection is often defeated by commonly existing information leaks, which provides attackers clues about the whereabouts of certain code gadgets. In this paper, we present ASLR-Guard, a novel mechanism that completely prevents the leaks of code pointers, and render other information leaks (e.g., the ones of data pointers) useless in deriving code address. The main idea behind ASLR-Guard is to render leak of data pointer useless in deriving code address by separating code and data, provide a secure storage for code pointers, and encode the code pointers when they are treated as data. ASLR-Guard can either prevent code pointer leaks or render their leaks harmless. That is, ASLR-Guard makes it impossible to overwrite code pointers with values that point to or will hijack the control flow to a desired address when the code pointers are dereferenced. We have implemented a prototype of ASLR-Guard, including a compilation toolchain and a C/C++ runtime. Our evaluation results show that (1) ASLR-Guard supports normal operations correctly; (2) it completely stops code address leaks and can resist against recent sophisticated attacks; (3) it imposes almost no runtime overhead (< 1%) for C/C++ programs in the SPEC benchmark. Therefore, ASLR-Guard is very practical and can be applied to secure many applications.
@inproceedings{lu:aslrguard, title = {{ASLR-Guard: Stopping Address Space Leakage for Code Reuse Attacks}}, author = {Kangjie Lu and Chengyu Song and Byoungyoung Lee and Simon P. Chung and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2015, address = {Denver, Colorado}, }
Long Term Evolution (LTE) is becoming the dominant cellular networking technology, shifting the cellular network away from its circuit-switched legacy towards a packet-switched network that resembles the Internet. To support voice calls over the LTE network, operators have introduced Voice-over-LTE (VoLTE), which dramatically changes how voice calls are handled, both from user equipment and infrastructure perspectives. We find that this dramatic shift opens up a number of new attack surfaces that have not been previously explored. To call attention to this matter, this paper presents a systematic security analysis.
Unlike the traditional call setup, the VoLTE call setup is controlled and performed at the Application Processor (AP), using the SIP over IP. A legitimate user who has control over the AP can potentially control and exploit the call setup process to establish a VoLTE channel. This combined with the legacy accounting policy (e.g., unlimited voice and the separation of data and voice) leads to a number of free data channels. In the process of unveiling the free data channels, we identify a number of additional vulnerabilities of early VoLTE implementations, which lead to serious exploits, such as caller spoofing, over-billing, and denial-of-service attacks. We identify the nature of these vulnerabilities and concrete exploits that directly result from the adoption of VoLTE. We also propose immediate countermeasures that can be employed to alleviate the problems. However, we believe that the nature of the problem calls for a more comprehensive solution that eliminates the root causes at mobile devices, mobile platforms, and the core network.
@inproceedings{kim:volte, title = {{Breaking and Fixing VoLTE: Exploiting Hidden Data Channels and Mis-implementations}}, author = {Hongil Kim and Dongkwan Kim and Minhee Kwon and Hyungseok Han and Yeongjin Jang and Dongsu Han and Taesoo Kim and Yongdae Kim}, booktitle = {Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS)}, month = oct, year = 2015, address = {Denver, Colorado}, }
Today, systems software is too complex to be bug-free. To find bugs in systems software, developers often rely on code checkers, like Linux's Sparse. However, the capability of existing tools used in commodity, large-scale systems is limited to finding only shallow bugs that tend to be introduced by simple programmer mistakes, and so do not require a deep understanding of code to find them. Unfortunately, the majority of bugs as well as those that are difficult to find are semantic ones, which violate high-level rules or invariants (e.g., missing a permission check). Thus, it is difficult for code checkers lacking the understanding of a programmer's true intention to reason about semantic correctness.
To solve this problem, we present Juxta, a tool that automatically infers high-level semantics directly from source code. The key idea in Juxta is to compare and contrast multiple existing implementations that obey latent yet implicit high-level semantics. For example, the implementation of open() at the file system layer expects to handle an out-of-space error from the disk in all file systems. We applied Juxta to 54 file systems in the stock Linux kernel (680K LoC), found 118 previously unknown semantic bugs (one bug per 5.8K LoC), and provided corresponding patches to 39 different file systems, including mature, popular ones like ext4, btrfs, XFS, and NFS. These semantic bugs are not easy to locate, as all the ones found by Juxta have existed for over 6.2 years on average. Not only do our empirical results look promising, but the design of Juxta is generic enough to be extended easily beyond file systems to any software that has multiple implementations, like Web browsers or protocols at the same layer of a network stack.
@inproceedings{min:juxta, title = {{Cross-checking Semantic Correctness: The Case of Finding File System Bugs}}, author = {Changwoo Min and Sanidhya Kashyap and Byoungyoung Lee and Chengyu Song and Taesoo Kim}, booktitle = {Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2015, address = {Monterey, CA}, }
Many applications such as the Chrome and Firefox browsers are largely implemented in C++ for its performance and modularity. Type casting, which converts one type of an object to another, plays an essential role in enabling polymorphism in C++ because it allows a program to utilize certain general or specific implementations in the class hierarchies. However, if not correctly used, it may return unsafe and incorrectly casted values, leading to so-called bad-casting or type-confusion vulnerabilities. Since a bad-casted pointer violates a programmer's intended pointer semantics and enables an attacker to corrupt memory, bad-casting has critical security implications similar to those of other memory corruption vulnerabilities. Despite the increasing number of bad-casting vulnerabilities, the bad-casting detection problem has not been addressed by the security community.
In this paper, we present CaVer, a runtime bad-casting detection tool. It performs program instrumentation at compile time and uses a new runtime type tracing mechanism, the type hierarchy table, to overcome the limitation of existing approaches and efficiently verify type casting dynamically. In particular, CaVer can be easily and automatically adopted to target applications, CaVer achieves broader detection coverage, and incurs reasonable runtime overhead. We have applied CaVer to large-scale software including Chrome and Firefox browsers, and CaVer discovered 11 previously unknown security vulnerabilities: nine in GNU stdlibc++ and two in Firefox, all of which have been confirmed and subsequently fixed by vendors. Our evaluation showed that CaVer imposes up to 7.6% and 64.6% overhead for performance-intensive benchmarks on the Chromium and Firefox browsers, respectively.
@inproceedings{lee:caver, title = {{Type Casting Verification}: Stopping an Emerging Attack Vector}, author = {Byoungyoung Lee and Chengyu Song and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 24th USENIX Security Symposium (Security)}, month = aug, year = 2015, address = {Washington, DC}, }
With increasing demand of big-data processing and faster in-memory databases, cloud providers are gearing towards large virtualized instances rather than horizontal scalability.
However, our experiments reveal that such instances in popular cloud services (e.g., 32 vCPUs with 208 GB supported by Google Compute Engine) do not achieve the desired scalability with increasing core count even with a simple, embarrassingly parallel job (e.g., kernel compile). On a serious note, the internal synchronization scheme (e.g., paravirtualized ticket spinlock) of the virtualized instance on a machine with higher core count (e.g., 80-core) dramatically degrades its overall performance. Our finding is different from a previously well-known scalability problem (lock contention problem), and occurs because of the sophisticated optimization techniques implemented in the hypervisor, what we call -- sleepy spinlock anomaly. To solve this problem, we design and implement oticket, a variant of paravirtualized ticket spinlock that effectively scales the virtualized instances in both undersubscribed and oversubscribed environments.
@inproceedings{kashyap:oticket, title = {{Scalability In The Clouds! A Myth Or Reality?}}, author = {Sanidhya Kashyap and Changwoo Min and Taesoo Kim}, booktitle = {Proceedings of the 6th Asia-Pacific Workshop on Systems (APSys)}, month = jul, year = 2015, address = {Tokyo, Japan}, }
Applications implement their own update protocols to ensure consistency of data on the file system. However, since current file systems provide only a preliminary ordering guarantee, notably fsync(), these update protocols become complex, slow, and error-prone.
We present a new file system, CFS, that supports a native interface for applications to maintain crash consistency of their data. Using CFS, applications can achieve crash consistency of data by declaring code regions that must operate atomically. By utilizing transactional flash storage (SSD/X-FTL), CFS implement a lightweight mechanism for crash consistency. Without using any heavyweight mechanisms based on redundant writes and ordering, CFS can atomically write multiple data pages and their relevant metadata to storage.
We made three technical contributions to develop a crash consistency interface with SSD/X-FTL in CFS: selective atomic propagation of dirty pages, in-memory metadata logging, and delayed deallocation. Our evaluation of five real-world applications shows that CFS-based applications significantly outperform ordering versions: 2-5x faster by reducing disk writes 1.9-4.1x and disk cache flushing 1.1-17.6x. Importantly, our porting effort is minimal: CFS requires 317 lines of modifications from 3.5 million lines of ported applications.
@inproceedings{min:cfs, title = {{Lightweight Application-Level Crash Consistency on Transactional Flash Storage}}, author = {Changwoo Min and Woon-Hak Kang and Taesoo Kim and Sang-Won Lee and Young Ik Eom}, booktitle = {Proceedings of the 2015 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2015, address = {Santa Clara, CA}, }
Cloud-based file synchronization services, such as Dropbox, are a worldwide resource for many millions of users. However, individual services often have tight resource limits, suffer from temporary outages or even shutdowns, and sometimes silently corrupt or leak user data.
We design, implement, and evaluate MetaSync, a secure and reliable file synchronization service that uses multiple cloud synchronization services as untrusted storage providers. To make MetaSync work correctly, we devise a novel variant of Paxos that provides efficient and consistent updates on top of the unmodified APIs exported by existing services. Our system automatically redistributes files upon reconfiguration of providers.
Our evaluation shows that MetaSync provides low update latency and high update throughput while being more trustworthy and available. MetaSync outperforms its underlying cloud services by 1.2-10x on three realistic workloads.
@inproceedings{han:metasync, title = {{MetaSync}: File Synchronization Across Multiple Untrusted Storage Services}, author = {Seungyeop Han and Haichen Shen and Taesoo Kim and Arvind Krishnamurthy and Thomas Anderson and David Wetherall}, booktitle = {Proceedings of the 2015 USENIX Annual Technical Conference (ATC)}, month = jul, year = 2015, address = {Santa Clara, CA}, }
Many system components and network applications are written in languages that are prone to memory corruption vulnerabilities. There have been countless cases where simple mistakes by developers resulted in memory corruption vulnerabilities and consequently security exploits. While there have been tremendous research efforts to mitigate these vulnerabilities, use-after-free still remains one of the most critical and popular attack vectors because existing proposals have not adequately addressed the challenging program analysis and runtime performance issues.
In this paper we present DangNull, a system that detects temporal memory safety violations --- in particular, use-after-free and double-free --- during runtime. DangNull relies on the key observation that the root cause of these violations is that pointers are not nullified after the target object is freed. Based on this observation, DangNull automatically traces the object's relationships via pointers and automatically nullifies all pointers when the target object is freed. DangNull offers several benefits. First, DangNull addresses the root cause of temporal memory safety violations. It does not rely on the side effects of violations, which can vary and may be masked by attacks. Thus, DangNull is effective against even the most sophisticated exploitation techniques. Second, DangNull checks object relationship information using runtime object range analysis on pointers, and thus is able to keep track of pointer semantics more robustly even in complex and large scale software. Lastly, DangNull does not require numerous explicit sanity checks on memory accesses because it can detect a violation with implicit exception handling, and thus its detection capabilities only incur moderate performance overhead.
@inproceedings{lee:dangnull, title = {Preventing Use-after-free with Dangling Pointers Nullification}, author = {Byoungyoung Lee and Chengyu Song and Yeongjin Jang and Tielei Wang and Taesoo Kim and Long Lu and Wenke Lee}, booktitle = {Proceedings of the 2015 Annual Network and Distributed System Security Symposium (NDSS)}, month = feb, year = 2015, address = {San Diego, CA}, }
2014
Rail is a framework for building web applications that can precisely identify inappropriately disclosed data after a vulnerability is discovered. To do so, Rail introduces retroactive disclosure auditing: re-running the application with previous inputs once the vulnerability is fixed to determine what data should have been disclosed. A key challenge for Rail is to reconcile state divergence between the original and replay executions, so that the differences between executions precisely correspond to inappropriately disclosed data. Rail provides application developers with APIs to address this challenge, by identifying sensitive data, assigning semantic names to non-deterministic inputs, and tracking dependencies.
Results from a prototype of Rail built on top of the Meteor framework show that Rail can quickly and precisely identify data disclosure from complex attacks, including programming bugs, administrative mistakes, and stolen passwords. Rail incurs up to 22% throughput overhead and 0.5 KB storage overhead per request. Porting three existing web applications required fewer than 25 lines of code changes per application.
@inproceedings{chen:rail, title = {Identifying information disclosure in web applications with retroactive auditing}, author = {Haogang Chen and Taesoo Kim and Xi Wang and Nickolai Zeldovich and M. Frans Kaashoek}, booktitle = {Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}, month = oct, year = 2014, address = {Broomfield, Colorado}, }
The primary goal of ASLR is to effectively randomize a program's memory layout so that adversaries cannot easily infer such information. As ASLR is a critical defense against exploitation, there have been tremendous efforts to evaluate the mechanism's security. To date, previous attacks that bypass ASLR have focused mostly on exploiting memory leak vulnerabilities, or abusing non-randomized data structures.
In this presentation, we leverage vulnerabilities introduced by performance-oriented software design to reveal new ways in which ASLR can be bypassed. In addition to describing how vulnerabilities originate from such designs, we will present real attacks that exploit them.
First, we analyze general hash table designs for various programming languages (JavaScript, Python, Ruby). To optimize object tracking for such languages, their interpreters may leak address information. Some hash table implementations directly store the address information in the table, whileothers permit inference of address information through repeated table scanning. We exhaustively examined several popular languages to see whether each of them has one or both of these problems, and present how they can be leveraged. As a concrete example, we demonstrate how address information can be leaked in the Safari web browser by simply running some JavaScript.
Second, we present an analysis of the Zygote process creation model, which is an Android operating system design for speeding up application launches. The results of our examination show that Zygote weakens ASLR because all applications are created with largely identical memory layouts. To highlight the severity of this issue, we demonstrate two different ASLR bypass attacks using real applications - Google Chrome and VLC Media Player.
@inproceedings{lee:breakaslr, title = {Abusing Performance Optimization Weaknesses to Bypass {ASLR}}, author = {Byoungyoung Lee and Yeongjin Jang and Tielei Wang and Chengyu Song and Long Lu and Taesoo Kim and Wenke Lee}, booktitle = {Black Hat USA Briefings (Black Hat USA)}, year = 2014, month = aug, address = {Las Vegas, NV}, }
There have been many research efforts to secure Android applications and the high-level system mechanisms. The low-level operating system designs have been overlooked partially due to the belief that security issues at this level are similar to those on Linux, which are well-studied. However, we identify that certain Android modifications are at odds with security and result in serious vulnerabilities that need to be addressed immediately.
In this paper, we analyze the Zygote process creation model, an Android operating system design for speeding up application launches. Zygote weakens Address Space Layout Randomization (ASLR) because all application processes are created with largely identical memory layouts.
We design both remote and local attacks capable of bypassing the weakened ASLR and executing return-oriented programming on Android. We demonstrate the attacks using real applications, such as the Chrome Browser and VLC Media Player. Further, we design and implement Morula, a secure replacement for Zygote. Morula introduces a small amount of code to the Android operating system and can be easily adopted by device vendors. Our evaluation shows that, compared to Zygote, Morula incurs a 13MB memory increase for each running application but allows each Android process to have an individually randomized memory layout and even a slightly shorter average launch time.
@inproceedings{lee:morula, title = {{From Zygote to Morula}: Fortifying weakened {ASLR} on {Android}}, author = {Byoungyoung Lee and Long Lu and Tielei Wang and Taesoo Kim and Wenke Lee}, booktitle = {Proceedings of the 35th IEEE Symposium on Security and Privacy (Oakland)}, month = may, year = 2014, address = {San Jose, CA}, }
2013
Recovering from attacks in an interconnected system is difficult, because an adversary that gains access to one part of the system may propagate to many others, and tracking down and recovering from such an attack requires significant manual effort. Web services are an important example of an interconnected system, as they are increasingly using protocols such as OAuth and REST APIs to integrate with one another. This paper presents Aire, an intrusion recovery system for such web services. Aire addresses several challenges, such as propagating repair across services when some servers may be unavailable, and providing appropriate consistency guarantees when not all servers have been repaired yet. Experimental results show that Aire can recover from four realistic attacks, including one modeled after a recent Facebook OAuth vulnerability; that porting existing applications to Aire requires little effort; and that Aire imposes a 19-30% CPU overhead and 6-9 KB/request storage cost for Askbot, an existing web application.
@inproceedings{chandra:aire, title = {Asynchronous Intrusion Recovery for Interconnected Web Services}, author = {Ramesh Chandra and Taesoo Kim and Nickolai Zeldovich}, booktitle = {Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP)}, month = nov, year = 2013, address = {Farmington, PA}, }
Because embedded interpreters offer flexibility and performance, they are becoming more prevalent, and can be found at nearly every level of the software stack. As one example, the Linux kernel defines languages to describe packet filtering rules and uses embedded interpreters to filter packets at run time. As another example, the RAR archive format allows embedding bytecode in compressed files to describe reversible transformations for decompression. This paper presents an analysis of common pitfalls in embedded interpreter implementations, which can lead to security vulnerabilities, and their impact. We hope that these results are useful both in augmenting existing embedded interpreters and in aiding developers in building new, more secure embedded interpreters.
@inproceedings{chen:vmsec, title = {Security Bugs in Embedded Interpreters}, author = {Haogang Chen and Cody Cutler and Taesoo Kim and Yandong Mao and Xi Wang and Nickolai Zeldovich and M. Frans Kaashoek}, booktitle = {Proceedings of the 4th Asia-Pacific Workshop on Systems (APSys)}, month = jul, year = 2013, address = {Singapore}, }
Tao is a system that optimizes the execution of unit tests in large software programs and reduces the programmer wait time from minutes to seconds. Tao is based on two key ideas: First, Tao focuses on efficiency, unlike past work that focused on avoiding false negatives. Tao implements simple and fast function-level dependency tracking that identifies tests to run on a code change; any false negatives missed by this dependency tracking are caught by running the entire test suite on a test server once the code change is committed. Second, to make it easy for programmers to adopt Tao, it incorporates the dependency information into the source code repository. This paper describes an early prototype of Tao and demonstrates that Tao can reduce unit test execution time in two large Python software projects by over 96% while incurring few false negatives.
@inproceedings{kim:tao, title = {Optimizing Unit Test Execution in Large Software Programs using Dependency Analysis}, author = {Taesoo Kim and Ramesh Chandra and Nickolai Zeldovich}, booktitle = {Proceedings of the 4th Asia-Pacific Workshop on Systems (APSys)}, month = jul, year = 2013, address = {Singapore}, }
Mbox is a lightweight sandboxing mechanism for non-root users in commodity OSes. Mbox's sandbox usage model executes a program in the sandbox and prevents the program from modifying the host filesystem by layering the sandbox filesystem on top of the host filesystem. At the end of program execution, the user can examine changes in the sandbox filesystem and selectively commit them back to the host filesystem. Mbox implements this by interposing on system calls and provides a variety of useful applications: installing system packages as a non-root user, running unknown binaries safely without network accesses, checkpointing the host filesystem instantly, and setting up a virtual development environment without special tools. Our performance evaluation shows that Mbox imposes CPU overheads of 0.1–45.2% for various workloads. In this paper, we present Mbox's design, efficient techniques for interposing on system calls, our experience avoiding common system call interposition pitfalls, and Mbox's performance evaluation.
@inproceedings{kim:mbox, title = {Practical and Effective Sandboxing for Non-root Users}, author = {Taesoo Kim and Nickolai Zeldovich}, booktitle = {Proceedings of the 2013 USENIX Annual Technical Conference (ATC)}, month = jun, year = 2013, address = {San Jose, CA}, }
2012
Poirot is a system that, given a patch for a newly discovered security vulnerability in a web application, helps administrators detect past intrusions that exploited the vulnerability. Poirot records all requests to the server during normal operation, and given a patch, re-executes requests using both patched and unpatched software, and reports to the administrator any request that executes differently in the two cases. A key challenge with this approach is the cost of re-executing all requests, and Poirot introduces several techniques to reduce the time required to audit past requests, including filtering requests based on their control flow and memoization of intermediate results across different requests.
A prototype of Poirot for PHP accurately detects attacks on older versions of MediaWiki and HotCRP, given subsequently released patches. Poirot's techniques allow it to audit past requests 12-51 times; faster than the time it took to originally execute the same requests, for patches to code executed by every request, under a realistic MediaWiki workload.
@inproceedings{kim:poirot, title = {Efficient Patch-based Auditing for Web Application Vulnerabilities}, author = {Taesoo Kim and Ramesh Chandra and Nickolai Zeldovich}, booktitle = {Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}, month = oct, year = 2012, address = {Hollywood, CA}, }
Cloud services are rapidly gaining adoption due to the promises of cost efficiency, availability, and on-demand scaling. To achieve these promises, cloud providers share physical resources to support multi-tenancy of cloud platforms. However, the possibility of sharing the same hardware with potential attackers makes users reluctant to off-load sensitive data into the cloud. Worse yet, researchers have demonstrated side-channel attacks via shared memory caches to break full encryption keys of AES, DES, and RSA.
We present StealthMem, a system-level protection mechanism against cache-based side-channel attacks in the cloud. StealthMem manages a set of locked cache lines per core, which are never evicted from the cache, and efficiently multiplexes them so that each VM can load its own sensitive data into the locked cache lines. Thus, any VM can hide memory access patterns on confidential data from other VMs. Unlike existing state-of-the-art mitigation methods, StealthMem works with existing commodity hardware and does not require profound changes to application software. We also present a novel idea and prototype for isolating cache lines while fully utilizing memory by exploiting architectural properties of set-associative caches. StealthMem imposes 1.05% of performance overhead on the SPEC 2006 CPU benchmark, and 3.4-5.8% overhead on secured AES, DES and Blowfish, requiring only 3-34 lines of code changes from the original implementations.
@inproceedings{kim:stealthmem, title = {{StealthMem}: System-Level Protection Against Cache-based Side Channel Attacks in the Cloud}, author = {Taesoo Kim and Marcus Peinado and Gloria Mainar-Ruiz}, booktitle = {Proceedings of the 21st USENIX Security Symposium (Security)}, month = aug, year = 2012, address = {Bellevue, WA}, }
Dare is a system that recovers system integrity after intrusions that spread between machines in a distributed system. Dare extends the rollback-and-reexecute recovery model of Retro to distributed system recovery by solving the following challenges: tracking dependencies across machines, repairing network connections, minimizing distributed repair, and dealing with long-running daemon processes. This paper describes an early prototype of Dare, presents some preliminary results, and discusses open problems.
@inproceedings{kim:dare, title = {Recovering from Intrusions in Distributed Systems with {Dare}}, author = {Taesoo Kim and Ramesh Chandra and Nickolai Zeldovich}, booktitle = {Proceedings of the 3rd Asia-Pacific Workshop on Systems (APSys)}, month = jul, year = 2012, address = {Seoul, South Korea}, }
2011
Warp is a system that helps users and administrators of web applications recover from intrusions such as SQL injection, cross-site scripting, and clickjacking attacks, while preserving legitimate user changes. Warp repairs from an intrusion by rolling back parts of the database to a version before the attack, and replaying subsequent legitimate actions. Warp allows administrators to retroactively patch security vulnerabilities - i.e., apply new security patches to past executions - to recover from intrusions without requiring the administrator to track down or even detect attacks. Warp's time-travel database allows fine-grained rollback of database rows, and enables repair to proceed concurrently with normal operation of a web application. Finally, Warp captures and replays user input at the level of a browser's DOM, to recover from attacks that involve a user's browser. For a web server running MediaWiki, Warp requires no application source code changes to recover from a range of common web application vulnerabilities with minimal user input at a cost of 24-27% in throughput and 2-3.2 GB/day in storage.
@inproceedings{chandra:warp, title = {Intrusion Recovery for Database-backed Web Applications}, author = {Ramesh Chandra and Taesoo Kim and Meelap Shah and Neha Narula and Nickolai Zeldovich}, booktitle = {Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP)}, month = oct, year = 2011, address = {Cascais, Portugal}, }
2010
Retro repairs a desktop or server after an adversary compromises it, by undoing the adversary's changes while preserving legitimate user actions, with minimal user involvement. During normal operation, Retro records an action history graph, which is a detailed dependency graph describing the system's execution. Retro uses refinement to describe graph objects and actions at multiple levels of abstraction, which allows for precise dependencies. During repair, Retro uses the action history graph to undo an unwanted action and its indirect effects by first rolling back its direct effects, and then re-executing legitimate actions that were influenced by that change. To minimize user involvement and re-execution, Retro uses predicates to selectively re-execute only actions that were semantically affected by the adversary's changes, and uses compensating actions to handle external effects.
An evaluation of a prototype of Retro for Linux with 2 real-world attacks, 2 synthesized challenge attacks, and 6 attacks from previous work, shows that Retro can often repair the system without user involvement, and avoids false positives and negatives from previous solutions. These benefits come at the cost of 35-127% in execution time overhead and of 4-150 GB of log space per day, depending on the workload. For example, a HotCRP paper submission web site incurs 35% slowdown and generates 4 GB of logs per day under the workload from 30 minutes prior to the SOSP 2007 deadline.
@inproceedings{kim:retro, title = {Intrusion recovery using selective re-execution}, author = {Taesoo Kim and Xi Wang and Nickolai Zeldovich and M. Frans Kaashoek}, booktitle = {Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}, month = oct, year = 2010, address = {Vancouver, Canada}, }
UserFS provides egalitarian OS protection mechanisms in Linux. UserFS allows any user—not just the system administrator—to allocate Unix user IDs, to use chroot, and to set up firewall rules in order to confine untrusted code. One key idea in UserFS is representing user IDs as files in a /proc-like file system, thus allowing applications to manage user IDs like any other files, by setting permissions and passing file descriptors over Unix domain sockets. UserFS addresses several challenges in making user IDs egalitarian, including accountability, resource allocation, persistence, and UID reuse. We have ported several applications to take advantage of UserFS; by changing just tens to hundreds of lines of code, we prevented attackers from exploiting application-level vulnerabilities, such as code injection or missing ACL checks in a PHP-based wiki application. Implementing UserFS requires minimal changes to the Linux kernel-a single 3,000-line kernel module-and incurs no performance overhead for most operations, making it practical to deploy on real systems.
@inproceedings{kim:userfs, title = {Making {Linux} Protection Mechanisms Egalitarian with {UserFS}}, author = {Taesoo Kim and Nickolai Zeldovich}, booktitle = {Proceedings of the 19th USENIX Security Symposium (Security)}, month = aug, year = 2010, address = {Washington, DC}, }