Georgia Tech at ACM CSS 2017


Atlanta    Oct. 30, 2017

Cybersecurity researchers from the Georgia Institute of Technology brought more discoveries than any other organization worldwide to the peer-reviewed Association of Computer Machinery's Computer and Communications Security (ACM CCS) 2017 conference held in Dallas, Oct. 30 - Nov. 2.

Out of 836 cybersecurity research papers submitted by universities and technology giants such as Microsoft and Google, just 18 percent were accepted into the conference. Georgia Tech led them all with eight papers, matched only by Cornell University and Cornell Tech when combined.

Among the research findings from Georgia Tech are: an in-depth analysis of abusive "combosquatting" -- the intentionally misleading domain names that lure web users onto malicious sites; a tool to help mobile app developers quickly identify software license violations; new fuzzing techniques, and a new technology that provides forensic investigators with a detailed record of an intrusion, even if attackers attempted to cover their tracks.

Nearly 200 organizations and 900 attendees gathered for ACM CCS 2017 -- the flagship annual conference of the Special Interest Group on Security, Audit and Control (SIGSAC) of ACM.

Search Georgia Tech research by keyword:
Attack Investigation  |  DDoS  | Domain Abuse  |  Fuzzing  |  Machine Learning  |  Mobile Apps  |  Ransomware  |  Vulnerability Detection


"Designing New Operating Primitives to Improve Fuzzing Performance"

in collaboration with Virginia Tech
Wen Xu, and Sanidhya Kashyap (Georgia Tech); Changwoo Min (Virginia Tech); Taesoo Kim (Georgia Tech)

Fuzzing is a software testing technique that finds bugs by repeatedly injecting mutated inputs to a target program. Known to be a highly practical approach, fuzzing is gaining more popularity than ever before. Current research on fuzzing has focused on producing an input that is more likely to trigger a vulnerability.

In this paper, we tackle another way to improve the performance of fuzzing, which is to shorten the execution time of each iteration. We observe that AFL, a state-of-the-art fuzzer, slows down by 24× because of file system contention and the scalability of fork() system call when it runs on 120 cores in parallel. Other fuzzers are expected to suffer from the same scalability bottlenecks in that they follow a similar design pattern. To improve the fuzzing performance, we design and implement three new operating primitives specialized for fuzzing that solve these performance bottlenecks and achieve scalable performance on multi-core machines. Our experiment shows that the proposed primitives speed up AFL and LibFuzzer by 6.1 to 28.9× and 1.1 to 735.7×, respectively, on the overall number of executions per second when targeting Google’s fuzzer test suite with 120 cores. In addition, the primitives improve AFL’s throughput up to 7.7× with 30 cores, which is a more common setting in data centers. Our fuzzer-agnostic primitives can be easily applied to any fuzzer with fundamental performance improvement and directly benefit large-scale fuzzing and cloud-based fuzzing services. 

"FlashGuard: Leveraging Intrinsic Flash Properties to Defend Against Encryption Ransomware"

in collaboration with Pennsylvania State University
Jian Huang (Georgia Tech); Jun Xu, Xinyu Xing, and Peng Liu  (Pennsylvania State University); Moinuddin K. Qureshi (Georgia Tech)

Encryption ransomware is a malicious software that stealthily encrypts user files and demands a ransom to provide access to these files. Several prior studies have developed systems to detect ransomware by monitoring the activities that typically occur during a ransomware attack. Unfortunately, by the time the ransomware is detected, some files already undergo encryption and the user is still required to pay a ransom to access those files. Furthermore, ransomware variants can obtain kernel privilege, which allows them to terminate software-based defense systems, such as anti-virus. While periodic backups have been explored as a means to mitigate ransomware, such backups incur storage overheads and are still vulnerable as ransomware can obtain kernel privilege to stop or destroy backups. Ideally, we would like to defend against ransomware without relying on software-based solutions and without incurring the storage overheads of backups.

To that end, this paper proposes FlashGuard, a ransomware-tolerant Solid State Drive (SSD) which has a firmware-level recovery system that allows quick and effective recovery from encryption ransomware without relying on explicit backups. FlashGuard leverages the observation that the existing SSD already performs out-of-place writes in order to mitigate the long erase latency of flash memories. Therefore, when a page is updated or deleted, the older copy of that page is anyway present in the SSD. FlashGuard slightly modifies the garbage collection mechanism of the SSD to retain the copies of the data encrypted by ransomware and ensure effective data recovery. Our experiments with 1,447 manually labeled ransomware samples show that FlashGuard can efficiently restore files encrypted by ransomware. In addition, we demonstrate that FlashGuard has a negligible impact on the performance and lifetime of the SSD. 

"Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse"

in collaboration with Stony Brook University and London South Bank University
Panagiotis Kintis (Georgia Tech); Najmeh Miramirkhani (Stony Brook University); Charles LeverYizheng Chen, and Roza Romero-GoĢmez (Georgia Tech); Nikolaos Pitropakis (London South Bank University); Nick Nikiforakis (Stony Brook University); Manos Antonakakis (Georgia Tech)

Domain squatting is a common adversarial practice where attackers register domain names that are purposefully similar to popular domains. In this work, we study a specific type of domain squatting called “combosquatting,” in which attackers register domains that combine a popular trademark with one or more phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first large-scale, empirical study of combosquatting by analyzing more than 468 billion DNS records—collected from passive and active DNS data sources over almost six years. We find that almost 60% of abusive combosquatting domains live for more than 1,000 days, and even worse, we observe increased activity associated with combosquatting year over year. Moreover, we show that combosquatting is used to perform a spectrum of dfferent types of abuse including phishing, social engineering, affiliate abuse, trademark abuse, and even advanced persistent threats. Our results suggest that combosquatting is a real problem that requires increased scrutiny by the security community. 

"Identifying Open-Source License Violation and 1-day Security Risk at Large Scale"

Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee (Georgia Tech)

With millions of apps available to users, the mobile app market is rapidly becoming very crowded. Given the intense competition, the time to market is a critical factor for the success and profitability of an app. In order to shorten the development cycle, developers often focus their efforts on the unique features and workflows of their apps and rely on third-party Open Source Software (OSS) for the common features. Unfortunately, despite their benefits, careless use of OSS can introduce significant legal and security risks, which if ignored can not only jeopardize security and privacy of end users, but can also cause app developers high financial loss. However, tracking OSS components, their versions, and interdependencies can be very tedious and error-prone, particularly if an OSS is imported with little to no knowledge of its provenance.

We therefore propose OSSPolice, a scalable and fully-automated tool for mobile app developers to quickly analyze their apps and identify free software license violations as well as usage of known vulnerable versions of OSS. OSSPolice introduces a novel hierarchical indexing scheme to achieve both high scalability and accuracy, and is capable of efficiently comparing similarities of app binaries against a database of hundreds of thousands of OSS sources (billions of lines of code). We populated OSSPolice with 60K C/C++ and 77K Java OSS sources and analyzed 1.6M free Google Play Store apps. Our results show that 1) over 40K apps potentially violate GPL/AGPL licensing terms, and 2) over 100K of apps use known vulnerable versions of OSS. Further analysis shows that developers violate GPL/AGPL licensing terms due to lack of alternatives, and use vulnerable versions of OSS despite efforts from companies like Google to improve app security. OSSPolice is available on GitHub. 

"Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection"

in collaboration with Shanghai Jiao Tong University, University of California, Berkeley, Samsung Research America, and University of California, Riverside
Xiaojun Xu (Shanghai Jiao Tong University); Chang Liu (University of California, Berkeley); Qian Feng (Samsung Research America); Heng Yin (University of California, Riverside); Le Song (Georgia Tech); Dawn Song (University of California, Berkeley)

The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph-matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding (i.e., a numeric vector) based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art’s embedding generation time by three to four orders of magnitude and reduce the required training time from more than one week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art (i.e., Genius). Our research showcases a successful application of deep learning on computer security problems. 

"Practical Attacks Against Graph-based Clustering"

in collaboration with University of North Carolina at Chapel Hill, University of Georgia, and Symantec CAML Group
Yizheng Chen, Yacin Nadji, and Athanasios Kountouras (Georgia Tech); Fabian Monrose (University of North Carolina at Chapel Hill); Roberto Perdisci (University of Georgia); Manos Antonakakis (Georgia Tech); Nikolaos Vasiloglou (Symantec CAML Group)

Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.

"RAIN: Refinable Attack Investigation with On-demand Inter-Process Information Flow Tracking"

Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee (Georgia Tech)

As modern attacks become more stealthy and persistent, detecting or preventing them at their early stages becomes virtually impossible. Instead, an attack investigation or provenance system aims to continuously monitor and log interesting system events with minimal overhead. Later, if the system observes any anomalous behavior, it analyzes the log to identify who initiated the attack and which resources were affected by the attack and then assess and recover from any damage incurred. However, because of a fundamental tradeoff between log granularity and system performance, existing systems typically record system-call events without detailed program-level activities (e.g., memory operation) required for accurately reconstructing attack causality or demand that every monitored program be instrumented to provide program-level information. 

To address this issue, we propose RAIN, a Refinable Attack INvestigation system based on a record-replay technology that records system-call events during runtime and performs instruction-level dynamic information flow tracking (DIFT) during on-demand process replay. Instead of replaying every process with DIFT, RAIN conducts system-call-level reachability analysis to filter out unrelated processes and to minimize the number of processes to be replayed, making inter-process DIFT feasible. Evaluation results show that RAIN effectively prunes out unrelated processes and determines attack causality with negligible false positive rates. In addition, the runtime overhead of RAIN is simliar to existing system-call level provenance systems and its analysis overhead is much smaller than full-system DIFT.

"Tail Attacks on Web Applications"

in collaboration with Louisiana State University
Huasong Shan, Qingyang Wang, and Calton Pu (Georgia Tech)

As the extension of Distributed Denial-of-Service (DDoS) attacks to application layer in recent years, researchers pay much interest in these new variants due to a low-volume and intermittent pattern with a higher level of stealthiness, invaliding the state-of-the-art DDoS detection/defense mechanisms. We describe a new type of low-volume application layer DDoS attack–Tail Attacks on Web Applications. Such attack exploits a newly identified system vulnerability of n-tier web applications (millibottlenecks with sub-second duration and resource contention with strong dependencies among distributed nodes) with the goal of causing the long-tail latency problem of the target web application (e.g., 95th percentile response time > 1 second) and damaging the long-term business of the service provider, while all the system resources are far from saturation, making it difficult to trace the cause of performance degradation.

We present a modified queueing network model to analyze the impact of our attacks in n-tier architecture systems, and numerically solve the optimal attack parameters. We adopt a feedback control-theoretic (e.g., Kalman filter) framework that allows attackers to fit the dynamics of background requests or system state by dynamically adjusting attack parameters. To evaluate the practicality of such attacks, we conduct extensive validation through not only analytical, numerical, and simulation results but also real cloud production setting experiments via a representative benchmark website equipped with state-of-the-art DDoS defense tools. We further proposed a solution to detect and defense the proposed attacks, involving three stages: fine-grained monitoring, identifying bursts, and blocking bots.