Static Analysis


A typed slicing compilation of the polymorphic RPC calculus

Kwanghoon Choi, James Cheney, Sam Lindley, Bob Reynders, PPDP, 6 September 2021.

The polymorphic RPC calculus allows programmers to write succinct multitier programs using polymorphic location constructs. However, until now it lacked an implementation. We develop an experimental programming language based on the polymorphic RPC calculus. We introduce a polymorphic Client-Server (CS) calculus with the client and server parts separated. In contrast to existing untyped CS calculi, our calculus is not only able to resolve polymorphic locations statically, but it is also able to do so dynamically. We design a type-based slicing compilation of the polymorphic RPC calculus into this CS calculus, proving type and semantic correctness. We propose a method to erase types unnecessary for execution but retaining locations at runtime by translating the polymorphic CS calculus into an untyped CS calculus, proving semantic correctness.

Available in: DOI, ArXiv


A Polymorphic RPC Calculus

Kwanghoon Choi, James Cheney, Simon Fowler, and Sam Lindley, Science of Computer Programming, Vol.197(102499), October 2020.

The RPC calculus is a simple semantic foundation for multi-tier programming languages such as Links in which located functions can be written for the client-server model. Subsequently, the typed RPC calculus is designed to capture the location information of functions by types and to drive location type-directed slicing compilations. However, the use of locations is currently limited to monomorphic ones, which is one of the gaps to overcome to put into practice the theory of RPC calculi for client-server model. This paper proposes a polymorphic RPC calculus to allow programmers to write succinct multi-tier programs using polymorphic location constructs. Then the polymorphic multi-tier programs can be automatically translated into programs only containing location constants amenable to the existing slicing compilation methods. We formulate a type system for the polymorphic RPC calculus, and prove its type soundness. Also, we design a monomorphization translation together with proofs on its type and semantic correctness for the translation.

Available in: LINK arXiv


A theory of RPC calculi for client-server model

Kwanghoon Choi, Byeong-Mo Chang, Journal of Functional Programming (JFP), Vol.29, pp.1-39, Cambridge University Press, March 2019.

With multi-tier programming languages, programmers can specify the locations of code to run in order to reduce development efforts for the web-based client-server model where programmers write client and server programs separately and test the multiple programs together. The RPC calculus, one of the foundations of those languages by Cooper and Wadler, has the feature of symmetric communication in programmer's writing arbitrarily deep nested client-server interactions. However, the existing research only considers dynamically typed locations. We propose a typed RPC calculus where locations are tracked in type-level. A new located type system paves the way for a theory of RPC calculi for the client-server model.
(In the following papers published in SCP2020 and PPDP2021, the typed RPC calculus will be enhanced with polymorphic locations and a type-based slicing compilation.)

Available in: ArXiv, LINK


자체수정 코드를 탐지하는 정적 분석 방법의 LLVM 프레임워크 기반 구현 및 실험 (An LLVM-Based Implementation of Static Analysis for Detecting Self-Modifying Code and Its Evaluation)

유재일, 최광훈, 한국정보보호학회 논문지, 32권, 2호, Pages 171-179, 2022년 4월.

자체 수정 코드(Self-Modifying-Code)란 실행 시간 동안 스스로 실행 코드를 변경하는 코드를 말한다. 이런 기법은 특히 악성코드가 정적 분석을 우회하는 데 악용된다. 따라서 이러한 악성코드를 효과적으로 검출하려면 자체 수정 코드를 파악하는 것이 중요하다. 그동안 동적 분석 방법으로 자체 수정 코드를 분석해왔으나 이는 시간과 비용이 많이 든다. 만약 정적 분석으로 자체 수정 코드를 검출할 수 있다면 악성코드 분석에 큰 도움이 될 것이다. 본 논문에서는 LLVM IR로 변환한 바이너리 실행 프로그램을 대상으로 자체 수정 코드를 탐지하는 정적 분석방법을 제안하고, 자체 수정 코드 벤치마크를 만들어 이 방법을 적용했다. 본 논문의 실험 결과 벤치마크 프로그램을 컴파일로 변환한 최적화된 형태의 LLVM IR 프로그램에 대해서는 설계한 정적 분석 방법이 효과적이었다. 하지만 바이너리를 리프팅 변환한 비정형화된 LLVM IR 프로그램에 대해서는 자체 수정 코드를 검출하기 어려운 한계가 있었다. 이를 극복하기 위해 바이너리를 리프팅 하는 효과적인 방법이 필요하다.

Self-Modifying-Code is a code that changes the code by itself during execution time. This technique is particularly abused by malicious code to bypass static analysis. Therefor, in order to effectively detect such malicious codes, it is important to identify self-modifying-codes. In the meantime, Self-modify-codes have been analyzed using dynamic analysis methods, but this is time-consuming and costly. If static analysis can detect self-modifying-code it will be of great help to malicious code analysis. In this paper, we propose a static analysis method to detect self-modified code for binary executable programs converted to LLVM IR and apply this method by making a self-modifying-code benchmark. As a result of the experiment in this paper, the designed static analysis method was effective for the standardized LLVM IR program that was compiled and converted to the benchmark program. However, there was a limitation in that it was difficult to detect the self-modifying-code for the unstructured LLVM IR program in which the binary was lifted and transformed. To overcome this, we need an effective way to lift the binary code.

Available in: LINK


A GQM Approach to Evaluation of the Quality of SmartThings Applications Using Static Analysis

Byeong-Mo Chang, Janine Cassandra Son, Kwanghoon Choi KSII Transactions on Internet and Information Systems, 14(6), June 2020.

SmartThings is one of the most popular open platforms for home automation IoT solutions that allows users to create their own applications called SmartApps for personal use or for public distribution. The nature of openness demands high standards on the quality of SmartApps, but there have been few studies that have evaluated this thoroughly yet. As part of software quality practice, code reviews are responsible for detecting violations of coding standards and ensuring that best practices are followed. The purpose of this research is to propose systematically designed quality metrics under the well-known Goal/Question/Metric methodology and to evaluate the quality of SmartApps through automatic code reviews using a static analysis. We first organize our static analysis rules by following the GQM methodology, and then we apply the rules to real-world SmartApps to analyze and evaluate them. A study of 105 officially published and 74 community-created real-world SmartApps found a high ratio of violations in both types of SmartApps, and of all violations, security violations were most common. Our static analysis tool can effectively inspect reliability, maintainability, and security violations. The results of the automatic code review indicate the common violations among SmartApps.

Available in: PDF


유해사이트를 접속하는 안드로이드 앱을 문자열 분석으로 검사하는 시스템

최광훈, 고광만, 박희완, 윤종희, 정보처리학회 논문지A,Vol.19-A, No.4, pp187-194, August 2012. 문현아,박수용,최광훈, 정보과학회 컴퓨팅의 실제 논문지(KTCP), 2018년2월.

안드로이드 기반 스마트폰 앱의 바이너리 코드를 오프라인 상에서 분석하여 유해 사이트 목록에 포함된 서버에 접속하는지 여부를판단하는 시스템을 제안하고, 실제 앱에 대해 적용한 실험 결과를 제시한다. 주어진 앱의 바이너리 코드를 Java 바이트 코드로 역 컴파일하고, 문자열분석을 적용하여 프로그램에서 사용하는 모든 문자열 집합을 계산한 다음, 유해 매체물을 제공하는 사이트 URL을 포함하는지 확인하는방법이다. 이 시스템은 앱을 실행하지 않고 배포 단계에서 검사할 수 있고 앱 마켓 관리에서 유해 사이트를 접속하는 앱을 분류하는 작업을자동화할 수 있는 장점이 있다. DNS 서버를 이용하거나 스마트폰에 모니터링 모듈을 설치하여 차단하는 기존 방법들과 서로 다른 단계에서유해앱을 차단함으로써 상호 보완할 수 있는 방법이 될 수 있다.

This paper proposes a string analysis based system for classifying Android Apps that may access so called harmful sites, and shows an experiment result for real Android apps on the market. The system first transforms Android App binary codes into Java byte codes, it performs string analysis to compute a set of strings at all program points, and it classifies the Android App as bad ones if the computed set contains URLs that are classified because the sites provide inappropriate contents. In the proposed approach, the system performs such a classification in the stage of distribution before installing and executing the Apps. Furthermore, the system is suitable for the automatic management of Android Apps in the market. The proposed system can be combined with the existing methods using DNS servers or monitoring modules to identify harmful Android apps better in different stages.

Available in: PDF


A Type and Effect System for Activation Flow of Components in Android Programs

Kwanghoon Choi, Byeong-Mo Chang, Information Processing Letters, 114(11):620-627, November 2014. (PDF)

This paper proposes a type and effect system for analyzing activation flow between components through intents in Android programs. The activation flow information is necessary for all Android analyses such as a secure information flow analysis for Android programs. We first design a formal semantics for a core of featherweight Android/Java, which can address interaction between components through intents. Based on the formal semantics, we design a type and effect system for analyzing activation flow between components and demonstrate the soundness of the system.

Available in: PDF


Typing Zinc Machine with Generalized Algebraic Data Types

Kwanghoon Choi and Seog Park, IEICE Transactions on Information and Systems, Vol.E94-D, No.6, pp.1190-1200, June 2011. (PDF)

The Krivine-style evaluation mechanism is well-known in the implementation of higher-order functions, allowing to avoid some useless closure building. There have been a few type systems that can verify the safety of the mechanism. The incorporation of the proposed ideas into an existing compiler, however, would require significant changes in the type system of the compiler due to the use of some dedicated form of types and typing rules in the proposals. This limitation motivates us to propose an alternative light-weight Krivine typing mechanism that does not need to extend any existing type system significantly. This paper shows how GADTs (Generalized algebraic data types) can be used for typing a ZINC machine following the Krivine-style evaluation mechanism. This idea is new as far as we know. Some existing typed compilers like GHC (Glasgow Haskell compiler) already support GADTs; they can benefit from the Krivine-style evaluation mechanism in the operational semantics with no particular extension in their type systems for the safety. We show the GHC type checker allows to prove mechanically that ZINC instructions are well-typed, which highlights the effectiveness of GADTs.

Available in: PDF


A Type System for the Push-Enter Model

Kwanghoon Choi and Taisook Han, Information Processing Letters, 87(4):205-211, August 31, 2003 (PDF)

In this paper, we first develop a simple compilation method based on the push-enter model. The compilation method uses its own states to indicate dynamic argument status. We design generic types for the states, and we develop a type system on the basis of these generic types. We then demonstrate how the compilation method is done within our type system.

Available in: PDF