Master Thesis
Automatic Refactorization.
My Master Thesis topic is about source code refactorization and its automatization. I’ll survey existing tools supporting source code analysis and aiding with introducing code improvements. I’ll also describe current state of research made in this field Variuos theoretical approaches will be outlined. The project part of it will include writing a tool for bad and good code detection and also automatically applying correction of bad code.
Why automatic?
Refactoring code may be a deadly boring task. Especially when applied to very bad legacy code. It takes a lot of time, thus introduces high costs. These costs are imminent since without refactoring existing code maintenance process might became impossible. To keep software reliable it has to be twiddled a lot. Although they require understanding of the design before and after changes, some transformations are quite repetitive. If we could automate them, it would lower the costs of maintenance and let developers focus more on functionality.
Automatic refactorization includes both bad/good coding patterns recognition and code transformations. Especially, I want to focus on bad-code-smells and design-patterns detection.
We can benefit from automatic refactorization these ways:
- Lower costs of code maintenance
- Quick recognition of common problems in code
- It helps to measure code quality
- Design Pattern detection helps to understand system design
- Helps with documenting the system
- Automatically fixes simple common problems, gotchas
- Can detect malformed implementations of complex abstract concepts like design-patterns
- Can suggest modifications to correct malformed patterns
Current state: tools
There is a set of open-source tools available on the network that provide some kinds of automatic refactorization :
- PMD, Checkstyle - these tools provide code metrics (like line-count per class) and help discover bad-code-smells and anti-patterns within code. These include for instance :
- Eating exceptions
- Interface without methods (should be an Enum)
- A class containing only static methods (should be a Singleton)
- Nested ternary (a?b:c) operators
- Switch statements without default case
- Not released resources (lack of X.close() )
- A final class with protected methods
- Too big classes, too big methods
- Google Singleton Detector - can find some instances of Singleton pattern by analyzing Java bytecode
- Eclipse Refactorings - can’t find problems, but aids with transformations through UI
- Jackpot - finds problems in code expressed as patterns but cannot modify code
Current state: research
There is some theoretical research on the subject. This is based on these mathematical tools:
- Formal Concept Analysis - especially representing inheritance hierarchies as lattice
- rho-Calculus
- Logic Metaprogramming - using Prolog for finding bad-code patterns in code
- Constraint-Satisfaction-Problem - bad-code patterns are represented as constraints and logic engines look for them
- Home-made Algorithms - these are “brute-force” and inflexible, prone to slight distortions in patterns
The project: goals
My Master Thesis goals include:
- Finding patterns in code
- Finding simple (exact) patterns in source code - bad-code-smells and simple anti-patterns, like PMD & Checkstyle do
- Finding complex (exact) patterns in source code, like design-patterns, that require more in-depth code analysis, looking at relationships between code constructs
- Finding complex but distorted patterns in source code, especially malformed design-patterns
- Applying transformations
- Applying user defined transformations for defined matches - e.g. finding where exceptions are eaten and inserting code to log them (or re-throw)
- Creating suggestions to correct distorted design-patterns - for instance finding a Singleton with non-private constructor and making it private
The project: concept design
There will be a repository of user defined Prolog rules. These rules will split up into two categories:
- Matching Rules - these will express bad-code-smells, anti-patterns, design-patterns
- Transformation Rules - these will express code transformations applied to code that is found with Matching Rules
The procedure for automatically refactoring code is as follows :
- The source code will be parsed into Java AST
- A Visitor traverses the AST and builds up Prolog Facts Repository about code.
- The user selects what code patterns is he interested in by choosing Matching Rules
- Prolog Inference Engine applies selected matching rules to the Facts already gathered and gives a list of all matched source code.
- The user selects what matched code he wants to refactor, then for each code snipped he selects Transformation rules that will be automatically applied.
- Prolog Inference Engine applies Transformation Rules to the AST of selected code snippets.
- AST is transformed back to the source code.
The project: technologies involved
As a Prolog Engine I will use one of these:
- TuProlog
- GNU Prolog for Java
For AST manipulation Eclipse JDT facilities will be used:
- org.eclipse.jdt.core.dom
- org.eclipse.jdt.core.dom.rewrite
JDT ASTRewrite introduces minimal transformations to the code, so that there will be no unnecessary modifications - for instance comments are preserved.
