DMS Software Reengineering Toolkit
Encyclopedia
The DMS Software Reengineering Toolkit is a proprietary set of program transformation
Program transformation
A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs...

 tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems.

DMS has been used to implement a wide variety of practical tools, include domain-specific languages (such as code generation for factory control), test coverage and profiling tools, clone detection
Duplicate code
Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons...

, language migration tools, and C++ component reengineering.

The toolkit provides means for defining language grammars and will produce parsers which automatically construct abstract syntax trees (ASTs), and prettyprinters to convert original or modified ASTs back into compilable source text. The parse trees capture, and the prettyprinters regenerate, complete detail about the original source program, including source position, comments, radix and format of numbers, etc., to ensure that regenerated source text is as recognizable to a programmer as the original text modulo any applied transformations.

Many program analysis and transformation tools are limited to ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 or Western European character sets such as ISO-8859; DMS can handle these as well as UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

, UTF-16, EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

, Shift-JIS
Shift-JIS
Shift JIS is a character encoding for the Japanese language originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1...

 and a variety of Microsoft character encodings.

DMS uses GLR
GLR parser
A GLR parser is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. First described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser"...

 parsing technology, enabling it to handle all practical context-free grammars. Semantic predicates extend this capability to interesting non-context-free grammars (Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 requires matching of multiple DO loops with shared CONTINUE statements by label; GLR with semantic predicates enables the DMS Fortran parser to produce ASTs for correctly nested loops as it parses).

DMS provides attribute grammar
Attribute grammar
An attribute grammar is a formal way to define attributes for the productions of a formal grammar, associating these attributes to values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler....

 evaluators for computing custom analyses over ASTs, such as metrics, and including special support for symbol table
Symbol table
In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and...

 construction. Other program facts can be extracted by built-in control- and data- flow analysis engines, local and global pointer analysis
Pointer analysis
In computer science pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables or storage locations. It is often a component of more complex analyses such as escape analysis...

, whole-program call graph
Call graph
A call graph is a directed graph that represents calling relationships between subroutines in a computer program. Specifically, each node represents a procedure and each edge indicates that procedure f calls procedure g...

 extraction, and symbolic range analysis by abstract interpretation
Abstract interpretation
In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer program which gains information about its semantics In...

.

Changes to ASTs can be accomplished by both procedural methods coded in PARLANSE and source-to-source tree transformations coded as rewrite rules using surface-syntax conditioned by any extracted program facts. The rewrite rule engine handles associative and commutative rules. A rewrite rule for C to replace a complex condition by the ?: operator be written as:

rule simplify_conditional_assignment(v:left_hand_side,e1:expression,e2:expression)
:statement->statement
= " if (\e1) \v=\e2; else \v=e3; "
-> " \v=\e1:?\e2:\e3; "
if no_side_effects(v);

Rewrite rules have names, e.g. simplify_conditional_assignment. Each rule has a "match this" and "replace by that" pattern pair separated by ->, in our example, on separate lines for readability. The patterns must correspond to language syntax categories; in this case, both patterns must be of syntax category statement also separated in sympathy with the patterns by ->. Target language (e.g., C) surface syntax is coded inside meta-quotes ", to separate rewrite-rule syntax from that of the target language. Backslashes inside meta-quotes represent domain escapes, to indicate pattern meta variables (e.g., \v, \e1, \e2) that match any language construct corresponding to the metavariable declaration in the signature line, e.g., e1 must be of syntactic category: (any) expression. If a metavariable is mentioned multiple times in the match pattern, it must match to identical subtrees; the same identically shaped v must occur in both assignments in the match pattern in this example. Metavariables in the replace pattern are replaced by the corresponding matches from the left side. A conditional clause if provides an additional condition that must be met for the rule to apply, e.g., that the matched metavariable v, being an arbitrary left-hand side, must not have a side effect (e.g., cannot be of the form of a[i++]; the no_side_effects predicate is defined by an analyzer built with other DMS mechanisms).

Achieving a complex transformation on code is accomplished by providing a number of rules that cooperate to achieve the desired effect. The ruleset is focused on portions of the program by metaprograms coded in PARLANSE.

A complete example of a language definition and source-to-source transformation rules defined and applied is shown using high school algebra
Algebra
Algebra is the branch of mathematics concerning the study of the rules of operations and relations, and the constructions and concepts arising from them, including terms, polynomials, equations and algebraic structures...

 and a bit of calculus
Calculus
Calculus is a branch of mathematics focused on limits, functions, derivatives, integrals, and infinite series. This subject constitutes a major part of modern mathematics education. It has two major branches, differential calculus and integral calculus, which are related by the fundamental theorem...

 as a domain-specific language.

DMS has a variety of predefined language front ends, covering most real dialects of C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 including C++0x
C++0x
C++11, also formerly known as C++0x, is the name of the most recent iteration of the C++ programming language, replacing C++03, approved by the ISO as of 12 August 2011...

, C#, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

, EGL, Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

, COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....

, Visual Basic
Visual Basic
Visual Basic is the third-generation event-driven programming language and integrated development environment from Microsoft for its COM programming model...

, Verilog
Verilog
In the semiconductor and electronic design industry, Verilog is a hardware description language used to model electronic systems. Verilog HDL, not to be confused with VHDL , is most commonly used in the design, verification, and implementation of digital logic chips at the register-transfer level...

, VHDL and some 20 or more other languages. Predefined languages enable customizers to immediately focus on their reengineering task rather than on the details of the languages to be processed.

DMS is additionally unusual in being implemented in a parallel programming language, PARLANSE, that uses symmetric multiprocessors available on commodity workstations. This enables DMS to provide faster answers for large system analyses and conversions.

DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems.

(DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK