Find code duplication, standard libraries and inlined functions

To find reused code, like in-lined functions or partially duplicated functions, we could normalize basic blocks and save an anonymous representation of it so we later can identify matching basic blocks in a control flow graph. This representation can be a list of instruction names eg. "add phi div icmp"

First we could create a data-dependence graph (find all instructions that depend on other instructions) to find possible orderings inside a basic block.
We also have to identify SubBlock orderings too (using the term "SubBlock" with lack of a better word to refer to a number of instructions that have a dependency chain connecting them). 
Perhaps we can use something from current instruction scheduling algorithms?

Example of SubBlock's:
basic block:
```
1,2,3,4,5,6,7,8,9 // 1-9 represent 9 instructions
```
Found dependencies:
```
[3 [1][2]    ]	// to run 3 we need to have executed 1 and 2 before
[5 [3][4]    ]  // to run 5 we need to have executed 3 and 4 before 
[9 [6][7][8] ]  // to run 9 we need to have executed 6, 7 and 8 before 
```
Note that 1 and 2 can be executed in any order, same is true for: 
3 and 4
6, 7 and 8
but also the SubBlock 1 to 5 can be executed before or after the SubBlock 6 to 9

To normalize this we follow predefined rules. 
Example:
- SubBlock with least number of instructions first.
- If two SubBlocks have same number of instructions take alphabetical name of first non-matching instruction 'a' first and 'z' last
- Instructions alphabetical name 'a' first and 'z' last

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Find code duplication, standard libraries and inlined functions #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Find code duplication, standard libraries and inlined functions #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions