-
Notifications
You must be signed in to change notification settings - Fork 2
Description
To find reused code, like in-lined functions or partially duplicated functions, we could normalize basic blocks and save an anonymous representation of it so we later can identify matching basic blocks in a control flow graph. This representation can be a list of instruction names eg. "add phi div icmp"
First we could create a data-dependence graph (find all instructions that depend on other instructions) to find possible orderings inside a basic block.
We also have to identify SubBlock orderings too (using the term "SubBlock" with lack of a better word to refer to a number of instructions that have a dependency chain connecting them).
Perhaps we can use something from current instruction scheduling algorithms?
Example of SubBlock's:
basic block:
1,2,3,4,5,6,7,8,9 // 1-9 represent 9 instructions
Found dependencies:
[3 [1][2] ] // to run 3 we need to have executed 1 and 2 before
[5 [3][4] ] // to run 5 we need to have executed 3 and 4 before
[9 [6][7][8] ] // to run 9 we need to have executed 6, 7 and 8 before
Note that 1 and 2 can be executed in any order, same is true for:
3 and 4
6, 7 and 8
but also the SubBlock 1 to 5 can be executed before or after the SubBlock 6 to 9
To normalize this we follow predefined rules.
Example:
- SubBlock with least number of instructions first.
- If two SubBlocks have same number of instructions take alphabetical name of first non-matching instruction 'a' first and 'z' last
- Instructions alphabetical name 'a' first and 'z' last