Skip to content

Find code duplication, standard libraries and inlined functions #2

@7i

Description

@7i

To find reused code, like in-lined functions or partially duplicated functions, we could normalize basic blocks and save an anonymous representation of it so we later can identify matching basic blocks in a control flow graph. This representation can be a list of instruction names eg. "add phi div icmp"

First we could create a data-dependence graph (find all instructions that depend on other instructions) to find possible orderings inside a basic block.
We also have to identify SubBlock orderings too (using the term "SubBlock" with lack of a better word to refer to a number of instructions that have a dependency chain connecting them).
Perhaps we can use something from current instruction scheduling algorithms?

Example of SubBlock's:
basic block:

1,2,3,4,5,6,7,8,9 // 1-9 represent 9 instructions

Found dependencies:

[3 [1][2]    ]	// to run 3 we need to have executed 1 and 2 before
[5 [3][4]    ]  // to run 5 we need to have executed 3 and 4 before 
[9 [6][7][8] ]  // to run 9 we need to have executed 6, 7 and 8 before 

Note that 1 and 2 can be executed in any order, same is true for:
3 and 4
6, 7 and 8
but also the SubBlock 1 to 5 can be executed before or after the SubBlock 6 to 9

To normalize this we follow predefined rules.
Example:

  • SubBlock with least number of instructions first.
  • If two SubBlocks have same number of instructions take alphabetical name of first non-matching instruction 'a' first and 'z' last
  • Instructions alphabetical name 'a' first and 'z' last

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions