This is the status report for June 18 - June 23
Accomplished this week:
This week I have continued with the Code Duplicated smell. I’m trying detect duplicated code reading IL instructions and checking against others. Is a simple approach and I can detect some code duplicate. I’m also impressed with Cecil, it’s a great library and I love the Visitor pattern.
But I’m not happy with the results, because detecting branches is difficult and I can’t detect all code ways.
By example: Suppose a simple if statement:
if (myList.Contains ("MoreFoo"))
myList.Remove ("MoreFoo");
The IL Code generated is the following:
IL_0000: ldarg.0
IL_0001: ldfld class [mscorlib]System.Collections.IList Test.Rules.Smells.ClassWithCodeDuplicated::myList
IL_0006: ldstr "MoreFoo"
IL_000b: callvirt instance bool class [mscorlib]System.Collections.IList::Contains(object)
IL_0010: brfalse IL_0025
IL_0015: ldarg.0
IL_0016: ldfld class [mscorlib]System.Collections.IList Test.Rules.Smells.ClassWithCodeDuplicated::myList
IL_001b: ldstr "MoreFoo"
IL_0020: callvirt instance void class [mscorlib]System.Collections.IList::Remove(object)
IL_0025: ret
I drop out the ldarg, ldfld, ldstr and others, and I get the information for compare Instruction Pairs. And If I write only:
myList.Contains ("MoreFoo");
myList.Remove ("MoreFoo");
The unique difference in the IL code was that I haven’t the branch instruction. I can detect simple duplications, but only detecting these ones won’t be useful.
Then I decided:
Plans for the next week.
Following the last lines, I have decided change the code representation, and I will use Control Flow Graphs for make some FlowAnalysis and obtain more information for detect code complete.
I can get from the CFG the blocks and check blocks. And better, I can try detect common subsets for each block. With this approach I believe that I can write a more complete analysis and detect the code duplication better and in a easier way.
Challenges or Problems.
Yes, a lot ! For the first, I have decide use the Cecil.FlowAnalysis assembly (placed in the cecil module in mono svn) for obtain the CFG, I will save a lot of work using this library. But I have a little trouble.
Yesterday afternoon, I started working on the CFG analysis, then I started building the graphs; and I get a surprise:
1) Test.Rules.Smells.DetectCodeDuplicatedInSameClassTest.TestClassWithCodeDuplicated : System.ArgumentException : Exception handlers are not supported.
Parameter name: body
at Cecil.FlowAnalysis.Impl.ControlFlow.FlowGraphBuilder..ctor (Mono.Cecil.MethodDefinition method) [0x00000]
at Cecil.FlowAnalysis.FlowGraphFactory.CreateControlFlowGraph (Mono.Cecil.MethodDefinition method) [0x00000]
at Gendarme.Rules.Smells.DetectCodeDuplicatedInSameClassRule.ContainsDuplicatedCode (Mono.Cecil.MethodDefinition currentMethod, Mono.Cecil.MethodDefinition targetMethod) [0x00000]
at Gendarme.Rules.Smells.DetectCodeDuplicatedInSameClassRule.CheckType (Mono.Cecil.TypeDefinition typeDefinition, Gendarme.Framework.Runner runner) [0x00000]
at Test.Rules.Smells.DetectCodeDuplicatedInSameClassTest.TestClassWithCodeDuplicated () [0x00000]
at <0x00000>
at (wrapper managed-to-native) System.Reflection.MonoMethod:InternalInvoke (object,object[])
at System.Reflection.MonoMethod.Invoke (System.Object obj, BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0×00000]
Ouch ! Then I believe that I have to write a patch for allow exception handlers and build the graph.
Interesting resources.
This week, I only have an interesting resource that hope helps me with this work, the dragon’s book
Finally thanks to Sebastien for solve my doubts. Also thanks to all mono people, every day I enjoy a bit more working with this great people.
I want also congratulate the Moonlight accomplishmet, I want try this technology!

2 comments ↓
First, have you checked out this book:http://www.amazon.com/gp/product/052182060X/002-4619524-6769607?ie=UTF8&tag=matthargettbl-20&linkCode=xm2&camp=1789&creativeASIN=052182060X
I found it very useful when I was doing my binary code analysis product.
Second, for finding duplicated code, I worked on a detector for this in my binary code analysis product. I checked for duplicated code in basic blocks by doing a hash on an abstraction of the block. I abstracted away details of load sources and store destinations, which helped find opportunities for method extraction and parameterization.
It worked very well on the analyzer’s code itself, with the addition of weighting based upon similar/same blocks.
One thing that would be very handy for projects would be identifying opportunities for extracting a template method. For instance, if the duplicated code is the same except for object creation *and* if the methods called on the object have similar signatures (parameter types, regardless of order).
This kind of detection would be a wonderful step beyond what ReSharper 3.0 and FxCop already gives, even though those don’t run on mono/Linux.
Hey Matt,
Yes, I also know the book too.
Yes, this is my main idea. In the first approach I filtered the IL and get only the call, callvirt and branches instructions. And I believe that I will do the same with this new approach.
Yes, would be very handy, and I will try to do it.
Okey, I will review the existing FxCorp rules and the ReSharper features.
Thanks for your comment !
Leave a Comment