⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 diffscript.vb

📁 an lcs algorithm for finding common string
💻 VB
📖 第 1 页 / 共 2 页
字号:
Namespace We.See.Diff

    ''' the diffscript class represents a list of changes to be made to reach list2 by modifying list1
    ''' an instance of this class is the result of the diff process
    ''' 
    ''' a word on the indexes
    ''' 
    ''' note, the description below is 1 based whereas the actual implementation is 0 based
    ''' this should probably be corrected at some stage
    ''' (for the confusion :P) 
    ''' 
    ''' the indexation of the modifications pose a problem;
    ''' at the time of execution of the diffscript, the only known indexes are the one of the source
    ''' since the modifications have not been performed yet
    ''' they are related to eachother
    ''' 
    ''' to illustrate:
    ''' aaa, abcaa
    ''' +b2 +c3
    ''' 
    ''' if we want to reverse the order of execution, the script should be changed:
    ''' aaa, abcaa
    ''' +c2 +b2 is the same script in reverse order
    ''' this is because the +b2 in the first script changes the index
    ''' 
    ''' the inverse scenario:
    ''' abcaa, aaa
    ''' -c3 -b2
    ''' 
    ''' becomes
    ''' -b2 -c2
    ''' 
    ''' at first glance it appear thar this can be resolved by executing the script in the
    ''' correct order however, moving modifications complicate the issue
    ''' abaa, acaba
    ''' >b2,3 +c2
    ''' and
    ''' +c2 >b3,4 or +c3 >b2,4
    ''' 
    ''' abaa, aacba
    ''' >b2,3 +c3
    ''' +c3, >b2,4
    ''' 
    ''' the solution is to maintain a list of the original indexes
    ''' abaa, acaba
    ''' 1234, 1 3 4
    ''' 
    ''' keep the list of modifications in the order at which they need to be applied to the original indexes
    ''' and translate the modifications to the new indexes before application
    ''' 
    ''' >b2,3 +c2
    ''' 
    ''' abaa
    ''' 1234
    ''' 
    ''' aaba
    ''' 13 4
    ''' acaba
    ''' 1 3 4
    ''' 
    ''' +c2 >b2,3 now becomes (means move the one that was originally at 2 to a new position after the original 3)
    ''' acbaa
    ''' 1 234
    ''' acaba
    ''' 
    ''' the order is which modifications are performed is now not relevant anymore
    ''' 
    ''' how this is implemented
    ''' 
    ''' the new implementation depends on the modifications being performed in the correct order
    ''' what happens is that modifications are interpreted and modified as such:
    ''' 
    ''' abcde +i2 +j4: expected result:
    ''' abicdje
    ''' 
    ''' to accomplish this, it is necessary to detect that when +i2 is performed, the index for modification 2 should increase by 1
    ''' likewise, when -i2 is performed, any +j4 or -j4 should now point to +j3 and -j3
    ''' 
    ''' first of all, the decision is made to have the diffscript contain the modifications from left to right
    ''' this seems to be most intuitive in reading the script as a human
    ''' the diffscript is originally determined from right to left though, due to the implementation of the LCS.
    ''' to overcome this, the diffscript is reversed without transposing the indexes
    ''' 
    ''' (below is the description of the old (not working) solution:
    ''' the reason it is discarded is that interpretation depends on the modifications pointing to the element at 
    ''' time of performing the modifications, otherwise,
    ''' 
    ''' abcde
    ''' abCDe
    ''' 
    ''' -d3-c2 +D2+C2 is not recognized as c>C d>D because the indexes differ
    ''' 
    ''' instead of using the list position mechanism described below in the Interpret function as well,
    ''' chosen is to modify the modifications to reflect their position in the diff script.
    ''' see above for the implementation
    ''' 
    ''' we need a mechanism to translate the index of the orininal element to the index in an intermediate state
    ''' i.e, in the intermediate step
    ''' acbaa
    ''' 1 234
    ''' 
    ''' we need to know that f(2)->3
    ''' to do this, we create an array of positions (effectively inversing the described mechanism above)
    ''' 1234
    ''' when inserting the c before position 2, we increase f(2,3,4)
    ''' 1345
    ''' 
    ''' likewise, when removing a element at position 2, we decrease f(2,3,4)
    ''' 
    ''' alternatively, we could just keep the deltas
    ''' 0000 -> 0100 or 0111
    ''' 
    ''' the performance implications of both the solutions are similar,
    ''' in the first scenario, we need recursion to determine the delta at f(3), adding the elements f(1,2,3) to get delta for 3
    ''' in the original and second scenarios, all elements of the lookup after the modification
    ''' need to be updated. the original solution is chosen because the loop required to update the index list
    ''' should execute faster than the recursive solution determining predecessor deltas
    ''' 
    ''' reversing the change script
    ''' 
    ''' we can also see that the modifications
    ''' +c2 +d3
    ''' 
    ''' can be reversed to
    ''' +d2 +c2
    ''' 
    ''' to accomplish this, 
    ''' result in acd
    ''' 
    ''' testing this:
    ''' create string (a)
    ''' make a diffscript with 2 changes -> c
    ''' apply c to (a) -> r1
    ''' reverse c -> c'
    ''' apply c' to a -> r2
    ''' compare r2 to r1, they should be equal
    ''' 
    Public Class DiffScript(Of T)
        Inherits List(Of Modification)

        ' The actions here are elements in the NotifyCollectionChangedAction enum, in the same order
        Public Enum Operation ' formerly known as Action
            Add
            Remove
            Replace
            Move
        End Enum

        Public Structure Modification

            Public Operation As Operation
            Public Item As T ' used in Add and Replace operations only. in debugging mode, this contains the item in Remove and Move operations too

            ' OLD:
            ' note, the term index is not used since we're talking about the place in the original array,
            ' irrespective of the modifications that have already been perfomed. 
            ' (see "a word on the indexes" above for the explanation)

            ' note, the term index is used since we're talking about the place in the intermediate array,
            ' reflective of the modifications that have already been perfomed. 
            ' (see "a word on the indexes" above for the explanation)
#If DEBUG Then
            Public IndexInOriginal As Integer ' this is used for debugging purposes only
#End If
            Public Index As Integer
            Public FromIndex As Integer ' used in move operations only

            Public Sub New(ByVal Operation As Operation, ByVal Item As T, ByVal Index As Integer)
                Me.Operation = Operation
                Me.Item = Item
                Me.Index = Index
                Me.IndexInOriginal = Index
            End Sub

#If DEBUG Then
            Public Sub ApplyTo(ByVal list As List(Of T))
                Select Case Operation
                    Case Operation.Add
                        list.Insert(Index, Item)
                    Case Operation.Remove
                        Debug.Assert(list(Index).Equals(Item))
                        list.RemoveAt(Index)
                    Case Operation.Replace
                        list(Index) = Item
                    Case Operation.Move
                        ' abcde
                        Debug.Assert(list(FromIndex).Equals(Item))
                        If FromIndex > Index Then
                            ' move from 4 to 2
                            ' adbce
                            list.Insert(Index, list(FromIndex))
                            list.RemoveAt(FromIndex + 1) '+1 because we inserted an item before the item to be removed
                        Else
                            ' move from 2 to 4
                            ' acdbe
                            list.Insert(Index, list(FromIndex))
                            list.RemoveAt(FromIndex)
                        End If
                End Select
            End Sub

#Region "    ApplyToOld"
#If False Then

            Public Sub ApplyToOld(ByVal list As List(Of String), ByRef locations As Integer())
                Select Case Operation
                    Case Operation.Add
                        list.Insert(locations(Origin), Item)
                        ' inserted an element, elements that were originally after the element are now moved down even further
                        For i As Integer = Origin To locations.Length - 1
                            locations(i) += 1
                        Next
                    Case Operation.Remove
                        Debug.Assert(list(locations(Origin)) = Item)
                        list.RemoveAt(locations(Origin))
                        ' removed an element, elements that were originally after the element are now moved up
                        ' there cannot be another modification at the origin location, but there can be 
                        ' an insert after for this purpose, we need to change the lookup at the origin too
                        ' abcde -c
                        ' abde
                        ' 12334 +f3 or +f4 are now the same
                        ' abfde
                        ' for this purpose, we add the +1
                        For i As Integer = Origin + 1 To locations.Length - 1
                            locations(i) -= 1
                        Next
                    Case Operation.Replace
                        list(locations(Origin)) = Item
                        ' no modifications in indexes with item replacement
                    Case Operation.Move
                        ' abcde
                        Debug.Assert(locations(FromOrigin) = Item)
                        If FromOrigin > Origin Then
                            ' move from 4 to 2
                            ' adbce
                            ' 1 235 'origs
                            ' 13425 'locations
                            list.Insert(locations(Origin), list(locations(FromOrigin)))
                            list.RemoveAt(locations(FromOrigin) + 1)
                            For i As Integer = Origin To FromOrigin
                                locations(i) += 1
                            Next
                        Else
                            ' move from 2 to 4
                            ' acdbe
                            ' 12345 'origs
                            ' 14235 'locations
                            list.Insert(locations(Origin) + 1, list(locations(FromOrigin)))
                            list.RemoveAt(locations(FromOrigin))
                            For i As Integer = FromOrigin To Origin
                                locations(i) -= 1
                            Next
                        End If
                End Select
            End Sub
#End If

#End Region
#End If

        End Structure


        ' the diffscript created by FindLCS only contains add and remove
        ' this is translated to add, remove, replace and move by calling Interpret
        Public Sub Interpret()

            Dim mods As New List(Of Modification)
            mods.AddRange(Me)

            Dim i, j As Integer
            Dim mod1 As Modification
            Dim mod2 As Modification
            Dim mod3 As Modification

            i = 0
            While i < mods.Count ' array size is modified in the loop so we cant use a for loop

                mod1 = mods(i)
                ' we only want to process add and remove here, leave modifications that have already been
                ' translated alone

                ' replace

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -