文本对比生成差分包——diff-match-patch

###0.前言
本文主要测试diff-match-patch的基础用法,为了生成react-native的bundle文件的差分包。

本文测试环境是java的。

diff-match-patch库:GitHub地址

###1.diff_match_patch库
用于处理纯文本的高性能库。

主要功能:

  1. Diff:比较两个纯文本块并有效地返回差异列表。
  2. Match:模糊匹配字符串
  3. Patch:根据补丁文件修复字符串

###1.1 diff_main (对比)
/**

 * Find the differences between two texts.
 * Run a faster, slightly less optimal diff.
 * This method allows the 'checklines' of diff_main() to be optional.
 * Most of the time checklines is wanted, so default to true.
 * @param text1 Old string to be diffed.
 * @param text2 New string to be diffed.
 * @return Linked List of Diff objects.
 */
public LinkedList<Diff> diff_main(String text1, String text2) {
  return diff_main(text1, text2, true);
}

用于对比两个文本间的不同点,并生成一个差异列表(LinkedList<Diff>)

###1.2 patch_make (补丁生成)
/**

 * Compute a list of patches to turn text1 into text2.
 * A set of diffs will be computed.
 * @param text1 Old text.
 * @param text2 New text.
 * @return LinkedList of Patch objects.
 */
public LinkedList<Patch> patch_make(String text1, String text2) {
  if (text1 == null || text2 == null) {
    throw new IllegalArgumentException("Null inputs. (patch_make)");
  }
  // No diffs provided, compute our own.
  LinkedList<Diff> diffs = diff_main(text1, text2, true);
  if (diffs.size() > 2) {
    diff_cleanupSemantic(diffs);
    diff_cleanupEfficiency(diffs);
  }
  return patch_make(text1, diffs);
}

根据对比的结果的差异列表生成一个补丁列表

###1.3 其他

  • String patch_toText(List<Patch> patches):补丁列表->字符串
  • List<Patch> patch_fromText(String textline):字符串->补丁列表
  • Object[] patch_apply(LinkedList<Patch> patches, String text1):为text1打补丁,转化为text2

额外说明:

Object[] patch_apply(pathes, text1)的返回值

  • Object[0]:结果字符串(text2)
  • Object[1]:boolean[],布尔数组,与补丁列表一一对应,补丁列表的补丁被使用,布尔列表对应的值为true。

###1.4 执行流程

生成差分包步骤:

  1. 定义原始字符串text1
  2. 对比text1和text2
  3. 生成差异列表
  4. 差异列表转化补丁列表
  5. 补丁列表转化为纯文本格式(序列化)

解析差分包步骤:

  1. 获取补丁文本
  2. 将补丁文本转化为补丁列表(反序列化)
  3. 根据原始字符串text1和补丁列表生成text2
  4. 得到text2

注意点:

  1. patch_apply(LinkedList<Patch> patches, String text)的含义是: text1要经过多少道工序(patches)才能变成text2,即LinkedList和LinkedList都是针对 text1->text2(单向)的。

###1.5 具体代码

生成差分包:

diff_match_patch dmp  = new diff_match_patch();
LinkedList<diff_match_patch.Patch> patches = dmp.patch_make(text1,text1);
// 生成补丁包字符串
String patchesStr = dmp.patch_toText(patches);
//省略IO操作

解析差分包:

diff_match_patch dmp  = new diff_match_patch();
String patchesText = getPatchesText();//获取差分文件的字符串,省略IO操作
LinkedList<diff_match_patch.Patch> pathes = (LinkedList<diff_match_patch.Patch>) dmp.patch_fromText(patchesText);
Object[] resultArr = dmp.patch_apply(pathes, text1);
String text2 = (String)resultArr[0];

完全配对校验:

public boolean isCompleted(Object[] results){
        Boolean[] resultStatusArr = (Boolean[])results[1];
        boolean isSuccess = true;
        for(boolean s : resultStatusArr){
            isSuccess =  isSuccess && s;
            if(!isSuccess) return false;
        }

        return isSuccess;
    }

END

–Nowy

–2018.11.21

分享到