如何在python中生成一组相似的字符串

本文介绍了如何在python中生成一组相似的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我想知道如何基于Levenshtein distance(字符串编辑距离)生成一组相似的字符串.理想情况下，我喜欢传递源字符串(即用于生成与其相似的其他字符串的字符串)，需要生成的字符串数和阈值作为参数，即，生成的集合应大于阈值.我想知道应该使用什么Python软件包?或任何想法如何实现这一目标?

I am wondering how to generate a set of similar strings based on Levenshtein distance (string edit distance). Ideally, I like to pass in, a source string (i.e. a string which is used to generate other strings that are similar to it), the number of strings need to be generated and a threshold as parameters, i.e. similarities among the strings in the generated set should be greater than the threshold. I am wondering what Python package(s) should I use to achieve that? Or any idea how to implement this?

推荐答案

我认为您可以用另一种方式来思考问题(反向).

I think you can think of the problem in another way (reversed).

给出一个字符串，说它是 sittin .
给出一个阈值(编辑距离)，说它是k.
然后您以k个步骤应用不同编辑"的组合.

Given a string, say it is sittin.
Given a threshold (edit distance), say it is k.
Then you apply combinations of different "edits" in k-steps.

例如，假设k =2.并假设您拥有允许的编辑模式是:

For example, let's say k = 2. And assume the allowed edit modes you have are:

删除一个字符
添加一个字符
用一个字符替换另一个字符.

然后逻辑如下:

input = 'sittin' for num in 1 ... n: # suppose you want to have n strings generated my_input_ = input # suppose the edit distance should be smaller or equal to k; # but greater or equal to one for i in in 1 ... randint(k): pick a random edit mode from (delete, add, substitute) do it! and update my_input_

如果您需要使用预定义的字典，那么这会增加一些复杂性，但是仍然可以实现.在这种情况下，编辑必须有效.

If you need to stick with a pre-defined dictionary, that adds some complexity but it is still doable. In this case, the edit must be valid.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

如何在python中生成一组相似的字符串

与本文相关的文章

评论列表(0)