I'm converting spoken form text to its written form. For example, "he owes me two-thousand dollars" should be converted to "he owes me $2,000" . I want an automatic check, to judge if the conversion was right or not. Can i use sentence transformers to compare the embeddings of "two-thousand dollars" to "$2,000" to check if the spoken to written conversion was right? For example, if the cosine similarity of the embeddings is close to 1, that would mean right conversion. Is there any other better way to do this?
I'm converting spoken form text to its written form. For example, "he owes me two-thousand dollars" should be converted to "he owes me $2,000" . I want an automatic check, to judge if the conversion was right or not. Can i use sentence transformers to compare the embeddings of "two-thousand dollars" to "$2,000" to check if the spoken to written conversion was right? For example, if the cosine similarity of the embeddings is close to 1, that would mean right conversion. Is there any other better way to do this?
Share Improve this question asked Mar 10 at 18:55 user3113173user3113173 212 bronze badges1 Answer
Reset to default 0Cosine similarity has limitations for numerical values—e.g., embeddings for "2000" and "4000" may be too similar to "two thousand." But you can try it as an auxiliary metric.
You can use multiple converters to check consistency and verify by back-transforming digits to words for equivalence with the original text. You can use text2num and num2word packages for that, for instance.