浏览代码

fix an edge case in skeletonization

'm' skeletonizes to 'rn' (but is exempted by the isBoring check),
but the fullwidth 'm' does not skeletonize to anything. The root cause
of this is the (still unexplained) patchiness of the skeleton mapping
for fullwidth -> standard-width Latin characters; the fix is to perform
width mapping first, before either skeletonization or isBoring.
tags/v1.1.0
Shivaram Lingamneni 5 年前
父节点
当前提交
be4d098945
共有 2 个文件被更改,包括 8 次插入4 次删除
  1. 4
    4
      irc/strings.go
  2. 4
    0
      irc/strings_test.go

+ 4
- 4
irc/strings.go 查看文件

@@ -163,15 +163,15 @@ func isIdent(name string) bool {
163 163
 // from the original (unfolded) identifier and stored/tracked separately from the
164 164
 // casefolded identifier.
165 165
 func Skeleton(name string) (string, error) {
166
-	if !isBoring(name) {
167
-		name = confusables.Skeleton(name)
168
-	}
169
-
170 166
 	// XXX the confusables table includes some, but not all, fullwidth->standard
171 167
 	// mappings for latin characters. do a pass of explicit width folding,
172 168
 	// same as PRECIS:
173 169
 	name = width.Fold.String(name)
174 170
 
171
+	if !isBoring(name) {
172
+		name = confusables.Skeleton(name)
173
+	}
174
+
175 175
 	// internationalized lowercasing for skeletons; this is much more lenient than
176 176
 	// Casefold. In particular, skeletons are expected to mix scripts (which may
177 177
 	// violate the bidi rule). We also don't care if they contain runes

+ 4
- 0
irc/strings_test.go 查看文件

@@ -181,6 +181,10 @@ func TestSkeleton(t *testing.T) {
181 181
 		t.Errorf("after skeletonizing, we should casefold")
182 182
 	}
183 183
 
184
+	if skeleton("smt") != "smt" {
185
+		t.Errorf("our friend lover successfully tricked the skeleton algorithm!")
186
+	}
187
+
184 188
 	if skeleton("еvan") != "evan" {
185 189
 		t.Errorf("we must protect against cyrillic homoglyph attacks")
186 190
 	}

正在加载...
取消
保存