Diff - 9aa120f7ada592ed03b37f4de8ee413c5385f123^! - android_external_vim

commit	9aa120f7ada592ed03b37f4de8ee413c5385f123	[log] [tgz]
author	Yee Cheng Chin <ychin.git@gmail.com>	Fri Apr 04 19:16:21 2025 +0200
committer	Christian Brabandt <cb@256bit.org>	Fri Apr 04 19:16:21 2025 +0200
tree	fda6de1c317402b17444ce43d76a6ddd110140ac
parent	b8d5c8509998f3a97ffe42f674352b07749cd119 [diff] [blame]

patch 9.1.1276: inline word diff treats multibyte chars as word char

Problem:  inline word diff treats multibyte chars as word char
          (after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
          (Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: #16881 (diff inline highlight)
closes: #17050

Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>

diff --git a/src/mbyte.c b/src/mbyte.c
index a38ab24..cc8d628 100644
--- a/src/mbyte.c
+++ b/src/mbyte.c

@@ -828,8 +828,8 @@
  * Get class of pointer:
  * 0 for blank or NUL
  * 1 for punctuation
- * 2 for an (ASCII) word character
- * >2 for other word characters
+ * 2 for an alphanumeric word character
+ * >2 for other word characters, including CJK and emoji
  */
     int
 mb_get_class(char_u *p)