網(wǎng)站版塊設(shè)計(jì)廣告營(yíng)銷
在現(xiàn)代的互聯(lián)網(wǎng)應(yīng)用中,敏感詞過(guò)濾已成為一個(gè)必不可少的功能,尤其是在社交媒體、評(píng)論審核等需要保證內(nèi)容健康的場(chǎng)景下。本文將基于開(kāi)源庫(kù)https://github.com/houbb/sensitive-word,詳細(xì)講解如何通過(guò)自定義敏感詞庫(kù)和工具類實(shí)現(xiàn)高效的敏感詞過(guò)濾功能。
1. 項(xiàng)目依賴
首先需要引入 sensitive-word 相關(guān)的 Maven 依賴:
<dependency><groupId>com.github.houbb</groupId><artifactId>sensitive-word</artifactId><version>1.4.1</version>
</dependency>
2. 配置敏感詞過(guò)濾組件
下面是核心的敏感詞過(guò)濾配置代碼,通過(guò) SensitiveWordBs 構(gòu)建過(guò)濾器,并加載自定義敏感詞和允許詞。
配置類代碼
package cn.yujky.study.sensitive.config;import cn.yujky.study.sensitive.service.impl.MyWordAllowImpl;
import cn.yujky.study.sensitive.service.impl.MyWordDenyImpl;
import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
import com.github.houbb.sensitive.word.support.allow.WordAllows;
import com.github.houbb.sensitive.word.support.deny.WordDenys;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;gframework.context.annotation.Configuration;/*** 敏感詞配置*/
@Slf4j
@Configuration
public class SensitiveWordConfig {@Autowiredprivate MyWordDenyImpl myWordDeny;@Autowiredprivate MyWordAllowImpl myWordAllow;/*** 初始化敏感詞過(guò)濾器** @return 配置好的敏感詞過(guò)濾引導(dǎo)類*/@Beanpublic SensitiveWordBs sensitiveWordBs() {log.info("本地敏感詞庫(kù)初始化中...");SensitiveWordBs init = SensitiveWordBs.newInstance().wordDeny(WordDenys.chains(WordDenys.defaults(), myWordDeny)).wordAllow(WordAllows.chains(WordAllows.defaults(), myWordAllow)).init();log.info("本地敏感詞庫(kù)初始化完成");return init;}
}
3 自定義敏感詞庫(kù)
通過(guò)實(shí)現(xiàn) WordDeny 和 WordAllow 接口,可以分別配置屏蔽詞和允許詞。以下是示例代碼:
3.1 自定義屏蔽詞(MyWordDenyImpl)
package cn.yujky.study.sensitive.service.impl;import com.github.houbb.sensitive.word.api.IWordDeny;
import lombok.AllArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.core.io.Resource;
import org.springframework.core.io.ResourceLoader;
import org.springframework.stereotype.Service;import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;/*** @name: MyWordDeny* @description: <p></p>* @author: yujky* @date: 2024/12/27 11:18*/
@Slf4j
@Service
@AllArgsConstructor
public class MyWordDenyImpl implements IWordDeny {private final ResourceLoader resourceLoader;@Overridepublic List<String> deny() {// 加載resource目錄下的sensiticeWord.txt文本中的敏感詞Resource resource = resourceLoader.getResource("classpath:sensiticeWord.txt");// 將文件內(nèi)容讀取為字符串try {String content = null;content = new String(Files.readAllBytes(Paths.get(resource.getURI())));log.info("敏感詞庫(kù)加載完成,敏感詞數(shù)量為:{}", content.split("\\n").length);log.info("敏感詞庫(kù)加載完成,敏感詞:\\n {}", content);// 按換行分割return Arrays.stream(content.split("\\n")).distinct().toList();} catch (IOException e) {throw new RuntimeException(e);}}
}
這里的敏感詞庫(kù)我是直接放在resource目錄下的sensiticeWord.txt文本中,你也可以改為從數(shù)據(jù)庫(kù)或者其他存儲(chǔ)工具中讀取
3.2 自定義允許詞(MyWordAllowImpl)
package cn.yujky.study.sensitive.service.impl;import com.github.houbb.sensitive.word.api.IWordAllow;
import org.springframework.stereotype.Service;import java.util.Arrays;
import java.util.List;/*** @name: MyWordAllowImpl* @description: <p></p>* @author: yujky* @date: 2024/12/27 11:20*/
@Service
public class MyWordAllowImpl implements IWordAllow {@Overridepublic List<String> allow() {return Arrays.asList("五星紅旗");}
}
4. 清洗文本工具類
在敏感詞檢測(cè)前,通常需要對(duì)文本進(jìn)行預(yù)處理,例如移除特殊字符、表情符號(hào)等。以下是清洗文本的工具類示例代碼:
package cn.yujky.study.sensitive;@Slf4j
public class SensitiveTextCleaner {/*** 移除 Emoji 表情** @param text 輸入文本* @return 清洗后的文本*/public static String removeEmojis(String text) {String emojiRegex = "[\\x{1F600}-\\x{1F64F}\\x{1F300}-\\x{1F5FF}\\x{1F680}-\\x{1F6FF}\\x{1F700}-\\x{1F77F}\\x{1F780}-\\x{1F7FF}\\x{1F800}-\\x{1F8FF}\\x{1F900}-\\x{1F9FF}\\x{1FA00}-\\x{1FA6F}\\x{1FA70}-\\x{1FAFF}\\x{2600}-\\x{26FF}\\x{2700}-\\x{27BF}]";return text.replaceAll(emojiRegex, "");}/*** 移除特殊字符** @param text 輸入文本* @return 清洗后的文本*/public static String removeSpecialCharacters(String text) {return text.replaceAll("[^a-zA-Z0-9\u4e00-\u9fa5]", "");}/*** 綜合清洗文本(移除表情與特殊字符)** @param text 輸入文本* @return 清洗后的文本*/public static String cleanText(String text) {text = removeEmojis(text); // 移除 Emojitext = removeSpecialCharacters(text); // 移除特殊字符return text.trim().toLowerCase(); // 轉(zhuǎn)小寫并去除多余空格}
}
5. 敏感詞過(guò)濾測(cè)試
在 Spring Boot 項(xiàng)目中通過(guò)單元測(cè)試驗(yàn)證過(guò)濾功能,以下為完整的測(cè)試代碼:
package cn.yujky.study.sensitive;import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
import lombok.extern.slf4j.Slf4j;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;@Slf4j
@SpringBootTest
class YujkySensitiveApplicationTests {@Autowiredprivate SensitiveWordBs sensitiveWordBs;@Testvoid contextLoads() {String text = "操&他🐎";String cleanText = SensitiveTextCleaner.cleanText(text);log.info("原文本: {}, 清洗后文本: {}", text, cleanText);// 檢查是否包含敏感詞boolean containsOriginal = sensitiveWordBs.contains(text);boolean containsCleaned = sensitiveWordBs.contains(cleanText);log.info("是否包含敏感詞(原文本): {}", containsOriginal);log.info("是否包含敏感詞(清洗后文本): {}", containsCleaned);// 控制臺(tái)輸出System.out.println("原文本檢測(cè)結(jié)果: " + containsOriginal);System.out.println("清洗后文本檢測(cè)結(jié)果: " + containsCleaned);}
}
5.1 測(cè)試結(jié)果示例
假設(shè)敏感詞庫(kù)中包含 “操” 和 “他”:
原文本: 操&他🐎, 清洗后文本: 操他
是否包含敏感詞(原文本): false
是否包含敏感詞(清洗后文本): true
這里建議對(duì)原文本以及清洗后的文本都進(jìn)行一次檢測(cè),增加敏感詞的檢測(cè)力度
如果你在開(kāi)發(fā)過(guò)程中有其他需求或問(wèn)題,歡迎交流!
https://web.yujky.cn/
用戶名:cxks
密碼: cxks123