From 800d05606f762c276c22db6d0bc52648e952477f Mon Sep 17 00:00:00 2001
From: "google-labs-jules[bot]"
 <161369871+google-labs-jules[bot]@users.noreply.github.com>
Date: Sat, 14 Jun 2025 06:02:05 +0000
Subject: [PATCH] feat: Add new content moderation project

This commit introduces a new project, `deepseek_content_moderation`, designed to detect sensitive content in text based on configurable keyword lists.

Key features include:
- Customizable categories of sensitive words stored in `config.json`.
- A `Moderator` class (`moderator.py`) that loads the configuration and uses regex for case-insensitive, whole-word matching.
- The `analyze_text` method returns a dictionary of triggered categories and the specific words found.
- Comprehensive unit tests (`tests/test_moderator.py`) using pytest ensure the functionality of the `Moderator` class.
- A detailed `README.md` provides an overview, setup instructions, usage examples, and testing guidelines.

The project structure has been set up to be a valid Python package, with the main directory named `deepseek_content_moderation`.
This project serves as a foundational component for applications requiring basic content filtering capabilities.
---
 deepseek_content_moderation/README.md         | 100 ++++++++++++++++
 deepseek_content_moderation/__init__.py       |   0
 .../__pycache__/__init__.cpython-310.pyc      | Bin 0 -> 137 bytes
 .../__pycache__/moderator.cpython-310.pyc     | Bin 0 -> 2114 bytes
 deepseek_content_moderation/config.json       |  57 +++++++++
 deepseek_content_moderation/moderator.py      |  55 +++++++++
 deepseek_content_moderation/tests/__init__.py |   0
 .../__pycache__/__init__.cpython-310.pyc      | Bin 0 -> 143 bytes
 ...est_moderator.cpython-310-pytest-8.3.5.pyc | Bin 0 -> 7424 bytes
 .../tests/test_moderator.py                   | 110 ++++++++++++++++++
 10 files changed, 322 insertions(+)
 create mode 100644 deepseek_content_moderation/README.md
 create mode 100644 deepseek_content_moderation/__init__.py
 create mode 100644 deepseek_content_moderation/__pycache__/__init__.cpython-310.pyc
 create mode 100644 deepseek_content_moderation/__pycache__/moderator.cpython-310.pyc
 create mode 100644 deepseek_content_moderation/config.json
 create mode 100644 deepseek_content_moderation/moderator.py
 create mode 100644 deepseek_content_moderation/tests/__init__.py
 create mode 100644 deepseek_content_moderation/tests/__pycache__/__init__.cpython-310.pyc
 create mode 100644 deepseek_content_moderation/tests/__pycache__/test_moderator.cpython-310-pytest-8.3.5.pyc
 create mode 100644 deepseek_content_moderation/tests/test_moderator.py

diff --git a/deepseek_content_moderation/README.md b/deepseek_content_moderation/README.md
new file mode 100644
index 0000000..64feac3
--- /dev/null
+++ b/deepseek_content_moderation/README.md
@@ -0,0 +1,100 @@
+# DeepSeek Content Moderation Tool
+
+## Overview
+
+This project provides a Python tool for detecting potentially sensitive content in text. It uses a configurable list of keywords and phrases across various categories to analyze input text and flag any matches. This tool is intended as a basic building block for more complex content moderation systems.
+
+## Features
+
+-   **Configurable Categories**: Define your own categories of sensitive content and the keywords/phrases for each.
+-   **JSON Configuration**: Sensitive word lists are managed in an easy-to-edit `config.json` file.
+-   **Regex-Based Matching**: Uses regular expressions for case-insensitive, whole-word matching.
+-   **Returns Matched Categories and Words**: Provides a dictionary of categories that were triggered and the specific words found.
+-   **Extensible**: Designed to be integrated into larger applications or workflows.
+
+## Project Structure
+
+```
+deepseek_content_moderation/
+├── __init__.py
+├── config.json
+├── moderator.py
+├── README.md
+└── tests/
+    ├── __init__.py
+    └── test_moderator.py
+```
+
+## Setup and Installation
+
+1.  **Prerequisites**:
+    *   Python 3.7+
+    *   `pytest` (for running tests): `pip install pytest`
+
+2.  **Configuration (`config.json`)**:
+    The `config.json` file stores the categories and lists of sensitive words. You can edit this file to add, remove, or modify categories and their associated terms.
+
+    Example structure:
+    ```json
+    {
+      "Profanity": ["example_swear", "another_bad_word"],
+      "HateSpeech": ["example_slur", "derogatory_term"],
+      // ... other categories
+    }
+    ```
+    *Initially, the file contains a predefined set of categories and example terms based on common types of sensitive content.*
+
+## Usage
+
+The core functionality is provided by the `Moderator` class in `moderator.py`.
+
+```python
+from deepseek_content_moderation.moderator import Moderator
+
+# Initialize the moderator (it will load from config.json by default)
+# You can also provide a custom path to a config file:
+# moderator = Moderator(config_path="path/to/your/custom_config.json")
+moderator = Moderator()
+
+text_to_analyze = "This text contains an example_swear and a derogatory_term."
+analysis_result = moderator.analyze_text(text_to_analyze)
+
+if analysis_result:
+    print("Sensitive content found:")
+    for category, words in analysis_result.items():
+        print(f"  Category: {category}, Words: {', '.join(words)}")
+else:
+    print("No sensitive content detected.")
+
+# Example Output:
+# Sensitive content found:
+#   Category: Profanity, Words: example_swear
+#   Category: HateSpeech, Words: derogatory_term
+```
+
+### `analyze_text(text: str) -> dict`
+
+-   **Input**: A string of text to analyze.
+-   **Output**: A dictionary where keys are the names of sensitive categories found in the text. The value for each key is a list of unique words/phrases from the input text that matched the sensitive terms in that category. If no sensitive content is found, an empty dictionary is returned.
+
+## Running Tests
+
+To run the unit tests, navigate to the parent directory of `deepseek_content_moderation` and run:
+
+```bash
+python -m pytest
+```
+Or, navigate into the `deepseek_content_moderation` directory and run `pytest`. Ensure `pytest` is installed.
+
+## Disclaimer
+
+This tool provides basic keyword-based detection. It is not a comprehensive solution for content moderation, which often requires more sophisticated NLP techniques, contextual understanding, and human oversight. The initial lists of sensitive words in `config.json` are illustrative and will likely need significant expansion and refinement for any practical application.
+
+## Contributing
+
+Feel free to expand upon this project. Suggestions for improvement include:
+- More sophisticated matching algorithms (e.g., Levenshtein distance for typos).
+- Support for multiple languages.
+- Integration with machine learning models for nuanced detection.
+- More granular reporting.
+```
diff --git a/deepseek_content_moderation/__init__.py b/deepseek_content_moderation/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deepseek_content_moderation/__pycache__/__init__.cpython-310.pyc b/deepseek_content_moderation/__pycache__/__init__.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1c7783c235c65aa44cf95b1194713b5128a72980
GIT binary patch
literal 137
zcmd1j<>g`k0x3S<3=sVoL?8o3AjbiSi&=m~3PUi1CZpd<h9ZzKg7~GQpIA_!pOTtd
zP@I~Y9iN<^SCX1n5}%u&l3J8ll9`{UA0MBYmst`YuUAlci^B#eQJRx#2QsCY2}rOo
F008q_9R&aY

literal 0
HcmV?d00001

diff --git a/deepseek_content_moderation/__pycache__/moderator.cpython-310.pyc b/deepseek_content_moderation/__pycache__/moderator.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a1b245fb1c2b1effcef0c3e51886ee5c855a2302
GIT binary patch
literal 2114
zcmZ`(&5s*36t_Jy$>ejFrEF25)kp|R5TxCHa6qf7OS>&aY>K*-5TjP3Y3wA^PG+XI
z(``o+2`PI=+>k1y<jB9|0~b#4C+r2Doy~SyiYI>Y$Is92{oZ>{Tdg{RcBZ{I`qf3~
zPdzC&2PW5{i$6e6L~)G#<v;fEZA1yR4iU9x#3#(6_7lR#jLeX4Q^$9hJ+Cllf&D6T
zDU4nsT4mLj=r?<YF`}+tGl&`>>IP9Wi24bl<~IzY0f?qSGz_A7f~fl~gJ=PwZ4fPk
zXrCY&{+dCo0b<=C)(m3(1cCR#uYJ^M{slgwUSG>{vte=_y0{2}As<uZ6WTDoV;{Vx
z&Y|`gJOuwL1Dm?Ec4+%mTGv`tZ7-#ahce|yjeeR8qG9*3NRvJRb*pB<0Qg5Ypo_0T
zNK{~oXXNMhlt?@$1zF$%1=bJ1z75~Z!YFaKkk~u`F7@{#^a9Q81^iP4H<O;iyyGZ4
z%~+yreW$YHG^DDsG()Wi{dAl~F$*{wvgb^6EM*H857e=tK^Dr9!h_!tZ-G%qUt9>Y
z>;h#h6O289(}`q>48}_bqcpj2JWP2vd!gJQh>}PKflec#{cGYj{_vkB$u~|&>8?P1
z41iVBCv-Q2+=D>`&IQmP0~rf+6FoflfK2g}P)x}oD#%SV!&8ewIObMiE%0eXozrM)
zOMC7VARkv21Wp63+vurl2&(j6yHdwPU%3xmY=RiVbq}jkEYTbnc!7T*Ps#q0i4IYO
zGgV=tA7)I?x)+swkaB8>T8cL4Q1$IN5>oSgZA~lIw``)kTk`*nAau9mv>(Re8VpuP
zbL+vQPmgesTb&1wI#spa4<#F>{KYDAT(hg3NV2hjSf^3K*Y!&ESh=d-x!t>aZ}-N|
z{%*(Ny6Tm?QlcDfxqw<S?$E$YRm)_B&H>4I@`hu)1}d}~(F9S!8@O$4;7wv5<!Aoi
zH)TuqT&fCZa>8FPfeNTOiTEv;L}*G1EGh+%t|K?cChqS++^wly*iaM@l&MoV+>@5H
zC5&cRe+4Nlh;l=p?BHIf!gV<LX%MP9h!Pscv8sraOGcq2fMjI}CIP1%;^#p4?+(&Q
zLIc4P5y|KobSC%*di$(VGY(~c1WURuO5tvXNf^J#S)lzq1$$xxgpKvLLFyn+5%QBw
z^37Kym5=l~nCr)&<Lc^3!Z8a1RS$HDO`uK#)eM5ClQ3THaGg>n65xR3%BE3Y@{jf6
zE4$1{=OWj+r(ZKuiY}^S*V(L_RejkwjFKS8zr8<-ga^OSlT1kOAd(|b9I%k<tl09x
zghEf{h;i>xNCCvhUK~AP-Zwk<cSXine-wy#!neA4dq=Y&K-nPW-uc`*|CN`2WG>MB
z=v(?R3lpPMjM7O=y(|=>o8Q_wzJfgu$63slX6)_K=tlTmuovw|y+L|!yd$H#$<ih<
z84khT=oy1YCz-H>reMM6_w`yb^uE7yXK7`|IJ|^myd-y}9=u`Q{7YlnC=?#$lVLCs
zI*xpN>~t*B|Ev#zv~K4euCv!voS~C{0#m;RL0dljenV4faw6C=@$u^>>Of7frJRgM
zi7bOImG&={H!hbpE|)j1ls2xEH?Ed8uD&fTl)Jsmu50=M7N3D|k&D~JJzF6*<m@T@
KcKJTG|Na;M6Ddvr

literal 0
HcmV?d00001

diff --git a/deepseek_content_moderation/config.json b/deepseek_content_moderation/config.json
new file mode 100644
index 0000000..e6a1b3e
--- /dev/null
+++ b/deepseek_content_moderation/config.json
@@ -0,0 +1,57 @@
+{
+  "Profanity": [
+    "swearword1",
+    "vulgarterm1",
+    "explicitlang1"
+  ],
+  "HateSpeech": [
+    "hatespeech_slur1",
+    "derogatory_term1",
+    "incitement_example1"
+  ],
+  "DiscriminatoryLanguage": [
+    "stereotype_example1",
+    "biased_phrase1",
+    "microaggression_example1"
+  ],
+  "SexuallyExplicitLanguage": [
+    "sexual_act_description1",
+    "explicit_anatomical_term1",
+    "suggestive_innuendo1"
+  ],
+  "ViolenceGore": [
+    "graphic_violence_desc1",
+    "torture_example1",
+    "weapon_for_harm1"
+  ],
+  "SelfHarmSuicide": [
+    "selfharm_method1",
+    "suicidal_ideation_phrase1",
+    "encouragement_selfharm1"
+  ],
+  "IllegalActivities": [
+    "drug_use_term1",
+    "illegal_weapon_term1",
+    "terrorism_related_term1"
+  ],
+  "BlasphemyReligiousInsults": [
+    "religious_insult1",
+    "disrespectful_term_religion1",
+    "offensive_to_belief1"
+  ],
+  "MedicalMisinformation": [
+    "unproven_medical_advice1",
+    "dangerous_health_claim1",
+    "harmful_pseudo_treatment1"
+  ],
+  "PrivacyViolatingPII": [
+    "personal_name_example",
+    "address_example_term",
+    "phone_number_example_term"
+  ],
+  "OffensiveSlangCulturalInsults": [
+    "cultural_slang_insult1",
+    "derogatory_cultural_term1",
+    "offensive_local_slang1"
+  ]
+}
diff --git a/deepseek_content_moderation/moderator.py b/deepseek_content_moderation/moderator.py
new file mode 100644
index 0000000..eb0b28c
--- /dev/null
+++ b/deepseek_content_moderation/moderator.py
@@ -0,0 +1,55 @@
+import json
+import re
+
+class Moderator:
+    def __init__(self, config_path="config.json"):
+        with open(config_path, 'r') as f:
+            self.config = json.load(f)
+        self._compile_regexes()
+
+    def _compile_regexes(self):
+        self.category_regexes = {}
+        for category, words in self.config.items():
+            # Escape special characters in words and join with | for OR logic
+            # Use  for whole word matching
+            escaped_words = [re.escape(word) for word in words]
+            regex_pattern = r"\b(" + "|".join(escaped_words) + r")\b"
+            # Compile with IGNORECASE
+            self.category_regexes[category] = re.compile(regex_pattern, re.IGNORECASE)
+
+    def analyze_text(self, text: str) -> dict:
+        found_sensitivities = {}
+        if not text:
+            return found_sensitivities
+
+        for category, regex_pattern in self.category_regexes.items():
+            matches = regex_pattern.findall(text)
+            if matches:
+                # Store unique matches
+                found_sensitivities[category] = sorted(list(set(matches)))
+
+        return found_sensitivities
+
+if __name__ == '__main__':
+    # Example Usage (optional, for basic testing)
+    moderator = Moderator()
+
+    test_text_1 = "This is a test with swearword1 and another bad term like HATEspeech_slur1."
+    analysis_1 = moderator.analyze_text(test_text_1)
+    print(f"Analysis for '{test_text_1}': {analysis_1}")
+
+    test_text_2 = "This text is clean and should pass."
+    analysis_2 = moderator.analyze_text(test_text_2)
+    print(f"Analysis for '{test_text_2}': {analysis_2}")
+
+    test_text_3 = "Another example with MEdiCaL_MiSiNfoRmAtiOn1 and suggestive_innuendo1."
+    analysis_3 = moderator.analyze_text(test_text_3)
+    print(f"Analysis for '{test_text_3}': {analysis_3}")
+
+    test_text_4 = "Testing PII like personal_name_example here."
+    analysis_4 = moderator.analyze_text(test_text_4)
+    print(f"Analysis for '{test_text_4}': {analysis_4}")
+
+    test_text_5 = "This has drug_use_term1 and also drug_use_term1 again."
+    analysis_5 = moderator.analyze_text(test_text_5)
+    print(f"Analysis for '{test_text_5}': {analysis_5}")
diff --git a/deepseek_content_moderation/tests/__init__.py b/deepseek_content_moderation/tests/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deepseek_content_moderation/tests/__pycache__/__init__.cpython-310.pyc b/deepseek_content_moderation/tests/__pycache__/__init__.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3fc4f9f81411d762371e71e0129c331b2cbea705
GIT binary patch
literal 143
zcmd1j<>g`k0x3S<3=sVoL?8o3AjbiSi&=m~3PUi1CZpd<h9ZzKg7{^mpIA_!pOTtd
zP@I~Y9iN<^SCX1n5}%u&l3J8ll9`{UUy@o}Qmh{zpP83g5+AQuP<e~P1}I&clWGSt
LtC$H$urL4s#FQUx

literal 0
HcmV?d00001

diff --git a/deepseek_content_moderation/tests/__pycache__/test_moderator.cpython-310-pytest-8.3.5.pyc b/deepseek_content_moderation/tests/__pycache__/test_moderator.cpython-310-pytest-8.3.5.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a4c0c3e083c208d1aacd971a84dd9f3ae2284030
GIT binary patch
literal 7424
zcmd^EOOM;u73L)+isESAekN_gxOQSSsqFD{(#CO|#IDoElQfT}Q3J3XUCxXgji_8w
z9*;_G>a>dhMYHU}MLk6~vuQV7^@nu9MNuG|c3UqB80h0Whm?lW_=yHVfXsl0_sMh4
zz2|)A+^b-6($?_z*wmG^*Y{}JXH?1m8K}I3C;S75)TC%>b-{0a-Ox3mtrzM=k@lE%
zlY2__62Gmwg}2@+uiJH7Xuf%MLK?F0v35h$D_ko|lWUW*B(0CN`jjk78}Dg3AuD*#
z$VoYc_pF?jGkDL*SviOIyquSN@ZKZ$%6)k6mHXuZy!Xk2@-e*kOYyc=ef%GgSzFo`
zN%=}3edR@gY6xn}{t9?ocwWI1E`vxR^;MmO1TC%#Y2GyYBGzQ-ajmaMTC7X!hW@MB
zz7dIAdMsjNOBdREP``lx8-~y#Bi3(#`GOW6-4Fr`B((SO4BD&ZrKGg%$&EnCq!@1a
zo=PffUgWQITW;j5^`sECx~jgX+irf;^|{Gi2^0!pR5g<Fn<`lG+RbP)u`grV+a2F;
ztR*veH<~NW#w%!B4b*0Dh9<a;puN&utz8X+b|N;ag`^O4{B}~{hJx&_cM>Cv6gsSE
zz2kPgXstexPI9FedC8!%(ro#OSlQOpJl5&X;xVt&IVOF-6Z-y##3u6Fk-MHQb2Dfk
zBlaQFL2ICDoy`Ptfl<bSzFMNi-v?qErYH#W-&V0AriA@pOP~3VrK`Q@8re14K{89;
zHL_mp3wG|8wB`CzRSXp%ZIq4yRl#3j(Nh9b*OIwRT&!-h9Y$We;gfP9O|cC{GL0~v
zKm$q3jCG8Xf{f!TOarE%U(@<VY=q~@G^8FID?%F7zM^jx(1!AskcFsltGK0I)561O
zGh=N_s4DlO)TVwKsbggErm;`!7o;gmaiJ+tE=DEvT5)kp+z>y57Z$X>z+?1bmZ>CC
z9Mt+IxSE+XUCbCrQ>OO3G|bBcFZ-sB@0ZBy;u7<+GG5j$UZ$L6UgZ(3|D5A9bsw#N
zi*c#ezi00Mm)2POGHWbian<e{a=KrRjkx@w4%zhZUo6uMc0kBkYTu>1u_aiFGS*;@
z*V`gwi7j3OJ6i*Ld=0SXdcQ0+7~I|^MF;|n(VOk6=+%xK?QAYoL#NqxsCc>>+CR+-
z&N*&6Sq*Ec{o8#uGPt#YNkmQ(Io+G`!q8U{4?kNCsqw5Tf)6;%jd3_t4Yzf55S=va
z)Sdj?GwIwjs+`VU9GiPD=`s*;mA~p=_d~nioBv?>=yD2MF{4)XgXL{;M?ZHCA|0d8
zLakI&6f#Y*4mCI-V6AQUopY)Bkg000yy`JXr5unEg%?wy7poJAnXXe}ztq`eJOf(J
zx(%<@f@mGp2-Z8E@{_U~dMiG)Bo&Nwm%GhY1eZ?CR)E1_a=;DOf^JLlEO)gPEPE}t
z?XCOC)CFE?Al5~t0+q}Y*sgn#>tFA*ytWsir#{mdTacubsa2ie9ZV*OZMtd4^OaNi
z%89g+GB}g3Ear!u%=etmSI*{#ovt&}^lQ(LW7~WV{@qsK$!2@?09HCAYyCvD#3w)y
zbrzKoh%?bJfVkVgKqc000p_*<bj=Lr;9LZA=h(T%U=HwQ+$u1<9YTMRAWuDm5+G>^
zZ)OH>1UV0aH-a{DGi#}L@SU|L+$wZDr_u7gwi6=a_`vL%dJ2MWXP$AKtZ(ROiG@9E
zZH7(w2)V~`wz&}>fe)z1=>z4Y!h?Ghb+1S22pZMXM2-@v68R>PXNWvY<T;RSlTAQ<
zn;MRT<Tq&{zkR3F0*#><lZ!%!Qo|-OGtO>1aG_k-jGEW*_~8?nahsAA8$hQh{ETcG
zIrnhW{)x)Yrb$M^SY**4-85_;5U~&=Y7G&QBhD~3Jw4bm^(;ypxe%L>t#O<vP_}lT
zm_!DGL<*5em*rJ~c%;W>W;P$;d%Cn)f(dHROTfG+9#yh<gzWD=@kr;`hny1zGQHG$
ziVb976%OSj9zX#0k^xtZp<3<73Jt6!htz;A-op$`Q*b(mv1}cRmVd<1XliC?fME4?
z>VJ}m1F~IK=c&A7Lu@o-hO}ct<(v)GXd)SsdYK5@%t<PpB66C@m<fGrw+V&t_m&T+
zOw&O(t5e!}2w%8Oj&6(Vq9fGzQ5><SNi>j6ah~%Nq+4GadorVPY~B#aRB%pM)%zCe
z7V71=#8Kal?U3>cT~5R{Y#iB48ujJm(6(FX$9DisWQzusVB3=e+v9f$Hn=n6=ffGF
z?3#i6h-Xj+F?G`fq)pISC7$36qLO70m0Sj4!fH!+Y&=$$Ma(Yss-xDs=7j^Zb-cE8
z5XAzA18>fX)g|mw3OwJX(wMbU!oVhr5d9{FXxM2QqEW~~G~n_mb)7&|-igr*Xii5`
zDI1Ew%|j8NF*Gs5Kt;Z+n(77cA#gpR2&mZp)i+S87m(E)?@i~DnFa9S<`Y%>kvv3Q
z!3V384{(zU#Gia6jg2Yev-!&N>I`*rnvjOhg@uvY$#Lg8{$S@??;_pkz}?8bLh|%*
z;9q|g>J8LK{A&cS|MF$>uP+9#rztxh#-u;Oe{ke}U`+Zoc_AkhzhRF{6N*QRNz)vY
zX2xRDXDq?zACutK&t4d&Db7YSS{wCgCx7sNc)zA}M_nc|8gw|zS7dqWH6q_5G8QV1
z?6&5i^=82W2?ymhx9Mn2A)=Lri65YPPcv3;qPGf#4>ODrdU?*v7hZkujkjK9{JMPM
zor{;=ecgTM;#*f9M$>vJo6P`rPi^a!{D<$Y_Tk-XH@wiNbN?{W+9cpmZxMNWTvuAu
zL5igoQ9(wUq~NNarx~36eTParAp9a#Um`NL2~X^n2A3kyri-f>#BKi8t2FOjkdZYX
zxq?{36~vb&G=Cw`!P-vaG*Xr`3?k($vHjB+i0mBoDY2a$27iJMGp6AGI|h+7_|HB<
zVhhd;B6DLP^7qdlL|EH-Xlu(otgU@b(An!#5!RU7UIxv508=}kM$7#JeKB%BYm6(5
zPlfs#>x?oIp8XH6Gbh*wmU>TK_UZh^SqRM3SI%-ba@s-k2#LusC%KzT_l^zDcvrh&
z1Oy>Jd5$@FBUtHp=xw!XjCjf1;Ed;n-R0DU7zKNmcIQ;|O|?E0JX-UeFuOW+?snx0
zJ!^`Puqy7Dq$=O?>>J#pq%DJMon&?>365~MMQE%sBxSb*DM&m-djCW6{T~q-1CGVr
zz`-epOR=TdPV*0UV=c&XkyHlgKu{FUfY>?#x@|DkaRu-`8b=@|pPi_S=%usOQC`XH
z9F-oz2NiD>r>sF-dl+cnIZEBl2JVgFYz=Pyh)&lzK2A6?Rh&{p^*WLIxKgx(1N;vE
z_LWPyzFN4+cs0M&b6yJ_xJB*;;JoublAD3CKDBcn@PL!ly)v%e)Dli*_`T*iC-~6Y
zJfGsx<PZg@OOf5BFknC6_`(Uu&ME)LG;s_8?s8j1zVCLr?MBq)>q^RC!U{;5yphM_
zl>Y%5ss|Ela5u`ATFF5-y;!ZKM<rZ#)|9`Ynh`ElhfPT-Yvb#T#7w6sx+X~UAgmv`
z&+kCB;cqO;pz)=Z=Jlwn{PYJI`iX@vA^B2^ZzA}(q$t_sER0hi1}S!GRwK72ygK}2
sj`adKbzA<6aBlp#0&+;Rge5AXVA!HCVGFBZnRtr$FX6vnO;?Kl1G*R>s{jB1

literal 0
HcmV?d00001

diff --git a/deepseek_content_moderation/tests/test_moderator.py b/deepseek_content_moderation/tests/test_moderator.py
new file mode 100644
index 0000000..ca1e586
--- /dev/null
+++ b/deepseek_content_moderation/tests/test_moderator.py
@@ -0,0 +1,110 @@
+import pytest
+import json
+import os
+from deepseek_content_moderation.moderator import Moderator # Adjusted import path
+
+# Helper to create a temporary config file for testing
+@pytest.fixture
+def temp_config_file(tmp_path):
+    config_data = {
+        "Profanity": ["badword", "swear"],
+        "HateSpeech": ["hateful_term", "slur"],
+        "SpecificCategory": ["unique_term_for_test"]
+    }
+    config_file = tmp_path / "test_config.json"
+    with open(config_file, 'w') as f:
+        json.dump(config_data, f)
+    return str(config_file) # Return path as string
+
+@pytest.fixture
+def moderator_instance(temp_config_file):
+    # Ensure the moderator uses the temp config by passing the path
+    return Moderator(config_path=temp_config_file)
+
+def test_config_loading(moderator_instance):
+    assert "Profanity" in moderator_instance.config
+    assert "swear" in moderator_instance.config["Profanity"]
+    assert "HateSpeech" in moderator_instance.category_regexes
+    assert moderator_instance.category_regexes["Profanity"].pattern == r"\b(badword|swear)\b"
+
+def test_analyze_text_no_sensitivities(moderator_instance):
+    analysis = moderator_instance.analyze_text("This is a clean sentence.")
+    assert analysis == {}
+
+def test_analyze_text_single_category_single_word(moderator_instance):
+    analysis = moderator_instance.analyze_text("This sentence contains a badword.")
+    assert "Profanity" in analysis
+    assert analysis["Profanity"] == ["badword"]
+
+def test_analyze_text_single_category_multiple_words(moderator_instance):
+    analysis = moderator_instance.analyze_text("This sentence has badword and also swear.")
+    assert "Profanity" in analysis
+    assert sorted(analysis["Profanity"]) == sorted(["badword", "swear"])
+
+def test_analyze_text_multiple_categories(moderator_instance):
+    analysis = moderator_instance.analyze_text("A sentence with badword and a hateful_term.")
+    assert "Profanity" in analysis
+    assert analysis["Profanity"] == ["badword"]
+    assert "HateSpeech" in analysis
+    assert analysis["HateSpeech"] == ["hateful_term"]
+
+def test_analyze_text_case_insensitivity(moderator_instance):
+    analysis = moderator_instance.analyze_text("This has a BADWORD and HATEFUL_TERM.")
+    assert "Profanity" in analysis
+    assert analysis["Profanity"] == ["BADWORD"] # The regex returns the found casing
+    assert "HateSpeech" in analysis
+    assert analysis["HateSpeech"] == ["HATEFUL_TERM"]
+
+def test_analyze_text_empty_string(moderator_instance):
+    analysis = moderator_instance.analyze_text("")
+    assert analysis == {}
+
+def test_analyze_text_words_within_words_whole_word_matching(moderator_instance):
+    # 'swear' is a keyword, 'swearinger' is not.
+    analysis = moderator_instance.analyze_text("He is swearinger but not swear.")
+    assert "Profanity" in analysis
+    assert analysis["Profanity"] == ["swear"]
+
+    # Test with a word that is a substring of a sensitive word, but not a whole word match
+    analysis_substring = moderator_instance.analyze_text("This is just a test, not a hateful_term at all.")
+    assert "HateSpeech" in analysis_substring
+    assert analysis_substring["HateSpeech"] == ["hateful_term"]
+
+    analysis_no_match = moderator_instance.analyze_text("This sentence has a term but not the specific unique_term_for_testing.")
+    assert "SpecificCategory" not in analysis_no_match
+
+
+def test_analyze_text_repeated_words(moderator_instance):
+    analysis = moderator_instance.analyze_text("This badword is a badword again badword.")
+    assert "Profanity" in analysis
+    assert analysis["Profanity"] == ["badword"] # Should only list unique matches
+
+def test_analyze_text_with_punctuation(moderator_instance):
+    analysis = moderator_instance.analyze_text("Is this a badword? Yes, badword!")
+    assert "Profanity" in analysis
+    assert analysis["Profanity"] == ["badword"]
+
+    analysis_slur = moderator_instance.analyze_text("No slur, okay?")
+    assert "HateSpeech" in analysis_slur
+    assert analysis_slur["HateSpeech"] == ["slur"]
+
+# It's good practice to ensure the test file can be found and run.
+# Create a dummy __init__.py in the parent directory of 'tests' if moderator.py is in the root
+# and tests are in a subdirectory, to make Python treat 'deepseek-content-moderation' as a package.
+# For this subtask, we assume the structure is:
+# deepseek-content-moderation/
+#   moderator.py
+#   config.json
+#   tests/
+#     test_moderator.py
+
+# To run these tests, navigate to the `deepseek-content-moderation` directory and run `pytest`.
+# Ensure `pytest` is installed (`pip install pytest`).
+# If `moderator.py` is in the root of `deepseek-content-moderation`, the import
+# `from ..moderator import Moderator` is for when tests are run as part of a package.
+# If running `pytest` directly from within the `tests` directory, or if `deepseek-content-moderation`
+# is not treated as a package, a simple `from moderator import Moderator` might be needed,
+# and `sys.path` manipulation or running pytest with `python -m pytest` from the root.
+
+# For this tool, we will ensure the structure supports `from ..moderator import Moderator`.
+# This requires an `__init__.py` in the `deepseek-content-moderation` directory.