summaryrefslogtreecommitdiff
path: root/searx/utils.py
diff options
context:
space:
mode:
authorIvan Gabaldon <igabaldon@inetol.net>2025-06-27 17:52:12 +0200
committerGitHub <noreply@github.com>2025-06-27 17:52:12 +0200
commit49fdf4edd9b5c0eab0e8baa7417bce74c4a6b81e (patch)
treea9a6cc51d4e7f55dd4d1acfd961e17c2bdec203e /searx/utils.py
parenta76ccba9c5519113987c25b02dad270ecfca3119 (diff)
[fix] utils: truncated result (#4949)
Make sure to prase everything before returning. Related: \ ``` FAIL: test_html_to_text (tests.unit.test_utils.TestUtils.test_html_to_text) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/searxng/searxng/tests/unit/test_utils.py", line 53, in test_html_to_text self.assertEqual(utils.html_to_text(r"regexp: (?<![a-zA-Z]"), "regexp: (?<![a-zA-Z]") ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: 'regexp: (?' != 'regexp: (?<![a-zA-Z]' - regexp: (? + regexp: (?<![a-zA-Z] ```
Diffstat (limited to 'searx/utils.py')
-rw-r--r--searx/utils.py2
1 files changed, 2 insertions, 0 deletions
diff --git a/searx/utils.py b/searx/utils.py
index 3c60851fa..7b7cd8f5d 100644
--- a/searx/utils.py
+++ b/searx/utils.py
@@ -161,9 +161,11 @@ def html_to_text(html_str: str) -> str:
s = _HTMLTextExtractor()
try:
s.feed(html_str)
+ s.close()
except AssertionError:
s = _HTMLTextExtractor()
s.feed(escape(html_str, quote=True))
+ s.close()
except _HTMLTextExtractorException:
logger.debug("HTMLTextExtractor: invalid HTML\n%s", html_str)
return s.get_text()