diff options
| author | Ivan Gabaldon <igabaldon@inetol.net> | 2025-06-27 17:52:12 +0200 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-06-27 17:52:12 +0200 |
| commit | 49fdf4edd9b5c0eab0e8baa7417bce74c4a6b81e (patch) | |
| tree | a9a6cc51d4e7f55dd4d1acfd961e17c2bdec203e /searx/utils.py | |
| parent | a76ccba9c5519113987c25b02dad270ecfca3119 (diff) | |
[fix] utils: truncated result (#4949)
Make sure to prase everything before returning.
Related: \
```
FAIL: test_html_to_text (tests.unit.test_utils.TestUtils.test_html_to_text)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/searxng/searxng/tests/unit/test_utils.py", line 53, in test_html_to_text
self.assertEqual(utils.html_to_text(r"regexp: (?<![a-zA-Z]"), "regexp: (?<![a-zA-Z]")
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'regexp: (?' != 'regexp: (?<![a-zA-Z]'
- regexp: (?
+ regexp: (?<![a-zA-Z]
```
Diffstat (limited to 'searx/utils.py')
| -rw-r--r-- | searx/utils.py | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/searx/utils.py b/searx/utils.py index 3c60851fa..7b7cd8f5d 100644 --- a/searx/utils.py +++ b/searx/utils.py @@ -161,9 +161,11 @@ def html_to_text(html_str: str) -> str: s = _HTMLTextExtractor() try: s.feed(html_str) + s.close() except AssertionError: s = _HTMLTextExtractor() s.feed(escape(html_str, quote=True)) + s.close() except _HTMLTextExtractorException: logger.debug("HTMLTextExtractor: invalid HTML\n%s", html_str) return s.get_text() |