summaryrefslogtreecommitdiff
path: root/searx/engines/__init__.py
AgeCommit message (Collapse)Author
2021-06-01[fix] sys.exit(1) when there is duplicate engine nameAlexandre Flament
2021-06-01[mod] searx.engines.load_engine return None instead of sys.exit(1)Markus Heiser
Loading an engine should not exit the application (*). Instead of exit, return None. (*) RuntimeError still exit the application: syntax error, etc... BTW: add documentation and normalize indentation (no functional change) Suggested-by: @dalf https://github.com/searxng/searxng/pull/116#issuecomment-851865627 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-01[mod] searx.engines.__init__: refactoringAlexandre Flament
2021-06-01[mod] move all default settings into searx.settings_defaultsAlexandre Flament
2021-05-05[mod] multithreading only in searx.search.* packagesAlexandre Flament
it prepares the new architecture change, everything about multithreading in moved in the searx.search.* packages previously the call to the "init" function of the engines was done in searx.engines: * the network was not set (request not sent using the defined proxy) * it requires to monkey patch the code to avoid HTTP requests during the tests
2021-04-21[enh] rewrite and enhance metricsAlexandre Flament
2021-04-21[mod] refactoring: processorsAlexandre Flament
Report to the user suspended engines. searx.search.processor.abstract: * manages suspend time (per network). * reports suspended time to the ResultContainer (method extend_container_if_suspended) * adds the results to the ResultContainer (method extend_container) * handles exceptions (method handle_exception)
2021-04-12[httpx] replace searx.poolrequests by searx.networkAlexandre Flament
settings.yml: * outgoing.networks: * can contains network definition * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections, keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time) * local_addresses can be "192.168.0.1/24" (it supports IPv6) * support_ipv4 & support_ipv6: both True by default see https://github.com/searx/searx/pull/1034 * each engine can define a "network" section: * either a full network description * either reference an existing network * all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-03-08[mod] by default allow only HTTPS, not HTTPAlexandre Flament
Related to https://github.com/searx/searx/pull/2373
2021-03-06[enh] add ability to send engine data to subsequent requestsAdam Tauber
2021-03-05[mod] don't dump traceback of SearxEngineResponseException on initMarkus Heiser
When initing engines a "SearxEngineResponseException" is logged very verbose, including full traceback information: ERROR:searx.engines:yggtorrent engine: Fail to initialize Traceback (most recent call last): File "share/searx/searx/engines/__init__.py", line 293, in engine_init init_fn(get_engine_from_settings(engine_name)) File "share/searx/searx/engines/yggtorrent.py", line 42, in init resp = http_get(url, allow_redirects=False) File "share/searx/searx/poolrequests.py", line 197, in get return request('get', url, **kwargs) File "share/searx/searx/poolrequests.py", line 190, in request raise_for_httperror(response) File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror raise_for_captcha(resp) File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha raise_for_cloudflare_captcha(resp) File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15) searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000 For SearxEngineResponseException this is not needed. Those types of exceptions can be a normal use case. E.g. for CAPTCHA errors like shown in the example above. It should be enough to log a warning for such issues: WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000 closes: #2612 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-25fix fetch_languages for bingMarc Abonce Seguin
Bing has a list of regions that it supports and some of these regions may have more than one possible language. In some cases, like Switzerland, these languages are always shown as options, so there is no issue. But in other cases, like Andorra, Bing will only show one language at the time, either the region's default or the request's language if the latter is supported by that region. For example, if the HTTP request is in French, Andorra will appear as fr-AD but if the same page is requested in any other language Andorra will appear as ca-AD. This is specially a problem when Bing assumes that the request is in English because it overrides enough language codes to make several major languages like Arabic dissappear from the languages.py file. To avoid that issue, I set the Accept-Language header to a language that's only supported in one region to hopefully avoid these overrides.
2021-02-01[mod] dynamically set language_support variableAlexandre Flament
The language_support variable is set to True by default, and set to False in only 5 engines. Except the documentation and the /config URL, this variable is not used. This commit remove the variable definition in the engines, and set value according to supported_languages length: False when the length is 0, True otherwise. Close #2485
2020-12-17[mod] split searx.search into different processorsAlexandre Flament
see searx.search.processors.abstract.EngineProcessor First the method searx call the get_params method. If the return value is not None, then the searx call the method search.
2020-12-11[enh] add raise_for_httperrorAlexandre Flament
check HTTP response: * detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time. * otherwise raise HTTPError as before the check is done in poolrequests.py (was before in search.py). update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
2020-12-09display if an engine does not support httpsNoémi Ványi
Closes #302
2020-12-03[enh] record details exception per engineAlexandre Flament
add an new API /stats/errors
2020-12-01[fix] /stats: report error percentage instead of error countAlexandre Flament
This bug exists since the PR https://github.com/searx/searx/pull/751
2020-11-20[enh] Add multiple outgoing proxiesAlexandre Flament
credits go to @bauruine see https://github.com/searx/searx/pull/1958
2020-10-25[enh] Add onions category with Ahmia, Not Evil and Torcha01200356
Xpath engine and results template changed to account for the fact that archive.org doesn't cache .onions, though some onion engines migth have their own cache. Disabled by default. Can be enabled by setting the SOCKS proxies to wherever Tor is listening and setting using_tor_proxy as True. Requires Tor and updating packages. To avoid manually adding the timeout on each engine, you can set extra_proxy_timeout to account for Tor's (or whatever proxy used) extra time.
2020-10-07[mod] Add searx.data moduleAlexandre Flament
Instead of loading the data/*.json in different location, load these files in the new searx.data module.
2020-09-22add language names in qwant's fetch languages functionMarc Abonce Seguin
2020-09-07[enh] stop searx when an engine raise an SyntaxError exception (#2177)Alexandre Flament
and some other exceptions: * KeyboardInterrupt * SystemExit * RuntimeError * SystemError * ImportError: an engine with an unmet dependency will stop everything.
2020-08-31Revert "[enh] test: load each engine to check for syntax errors"Alexandre Flament
This reverts commit 4fb3ed2c6335b68f6b28ebc68d5d22f2fd621648.
2020-08-28[enh] test: load each engine to check for syntax errorsDalf
2020-05-31add display_error_messages option to engine settingsNoémi Ványi
A new option is added to engines to hide error messages from users. It is called `display_error_messages` and by default it is set to `True`. If it is set to `False` error messages do not show up on the UI. Keep in mind that engines are still suspended if needed regardless of this setting. Closes #1828
2020-02-08[enh] introduce private enginesNoémi Ványi
This PR adds a new setting to engines named `tokens`. It expects a list of tokens which lets searx validate if the request should be accepted or not.
2019-10-16fix pep 8 checkNoémi Ványi
2019-10-16add initial support for offline engines && command engineNoémi Ványi
2019-07-27[fix] make sure then engine name is lower caseDalf
Minor fix: "%s engine initialized" display the right engine name
2019-01-06[fix] always set langauge_aliases even if it's emptyMarc Abonce Seguin
2018-03-27refactor engine's search language handlingMarc Abonce Seguin
Add match_language function in utils to match any user given language code with a list of engine's supported languages. Also add language_aliases dict on each engine to translate standard language codes into the custom codes used by the engine.
2018-02-17[fix] fix engine initializationAdam Tauber
2018-01-16[fix] read utf-8 files (settings, languages, currency) with python3.5Marc Abonce Seguin
Related to discussion in #1124 The io.open import is necessary for python2
2017-12-21Make Python 3 able to read settings files with Unicode charactersJoseph Nuthalapati
SearX currently doesn't start up when run with Python 3 as it tries to parse the settings.yml file with ASCII codecs. There are similar problems with engines_languages.json and currencies.json Python 3 requires that files with Unicode characters be read with a 'b' flag. This also works with Python 2 and hence can be integrated into the main source code. Tested with the latest Python 3.6.4rc1 on Debian unstable. Signed-off-by: Joseph Nuthalapati <njoseph@thoughtworks.com>
2017-07-21[mod] separate engine load and initializationAdam Tauber
2017-07-20[enh] add "inactive" attribute to enginesAdam Tauber
This modification allows us to deactivate engines in settings.yml without commenting them out
2017-06-06[fix] pep8Adam Tauber
2017-06-06[enh] add init function to engines which loads parallelAdam Tauber
2017-05-15[enh] py3 compatibilityAdam Tauber
2017-04-08[mod] searx doesn't crash at startup when an engine can't be loaded (see #884)Alexandre Flament
2016-12-28Merge branch 'master' into languagesAdam Tauber
2016-12-27[fix] proper engine initAdam Tauber
2016-12-27[enh] explicit engine initAdam Tauber
2016-12-15tests for _fetch_supported_languages in enginesmarc
and refactor method to make it testable without making requests
2016-12-13[mod] fetch supported languages for several enginesmarc
utils/fetch_languages.py gets languages supported by each engine and generates engines_languages.json with each engine's supported language.
2016-12-13[enh] add supported_languages on engines and auto-generate languages.pymarc
2016-12-09Merge branch 'master' into searchpy2Alexandre Flament
2016-11-19[mod] move load_module function to utilsAdam Tauber
2016-11-05Simplify search.py, basically updated PR #518Alexandre Flament
The timeouts in settings.yml is about the total time (not only the HTTP request but also the prepare the request and parsing the response) It was more or less the case before since the threaded_requests function ignores the thread after the timeout even the HTTP request is ended. New / changed stats : * page_load_time : record the HTTP request time * page_load_count: the number of HTTP request * engine_time : the execution total time of an engine * engine_time_count : the number of "engine_time" measure The avg response times in the preferences are the engine response time (engine_load_time / engine_load_count) To sum up : * Search.search() filters the engines that can't process the request * Search.search() call search_multiple_requests function * search_multiple_requests creates one thread per engine, each thread runs the search_one_request function * search_one_request calls the request function, make the HTTP request, calls the response function, extends the result_container * search_multiple_requests waits for the the thread to finish (or timeout)