summaryrefslogtreecommitdiff
path: root/searx/engines
AgeCommit message (Collapse)Author
2021-03-08[fix] rewrite Yahoo-News engineMarkus Heiser
Many things have been changed since last review of this engine. This patch fix xpath selectors, implements suggestion and is a complete review / rewrite of the engine. Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-06[enh] add ability to send engine data to subsequent requestsAdam Tauber
2021-03-05[mod] don't dump traceback of SearxEngineResponseException on initMarkus Heiser
When initing engines a "SearxEngineResponseException" is logged very verbose, including full traceback information: ERROR:searx.engines:yggtorrent engine: Fail to initialize Traceback (most recent call last): File "share/searx/searx/engines/__init__.py", line 293, in engine_init init_fn(get_engine_from_settings(engine_name)) File "share/searx/searx/engines/yggtorrent.py", line 42, in init resp = http_get(url, allow_redirects=False) File "share/searx/searx/poolrequests.py", line 197, in get return request('get', url, **kwargs) File "share/searx/searx/poolrequests.py", line 190, in request raise_for_httperror(response) File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror raise_for_captcha(resp) File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha raise_for_cloudflare_captcha(resp) File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15) searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000 For SearxEngineResponseException this is not needed. Those types of exceptions can be a normal use case. E.g. for CAPTCHA errors like shown in the example above. It should be enough to log a warning for such issues: WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000 closes: #2612 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-01[enh] google scholar - python implementation of the engineMarkus Heiser
The old xpath configuration for google scholar did not work and is replaced by a python implementation. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-01Merge pull request #2602 from MarcAbonce/fix-bing-fetch-languagesAlexandre Flament
Fix fetch_languages for Bing
2021-03-01Add Freesound engine (#2596)GazoilKerozen
Add freesound engine with player. Co-authored-by: Gazoil <maildeguzel@gmail.com>
2021-02-25remove articles number from engines_languages.jsonMarc Abonce Seguin
2021-02-25fix fetch_languages for bingMarc Abonce Seguin
Bing has a list of regions that it supports and some of these regions may have more than one possible language. In some cases, like Switzerland, these languages are always shown as options, so there is no issue. But in other cases, like Andorra, Bing will only show one language at the time, either the region's default or the request's language if the latter is supported by that region. For example, if the HTTP request is in French, Andorra will appear as fr-AD but if the same page is requested in any other language Andorra will appear as ca-AD. This is specially a problem when Bing assumes that the request is in English because it overrides enough language codes to make several major languages like Arabic dissappear from the languages.py file. To avoid that issue, I set the Accept-Language header to a language that's only supported in one region to hopefully avoid these overrides.
2021-02-22Fix paging of Bing ImagesNoémi Ványi
2021-02-20Added rumble.com video search engine. TODO video embedding.datagram1
Update rumble.py some lines too long. Disable Rumble engine disabled : True PEP8 fix change line spacing
2021-02-16Merge pull request #2573 from unixfox/yggtorrentAlexandre Flament
update yggtorrent url + add it back
2021-02-15fix yggtorrent url + add it backEmilien Devos
2021-02-13Improve peertube searchingThorben Günther
At the moment videos without a description are not shown - setting default content to "" fixes this. Another current bug is that thumbnails are not displayed. This is caused by a double slash in the url. For this every trailing slash is now stripped (for backwards compatibility) and the API response is correctly parsed.
2021-02-12Merge pull request #2566 from dalf/remove-yandexAlexandre Flament
[remove] yandex engine
2021-02-12[fix] duckduckgo engine: "!ddg !g" do not redirect to googleAlexandre Flament
* searx understand "!ddg !g time" as : send "!g time" to DDG * !g a DDG bang for Google: DDG return a HTTP redirect to Google This commit adds a the allows_redirect param not to follow HTTP redirect. The DDG engine returns a empty result as before without HTTP redirect.
2021-02-12Merge pull request #2562 from dalf/mod-json-engineAlexandre Flament
[mod] json_engine: add content_html_to_text and title_html_to_text
2021-02-12Merge pull request #2565 from dalf/upd-wikipediaAlexandre Flament
[upd] wikipedia engine: return an empty result on query with illegal characters
2021-02-12Merge pull request #2564 from dalf/fix-seznamAlexandre Flament
[fix] fix seznam engine
2021-02-12Merge pull request #2560 from dalf/fix-duckduckgoAlexandre Flament
Fix duckduckgo
2021-02-11Merge pull request #2541 from return42/mediathekviewwebAlexandre Flament
[enh] add engine MediathekViewWeb (API)
2021-02-11[remove] yandex engineAlexandre Flament
2021-02-11[fix] fix seznam engineAlexandre Flament
no paging support
2021-02-11[upd] wikipedia engine: return an empty result on query with illegal charactersAlexandre Flament
on some queries (like an IT error message), wikipedia returns an HTTP error 400. this commit returns an empty result instead of showing an error to the user.
2021-02-10[mod] json_engine: add content_html_to_text and title_html_to_textAlexandre Flament
Some JSON API returns HTML in either in the HTML or the content. This commit adds two new parameters to the json_engine: content_html_to_text and title_html_to_text, False by default. If True, then the searx.utils.html_to_text removes the HTML tags. Update crossref, openairedatasets and openairepublications engines
2021-02-10Merge pull request #2544 from mrwormo/congresslibraryAlexandre Flament
[Engine] Add Library of Congress engine
2021-02-09[mod] duckduckgo engine: better support of the language preferenceAlexandre Flament
After the main request, send a second to https://duckduckgo.com/t/sl_h See https://github.com/searx/searx/issues/2259
2021-02-09[enh] add engine MediathekViewWeb (API)Markus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-09Add Library of Congress enginemrwormo
2021-02-09[fix) fix apk_mirror engineAlexandre Flament
2021-02-08add support for Chinese variants in WikipediaMarc Abonce Seguin
2021-02-07[feat] recoll: paged json supportHermógenes Oliveira
2021-02-04Add Creative Commons search enginemrwormo
2021-02-01[mod] dynamically set language_support variableAlexandre Flament
The language_support variable is set to True by default, and set to False in only 5 engines. Except the documentation and the /config URL, this variable is not used. This commit remove the variable definition in the engines, and set value according to supported_languages length: False when the length is 0, True otherwise. Close #2485
2021-01-28[fix] google: avoid unnecessary SearxEngineXPathException errorsMarkus Heiser
Avoid SearxEngineXPathException errors when parsing non valid results:: .//div[@class="yuRUbf"]//a/@href index 0 not found Traceback (most recent call last): File "./searx/engines/google.py", line 274, in response url = eval_xpath_getindex(result, href_xpath, 0) File "./searx/searx/utils.py", line 608, in eval_xpath_getindex raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found') searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28[fix] normalize the language & region aspects of all google enginesMarkus Heiser
BTW: make the engines ready for search.checker: - replace eval_xpath by eval_xpath_getindex and eval_xpath_list - google_images: remove outer try/except block Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24[fix] google-videos: parse values for 'length' & 'author'Markus Heiser
The 'video.html' template from the 'oscar' design supports replacement for *author* and *length*. Google-videos does not have an author, alternatively the publisher info from is used for the *author*. Hint: these replacements are not supported by the 'simple' design. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24[fix] revise of the google-Video engineMarkus Heiser
This revise is based on the methods developed in the revise of the google engine (see commit 410c2f9). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24[fix] google_news: avoid one HTTP redirect except for the English resultsAlexandre Flament
also add params['soft_max_redirects'] = 1 to avoid false error reporting in /stats/errors
2021-01-23[fix] google-news: query uses locale without country tagMarkus Heiser
Wthout country-region tag google will redirect to correct the contry tag [1]: SEARX_DEBUG=1 searx-checker -v "google news" ... https://news.google.com:443 "GET /search?q=computer&hl=en... HTTP/1.1" 302 0 https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None ... [1] https://github.com/searx/searx/pull/2483#issuecomment-765600849 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22[fix] revise of the google-news engineMarkus Heiser
This revise is based on the methods developed in the revise of the google engine (see commit 410c2f9). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-16Merge pull request #2451 from mrwormo/invidious-engineAlexandre Flament
[Fix] Invidious Engine
2021-01-14[enh] engines: add about variableAlexandre Flament
move meta information from comment to the about variable so the preferences, the documentation can show these information
2021-01-14[fix] Invidious engine by enabling requests by randomly picking amongst ↵mrwormo
working instances
2020-12-20[fix] pylint: use "raise ... from ..."Alexandre Flament
2020-12-20[fix] Python 3.9: use html.unescape instead of HTMLParser.unescapeAlexandre Flament
2020-12-17[mod] dictzone, translated, currency_convert: use engine_type online_curency ↵Alexandre Flament
and online_dictionnary
2020-12-17[mod] split searx.search into different processorsAlexandre Flament
see searx.search.processors.abstract.EngineProcessor First the method searx call the get_params method. If the return value is not None, then the searx call the method search.
2020-12-16Fix the StartPage result title is showing the urllucky13820
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-14Merge pull request #2385 from joshu9h/patch-1Alexandre Flament
[Fix] Startpage
2020-12-13Merge pull request #2372 from dalf/remove-broken-enginesAlexandre Flament
[remove] remove searchcode_doc and twitter