diff options
Diffstat (limited to 'docs/dev/engine_overview.rst')
| -rw-r--r-- | docs/dev/engine_overview.rst | 315 |
1 files changed, 315 insertions, 0 deletions
diff --git a/docs/dev/engine_overview.rst b/docs/dev/engine_overview.rst new file mode 100644 index 000000000..a6867b5d0 --- /dev/null +++ b/docs/dev/engine_overview.rst @@ -0,0 +1,315 @@ +Engine overview +=============== + + +searx is a `metasearch-engine <https://en.wikipedia.org/wiki/Metasearch_engine>`__, +so it uses different search engines to provide better results. + +Because there is no general search API which could be used for every +search engine, an adapter has to be built between searx and the +external search engines. Adapters are stored under the folder +`searx/engines +<https://github.com/asciimoo/searx/tree/master/searx/engines>`__. + + +.. contents:: + :depth: 3 + +general engine configuration +---------------------------- + +It is required to tell searx the type of results the engine provides. The +arguments can be set in the engine file or in the settings file +(normally ``settings.yml``). The arguments in the settings file override +the ones in the engine file. + +It does not matter if an option is stored in the engine file or in the +settings. However, the standard way is the following: + + +engine file +~~~~~~~~~~~ + ++----------------------+-----------+-----------------------------------------+ +| argument | type | information | ++======================+===========+=========================================+ +| categories | list | pages, in which the engine is working | ++----------------------+-----------+-----------------------------------------+ +| paging | boolean | support multible pages | ++----------------------+-----------+-----------------------------------------+ +| language\_support | boolean | support language choosing | ++----------------------+-----------+-----------------------------------------+ +| time\_range\_support | boolean | support search time range | ++----------------------+-----------+-----------------------------------------+ +| offline | boolean | engine runs offline | ++----------------------+-----------+-----------------------------------------+ + +settings.yml +~~~~~~~~~~~~ + ++------------+----------+-----------------------------------------------+ +| argument | type | information | ++============+==========+===============================================+ +| name | string | name of search-engine | ++------------+----------+-----------------------------------------------+ +| engine | string | name of searx-engine (filename without .py) | ++------------+----------+-----------------------------------------------+ +| shortcut | string | shortcut of search-engine | ++------------+----------+-----------------------------------------------+ +| timeout | string | specific timeout for search-engine | ++------------+----------+-----------------------------------------------+ + +overrides +~~~~~~~~~ + +A few of the options have default values in the engine, but are +often overwritten by the settings. If ``None`` is assigned to an option +in the engine file, it has to be redefined in the settings, +otherwise searx will not start with that engine. + +The naming of overrides is arbitrary. But the recommended +overrides are the following: + ++-----------------------+----------+----------------------------------------------------------------+ +| argument | type | information | ++=======================+==========+================================================================+ +| base\_url | string | base-url, can be overwritten to use same engine on other URL | ++-----------------------+----------+----------------------------------------------------------------+ +| number\_of\_results | int | maximum number of results per request | ++-----------------------+----------+----------------------------------------------------------------+ +| language | string | ISO code of language and country like en\_US | ++-----------------------+----------+----------------------------------------------------------------+ +| api\_key | string | api-key if required by engine | ++-----------------------+----------+----------------------------------------------------------------+ + +example code +~~~~~~~~~~~~ + +.. code:: python + + # engine dependent config + categories = ['general'] + paging = True + language_support = True + +making a request +---------------- + +To perform a search an URL have to be specified. In addition to +specifying an URL, arguments can be passed to the query. + +passed arguments +~~~~~~~~~~~~~~~~ + +These arguments can be used to construct the search query. Furthermore, +parameters with default value can be redefined for special purposes. + ++----------------------+------------+------------------------------------------------------------------------+ +| argument | type | default-value, information | ++======================+============+========================================================================+ +| url | string | ``''`` | ++----------------------+------------+------------------------------------------------------------------------+ +| method | string | ``'GET'`` | ++----------------------+------------+------------------------------------------------------------------------+ +| headers | set | ``{}`` | ++----------------------+------------+------------------------------------------------------------------------+ +| data | set | ``{}`` | ++----------------------+------------+------------------------------------------------------------------------+ +| cookies | set | ``{}`` | ++----------------------+------------+------------------------------------------------------------------------+ +| verify | boolean | ``True`` | ++----------------------+------------+------------------------------------------------------------------------+ +| headers.User-Agent | string | a random User-Agent | ++----------------------+------------+------------------------------------------------------------------------+ +| category | string | current category, like ``'general'`` | ++----------------------+------------+------------------------------------------------------------------------+ +| started | datetime | current date-time | ++----------------------+------------+------------------------------------------------------------------------+ +| pageno | int | current pagenumber | ++----------------------+------------+------------------------------------------------------------------------+ +| language | string | specific language code like ``'en_US'``, or ``'all'`` if unspecified | ++----------------------+------------+------------------------------------------------------------------------+ + +parsed arguments +~~~~~~~~~~~~~~~~ + +The function ``def request(query, params):`` always returns the +``params`` variable. Inside searx, the following paramters can be +used to specify a search request: + ++------------+-----------+---------------------------------------------------------+ +| argument | type | information | ++============+===========+=========================================================+ +| url | string | requested url | ++------------+-----------+---------------------------------------------------------+ +| method | string | HTTP request method | ++------------+-----------+---------------------------------------------------------+ +| headers | set | HTTP header information | ++------------+-----------+---------------------------------------------------------+ +| data | set | HTTP data information (parsed if ``method != 'GET'``) | ++------------+-----------+---------------------------------------------------------+ +| cookies | set | HTTP cookies | ++------------+-----------+---------------------------------------------------------+ +| verify | boolean | Performing SSL-Validity check | ++------------+-----------+---------------------------------------------------------+ + +example code +~~~~~~~~~~~~ + +.. code:: python + + # search-url + base_url = 'https://example.com/' + search_string = 'search?{query}&page={page}' + + # do search-request + def request(query, params): + search_path = search_string.format( + query=urlencode({'q': query}), + page=params['pageno']) + + params['url'] = base_url + search_path + + return params + +returned results +---------------- + +Searx is able to return results of different media-types. +Currently the following media-types are supported: + +- default +- images +- videos +- torrent +- map + +To set another media-type as default, the parameter +``template`` must be set to the desired type. + +default +~~~~~~~ + ++--------------------+---------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++====================+===============================================================================================================+ +| url | string, url of the result | ++--------------------+---------------------------------------------------------------------------------------------------------------+ +| title | string, title of the result | ++--------------------+---------------------------------------------------------------------------------------------------------------+ +| content | string, general result-text | ++--------------------+---------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish | ++--------------------+---------------------------------------------------------------------------------------------------------------+ + +images +~~~~~~ + +to use this template, the parameter + ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++====================+=======================================================================================================================================+ +| template | is set to ``images.html`` | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| url | string, url to the result site | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, title of the result *(partly implemented)* | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| content | *(partly implemented)* | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish *(partly implemented)* | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| img\_src | string, url to the result image | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| thumbnail\_src | string, url to a small-preview image | ++--------------------+---------------------------------------------------------------------------------------------------------------------------------------+ + +videos +~~~~~~ + ++--------------------+--------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++====================+==============================================================================================================+ +| template | is set to ``videos.html`` | ++--------------------+--------------------------------------------------------------------------------------------------------------+ +| url | string, url of the result | ++--------------------+--------------------------------------------------------------------------------------------------------------+ +| title | string, title of the result | ++--------------------+--------------------------------------------------------------------------------------------------------------+ +| content | *(not implemented yet)* | ++--------------------+--------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish | ++--------------------+--------------------------------------------------------------------------------------------------------------+ +| thumbnail | string, url to a small-preview image | ++--------------------+--------------------------------------------------------------------------------------------------------------+ + +torrent +~~~~~~~ + ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++==================+=======================================================================================================================================+ +| template | is set to ``torrent.html`` | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| url | string, url of the result | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| title | string, title of the result | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| content | string, general result-text | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish *(not implemented yet)* | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| seed | int, number of seeder | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| leech | int, number of leecher | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| filesize | int, size of file in bytes | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| files | int, number of files | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| magnetlink | string, `magnetlink <https://en.wikipedia.org/wiki/Magnet_URI_scheme>`__ of the result | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +| torrentfile | string, torrentfile of the result | ++------------------+---------------------------------------------------------------------------------------------------------------------------------------+ + + +map +~~~ + ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| result-parameter | information | ++=========================+==============================================================================================================+ +| url | string, url of the result | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| title | string, title of the result | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| content | string, general result-text | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| latitude | latitude of result (in decimal format) | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| longitude | longitude of result (in decimal format) | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| boundingbox | boundingbox of result (array of 4. values ``[lat-min, lat-max, lon-min, lon-max]``) | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| geojson | geojson of result (http://geojson.org) | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| osm.type | type of osm-object (if OSM-Result) | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| osm.id | id of osm-object (if OSM-Result) | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| address.name | name of object | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| address.road | street name of object | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| address.house\_number | house number of object | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| address.locality | city, place of object | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| address.postcode | postcode of object | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| address.country | country of object | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ + |