3.1.1.3. karr_lab_aws_manager.elasticsearch_kl package

3.1.1.3.1. Submodules

3.1.1.3.2. karr_lab_aws_manager.elasticsearch_kl.index_setting_file module

class karr_lab_aws_manager.elasticsearch_kl.index_setting_file.IndexUtil(filter_dir=None, analyzer_dir=None, mapping_properties_dir=None)[source]

Bases: object

Make index setting json file.

combine_files(**kwargs)[source]

Combine various settings for index to form a coherent description. (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html)

Parameters
  • _filter (bool) – whether to include filter info.

  • analyzer (bool) – whether to include analyzer info.

  • mappings (bool) – whether to include mappings info.

Returns

(dict)

read_file(_dir)[source]

Read in json file.

Parameters

_dir (str) – directory of the json file.

Returns

(dict)

3.1.1.3.3. karr_lab_aws_manager.elasticsearch_kl.query_builder module

class karr_lab_aws_manager.elasticsearch_kl.query_builder.QueryBuilder(profile_name=None, credential_path=None, config_path=None, elastic_path=None, cache_dir=None, service_name='es', max_entries=inf, verbose=False)[source]

Bases: karr_lab_aws_manager.elasticsearch_kl.util.EsUtil

build_bool_query_body(must=None, _filter=None, should=None, must_not=None, minimum_should_match=0)[source]

Building boolean query body (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html)

Parameters
  • must (list or dict, optional) – Body for must. Defaults to None.

  • _filter (list or dict, optional) – Body for filter. Defaults to None.

  • should (list or dict, optional) – Body for should. Defaults to None.

  • must_not (list or dict, optional) – Body for must_not. Defaults to None.

  • minimum_should_match (int) – The number or percentage of should clauses returned documents must match. Defaults to 0.

Returns

boolean query body

Return type

(dict)

build_simple_query_string_body(query_message, **kwargs)[source]

Builds query portion of the body in request body search (https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/full-text/#simple-query-string)

Parameters

query_message (str) – string to be queried for.

Returns

query request body

Return type

(dict)

3.1.1.3.4. karr_lab_aws_manager.elasticsearch_kl.util module

class karr_lab_aws_manager.elasticsearch_kl.util.ComplexEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.encoder.JSONEncoder

default(o)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
class karr_lab_aws_manager.elasticsearch_kl.util.EsUtil(profile_name=None, credential_path=None, config_path=None, elastic_path=None, cache_dir=None, service_name='es', max_entries=inf, verbose=False)[source]

Bases: karr_lab_aws_manager.config.config.establishES

add_field_to_index(index, field=None, value=None, query={'match_all': {}}, script_type='inline', script_complete=None)[source]

Add a field of value to all documents in index

Parameters
  • index (str) – name of index.

  • field (str) – name of field.

  • value (Obj) – value of field.

  • query (Obj) – query of index.

  • script_type (str) – type of script, inline or store.

  • script_complete (str) – content of script.

Returns

elasticsearch update status description.

Return type

(HTTPResponse)

allocation_explain()[source]

chooses the first unassigned shard that it finds and explains why it cannot be allocated to a node

Returns

http response

Return type

(HTTPResponse)

build_es(suffix=None)[source]

build es query object

Parameters

suffix (str) – string trailing es endpoint

Returns

Elasticsearch object

Return type

(Elasticsearch)

change_field_name(pipeline_name, pipeline_description, src_field, target_field, src_idx, dest_idx)[source]

Change field name. (https://www.elastic.co/guide/en/elasticsearch/reference/current/rename-processor.html)

Parameters
  • pipeline_name (str) – Name of pipeline.

  • pipeline_description (str) – Description of pipeline.

  • src_field (str) – Name of the field before change.

  • target_field (str) – Name of the field after change.

  • src_idx (str) – Name of index before change.

  • dest_idx (str) – Name of index after changes.

create_index(index, mappings=None, setting={'settings': {'number_of_replicas': 0, 'number_of_shards': 1}}, additional_settings=None)[source]
Create index

(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html)

Parameters
  • index (str) – name of index

  • setting (dict, optional) – index settings. Defaults to {“settings”: {“number_of_shards”: 1}}.

  • mappings (dict, optional) – index mappings. Deafults to None.

  • additional_settings (dict) – additional settings. Defaults to None.

create_index_with_file(index, _file, num_shard=1, num_replica=0)[source]
Create index with an index description file

(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html)

Parameters
  • index (str) – name of index

  • _file (dict) – index setting description.

  • num_shard (int, optional) – number of shards. Defaults to 1.

  • num_replica (int, optional) – number of replicas. Defaults to 0.

Returns

(requests.Response)

data_to_es_bulk(cursor, index='test', count=None, bulk_size=100, _id='uniprot_id', headers={'Content-Type': 'application/json'})[source]

Load data into elasticsearch service

Parameters
  • count (int) – cursor size

  • cursor (pymongo.Cursor or iter) – documents to be PUT/POST to es

  • index (str) – name of unique key to be used as index for es

  • bulk_size (int) – number of documents in one PUT

  • headers (dict) – http header

  • _id (str) – key in mogno collection for identification

Returns

set of status codes

Return type

(set)

data_to_es_single(count, cursor, index, _id='uniprot_id', headers={'Content-Type': 'application/json'})[source]

Load data into elasticsearch service

Parameters
  • count (int) – cursor size

  • cursor (pymongo.Cursor or iter) – documents to be PUT to es

  • index (str) – name of unique key to be used as index for es

  • es_endpoint (str) – elasticsearch endpoint

  • headers (dict) – http header information

  • _id (str) – key in mongo collection for identification

Returns

set of status codes

Return type

(set)

delete_index(index, _id=None)[source]

Delete elasticsearch index

Parameters
  • index (str) – name of index in es

  • _id (int) – id of the doc in index (optional)

enable_fielddata(index, _type, field)[source]

Enable fielddata for type fields

Parameters
  • index (str) – Index in which the operation will be done

  • _type (str) – Existing mapping for field.

  • field (str) – name of the field.

get_index_mapping(index='.kibana_1')[source]

Get

Parameters

index (str, optional) – Comma-separated list or wildcard expression of index names. Defaults to ‘.kibana_1’.

Returns

(requests.Response)

index_health_status()[source]

shows the health status, number of documents, and disk usage for each index

Returns

http response

Return type

(HTTPResponse)

index_settings(index, number_of_replicas, number_of_shards=1, other_settings={}, headers={'Content-Type': 'application/json'})[source]

Setting index’s shard and replica number in es cluster

Parameters
  • index (str) – name of index to be set

  • number_of_replicas (int) – number of replica shards to be used for the index

  • number_of_shards (int) – number of primary shards contained in the es cluster

  • other_settings (dict) – other index settings.

  • headers (dict) – http request content header description

Returns

http response

Return type

(HTTPResponse)

make_action_and_metadata(index, _id)[source]

Make action_and_metadata obj for bulk loading e.g. { “index”: { “_index” : “index”, “_id” : “id” } }

Parameters
  • index (str) – name of index on ES

  • _id (str) – unique id for document

Returns

metadata that conforms to ES bulk load requirement

Return type

(dict)

migrate_index(old_index, new_index, headers={'Content-Type': 'application/json'}, number_of_shards=1, number_of_replicas=0)[source]

Migrate old index to new index whilst changing shard and replica setting

Parameters
  • old_index (str) – name of the old index.

  • new_index (str) – name of the new index

  • headers (HTTP.header, optional) – header. Defaults to { “Content-Type”: “application/json” }.

  • number_of_shards (int, optional) – number of shards for the index. Defaults to 1.

  • number_of_replicas (int, optional) – number of replicas for the index. Defaults to 1.

Returns

(list of requests.Response)

put_mapping(index, body)[source]

Put index mapping to exisiting index. (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html)

Parameters
  • index (str) – mapping for the index.

  • body (dict) – mapping description.

Returns

(requests.Response)

test_analyzer(msg, tokenizer='standard', index=None)[source]

Test ES analyzer / tokenizer results. https://www.elastic.co/guide/en/elasticsearch/reference/6.8/analysis-standard-tokenizer.html

Parameters
  • msg (str) – Message to be analyzed.

  • tokenizer (str, optional) – analyzer to be used.

  • index (str, optional) – Index in which custom analyzer resides.

Returns

http response

Return type

(HTTPResponse)

unassigned_reason()[source]

sends http request to get why a shard is unassigned

Returns

http response

Return type

(HTTPResponse)

update_alias_to_idx(idx, alias, action='add')[source]

Add aliases to an index. (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html)

Parameters
  • idx (str or list of str) – indices official name / names.

  • alias (str) – index alias.

  • action (str) – add or remove

karr_lab_aws_manager.elasticsearch_kl.util.main()[source]

3.1.1.3.5. Module contents