Get tokens from text analysis | Elasticsearch API documentation

Get tokens from text analysis Generally available

POST /_analyze

The analyze API performs analysis on a text string and returns the resulting tokens.

Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit.

Required authorization

Index privileges: index

External documentation

Query parameters

index string

Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

application/json

Body

analyzer string

The name of the analyzer that should be applied to the provided text. This could be a built-in analyzer, or an analyzer that’s been configured in the index.
attributes array[string]

Array of token attributes used to filter the output of the explain parameter.
char_filter array

Array of character filters used to preprocess characters before the tokenizer.

External documentation
explain boolean

If true, the response includes token attributes and additional details.
field string

Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
filter array

Array of token filters used to apply after the tokenizer.

External documentation
normalizer string

Normalizer to use to convert text into a single token.
text string | array[string]

One of:
TextToAnalyze string TextToAnalyze array[string]

Responses

200 application/json
Hide response attributes Show response attributes object
- detail object
  
  Hide detail attributes Show detail attributes object
  
  analyzer object
  
  Hide analyzer attributes Show analyzer attributes object
  
  name string Required
  
  tokens array[object] Required
  
  Hide tokens attributes Show tokens attributes object
  
  bytes string Required
  
  end_offset number Required
  
  keyword boolean
  
  position number Required
  
  positionLength number Required
  
  start_offset number Required
  
  termFrequency number Required
  
  token string Required
  
  type string Required
  
  charfilters array[object]
  
  Hide charfilters attributes Show charfilters attributes object
  
  filtered_text array[string] Required
  
  name string Required
  
  custom_analyzer boolean Required
  
  tokenfilters array[object]
  
  Hide tokenfilters attributes Show tokenfilters attributes object
  
  name string Required
  
  tokens array[object] Required
  
  Hide tokens attributes Show tokens attributes object
  
  bytes string Required
  
  end_offset number Required
  
  keyword boolean
  
  position number Required
  
  positionLength number Required
  
  start_offset number Required
  
  termFrequency number Required
  
  token string Required
  
  type string Required
  
  tokenizer object
  
  Hide tokenizer attributes Show tokenizer attributes object
  
  name string Required
  
  tokens array[object] Required
  
  Hide tokens attributes Show tokens attributes object
  
  bytes string Required
  
  end_offset number Required
  
  keyword boolean
  
  position number Required
  
  positionLength number Required
  
  start_offset number Required
  
  termFrequency number Required
  
  token string Required
  
  type string Required
- tokens array[object]
  
  Hide tokens attributes Show tokens attributes object
  
  end_offset number Required
  
  position number Required
  
  positionLength number
  
  start_offset number Required
  
  token string Required
  
  type string Required

POST /_analyze

GET /_analyze
{
  "analyzer": "standard",
  "text": "this is a test"
}

resp = client.indices.analyze(
    analyzer="standard",
    text="this is a test",
)

const response = await client.indices.analyze({
  analyzer: "standard",
  text: "this is a test",
});

response = client.indices.analyze(
  body: {
    "analyzer": "standard",
    "text": "this is a test"
  }
)

$resp = $client->indices()->analyze([
    "body" => [
        "analyzer" => "standard",
        "text" => "this is a test",
    ],
]);

curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analyzer":"standard","text":"this is a test"}' "$ELASTICSEARCH_URL/_analyze"

Request examples

You can apply any of the built-in analyzers to the text string without specifying an index.

{
  "analyzer": "standard",
  "text": "this is a test"
}

If the text parameter is provided as array of strings, it is analyzed as a multi-value field.

{
  "analyzer": "standard",
  "text": [
    "this is a test",
    "the second text"
  ]
}

You can test a custom transient analyzer built from tokenizers, token filters, and char filters. Token filters use the filter parameter.

{
  "tokenizer": "keyword",
  "filter": [
    "lowercase"
  ],
  "char_filter": [
    "html_strip"
  ],
  "text": "this is a <b>test</b>"
}

Custom tokenizers, token filters, and character filters can be specified in the request body.

{
  "tokenizer": "whitespace",
  "filter": [
    "lowercase",
    {
      "type": "stop",
      "stopwords": [
        "a",
        "is",
        "this"
      ]
    }
  ],
  "text": "this is a test"
}

Run `GET /analyze_sample/_analyze` to run an analysis on the text using the default index analyzer associated with the `analyze_sample` index. Alternatively, the analyzer can be derived based on a field mapping.

{
  "field": "obj1.field1",
  "text": "this is a test"
}

Run `GET /analyze_sample/_analyze` and supply a normalizer for a keyword field if there is a normalizer associated with the specified index.

{
  "normalizer": "my_normalizer",
  "text": "BaR"
}

If you want to get more advanced details, set `explain` to `true`. It will output all token attributes for each token. You can filter token attributes you want to output by setting the `attributes` option. NOTE: The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.

{
  "tokenizer": "standard",
  "filter": [
    "snowball"
  ],
  "text": "detailed output",
  "explain": true,
  "attributes": [
    "keyword"
  ]
}

Response examples (200)

A successful response for an analysis with `explain` set to `true`.

{
  "detail": {
    "custom_analyzer": true,
    "charfilters": [],
    "tokenizer": {
      "name": "standard",
      "tokens": [
        {
          "token": "detailed",
          "start_offset": 0,
          "end_offset": 8,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "output",
          "start_offset": 9,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    },
    "tokenfilters": [
      {
        "name": "snowball",
        "tokens": [
          {
            "token": "detail",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0,
            "keyword": false
          },
          {
            "token": "output",
            "start_offset": 9,
            "end_offset": 15,
            "type": "<ALPHANUM>",
            "position": 1,
            "keyword": false
          }
        ]
      }
    ]
  }
}

Get tokens from text analysis Generally available

Required authorization

Query parameters

Body

text string | array[string]

Responses