Elasticsearch Reindex Change Field Type

Working with databases is very fun but can sometimes be challenging, especially when dealing with already-existing data.

For example, if you want to change the type of a specific field, it might require you to take the service down, which can have grave repercussions, especially in services that process large amounts of data.

Fortunately, we can use Elasticsearch’s powerful features such as Reindexing, ingest nodes, pipelines, and processors to make such tasks very easy.

This tutorial will show you how to change a field type in a specific index to another, using Elasticsearch Ingest nodes. Using this approach will eliminate downtime that affects services while still managing to perform the field type change tasks.

Introduction to Ingest Nodes

Elasticsearch’s ingest node allows you to pre-process documents before their indexing.

An Elasticsearch node is a specific instance of Elasticsearch; connected nodes (more than one) make a single cluster.

You can view the nodes available in the running cluster with the request:

GET /_nodes/

The cURL command for this is:

curl -XGET “http://localhost:9200/_nodes/”

Executing this command should give you massive information about the nodes, as shown below (truncated output):

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "22e0bee6ef91461d82d9b0f1b4b13b4a",
  "nodes" : {
    "gSlMjTKyTemoOX-EO7Em4w" : {
      "name" : "instance-0000000003",
      "transport_address" : "172.28.86.133:19925",
      "host" : "172.28.86.133",
      "ip" : "172.28.86.133",
      "version" : "7.10.2",
      "build_flavor" : "default",
      "build_type" : "docker",
      "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
      "total_indexing_buffer" : 214748364,
      "roles" : [
        "data",
        "data_cold",
        "data_content",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "remote_cluster_client",
        “transform”
      ],
      "attributes" : {
        "logical_availability_zone" : "zone-0",
        "server_name" : "instance-0000000003.22e0bee6ef91461d82d9b0f1b4b13b4a",
        "availability_zone" : "us-west-1c",
        "xpack.installed" : "true",
        "instance_configuration" : "aws.data.highio.i3",
        "transform.node" : "true",
        "region" : "us-west-1"
      },
      "settings" : {
        "s3" : {
          "client" : {
            "elastic-internal-22e0be" : {
              "endpoint" : "s3-us-west-1.amazonaws.com"
            }
          }
        },
--------------------------------output truncated---------------------

By default, all Elasticsearch nodes enable ingest and are capable of handling ingest operations. However, for heavy ingest operations, you can create a single node dedicated to ingesting only.

To handle pre_process, before indexing the documents, we need to define a pipeline that states the preprocessors series.

Preprocessors are sets of instructions wrapped around a pipeline and are executed one at a time.

The following is the general syntax of how to define a pipeline:

{
  "description" : "Convert me",
  "processors" : [{
      "convert" : {
        "field" : "id",
        "type": "integer"
      } ]
}

The description property says what the pipeline should achieve. The next parameter is the preprocessors, passed on as a list in the order of their execution.

Create a Convert Pipeline

To create a pipeline that we will use to convert a type, use the PUT request with the _ingest API endpoint as:

PUT _ingest/pipeline/convert_pipeline
{
  “description”: “converts the field dayOfWeek field to a long from integer”,
  "processors" : [
    {
      "convert" : {
        "field" : "dayOfWeek",
        "type": "long"
      }
    }
  ]
}

For cURL, use the command:

curl -XPUT "http://localhost:9200/_ingest/pipeline/convert_pipeline" -H 'Content-Type: application/json' -d'{  "description": "converts the dayOfWeek field to a long from integer",  "processors" : [    {      "convert" : {        "field" : "dayOfWeek",        "type": "long"      }    }  ]}'

Reindex and Convert Type

Once we have the pipeline in the ingest node, all we need to do is call the indexing API and pass the pipeline as an argument in the dest of the request body as:

POST _reindex
{
  “source”: {
    "index": "kibana_sample_data_flights"
  },
  "dest": {
    "index": "kibana_sample_type_diff",
    "pipeline": "convert_pipeline"
  }
}

For cURL:

curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'{  "source": {    "index": "kibana_sample_data_flights"  },  "dest": {    "index": "kibana_sample_type_diff",    "pipeline": "convert_pipeline"  }}'

 Verify Conversion

To verify that the pipeline has applied correctly, use the GET request to fetch that specific field as:

GET /kibana_sample_data_flights/_mapping/field/dayOfWeek
GET /kibana_sample_type_diff/_mapping/field/dayOfWeek

This should return the data as:

-----------------------ORIGINAL INDEX---------------------------
{
  "kibana_sample_data_flights" : {
    "mappings" : {
      "dayOfWeek" : {
        "full_name" : "dayOfWeek",
        "mapping" : {
          "dayOfWeek" : {
            "type" : "integer"
          }
        }
      }
    }
  }
}
 
-------------------------REINDEXED DATA-------------------------------
{
  "kibana_sample_type_diff" : {
    "mappings" : {
      "dayOfWeek" : {
        "full_name" : "dayOfWeek",
        "mapping" : {
          "dayOfWeek" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

Conclusion

In this guide, we have looked at how to work with Elasticsearch Ingest nodes to pre-process documents before indexing, thus converting a field from one type to another.

Consider the documentation to learn more.

https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html



from Linux Hint https://ift.tt/3oWREZn

Post a Comment

0 Comments