Skip to content

Indices automatically created while es.index.auto.create = false #2370

Open
@frensjan

Description

@frensjan

es.index.auto.create should govern whether elasticsearch-hadoop automatically creates indices or not. At least for Hadoop MapReduce, a check for whether the index exists is done in org.elasticsearch.hadoop.mr.EsOutputFormat#init which is called when a job is submitted. However, after that check, auto-creation is then no longer checked.

This causes an issue that if an index is deleted while it is being written to, the index can be recreated in org.elasticsearch.hadoop.mr.EsOutputFormat.EsRecordWriter#init. This happens in the first write to the EsRecordWriter.

If for instance action.auto_create_index is disabled for an Elasticsearch cluster when an index is deleted, writes to it will fail. However, if e.g. a MapReduce task is retried because of this, the check in EsOutputFormat#init is not done, so the index is (re-)created in EsRecordWriter#init. In case of a bare index (e.g., not managed by index templates) the index is created without a mapping which can cause all sorts of trouble.

A partial stacktrace is included for reference below:

"REDACTED" prio=5 tid=0x215 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:556)
	  at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:373)
	  at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:658)
	  at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:634)
	  at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
	  at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:150)
...

A possible solution could be to check es.index.auto.create somewhere around / in org.elasticsearch.hadoop.rest.RestRepository#touch.

I'd be happy to do the coding and provide a PR. But I'd like to get some feedback first.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions