Description
es.index.auto.create
should govern whether elasticsearch-hadoop automatically creates indices or not. At least for Hadoop MapReduce, a check for whether the index exists is done in org.elasticsearch.hadoop.mr.EsOutputFormat#init
which is called when a job is submitted. However, after that check, auto-creation is then no longer checked.
This causes an issue that if an index is deleted while it is being written to, the index can be recreated in org.elasticsearch.hadoop.mr.EsOutputFormat.EsRecordWriter#init
. This happens in the first write to the EsRecordWriter
.
If for instance action.auto_create_index
is disabled for an Elasticsearch cluster when an index is deleted, writes to it will fail. However, if e.g. a MapReduce task is retried because of this, the check in EsOutputFormat#init
is not done, so the index is (re-)created in EsRecordWriter#init
. In case of a bare index (e.g., not managed by index templates) the index is created without a mapping which can cause all sorts of trouble.
A partial stacktrace is included for reference below:
"REDACTED" prio=5 tid=0x215 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:556)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:373)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:658)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:634)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:150)
...
A possible solution could be to check es.index.auto.create
somewhere around / in org.elasticsearch.hadoop.rest.RestRepository#touch
.
I'd be happy to do the coding and provide a PR. But I'd like to get some feedback first.