python - Elasticsearch exporter bulk indexer flush error with random IP address

I've been trying to setup an opentelemetry collector (v0.121.0 to be exact) so that it collects logs using otlp receiver and sends them to our already long-standing production elasticsearch.

However I've been unable to successfully send any data to elastic so far using this setup.

I will try to list my attempts in logical more than chronological order - I hope it will be easier to follow that way. First, the versions of components used: ElasticSearch: 8.16.1 OpenTelemetry collector contrib: v.0.121.0 Docker: v4.38.0 Python: 3.13.1

I deployed a local docker setup with Elastic, Kibana and OpenTelemetry Contrib containers to make sure the config is correct and I can send the logs to elastic. The compose file for that setup:

# version: "3"

services:

  # ElasticSearch
  els:
    image: elasticsearch:8.16.1
    container_name: els
    hostname: els
    environment:
      - xpack.security.enrollment.enabled=false
      - xpack.security.enabled=false
      - bootstrap.memory_lock=true
      - ES_JAVA_OPTS=-XX:UseSVE=0
      - CLI_JAVA_OPTS=-XX:UseSVE=0
      - discovery.type=single-node
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - elastic
    volumes:
      - esdata1:/usr/share/elasticsearch/data
      - eslog:/usr/share/elasticsearch/logs
      - /Users/***/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml

  # Kibana
  kibana:
    image: docker.elastic.co/kibana/kibana:8.16.1
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_URL: http://els:9200
    networks:
      - elastic
    depends_on:
    - els

  # OtelCollector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.121.0
    container_name: otel-collector
    volumes:
      - type: bind
        source: /Users/***/otel_config_test.yaml
        target: /etc/otelcol-contrib/config.yaml
    ports:
      - "13133:13133" # Health_check extension
      - "4318:4318"   # OTLP http receiver
      - "4317:4317"   # OTLP gRPC receiver
    networks:
      - elastic

volumes:
  esdata1:
  eslog:

networks:
  elastic:
    driver: bridge

and opentelemetry config file:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

exporters:
  elasticsearch:
    endpoint: :9200
    tls:
      insecure: true
      insecure_skip_verify: true

    logs_index: test
    discover:
      on_start: true
    flush:
      bytes: 10485760
    retry:
      max_retries: 5
      retry_on_status:
        - 429
        - 500
    sending_queue:
      enabled: true
    compression: none
  debug:


connectors:

extensions:
  health_check:
  pprof:
  zpages:


service:
  extensions: [health_check, pprof, zpages]
  telemetry:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [elasticsearch, debug]

(I'm on M4 Mac, so five the weird localhost in the config). That worked beautifully and I was able to send data to the otel receiver using this script:

import logging

from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

logging.basicConfig(level=logging.DEBUG)

logger_provider = LoggerProvider(
)
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint=":4317/v1/logs", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

logging.getLogger().addHandler(handler)

logging.info("Python to ES collector")

logger_provider.shutdown()

After that the logs were successfully send to Elastic and I could see them in Kibana (Analytics > Discover with filter created for my "test" index. So far so good, I was quite sure everything will work the same for prod Elastic given proper credentials. That's not what happened... I adjusted the opentelemetry config with proper elastic API endpoint and credentials :

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

exporters:
  elasticsearch:
    endpoint: 
    tls:
      insecure: true
      insecure_skip_verify: true
    auth:
      authenticator: basicauth
    logs_index: test
    discover:
      on_start: true
    flush:
      bytes: 10485760
    retry:
      max_retries: 5
      retry_on_status:
        - 429
        - 500
    sending_queue:
      enabled: true
    compression: none
  debug:


connectors:

extensions:
  health_check:
  pprof:
  zpages:
  basicauth:
    client_auth:
      username: "*******"
      password: "*******"

service:
  extensions: [basicauth, health_check, pprof, zpages]
  telemetry:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [elasticsearch, debug]

and deployed it again (used the same compose.yaml, just commented the es/kibana parts). Now I got an error from opentelemetry:

otel-collector  | 2025-03-17T07:21:36.889Z      info    [email protected]/service.go:281 Everything is ready. Begin running and processing data.
otel-collector  | 2025-03-17T07:21:43.781Z      info    Logs    {"otelcolponent.id": "debug", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "resource logs": 1, "log records": 1}
otel-collector  | 2025-03-17T07:23:06.970Z      error   [email protected]/bulkindexer.go:346       bulk indexer flush error        {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "failed to execute the request: dial tcp 10.46.10.163:18926: i/o timeout"}
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.flushBulkIndexer
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:346
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).flush
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:329
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).run
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:322
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.newAsyncBulkIndexer.func1
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:216

Now a note on the 10.46.10.163:18926 address - this is some random crap, I have no clue where it's coming from. What I did manage to establish is that it's not a result of some local /etc/hosts or similar config, since both IP and port keeps changing for consecutive logs being send to the receiver. That and the fact that I have pristine /etc/hosts file. Another clue that it's not about that is that when I intentionally provide a wrong password in the auth config, I get 401 and not timeout/no host: error [email protected]/bulkindexer.go:346 bulk indexer flush error {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "flush failed (401): {"error":{"type":"security_exception","caused_by":{}}}"}

To any rational mind (without prior knowledge of what this issue is) this would surely look like some sort of connectivity issue. At least that was my thought, so I proceed to test that idea. I wrote a simple python script that sends a log directly to Elastic using the same auth method as the opentelemetry config.

from elasticsearch import Elasticsearch


username = '********'
password = '********'


es = Elasticsearch(
    hosts = [";],
    basic_auth=(username, password)
)

# Example log message
log_message = {
    '@timestamp': '2025-03-14T08:49:07Z',
    'level': 'INFO',
    'message': 'Test log directly from python'
}

# Index the log message
response = es.index(index='test', document=log_message)

print(response)

And this was again successful... Ok, perhaps it's some docker shenanigans I don't understand, let's remove that layer to see if that's it. I downloaded opentelemetry v0.121.0 binary and started it with the exact same config (once with correct and once with incorrect credentials) - got exactly the same errors as with the docker deployment.

As a hail mary I also tried with cloudid instead of an endpoint, but I just got even more confusing result

error   [email protected]/bulkindexer.go:346       bulk indexer flush error        {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "failed to execute the request: dial tcp: lookup *******$****.hostname: no such host"

Now I obfuscated the hell out of that, but my point is that it was decoded correctly and I tripple-checked that the cloudid I provided is correct. I also tried that cloudid in my python script for sending logs directly to elastic and, of course, it worked perfectly...

At some point I also tried (mostly) the same setup with Azure Container Groups - yielded the same fake IP issue as my local docker and binary setups.

One last thing about our elastic deployment: there is no traffic filter (I think it only applies to deployments and not for connecting to the API anyway, but I felt it's better to mention it)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Elasticsearch exporter bulk indexer flush error with random IP address - Stack Overflow

与本文相关的文章

评论列表(0)