最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Elasticsearch exporter bulk indexer flush error with random IP address - Stack Overflow

programmeradmin3浏览0评论

I've been trying to setup an opentelemetry collector (v0.121.0 to be exact) so that it collects logs using otlp receiver and sends them to our already long-standing production elasticsearch.

However I've been unable to successfully send any data to elastic so far using this setup.

I will try to list my attempts in logical more than chronological order - I hope it will be easier to follow that way. First, the versions of components used: ElasticSearch: 8.16.1 OpenTelemetry collector contrib: v.0.121.0 Docker: v4.38.0 Python: 3.13.1

I deployed a local docker setup with Elastic, Kibana and OpenTelemetry Contrib containers to make sure the config is correct and I can send the logs to elastic. The compose file for that setup:

# version: "3"

services:

  # ElasticSearch
  els:
    image: elasticsearch:8.16.1
    container_name: els
    hostname: els
    environment:
      - xpack.security.enrollment.enabled=false
      - xpack.security.enabled=false
      - bootstrap.memory_lock=true
      - ES_JAVA_OPTS=-XX:UseSVE=0
      - CLI_JAVA_OPTS=-XX:UseSVE=0
      - discovery.type=single-node
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - elastic
    volumes:
      - esdata1:/usr/share/elasticsearch/data
      - eslog:/usr/share/elasticsearch/logs
      - /Users/***/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml

  # Kibana
  kibana:
    image: docker.elastic.co/kibana/kibana:8.16.1
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_URL: http://els:9200
    networks:
      - elastic
    depends_on:
    - els

  # OtelCollector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.121.0
    container_name: otel-collector
    volumes:
      - type: bind
        source: /Users/***/otel_config_test.yaml
        target: /etc/otelcol-contrib/config.yaml
    ports:
      - "13133:13133" # Health_check extension
      - "4318:4318"   # OTLP http receiver
      - "4317:4317"   # OTLP gRPC receiver
    networks:
      - elastic

volumes:
  esdata1:
  eslog:

networks:
  elastic:
    driver: bridge

and opentelemetry config file:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

exporters:
  elasticsearch:
    endpoint: :9200
    tls:
      insecure: true
      insecure_skip_verify: true

    logs_index: test
    discover:
      on_start: true
    flush:
      bytes: 10485760
    retry:
      max_retries: 5
      retry_on_status:
        - 429
        - 500
    sending_queue:
      enabled: true
    compression: none
  debug:


connectors:

extensions:
  health_check:
  pprof:
  zpages:


service:
  extensions: [health_check, pprof, zpages]
  telemetry:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [elasticsearch, debug]

(I'm on M4 Mac, so five the weird localhost in the config). That worked beautifully and I was able to send data to the otel receiver using this script:

import logging

from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

logging.basicConfig(level=logging.DEBUG)

logger_provider = LoggerProvider(
)
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint=":4317/v1/logs", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

logging.getLogger().addHandler(handler)

logging.info("Python to ES collector")

logger_provider.shutdown()

After that the logs were successfully send to Elastic and I could see them in Kibana (Analytics > Discover with filter created for my "test" index. So far so good, I was quite sure everything will work the same for prod Elastic given proper credentials. That's not what happened... I adjusted the opentelemetry config with proper elastic API endpoint and credentials :

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

exporters:
  elasticsearch:
    endpoint: 
    tls:
      insecure: true
      insecure_skip_verify: true
    auth:
      authenticator: basicauth
    logs_index: test
    discover:
      on_start: true
    flush:
      bytes: 10485760
    retry:
      max_retries: 5
      retry_on_status:
        - 429
        - 500
    sending_queue:
      enabled: true
    compression: none
  debug:


connectors:

extensions:
  health_check:
  pprof:
  zpages:
  basicauth:
    client_auth:
      username: "*******"
      password: "*******"

service:
  extensions: [basicauth, health_check, pprof, zpages]
  telemetry:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [elasticsearch, debug]

and deployed it again (used the same compose.yaml, just commented the es/kibana parts). Now I got an error from opentelemetry:

otel-collector  | 2025-03-17T07:21:36.889Z      info    [email protected]/service.go:281 Everything is ready. Begin running and processing data.
otel-collector  | 2025-03-17T07:21:43.781Z      info    Logs    {"otelcolponent.id": "debug", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "resource logs": 1, "log records": 1}
otel-collector  | 2025-03-17T07:23:06.970Z      error   [email protected]/bulkindexer.go:346       bulk indexer flush error        {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "failed to execute the request: dial tcp 10.46.10.163:18926: i/o timeout"}
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.flushBulkIndexer
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:346
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).flush
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:329
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).run
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:322
otel-collector  | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.newAsyncBulkIndexer.func1
otel-collector  |       github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:216

Now a note on the 10.46.10.163:18926 address - this is some random crap, I have no clue where it's coming from. What I did manage to establish is that it's not a result of some local /etc/hosts or similar config, since both IP and port keeps changing for consecutive logs being send to the receiver. That and the fact that I have pristine /etc/hosts file. Another clue that it's not about that is that when I intentionally provide a wrong password in the auth config, I get 401 and not timeout/no host: error [email protected]/bulkindexer.go:346 bulk indexer flush error {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "flush failed (401): {"error":{"type":"security_exception","caused_by":{}}}"}

To any rational mind (without prior knowledge of what this issue is) this would surely look like some sort of connectivity issue. At least that was my thought, so I proceed to test that idea. I wrote a simple python script that sends a log directly to Elastic using the same auth method as the opentelemetry config.

from elasticsearch import Elasticsearch


username = '********'
password = '********'


es = Elasticsearch(
    hosts = [";],
    basic_auth=(username, password)
)

# Example log message
log_message = {
    '@timestamp': '2025-03-14T08:49:07Z',
    'level': 'INFO',
    'message': 'Test log directly from python'
}

# Index the log message
response = es.index(index='test', document=log_message)

print(response)

And this was again successful... Ok, perhaps it's some docker shenanigans I don't understand, let's remove that layer to see if that's it. I downloaded opentelemetry v0.121.0 binary and started it with the exact same config (once with correct and once with incorrect credentials) - got exactly the same errors as with the docker deployment.

As a hail mary I also tried with cloudid instead of an endpoint, but I just got even more confusing result

error   [email protected]/bulkindexer.go:346       bulk indexer flush error        {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "failed to execute the request: dial tcp: lookup *******$****.hostname: no such host"

Now I obfuscated the hell out of that, but my point is that it was decoded correctly and I tripple-checked that the cloudid I provided is correct. I also tried that cloudid in my python script for sending logs directly to elastic and, of course, it worked perfectly...

At some point I also tried (mostly) the same setup with Azure Container Groups - yielded the same fake IP issue as my local docker and binary setups.

One last thing about our elastic deployment: there is no traffic filter (I think it only applies to deployments and not for connecting to the API anyway, but I felt it's better to mention it)

发布评论

评论列表(0)

  1. 暂无评论