I've been trying to setup an opentelemetry collector (v0.121.0 to be exact) so that it collects logs using otlp receiver and sends them to our already long-standing production elasticsearch.
However I've been unable to successfully send any data to elastic so far using this setup.
I will try to list my attempts in logical more than chronological order - I hope it will be easier to follow that way. First, the versions of components used: ElasticSearch: 8.16.1 OpenTelemetry collector contrib: v.0.121.0 Docker: v4.38.0 Python: 3.13.1
I deployed a local docker setup with Elastic, Kibana and OpenTelemetry Contrib containers to make sure the config is correct and I can send the logs to elastic. The compose file for that setup:
# version: "3"
services:
# ElasticSearch
els:
image: elasticsearch:8.16.1
container_name: els
hostname: els
environment:
- xpack.security.enrollment.enabled=false
- xpack.security.enabled=false
- bootstrap.memory_lock=true
- ES_JAVA_OPTS=-XX:UseSVE=0
- CLI_JAVA_OPTS=-XX:UseSVE=0
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
networks:
- elastic
volumes:
- esdata1:/usr/share/elasticsearch/data
- eslog:/usr/share/elasticsearch/logs
- /Users/***/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
# Kibana
kibana:
image: docker.elastic.co/kibana/kibana:8.16.1
ports:
- "5601:5601"
environment:
ELASTICSEARCH_URL: http://els:9200
networks:
- elastic
depends_on:
- els
# OtelCollector
otel-collector:
image: otel/opentelemetry-collector-contrib:0.121.0
container_name: otel-collector
volumes:
- type: bind
source: /Users/***/otel_config_test.yaml
target: /etc/otelcol-contrib/config.yaml
ports:
- "13133:13133" # Health_check extension
- "4318:4318" # OTLP http receiver
- "4317:4317" # OTLP gRPC receiver
networks:
- elastic
volumes:
esdata1:
eslog:
networks:
elastic:
driver: bridge
and opentelemetry config file:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
memory_limiter:
check_interval: 1s
limit_mib: 1000
spike_limit_mib: 200
exporters:
elasticsearch:
endpoint: :9200
tls:
insecure: true
insecure_skip_verify: true
logs_index: test
discover:
on_start: true
flush:
bytes: 10485760
retry:
max_retries: 5
retry_on_status:
- 429
- 500
sending_queue:
enabled: true
compression: none
debug:
connectors:
extensions:
health_check:
pprof:
zpages:
service:
extensions: [health_check, pprof, zpages]
telemetry:
pipelines:
logs:
receivers: [otlp]
processors: [batch, memory_limiter]
exporters: [elasticsearch, debug]
(I'm on M4 Mac, so five the weird localhost in the config). That worked beautifully and I was able to send data to the otel receiver using this script:
import logging
from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
BatchSpanProcessor,
ConsoleSpanExporter,
)
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(ConsoleSpanExporter())
)
logging.basicConfig(level=logging.DEBUG)
logger_provider = LoggerProvider(
)
set_logger_provider(logger_provider)
exporter = OTLPLogExporter(endpoint=":4317/v1/logs", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)
logging.getLogger().addHandler(handler)
logging.info("Python to ES collector")
logger_provider.shutdown()
After that the logs were successfully send to Elastic and I could see them in Kibana (Analytics > Discover with filter created for my "test" index. So far so good, I was quite sure everything will work the same for prod Elastic given proper credentials. That's not what happened... I adjusted the opentelemetry config with proper elastic API endpoint and credentials :
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
memory_limiter:
check_interval: 1s
limit_mib: 1000
spike_limit_mib: 200
exporters:
elasticsearch:
endpoint:
tls:
insecure: true
insecure_skip_verify: true
auth:
authenticator: basicauth
logs_index: test
discover:
on_start: true
flush:
bytes: 10485760
retry:
max_retries: 5
retry_on_status:
- 429
- 500
sending_queue:
enabled: true
compression: none
debug:
connectors:
extensions:
health_check:
pprof:
zpages:
basicauth:
client_auth:
username: "*******"
password: "*******"
service:
extensions: [basicauth, health_check, pprof, zpages]
telemetry:
pipelines:
logs:
receivers: [otlp]
processors: [batch, memory_limiter]
exporters: [elasticsearch, debug]
and deployed it again (used the same compose.yaml, just commented the es/kibana parts). Now I got an error from opentelemetry:
otel-collector | 2025-03-17T07:21:36.889Z info [email protected]/service.go:281 Everything is ready. Begin running and processing data.
otel-collector | 2025-03-17T07:21:43.781Z info Logs {"otelcolponent.id": "debug", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "resource logs": 1, "log records": 1}
otel-collector | 2025-03-17T07:23:06.970Z error [email protected]/bulkindexer.go:346 bulk indexer flush error {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "failed to execute the request: dial tcp 10.46.10.163:18926: i/o timeout"}
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.flushBulkIndexer
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:346
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).flush
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:329
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).run
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:322
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.newAsyncBulkIndexer.func1
otel-collector | github/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/bulkindexer.go:216
Now a note on the 10.46.10.163:18926 address - this is some random crap, I have no clue where it's coming from. What I did manage to establish is that it's not a result of some local /etc/hosts or similar config, since both IP and port keeps changing for consecutive logs being send to the receiver. That and the fact that I have pristine /etc/hosts file. Another clue that it's not about that is that when I intentionally provide a wrong password in the auth config, I get 401 and not timeout/no host: error [email protected]/bulkindexer.go:346 bulk indexer flush error {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "flush failed (401): {"error":{"type":"security_exception","caused_by":{}}}"}
To any rational mind (without prior knowledge of what this issue is) this would surely look like some sort of connectivity issue. At least that was my thought, so I proceed to test that idea. I wrote a simple python script that sends a log directly to Elastic using the same auth method as the opentelemetry config.
from elasticsearch import Elasticsearch
username = '********'
password = '********'
es = Elasticsearch(
hosts = [";],
basic_auth=(username, password)
)
# Example log message
log_message = {
'@timestamp': '2025-03-14T08:49:07Z',
'level': 'INFO',
'message': 'Test log directly from python'
}
# Index the log message
response = es.index(index='test', document=log_message)
print(response)
And this was again successful... Ok, perhaps it's some docker shenanigans I don't understand, let's remove that layer to see if that's it. I downloaded opentelemetry v0.121.0 binary and started it with the exact same config (once with correct and once with incorrect credentials) - got exactly the same errors as with the docker deployment.
As a hail mary I also tried with cloudid instead of an endpoint, but I just got even more confusing result
error [email protected]/bulkindexer.go:346 bulk indexer flush error {"otelcolponent.id": "elasticsearch", "otelcolponent.kind": "Exporter", "otelcol.signal": "logs", "error": "failed to execute the request: dial tcp: lookup *******$****.hostname: no such host"
Now I obfuscated the hell out of that, but my point is that it was decoded correctly and I tripple-checked that the cloudid I provided is correct. I also tried that cloudid in my python script for sending logs directly to elastic and, of course, it worked perfectly...
At some point I also tried (mostly) the same setup with Azure Container Groups - yielded the same fake IP issue as my local docker and binary setups.
One last thing about our elastic deployment: there is no traffic filter (I think it only applies to deployments and not for connecting to the API anyway, but I felt it's better to mention it)