Observability Platform Implementation (ELK Stack / Prometheus & Grafana)

Job Overview

Budget

$5,500.00

Level

Senior

Location

India

Job Posted

30 Jul, 2025

Category

DevOps

Total Proposals

0

Job Description

Basic Monitoring Setup:

Overview:

This package provides a foundational monitoring and logging solution for a single application or a small set of servers, enabling basic visibility into system health and performance. Ideal for startups or projects requiring quick setup.

Deliverables:

  • Installation and basic configuration of either ELK Stack (Elasticsearch, Logstash, Kibana) or Prometheus & Grafana.
  • r collecting logs and metrics from up to 5 servers/VMs.
  • Creation of 3-5 essential dashboards for key metrics (CPU, Memory, Disk, Network) and log overview.
  • Configuration of 3-5 basic alerts for critical issues (e.g., server down, high resource utilization).
  • Basic documentation for accessing dashboards and understanding alerts.
  • Full installation, configuration, and optimization of a scalable ELK Stack or Prometheus & Grafana cluster.
  • Agent deployment for collecting logs and metrics from up to 20 servers/VMs or a small Kubernetes cluster.
  • Implementation of advanced log parsing, filtering, and metric aggregation.
  • Development of 10-15 custom, actionable dashboards tailored to application and infrastructure needs.
  • Configuration of 10-15 advanced alerting rules with integration to notification channels (e.g., Slack, PagerDuty).
  • Basic distributed tracing setup (e.g., Jaeger/OpenTelemetry integration).
  • Detailed documentation including architecture, configuration, and operational runbooks.
  • One-day training session for your operations team.

Required Qualifications:

  • Minimum 3+ years of hands-on experience in implementing and managing large-scale monitoring and logging solutions, with at least 3 years specifically with either ELK Stack or Prometheus/Grafana.
  • Deep expertise in Elasticsearch cluster management, Logstash pipeline configuration, and Kibana dashboarding, OR profound knowledge of Prometheus metric collection, PromQL, Alertmanager, and Grafana dashboarding.
  • Strong understanding of logging best practices, metric collection strategies, and distributed tracing concepts.
  • Proficiency in Linux server administration and scripting (Bash, Python) for automation and data collection.
  • Experience with containerized environments (Docker, Kubernetes) and collecting metrics/logs from them.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and their native monitoring services.
  • Excellent analytical skills to interpret complex data and identify root causes of issues.
  • Strong communication and collaboration skills to work with diverse technical teams.

Key Skills:

  • Observability
  • Monitoring
  • Logging
  • Alerting
  • ELK Stack
  • Elasticsearch
  • Logstash
  • Kibana
  • Prometheus
  • Grafana
  • Alertmanager
  • PromQL
  • Distributed Tracing (Jaeger, OpenTelemetry)
  • Metrics
  • Logs
  • Dashboards
  • Cloud Monitoring (CloudWatch, Azure Monitor, Google Cloud Monitoring)

 Expectations for Support from Freelancer:

  • Responsiveness: Prompt communication and response to inquiries (within 24 hours on weekdays).
  • Availability: Willingness to be available for urgent issues or critical updates, potentially outside standard business hours, with prior arrangement.
  • Troubleshooting: Ability to quickly diagnose and resolve any post-implementation issues that may arise.
  • Documentation Updates: Keep documentation current with any changes or optimizations made during the support phase.
  • Advisory: Provide expert advice on future scaling, security enhancements, or new feature implementations.

Project Goals:

  • Enhanced Visibility: Provide real-time, actionable insights into system health, performance, and operational issues.
  • Proactive Problem Detection: Enable early identification of issues before they impact users.
  • Faster Incident Response: Streamline troubleshooting and reduce Mean Time To Resolution (MTTR).
  • Improved System Reliability: Support data-driven decisions for system optimization and stability.
  • Operational Efficiency: Reduce manual effort in monitoring and alerting.

Skills

  • Monitoring and logging tools and technologies (e.g., Prometheus, Grafana, ELK Stack)

Tags

Monitoring and logging tools and technologies (e.g., Prometheus, Grafana, ELK Stack)

Author Spotlight

Priya Nair

Priya Nair

Client

No description available.

Related Jobs

1 year ago Senior
$70.00 Hourly

We are seeking a Security Information and Event Management (SIEM) Analyst to set up and manage SIEM solutions for compre...

Log aggregation and analysis
View More
1 year ago MidLevel
$70.00 Hourly

We are seeking a Container Security Specialist to secure our containerized applications, ensuring each component is isol...

Containerization technologies (e.g., Docker, Kubernetes)
View More
1 year ago Senior
$75.00 Hourly

We are looking for a DevSecOps Engineer to integrate security into every stage of our software development lifecycle (SD...

CI/CD security and best practices
View More
1 year ago Junior
$75.00 Hourly

We are hiring a Microservices Security Consultant to implement and manage security protocols for our microservices archi...

Microservices security
View More
Priya Nair

Priya Nair

India


Member Since
Oct 26, 2024
Total Created Jobs
7