Cloud Reliability Engineer — Infios

Job Description

Full details about the role and requirements

Yukerja Summary

The Cloud Reliability Engineer role at Infios is curated from Himalayas (category Teknologi & IT). This role is marked as remote — check timezone and location requirements on the official listing. Yukerja.com is not the employer — applications are handled on the official source site.

If you are looking for a meaningful career where people work and act with passion, rethink the existing and always strive to find the best solution - you have come to the right place. We develop future technologies to relentlessly make supply chains better.

We are a leader in supply chain software solutions, helping organizations streamline operations, reduce costs, and improve efficiency.

Key Responsibilities

?Cloud Infrastructure Operations

oOperate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments.

oManage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades.

oEnsure system availability, scalability, and performance through proactive monitoring and optimization.

oMaintain infrastructure-as-code (IaC) for consistent and repeatable deployments.

?Automation & Continuous Improvement

oIdentify opportunities for operational automation to eliminate manual processes (“reduce toil”).

oBuild and maintain automated pipelines for deployments, configuration, and remediation.

oDevelop self-healing mechanisms to automatically detect and resolve common service issues.

oParticipate in continuous improvement initiatives around reliability, performance, and efficiency.

?Reliability Engineering

oImplement SRE principles: define and track SLIs, SLOs, and error budgets.

oPerform incident analysis and postmortems to identify root causes and prevent recurrence.

oDesign proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog).

oCollaborate with DevOps and development teams to build reliable, observable, and resilient systems.

?CI/CD and Release Operations

oManage and optimize CI/CD pipelines to ensure reliable and consistent delivery.

oSupport deployment strategies (blue/green, canary, rolling) to reduce downtime risk.

oCollaborate with Product and DevOps teams on release readiness and rollback automation.

?Incident Response & Troubleshooting

oMonitor, troubleshoot, and resolve infrastructure and application issues

oRespond to production incidents and ensure rapid mitigation and resolution.

oTroubleshoot complex cloud, container, and networking issues across distributed systems.

oDrive a culture of proactive monitoring, data-driven analysis, and preventive action.

Required Qualifications

?Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).

?5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles.

?Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP).

?Strong knowledge of Kubernetes deployment, management, and troubleshooting

?Solid understanding of observability and monitoring (e.g., Dynatrace, DataDog) and incident management platforms.

?Proficiency in scripting and automation (e.g., Python, Bash, Terraform, Ansible).

?Strong troubleshooting and analytical skills across infrastructure and applications.

?Experience with incident response, RCA, and postmortem processes.

?A mindset of continuous improvement, reliability, and self-healing automation.

?Understanding of SRE principles, SLAs/SLOs/SLIs, and chaos engineering practices.

Preferred Skills

?Experience in conducting resilience assessments and recovery drills.

?Familiarity with ServiceNow and Dynatrace or other observability and ITSM tools.

?Experience with chaos engineering or resiliency testing frameworks

?Background in networking, load balancing, and performance tuning

?Strong communication and stakeholder management skills.

Soft Skills & Mindset

?Strong collaboration skills — comfortable working with developers, ops, and management.

?Clear communicator; able to translate technical issues into business impact.

?Self-starter with a problem-solving and automation-first mentality.

?Resilient under pressure — thrives in a dynamic, fast-paced environment.

?Passionate about operational excellence and continuous learning.

Key Success Metrics

?SLA/SLO compliance for critical services

?Reduction in MTTR (Mean Time to Recover)

?Increase in automated incident resolution rates

?Reduction in customer-impacting incidents

?Frequency and outcomes of resilience testing exercises

?Service uptime / availability

Why join us?

At Infios, we're not just looking for employees; we're looking for partners in innovation, growth, and purpose. Meeting you where you are to create the future you need is at the core of who we are and what we do. Whether you're at the beginning of your career or a seasoned expert, we meet you on your journey, equipping you with the tools and opportunities to build the future you envision. Together, we will relentlessly work toward one common goal - making supply chains better.

We believe the future is better when supply chains work better.

We are an equal-opportunity employer and committed to inclusion in the workplace.

At Infios, we believe that inclusion is a fundamental cornerstone of our success. We are committed to creating a safe and welcoming environment where every individual’s unique experiences and perspectives are valued—whether they look, think, move, believe, or love differently.

All qualified applicants will receive consideration for employment without regard to race, color, ethnicity, national origin, sex, sexual orientation, gender identity, marital status, pregnancy, religion, age, disability, veteran status, genetic information, or any other characteristic protected by law.

Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions of this role. If you requireassistance or accommodation due to a disability during the recruiting process, please let us know at jobs@infios.com

Disclaimer: This job advertisement is not designed to cover a comprehensive listing of all duties or responsibilities that are required for this job. Please note that any salary information is a general guideline only. Individual compensation will be determined by various factors such as the scope and responsibilities of the position, experience, education, skills, location, and market and business considerations. Applications must be submitted via our career site.

Originally posted on Himalayas