Back
Back
Back
Back

Customer Job

Major Incident Engineer

Job ID: 26-01197
Pay rate range - $60/hr. to $65/hr. Job Description

The Problem Management Engineer is responsible for leading and maturing the Problem Management function to prevent incident recurrence, reduce operational risk, and improve service resiliency.
This role owns the quality and effectiveness of root cause analysis, ensures permanent fixes are validated, and drives continuous improvement across IT services in alignment with ITIL best practices.

This position requires a strong technical background to credibly engage engineering teams, challenge root cause conclusions, and ensure solutions are durable, evidence based, and measurable.

Problem Management Leadership

Lead and oversee the end to end Problem Management lifecycle, including detection, logging, classification, investigation, resolution, validation, and closure
Ensure problems are closed only when defined closure criteria are met, including validated resolution, preventive controls, and monitoring improvements
Prevent premature or superficial closure of problems by enforcing quality and evidence standards
Root Cause Analysis & Technical Oversight

Lead and validate structured Root Cause Analysis (RCA) using methodologies such as 5 Whys, Fishbone, and fault tree analysis
Challenge assumptions and ensure true root causes are identified for major incidents and recurring issues
Review and validate the technical feasibility and effectiveness of permanent fixes
Cross Functional Collaboration

Partner closely with Incident Management, Change Management, Resiliency/Reliability Engineering, and Service Owners
Coordinate permanent fixes through formal change processes
Work with vendors and external partners to track dependencies and ensure accountability
Governance, Metrics & Reporting

Establish and enforce Problem Management governance and quality standards
Track and report on key metrics, including overdue problems, SLA compliance, recurrence trends, and systemic risks
Provide clear, actionable updates and insights to senior leadership and executive forums
Knowledge & Continuous Improvement

Maintain and improve the Known Error Database (KEDB) and Problem related Knowledge Articles
Identify opportunities for proactive problem management, automation, and improved monitoring and alerting
Continuously refine Problem Management processes, tools, and standards to increase effectiveness and efficiency


Required Qualifications
Strong understanding of ITIL Problem Management processes and best practices
Proven experience leading or performing Root Cause Analysis in complex technical environments
Technical background (infrastructure, applications, cloud, or enterprise platforms) sufficient to engage and challenge engineering teams
Hands on experience with ServiceNow or comparable enterprise ITSM platforms
Strong communication and stakeholder management skills, including executive level communication
Ability to analyze trends, identify systemic risk, and drive proactive improvements


Preferred Qualifications
ITIL Foundation certification or higher
Experience in large scale enterprise environments
Experience supporting Major Incident or executive outage review forums
Familiarity with automation, observability, and proactive problem management techniques
Experience working with vendors and external service providers

CV or resume

Choose file
or drag and drop file here
For best results, upload *.doc/.docx/.pdf format files only (File size must be less than 2MB)

Personal information

Tell us something about yourself

How may I help you?