Job Details
Location:
Romania
AFI Park 2, Bulevardul General Vasile Milea 4, București 061344, Romania
Posted:
Apr 10, 2024
Job Description
Production Infrastructure & Engineering (PI&E) organization provides the essential platforms and infrastructure hosting solutions that power EA's live services. Our charter is to make EA's games and services available to all players anytime and anywhere. To do this, we focus on the high availability of infrastructure, primary services, and studio services. We aim to help developers to experiment and build new games quickly with infrastructure services on-demand and workflows that promote rapid development in the cloud. In all of this, we focus on being there for players where and when they want to play.
- The Mission Control Center (MCC) resides within the EA PI&E team and plays a key role in driving online ‘always on’ services keeping a watchful eye over all monitored endpoints to ensure a 24X7 uptime for EA games.
- The MCC Incident Manager will report to the MCC Manager.
- Shifts: 8 hours a day, within the time interval 6:00 AM - 6:30 PM (RO time) including weekends and holidays.
- You can work either hybrid or on site from our office in Bucharest
Responsibilities
- You will identify incidents by evaluating alerts, performing checks and engaging in conversations with partner teams.
- Coordinate incidents while maintaining command and control of the incident response.
- Be responsible with the timeline, documentation, engagement, escalation and communication of incidents.
- Maintain alignment the response teams, stakeholders and leadership through audio/video bridges, active messaging, and posted updates.
- Be the first point of escalation for MCC team members and partners.
- Help with building runbooks for daily incident resolution.
- Partner with other EA Operational teams to reduce systems downtime.
- Success is measured through peer review, partner review, operational indicators, personal and team goals.
Qualifications
- You have a passion for the IT and gaming industries.
- 1-3 years experience with Systems Operations/Engineering organizational responsibilities, which include ownership and management of incident escalation, resolution tracking and resolution reporting, with at least 1 year being in an Incident Manager role.
- Experience with the coordination of groups from multiple disciplines and levels towards a goal and resolution.
- Experience with ITIL best practices, including Incident, Change, and Problem Management their purpose and how they are connected.
- Basic knowledge of cloud technology offerings, networking, virtualization, databases and security fundamentals.
- Excellent English verbal and written communication skills and confidence to communicate under crisis conditions.
- You understand the rigorous demands of a 24x7 real-time online, operational environment.