Company Description
Standard Bank Group is a leading Africa-focused financial services group, and an innovative player on the global stage, that offers a variety of career-enhancing opportunities – plus the chance to work alongside some of the sector’s most talented, motivated professionals. Our clients range from individuals, to businesses of all sizes, high net worth families and large multinational corporates and institutions. We’re passionate about creating growth in Africa. Bringing true, meaningful value to our clients and the communities we serve and creating a real sense of purpose for you.
Job Description
We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team in Johannesburg, South Africa. This role is pivotal in bridging development and operations, requiring a strong software engineering mindset applied to system reliability, scalability, and performance. The ideal candidate must demonstrate deep expertise in automating CI/CD pipelines, implementing infrastructure as code, and designing resilient systems across both legacy and containerised platforms. A solid understanding of service-level indicators, performance metrics, and proactive issue resolution is essential to ensure optimal system availability and speed of delivery. We are looking for someone who can lead reliability improvements, collaborate across teams, and drive innovation in monitoring, security, and system performance.
Qualifications
Qualifications:
- Bachelor’s or Master’s Degrees in Computer Engineering, Software engineering
- Site Reliability Engineer Certification
Experience:
- 5–7 years of hands-on experience in Site Reliability Engineering or a closely related field, with a strong focus on system reliability and performance.
- Advanced coding skills in at least one high-level language (e.g., Python, Java, JavaScript, Ruby, or C++), including experience beyond basic scripting.
- Expertise in cloud computing, containerisation (Docker, Kubernetes), and DevOps practices, with a solid understanding of Linux/Unix environments and incident management.
- Strong analytical and troubleshooting abilities, with a proactive approach to identifying and resolving system bottlenecks and reliability issues.
- Proficiency in reliability-centered maintenance principles, infrastructure as code, and monitoring tools, with the ability to manage multiple projects and drive continuous improvement.
- Strong attention to detail and ability to manage multiple projects efficiently
Additional Information
Behavioural Competencies:
- Adopting Practical Approaches
- Articulating Information
- Checking Things
- Developing Expertise
- Documenting Facts
Technical Competencies:
- Application Knowledge for Support
- Business Continuity and Disaster Recovery Planning
- Information Technology Architecture
- Infrastructure and Platforms Support
- IT Design Driven Development