jeff on 3 Feb 2019 17:31:44 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
[PLUG] job opening |
Long term contract or contract to hire. Job Profile – Site Reliability Engineer Linux adm PowerShell Understanding of Windows/IIS ALM tools such as VSTS Business Area: Global Technology Services Job DescriptionOur mission is to deliver services that matter and achieve and sustain operational excellence. You will be at the heart of fullfilling our mission by bringing your software development experience to the table to own and help our vision of engineering reliability end to end. You will design and implement continuous improvement of the management, design, and function of our operational environments to achieve speed and reliability to enable business agility and happy users. Unlike anywhere else in the industry, we are creating roles and teams that combine deep software knowledge with operations to drive unmatched service reliability.
You will be part of our technology organization and have a great opportunity to work across various parts of, including our development teams and other stakeholders to drive reliabilty upstream in the application lifecycle and across our operational environments.
Technical expertise is critical in order to imagine and drive technical improvements across our database, networking, and infrastructure teams, and to partner with our application teams, implementing more robust and performant applications for our internal solutions and business solutions (Tax, Audit, Consulting, Finance and Advisory Services).
You should be someone excited with the challenge of bringing new thinking to operations and is passionate about imaginging and implementing improvements and relentlessly pursues excellence, is a deep and broad technical expert, and can build trusting relationships across teams.
It’s a new and exciting role to drive our organization further in world class operations.
Responsibilities Role Specific ResponsibilitiesEnsure user visible uptime and quality, providing operational and development expertise in making our systems fail rarely, and are fast to fix when they do fail
Administer daily operations of servers including log review with escalation, patch & upgrade applications, manage backup & restoration implementation and testing.
Install, configure, and manage multiple Red Hat Enterprise Linux (RHEL/CentOS) systems in a distributed High Availability configuration on physical and virtual hardware.
Participate in architecture and design reviews to provide recommended improvements to the development teams to improve the reliability and performance of applications
Employee will participate in a 24/7 on-call rotation schedule providing third-level incident response with other Information Technology team members.
Minimize manual involvement by imagining & implementing continuous improvements that create an operating environment, including the development of new tools, dynamically monitoring, alerting, & automated self-healing & recovery Identify and/or analyze problems relating to mission critical services and implement automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions. Capable of presenting analyses and recommendations to leadership or discussing the technical merits of solutions with engineers and architects. Own the day-to-day health, uptime, monitoring, and reliability of services and server infrastructure
Practice Agile and Scrum methodologies Qualifications Qualifications Strong experience with software engineering Strong working knowledge of Linux Strong experience with VSTS, or similar ALM tool Working knowledge of Azure Services, especially ARM templates Strong experience with PowerShellUnderstanding of the concepts and principles behind DevOps, Continuous Delivery, Agile, Lean, etc. Use of DevOps tools to deliver and operate end-user services a plus (e.g., Chef, New Relic, Puppet, etc.)
Experience and knowledge of database technologies, particularly MS SQL Knowledge of virtualization and its benefits for improving reliabilityStrong experience with instrumentation, monitoring, alerting, and responding relative to performance and availability of applications Capable of technical deep dives into infrastructure, databases, and application, specifically in designing, coding, operating, and supporting high-performance, highly available services and infrastructure Experience in designing for failure, including disaster recovery and business continuity planning Experience operating and supporting mission-critical applications (e.g. incident and outage management) Experience problem solving issues on globally distributed systems and critical product service environments Knows what is possible using latest networking, infrastructure, database, and application technologies to driving automation and reliability improvements
Excellent at building relationships across teams Firm sense of accountability and ownership Desire to understand our businesses and users COMPETENCIES Specialized CompetenciesBelow are the key specialized competencies required for Site Reliability Engineers.
Competency Proficiency Level Understands Technology Design & Implementation. Advanced Understands Systems Architecture Foundation Understands Technology and How it Supports Business Needs AdvancedUnderstands Business Functions / Departments as they Relate to Technology Packages
Foundation Understands Operational Solutions Advanced Understands Transition & Knowledge Transfer Advanced Understands Technical Trends & Best Practices Foundation Joel Polin 215-968-3303 www.polinassociates.com ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug