This job board retrieves part of its jobs from: Toronto Jobs | Emplois Montréal | IT Jobs Canada

Find jobs in Georgia today!

To post a job, login or create an account |  Post a Job

  Jobs in Georgia  

Bringing the best, highest paying job offers near you

previous arrow
next arrow

Site Reliability Engineer

Genuent Global, LLC

This is a Full-time position in Alpharetta, GA posted February 5, 2021.

Genuent is hiring a Site Reliability Engineer. This would be a long term contract opportunity located in Atlanta, GA. If this is something you might be interested in, please send your updated resume to Mike Sabo Due to client restrictions, we are only accepting those able to work on our W2 and authorized to work in the US without visa sponsorship now or in the future Site Reliability Engineering (SRE) Site Reliability Engineering (SRE) applies software engineering techniques and discipline to production operations to attack major problems and fix them for good. Our customers count on us to provide extraordinary availability, scalability and security for our services. SRE should be comfortable with taking on new engineering challenges, defining potential solutions, and implementing designs in a team environment. This position will play an important role in our organization’s evolution towards contemporary application and infrastructure management practices and will be expected to both guide and support the team’s growth and learning. SRE is new at the company, and members of this team will have the chance to influence the direction for a critical and global SRE organization. SRE will also be focused on addressing the hottacticalengineering issues that are impacting the ongoing integration activities within company Technology Services (FTS) Responsibilities Build holistic visibility into SLIs, SLOs, and SLAs, dependency graphs, past performance of software, network, and system to ensure that we can continue to scale without increasing operational burden or toil. Assess the current state of the environment and drive “SWAT initiatives in collaboration with the rest of the Organization to ensure transparency, resiliency, stability, reliability etc.. Across both Applications Infrastructure stack. SWAT initiatives for future state can vary from Incident Analysis leveraging ML AI Assisting with Datacenter Stability Consolidation effort to Application Transformation Monolithic to Micro-Services, PaaS etc. Enables the adoption and implementation of cloud-based application reliability, resiliency, and observability deployment best practices for production non-prod environments including public cloud migration of our mission critical applications from the on-prem data-centers. Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems. Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform. Step back to observe patterns and develop innovative tools and automation to minimize toil. Use those learnings to drive the best operational practices. Monitor and report on service level objectives for a given applications services. Work with business and product owners to establish key performance indicators. Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities. Partner with the broader organization to build a culture of rigorously learning from incidents. Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate tech and engineering best practices. Unblock, support, and effectively communicate across teams to achieve results. Define roadmap and architecture based on technology and business outcomes. Experience 4+ years of software engineering experience and development best practices code management Experience with Infrastructure as Code tools (e.g. Terraform, CloudFormation) Experience with high level programming languages (Python, Go, Java, etc.) Experience with designing solutions for Canary andor BlueGreen deployments Experience designing, debugging and running fault tolerant large-scale distributed systems Experience working with public cloud platforms (e.g., AWS, Google Cloud Platform, Microsoft Azure, etc.) Experience with creating and improving documented procedures andor playbooks. Knowledge of open-source configuration, orchestration, and CICD tools. Knowledge of Kubernetes, PCF andor Docker. Deep understanding of Cloud Architecture and Operations Strong troubleshooting and debugging skills Experience with tools technologies such as Prometheus, Grafana, AppDynamics, Dynatrace, Splunk and Moogsoft is a plus. Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible, or Salt. Understanding of standard networking protocols and components such as HTTP, DNS, ECMP, TCPIP, ICMP, the OSI Model, Subnetting and Load Balancing strategies.

AL Jobs AR Jobs CA Jobs GA Jobs KS Jobs KY Jobs LA Jobs MD Jobs MI Jobs MN Jobs MS Jobs MO Jobs NY Jobs OR Jobs TN Jobs TX Jobs UT Jobs VA Jobs WV Jobs ID Jobs