Senior or Staff Site Reliability Engineer AWS in Constanţa

ClickUp is the world’s only all-in-one productivity platform that flexes to the way people want to work. It replaces all individual workplace productivity tools with a single, unified platform that includes project management, document collaboration, whiteboards, spreadsheets, and role-based AI tools. Our dedication to enhancing productivity has earned us recognition on prestigious lists including the Forbes Cloud 100, Fast Company's Most Innovative Companies, Inc. Power Partners and #1 on two of G2's Best Software Products Lists for 2023 - #1 Project Management Product and #1 Collaboration and Productivity Product. With our headquarters based in San Diego and a rapidly expanding global presence, we are shaping the future of work. Join our team at ClickUp, one of the fastest-growing SaaS companies worldwide, and help millions of users be more productive - saving them at least one day every week. Seeking a Staff level SRE with lots of experience with AWS, this role is in Poland working remotely; so the candidate must live and work in Poland. Salary: 25000 PLN to 49000 PLN per month Plus 34 Days Paid Time Off (25 days vacation/sick, plus 9 Polish Holidays Paid) Plus Equity in Stock Options We are looking for driven and innovative software engineers with strong site reliability engineering (SRE) discipline or interest in this area to help us make ClickUp the "one app to rule them all". As an SRE at ClickUp, your primary roles will be improving the stability, availability and reliability of our globally distributed and cloud-based infrastructure that powers our app for thousands of users daily. If you are a rockstar engineer with an entrepreneurial and high-paced mindset who are ready to own, drive and tackle some of the most complex problems there are out there we would love to hear from you! What you'll do: Build a deep understanding of how ClickUp's systems behave, scale, interact and fail, and use that insight to identity risks and opportunities for remediation Own, drive and improve the incident management process across engineering org and participate in the team's follow-the-sun model Define SLOs and SLIs for all of our services and introduce error budgeting Own and improve our observability on all of our services Build software solutions to enable reliability and operability of large scale distributed systems handling petabytes of data and serving Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across ClickUp Engineering Automate critical portions of ClickUp engineering processes, to minimize risk and maximize the speed of innovation Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world What we’re looking for: Software engineering: At the very core, we are looking strong software engineers with operational, infrastructural or SRE mentality who can design and build systems for platform and infrastructure layers Cloud experience : Production working experience in a major cloud environment around doing CI/CD deployments, using managed services, bootstrapping and provisioning services via infrastructure-as-code (IAC) systems, automations and operations Infrastructure Management : You have worked with and managed production grade infrastructure with IaC tools or configuration management tools Operating systems : Strong knowledge of *nix based operating systems, their internals and advanced troubleshooting commands Compute : Experience of working with VMs, containers and container orchestration systems Database: Experience of working with RDBMS and NoSQL storage solutions within production capacity and know your way around running and inspecting queries. A good understanding of indexing, locking, replication and sharding are a bonus! Observability: You have worked with logging, monitoring and alerting tools before and you know how logs are collected, aggregated and injected. You have set up monitors and alerts for production services and know your way around concepts such as SLOs and SLIs Bonus points : We believe strong engineers can pick up any technologies and tools fast and hit the ground up running. Therefore, we avoid listing specific technologies. However, if you have worked with at least one of the technologies we have in our stack that would definitely be a bonus point. CloudFormation/CDK, ECS, ElasticBeanstalk PostgreSQL, DynamoDB, AuroraDB Typescript or any JavaScript based framework Working Hours EU SREs typically work a standard 40-hour workweek (9:00-17:00), Monday through Friday. However, it may be required to work occasional overtime hours to respond to incidents, meet deadlines or cover late meetings (up to 19:00-20:00, but not every day) On-Call Rotation Monday-Friday: 12 hours support for ClickUp's infrastructure (5am-5pm). We change every 2 days each person (for example Kamil (Mon/Tue), Marcin (Wed/Thu), Adam (Fri/Mon)) Weekends: We currently have 8 people. Each person covers the entire weekend 24/7 (two weekend days). By which per person weekend coverage falls every 2 months - 6 times a year. Finally there will be 10 people so, soon it will be every 2.5 months. Some high level overview of exciting projects: We will move our services to Kubernetes soon (It is the space for SRE to adapt monitoring, disaster recovery environment, cost breakdown for cluster, services, pods) CDN & EDGE - EU SRE team is owning whole Content Delivery Network - huge project which is in progress (IaC, Process, ClickUp standards, we are doing a lot of tweaks and performance improvements to make our platform better, faster and more reliable) Cost Management for AWS (FinOps) Monitoring (observability) - DataDog More to come... Team lead or tech lead type of work they can expect as well, this will be helpful: Here I will use the document "Pathways for eng positions in EPD" Defines technical roadmap for team, identifying areas that need improvement and leading cross-team solutions. Exemplifies execution and delivery focused leadership. Sets a role model for team members with their contributions to ClickUp's Engineering culture. Helps create long-term high-impact technical roadmap for team. Delivers multiple projects towards long-term initiatives to completion. Identifies technical requirements collaborating with partnering teams and drives smooth execution of features and projects. Holds self and team members accountable for making the right decisions for their systems’ reliability. Is a key part of the team's execution, can be relied to run day to day technical operations with little or no guidance for multiple weeks without losing focus or momentum. Works with leadership and product towards (re)prioritization of roadmap. Is informed and insightful about all team deliverables. Makes proper tradeoff between long term foundational investment vs short term tactical goals. Identifies needs and creates processes for improved quality standards in the team. Demonstrates technical maturity, insight and leadership for the team. Leads complex multi-team incidents. Enables collaboration across multiple teams to deliver on cross-team initiatives, while delegating ownership and uplifting those around them. Sets an example for an inclusive engineering culture. Helps improve team practices to foster an inclusive environment. Works across teams to propose solutions that are inclusive of others’ ideas and suggestions. Proactively partners across teams to breakdown silos. Champions collaboration & trust in achieving results over individual heroics. Prioritizes team’s velocity and impact over their own. Effectively communicates proposal or decision to achieve buy-in or create alignment. Leads by example to guide team through complex situations. Partners cross-functionally in defining product direction, by providing technical insight, inspiration, vision and vetting of ideas. Mentors team members across multiple teams. Identifies, architects, and builds the right systems and products to improve team's goals and vision. Responsible for a significant part of a goal. Supervises major product changes and redesigns through careful experimentation and rollout. Identifies major areas of team's codebase that need attention and leads solutions to improve. Holds self and team accountable to a high bar for quality and reliability standards. Performs meaningful code reviews for engineers outside of their team. Designs architecture with an eye for reliability, performance and observability. Anticipates and proactively addresses any impact their changes may cause to other systems' reliability and performance. ClickUp was founded on a culture of hard work, consistent growth, and a desire to break norms. We’re a values-driven company and hire based on ambition, merit, and a willingness to do what it takes to succeed. We don’t care where you’re from, what you look like, or who you’re in a relationship with—we hire the best people for the job, and create an environment that supports employees on their journey to do the most exciting work of their lives! ClickUp is an Equal Opportunity Employer, and qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin. ClickUp collects and processes personal data in accordance with applicable data protection laws. If you are a European Job Applicant, see our privacy policy for further details.

Senior or Staff Site Reliability Engineer AWS in Constanţa

Contact

Locuri de munca similare

Job Internal UAT Consultant

Job Security Cleared (SC) SharePoint Developer/Application Support Publisher

Job Index Engineer (Python Quant Dev)

Job Junior Claims Quality Agent

Job Project Manager - AFC PSD Subsystems