Blockdaemon is hiring a
Web3 Site Reliability Engineer
CA Los Angeles, California, United States
Blockdaemon is looking for an experienced Site Reliability Engineer (SRE) who will work to harden and provide visibility to both infrastructure and customer resources to make them more robust and secure.
Responsibilities include but are not limited to:
Monitor alerts and respond to outages or performance degradation
Develop tools to streamline that activity and to automate as much as possible
Reduce manual, repetitive, error prone workload, freeing up engineering to take on longer term projects and becoming more proactive than reactive
Strive to reduce alert-fatigue to ensure the monitoring system minimizes false positives and prioritizes clearly actionable and timely alerts
Continuously improve upon and refine both monitoring systems and deployment workflows
Prioritize both security and the end user experience
Provide not only customer value, but also value to the rest of the Blockdaemon team
Be part of an On-call rota
Performs other duties and responsibilities as assigned
Skills and Required Qualifications:
Software engineering background (2 – 5 years) with one of the following languages
Golang
Rust
Solidity
Desirable:
Linux service administration (systemd, docker, etc)
Linux shell (bash, ssh, etc.)
Certificate management (SSH key, TLS, CA, PKI, etc)
Linux troubleshooting (curl, tcpdump, ps, top, swap, memory, cpu usage, kernel logs, etc)
Cloud VM/network provisioning and administration (AWS, GCE, Azure)
Beats, ElasticSearch, Logstash, Kibana
Prometheus, Loki, Grafana
Bare metal provisioning, including system performance tuning, RAID provisioning, disk partitioning, NVMe/SSD wear monitoring and mitigation, etc.
Terraform and/or Ansible, Vault a plus
Git and continuous integration
K8s experience a plus
Nginx/HTTP/HTTPS/JSONRPC/REST a plus
Strong verbal and written communication skills, including presentation skills
Able to prioritize work, pivot as necessary and meet deadlines
Leave a comment