We couldn’t be more excited to share our latest announcement with you! We are introducing a new product to the Yhat suite: a centralized system of record for data science teams called Bandit!
We’re also inviting you, our blog readers, to apply to our limited Beta program! So what exactly does Bandit do??
I’m so glad you asked.
Most data science teams use Git for version control (or if they aren’t, they really should be). Git is great for tracking changes to your code (i.e. R and Python models), but there is a lot more to data science projects than just the training code. There’s also the associated input data, output data, plus all kinds of statistical information about both datasets and model fits (ROC, AUC, R2, etc) for each version of every model.
All of these artifacts are hard to keep track of at an individual level, and are exponentially harder to track, organize and share as a team. Good luck to the poor soul who is asked to recreate or revise a colleague’s analysis. What data was the model trained on? What version of package xyz did she use?? What was the ROC?! How did that change over the past quarter?!?
For most data science teams, the clues and answers to these questions are scattered across notebooks, IDE’s, dashboards, emails, Slack messages, and noggins.
This is the first problem Bandit solves. Bandit gathers and stores all of the inputs and outputs associated with every model the data science team builds and commits to Git. Each time a branch is merged to master, Bandit runs a new job and stores all the accompanying artifacts, plus any metatags your team has deemed important, such as the statistical fit of models.
Bandit provides a system of record and provides a clean and clear structure for reviewing and auditing your data science team’s work. Every member’s work and output is saved and searchable so that your team’s data science efforts are safe, organized, and reproducible. Bandit makes it easy for data science teams to track their predictive models over time. There’s even a discussion tab where you can ask teammates about their code or job results.
Most data scientists (and probably most humans, generally) hate tedious, repetitive tasks. Yet many data scientists regularly find themselves rerunning the same jobs, time and time again.
- The process goes something like this:
- It’s Monday! Time for me to do the thing I do every Monday.
- Postpone. Check HackerNews. Browse the Yhat blog.
- Submit IT ticket for remote resources.
- Waiit. Get coffee. Waiiiiiit. Check in. Wait some more.
- Run analysis and write predictions to database.
- Take a walk. Waiiiit. Should I have a snack?
- Email reports to team members.
- Push new results to dashboard.
- Submit and IT ticket to spin down your servers.
Bandit automates both job scheduling and provisioning remote resources. With Bandit, you can automate recurring analyses for any Git project at any time increment (daily, weekly, etc). You can also select the server size you’d like to run your job on, from as small as 8 CPU to as large as 64 CPU. Bandit autoscales compute resources so that you don’t ever have to worry about spinning up or down servers, or imploring your IT team for more compute resources.
How do I know if I have a use case for Bandit? Bandit is designed for corporate data science teams. If that sounds like you, and you’ve dealt with one or both of the problems we described, you’ve got a good use case for Bandit.
What do I do to apply to join the Beta? We’re doing a limited Beta through mid-March. If you’d like to participate, you can request to join at https://www.yhat.com/products/bandit. Our expectation is that your data science team has a serious interest in Bandit and will give us feedback in exchange for a few weeks testing out the product.
Why is it called Bandit? I don’t really have a good answer for that. I think we were on a western kick after talking about Rodeo. Bandit reminds me of racoons, which are adorable. For a brief period, the name Sushi was also entertained.
What sayeth the Twittersphere?