Notebooks have become an essential tool for data scientists, analysts and developers working with data. SingleStore Notebooks, based on the popular Jupyter Notebook, allow users to create interactive documents containing live code, visualizations and narrative text.
They provide an efficient way to explore, analyze and visualize data, making it easier to communicate insights and results.
Building upon the success of our Notebook feature, we're excited to introduce Scheduled Jobs — a powerful new capability that takes the utility of Notebooks to the next level. Scheduled Jobs allow users to automate the execution of their notebooks at predetermined intervals or specific times, bringing several key benefits:
- Automation. Eliminate manual intervention for routine data processing tasks
- Efficiency. Save time and resources by running jobs during off-peak hours
- Consistency. Ensure regular updates to your data pipelines and reports
This feature builds on our previous enhancements to SingleStore Notebooks, including improved performance, enhanced usability and the introduction of secure credential management through Helios® Secrets.
Scheduled Jobs address a common pain point for users who need to run recurring data tasks, like daily data imports, regular report generation or periodic model retraining. By automating these processes, we're empowering our users to focus on deriving insights rather than managing repetitive tasks. We had earlier used Scheduled Jobs for streamlining our monitoring.
Internal use case for Scheduled Jobs: Streamlining incident response
At SingleStore, we've leveraged our Scheduled Jobs feature to create an automated incident response system. This system significantly improves our ability to detect, diagnose and respond to issues quickly, ensuring minimal disruption for our customers.
The challenge
With our growing cloud user base at SingleStore, we are always looking for ways to reduce TimeToDetect, TimeToMitigate and TimeToResolve metrics. We needed a solution that could:
- Quickly gather relevant information about affected customers
- Perform automated health checks and diagnostics
- Provide context and AI-driven, actionable insights to our support team and on-call engineers
- Streamline the incident response process
Our solution: Automated incident response system
We developed an integrated system using Scheduled Jobs, our OpsAPI, and a custom Slackbot. Here's how it works:
- Incident creation. When an incident is reported (either by a customer or internally), it's logged in our incident management tool, incident.io
- Slackbot activation. Our custom Slackbot listens for incident notifications and triggers a series of Scheduled Notebook jobs
- Upon detecting a new incident, the Slackbot triggers an initial automated response job. This job is designed to quickly gather essential information, and perform preliminary diagnostics.
The initial response job performs several crucial tasks:
First, it parses the incident details from the webhook payload, extracting key information such as the incident ID, title and severity. Then using our OpsAPI, it retrieves comprehensive details about the affected customer's cluster, including the cluster name and relevant dashboard links. Next it conducts a series of automated health checks on the customer's cluster, assessing its overall stability and performance. The job then compiles all this information into a concise yet informative summary.
Finally, it posts this summary to a dedicated Slack channel created specifically for the incident, ensuring that our support team has immediate access to critical context.
Following the initial response, a more in-depth diagnostic job is automatically triggered. This secondary job is designed to perform a comprehensive analysis of the cluster's state and gather detailed technical information.
This in-depth diagnostic job executes a series of important tasks:
It runs a set of predefined SQL queries specifically designed to probe various aspects of the cluster's health and performance. The job collects detailed information about the cluster's configuration, current resource utilization, and key performance metrics.It will then retrieve relevant system logs and error reports, which are crucial for identifying the root cause of issues.The job also checks the status of other scheduled jobs running on the cluster, which can help identify potential conflicts or resource contention.
All of this technical data is then compiled into a comprehensive debug report. The report is formatted for readability and posted to the same Slack channel, providing our support team with a deep, technical overview of the cluster's state.
These automated jobs work in tandem to provide our support team with both immediate context and in-depth technical details, enabling them to start addressing the incident with a wealth of relevant information at their fingertips.
Key components of the automated system
- OpsAPI integration. OpsAPI is a set of internal Operational APIs responsible for live site troubleshooting and mitigation. Both notebook jobs interact with SingleStore's OpsAPI to retrieve cluster information and execute diagnostic queries.
- Slack integration. The system posts messages to Slack channels, providing real-time updates to the support team.
- Parameterization. The notebooks accept parameters like cluster ID and Slack channel, making them flexible for use with different incidents and clusters.
- Error handling. The notebooks include try-except blocks to handle potential errors gracefully, ensuring that even if part of the process fails, the system can still provide valuable information.
Benefits realized
By automating the initial information gathering and diagnostics, we've dramatically reduced our time-to-first-action on incidents. Support teams now have a complete picture of the situation from the moment they begin working on an incident. The automated process ensures all necessary checks are performed consistently for every incident.
By automating routine checks and information gathering, we've reduced the cognitive load on our support team, allowing them to focus on problem-solving — and faster, more effective incident response translates directly to improved customer satisfaction and reduced downtime. This use case demonstrates the power of Scheduled Jobs when integrated into a broader ecosystem of tools and APIs.
Why choose Scheduled Jobs
When considering solutions for automating recurring data tasks, organizations often evaluate multiple options. Here's why Scheduled Jobs in SingleStore Notebooks stands out from the alternatives:
- Integrated environment. Scheduled Jobs are integrated into the SingleStore ecosystem, eliminating the need for external scheduling tools or complex integrations. This tight integration ensures smooth execution and reduces potential points of failure.
- Familiar interface. Users already comfortable with SingleStore Notebooks can easily set up Scheduled Jobs without learning a new system. This familiarity accelerates adoption and reduces the learning curve.
- Performance. SingleStore's high-performance database engine is readily available to power Scheduled Jobs, ensuring rapid execution even for complex data processing tasks. This is particularly beneficial for data-intensive operations where movement of data away from the database for processing incurs a lot of costs.
- Security. Utilizing Helios Secrets, Scheduled Jobs can securely access and manage credentials, maintaining robust security practices without compromising automation.
- Cost-effective. By leveraging existing SingleStore infrastructure, Scheduled Jobs often prove more cost-effective than implementing and maintaining separate scheduling solutions.
Parameterized Scheduled Jobs
Users are able to specify parameters that can be used when the scheduled job is run. Thus, users can now reuse a single notebook to execute different logic based on the parameters passed. This significantly reduces the need for duplicate notebooks and simplifies maintenance. Additionally, users can create more dynamic and adaptable workflows. For example, you could schedule a single data processing job to run for different date ranges or data sources simply by changing the parameters.
All this translates to greater simplicity, enabling a single job definition to handle multiple use cases, reducing the overall number of jobs needed. Parameterized jobs ensure consistency in your processes, while allowing for variability where needed.
This new capability addresses one of the key limitations we previously faced with complex scheduling patterns. Users can now create more sophisticated workflows without the need for multiple, nearly identical notebooks.
To make the most of this new feature, we recommend:
- Identifying common patterns in your notebooks that could be parameterized
- Refactoring existing notebooks to accept parameters for variable elements
- Setting up scheduled jobs with different parameter sets to cover various use cases
We're excited to see how our users leverage this new capability to create more powerful and flexible data workflows.
Areas of focus moving forward
To provide a balanced view of the Scheduled Jobs feature, it's important to address its current limitations and areas where we're actively working on improvements. Based on user feedback and our own assessments, we've identified several key areas:
- Concurrency limits. Currently, there's a cap on the number of jobs that can run simultaneously
- Notification system. While jobs can be scheduled, the current notification system for job completion or failure is limited. Users have expressed a desire for more robust alerting and monitoring mechanisms
- Complex scheduling patterns. At present, the scheduling options are somewhat basic. Users have requested more advanced scheduling capabilities, including conditional execution or event-driven triggers, using cron job specification to schedule these jobs
- Resource management. Some users have reported challenges in managing resources effectively when multiple large jobs are scheduled close together. We have plans to support higher tier compute including GPU support for gen AI-specific tasks that have huge performance gains when run on GPUs — and our platform will be able to recommend and optimize for compute usage based on previous executions
- Integration with external tools. There's limited ability to integrate Scheduled Jobs with external monitoring or orchestration tools, which some enterprise users have requested
Our notebooks team is working on improving this product constantly and Notebooks and Schedule Jobs are part of our free shared tier give it it a try and reach out to pm@singlestore.com if you have any feature request or suggestion for improvement.