Roberto A. Vitillo: Dashboards made simple with Spark and Plotly |
In my previous about our new Spark infrastructure, I went into the details on how to launch a Spark cluster on AWS to perform custom analyses on Telemetry data. Sometimes though one has the need to rerun an analysis recurrently over a certain timeframe, usually to feed data into dashboards of various kinds. We are going to roll out a new feature that allows users to upload an IPython notebook to the self-serve data analysis dashboard and run it on a scheduled basis. The notebook will be executed periodically with the chosen frequency and the result will be made available as an updated IPython notebook.
To schedule a Spark job:
Once a new scheduled job is created it will appear in the top listing of the scheduling dashboard. When the job is run its result will be made available as an IPython notebook visible by clicking on the “View Data” entry of your job.
As I briefly mentioned at the beginning, periodic jobs are typically used to feed data to dashboards. Writing dashboards for a custom job isn’t very pleasant and I wrote in the past some simple tool to help with that. It turns out though that thanks to IPython one doesn’t need necessarily to write a dashboard from scratch but can simple re-use the notebook as the dashboard itself! I mean, why not? That might not be good enough for management facing dashboards but acceptable for ones aimed at engineers.
In fact with IPython we are not limited at all to matplotlib’s static charts. Thanks to Plotly, it’s easy enough to generate interactive plots which allow to:
Plotly comes with its own API but if you have already a matplotlib based chart then it’s trivial to convert it to an interactive plot. As a concrete example, I updated my Spark Hello World example with a plotly chart.
fig = plt.figure(figsize=(18, 7)) frame["WINNT"].plot(kind="hist", bins=50) plt.title("startup distribution for Windows") plt.ylabel("count") plt.xlabel("log(firstPaint)") py.iplot_mpl(fig, strip_style=True)
As you can see, just a single extra line of code is needed for the conversion.
As WordPress doesn’t support iframes, you are going to have to click on the image and follow the link to see the interactive plot in action.
http://robertovitillo.com/2015/03/13/simple-dashboards-with-scheduled-spark-jobs-and-plotly/
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |