Getting Started Guide for New Developers

Submitted by mmacieje on

When new people join a project, it is a good moment to review existing tools and methodologies. They bring a fresh perspective, new skills and often times question the status quo. All that allows for improving the project and setting it on a new trajectory. In this post, we provide a quick introduction with tools used for development and valuable references for learning.

Project Website - https://cern.ch/sigmon

Our website is a go to place to find all information about the project: description of particular modules, links to additional references/repositories/documentation, as well as our technical blog.

Project Overview

In order to support our main use cases, the Signal Monitoring project architecture consists of four elements:
  1. API for logging db query and signal processing - https://gitlab.cern.ch/lhcdata/lhc-sm-api
  2. Signal Monitoring notebooks - https://gitlab.cern.ch/lhcdata/lhc-sm-apps
  3. HWC and Operation notebooks - https://gitlab.cern.ch/lhcdata/lhc-sm-hwc
  4. Scheduler for execution of HWC notebooks and monitoring applications - https://gitlab.cern.ch/lhcdata/lhc-sm-scheduler

Full description of the project is available at: https://cern.ch/sigmon

Development Environment

Tools and services mentioned below are provided by the IT department and do not require any local installation.
GitLab CI/CD - we use GitLab repository for code versioning and execution of the continuous integration pipeline for the API SWAN is a platform for development and prototyping of analysis and signal monitoring notebooks: https://swan.web.cern.ch
NXCALS cluster is used for code execution with Apache Spark; make sure to get the NXCALS access at http://nxcals-docs.web.cern.ch/current/user-guide/data-access/nxcals-access-request/
Apache Airflow schedules execution of signal monitoring applications
EOS, influxdb, and HDFS are used as persistent storage for computation results.

Below we describe three tools that need to be installed locally in order to work with our code (primarily API development as the notebooks are mostly created in SWAN).

Python

Python is currently a lingua franca for data and signal analysis. We use version 3.6 (install locally from https://www.python.org/downloads/release/python-368/) and profit from existing libraries for scientific computing (pandas and numpy) and visualisation (matplotlib and plotly). Packages required for a project are stored in a requirements.txt file. There is a wealth of references on-line on python and the libraries we use. In addition, we prepared two mini-lectures on python:

As for the naming conventions we follow:

PyCharm

PyCharm IDE (Integrated Development Environment) - it is a desktop application for Python development. Pycharm comes with a number of features facilitating code development (code completion, code quality checks, execution of unit tests, code formating, etc.). The Community Edition should be installed locally and is available for free at https://www.jetbrains.com/pycharm/download

Git

The versioning of our code base is carried out with Git. It has to be installed locally from https://git-scm.com/downloads. We wrote an introduction to git guide at: https://twiki.cern.ch/twiki/bin/view/TEMPEPE/MoreAboutGit To check the code quality, we put together a continuous integration pipeline (GitLab CI/CD) allowing for:

  • execution of all unit tests with GitLab CI
  • static code analysis with sonarQube
  • analysis of input arguments and return types with mypy package
  • creation of the API documentation based on code doc strings. The documentation is created with Sphinx package and stored on EOS.
  • provided that all unit tests were passed, creation of lhcsmapi package with python package index (PyPI)