Experiment Software Installation toolkit on LCG-2: EGEE is a project funded by the European Union under contract IST-2003-508833 Experiment Software Installation toolkit on LCG-2 www.eu-egee.org Slide2: The current implementation of the general schema discussed elsewhere (http://grid-deployment.web.cern.ch/grid-deployment/eis/docs/SoftwareInstallation/index.html foresees a three layer structured software. Tank & Spark represents a component of the lower level layer and it is mainly used for propagate software to the rest of a farm whenever no file system is provided. -: - Structure of the Experiment Software Installation toolkit UI WN CE lcg-asis: is a friendly user interface It hides the difficulties that lcg-ManageSoftware invocation implies (see previous slides) It uploads (if specified) the sources of the software on the grid (tarball(s)). It loops over all available sites to the VO (complying with some requirements provided by the user in terms of CPU, memory, disk space and CPU-time) and for each site: Checks if another software management process is running on the site through the Information System (see later) Creates automatically the JDL for that site Submits the jobs and stores job information lcg-ManageSofwtare: represents the “middle layer” on the current implementation It is the steering script to be invoked for installing/removing/validating application software. It checks for the local (WN) environment and decides the workflow to be performed. Is there running some other process for the same software Is it a shared file system or not ? Is it a AFS file system or not (conversion of GSI credential to AFS Tokens)? Is there installed Tank&Spark (invocation of the propagation)? It allows for a reliable download of the tarball(s) (if specified) through the lcg-* commands to the local WN by-passing the outbound connectivity requirement. Eventual un packaging of these tarball(s) It invokes the experiment specific script (provided in somehow by the ESM) and checks the result of the script It creates a temporary directory used for install/validate/remove software and later on it cleans up such temporary directory. It publishes TAGs on the Information System with a flavor that depends on the action that it’s going to be performed and the result of a such action (do a man of the command for more information) Lcg-ManageVOTag: is a component of the lower level layer It is the command used by lcg-ManageSoftware in order to add/list/remove tags published on the Information System using the Gris running on a given CE. (It just adds/removes entries to the GlueHostApplicationSoftwareRunTimeEnvironment attribute of the IS .) It could be also used as a standalone application. It only requires the following format for the TAG: VO-<voname>-<whatever_string> Tank&Spark: a component of the lower level layer running either on the WN and on the CE It is here mainly used to propagate software to other WNs. BUT: It can be used as a standalone mechanism grid-independent It can allow for installation by-passing the grid-job-submission (high prioritization of software management) It keeps track about the installer which is strongly authenticated and univocally identified. It complies with external policy set by the site administrators It can manage all possible topologies of file system (shared, no-shared, AFS, a mix of them!) It allows for a-synchronous (currently) and synchronous installation. It allows for failure recovery. (re-try of the installation on the node) It can allow for roll-back of a given installation (not in place) It allows for an exhaustive notification (with success and problems node by node) to the ESM and (automatically) to the site admin. It allows for storing many information about a given software -internally identified through GUIDs - (ex. date, size, owner, path, status and so on). Automatic farm management: (It a node out? Is a new node there?) It adds/removes nodes into its central DB (MySQL) It modifies the Information System by changing the “flavour” of the tag gssklog: is another component of the lower level layer and it’s part of another mechanism externally developed: gssklog-gssklogd It represents the client of this mechanism allowing for the conversion of GSI credential into valid KRB5 AFS tokens, Tank & Spark: Tank & Spark It consists of three different components: Tank : =multithread (gSOAP based) service (running on the CE) listening for GSI-authenticated (and non) connections Spark :=client application running on each WN (through a cronjob and/or through a normal “grid-job” from lcg-ManageSoftware) and contacting tank for retrieve/insert/delete software informations. R-sync server running on another machine (a SE for instance) and acting as central repository of the software. Slide5: CE The software (here labeled as “c”) is installed locally through the middle layer lcg-ManageSoftware. A pre-validation is highly recommended before triggering the propagation. The Information System is upgraded Site Firewall ab ab abc abc “c” “c” Slide6: Flag flavors: VO-dteam-orca-8.3-processing-install Installation on going VO-dteam-orca-8.3-processing-remove Removal on going VO-dteam-orca-8.3-processing-validate Validation on going VO-dteam-orca-8.3-aborted-install Installation failure VO-dteam-orca-8.3-aborted-remove Removal failure VO-dteam-orca-8.3-aborted-validate Validation failure VO-dteam-orca-8.3-to-be-validated Installation OK Removal OK VO-dteam-orca-8.3 Validation OK Advantages: Normal users continue to use the same mechanism to know about the software on a site The ESMs know about the status of his management experiment software jobs. There is not possibility to have concurrent management software jobs for the same software version on the same site.