Grids: Difference between revisions
No edit summary |
|||
Line 207: | Line 207: | ||
[[Category:Computing]] | [[Category:Computing]] | ||
[[Category: | [[Category:Workflows]] |
Revision as of 22:05, 13 April 2017
Grids in General
The short answer is that computing grids are just the mother of all batch queues.The metaphor behind "Grid Computing" is that computing resources should be as available, as reliable and as easy to use as is the electric power grid. The big picture is that institutions, both large and small, can make their computing resources available to the grid and appropriately authorized users can use any resource that is available and is appropriate for their job. Priority schemes can be established to ensure that those who provide resources can have special access to their own resources while allowing others to have "as available" access. A large collection of software tools is needed to implement the full vision of the grid. These tools manage authentication, authorization, accounting, job submission, job scheduling, resource discovery, work-flow management and so on.
Head Nodes, Worker Nodes, Cores and Slots
A simplistic picture is that a grid installation consists of one head node and many worker nodes; typically there are hundreds of worker nodes per head node. In a typical grid installation today the worker nodes are multi-core machines and it is normal for there to be one batch slot per core. So, in an installation with 100 worker nodes, each of which has a dual quad-core processor, there would be 800 batch slots. This means that up to 800 grid jobs can run in parallel. There might be some grid installations with more than one batch slot per core, perhaps 9 or 10 slots on a dual quad-core machine. This makes sense if the expected job mix has long IO latencies.
When a job is submitted to the grid, it is submitted to a particular head node, which looks after queuing, resource matching, scheduling and so on. When a job has reached the front of the queue and is ready to run, the head node sends the job to a slot on a worker node.
FermiGrid
FermiGrid , which is managed by the Fermilab Scientific Computing Division (SCD), deploys a collection of computing resources that the Division makes available via grid protocols. FermiGrid includes four separate pools of Grid resources: the General Purpose Grid (GPGrid), plus separate resources for CMS, and access to remote grids at Universities. Fermilab users are intended as the primary users of FermiGrid but unused resources can be used by other authorized grid users. Mu2e is among the authorized users of the General Purpose Grid.
The GPGrid worker nodes are installed with the most recent production version of Scientific Linux Fermi (SLF6 in 2017). This is generally the same major version installed on our interactive nodes. The intent is that code compiled on the interactive nodes will run on the grid nodes.
All GPGrid worker nodes have sufficient memory for our expected needs.
GPGrid has some nodes with a large memory and many cores. In typical
running mode all cores would be running one job and use about 2GB memory.
It is possible to partition these nodes so they run fewer jobs,
but give each job much more memory. Some specialized mu2e jobs
take advantage of this capability.
We can use up to 40 GB per job of local disk space on each worker node.
Another feature of the Fermilab General Purpose Grid that certain
disks are mounted on all worker nodes.
For production jobs, we access our code as a pre-built release on
the cvmfs distributed disk system.
For personal jobs, you can compile your code on a mu2e disk which is visible
on the GPGrid nodes, where your executable can then run your code.
We are moving away from this style of grid-mounted disk for personal jobs and
moving toward moving a tarball of your code to the grid node,
but do not have a plan in place now (spring 2017).
Please consult the disk and data transfer
documentation for more information about moving code and data to grid jobs.
Condor
In FermiGrid, the underlying job scheduler is CONDOR. This is the code the receives requests to run jobs, queues up the requests, prioritizes them, and then sends each job to a batch slot on an appropriate worker node.
CONDOR allows you, with a single command, to submit many related jobs. For example you can tell condor to run 100 instances of the same code, each running on a different input file. If enough worker nodes are available, then all 100 instances will run in parallel, each on its own worker node. In CONDOR-speak, each instance is referred to as a process (or job) and a collection of processes is called a "cluster" (or submission, or jobid'). A set of clusters with one goal can be called a project.
We do not use condor directly, we interact with a Fermilab product call jobsub, described more below, which translates your commands to the condor processes.
Virtual Organizations and roles
Access to grid resources is granted to members of various "Virtual Organizations" (VO); you can think of these as a strong authentication version of Unix user groups. One person may be a member of several VOs but when that person runs a job they must choose the VO under which they will submit the job. There are individual VOs for each of the large experiments at Fermilab and one general purpose VO; most of the small experiments, including Mu2e, have their own group within this general purpose VO.
VO's can also have roles. Most users will use the default Analysis role, but production uses the Production role.
Authentication
In order to use any Fermilab resource, the lab requires that you pass its strong authentication tests. When you want to log into a lab machine you first authenticate yourself on your own computer using kinit and then you log in to the lab computer using a kerberos aware version of ssh.
This is all you need to do to access the grid, or transfer data. When you issue commands for grid jobs, the system will convert your kerberos ticket to your CILogin identity, check your VO, and then derives a VOMS proxy for that identity, as needed (see Authentication). The CILogin cert and VOMS proxy is a stored in /tmp and reused as long as it is valid.
Usually only the user (or a SCD expert) can remove a user's job, however, jobsub allows each experiment to assign jobsub "superusers" who can delete or hold all user's jobs. mu2e doesn't have this role assigned at this time.
Monte Carlo
For running Monte Carlo grid jobs, please follow the instructions at Workflows.
non-Monte Carlo
Under some circumstances you may need to submit scripts outside of the standard procedures. In this case, you can use jobsub directly. Please read and understand disks and data transfer.
To start:
setup jobsub_client
basic submit commands. If you want the environmental variable VAR set to a value in the grid environment, do that in the interactive environment and indicate it to jobsub:
export VAR=value jobsub_submit --group=mu2e -N 10 \ --OS=SL6 --memory=2GB --disk=5GB --expected-lifetime=5h \ -e VAR \ file://osgMonCheck.sh
you will get back a message, includes a jobid, that identifies this submission or cluster:
... Use job id 15414243.0@fifebatch1.fnal.gov to retrieve output
to check jobs (also see Monitoring below):
jobsub_q
remove a job:
jobsub_rm --jobid=15414243.0@fifebatch1.fnal.gov
get condor log files in a tarball:
jobsub_fetchlog --jobid=15414243.0@fifebatch1.fnal.gov
The condor log files do not usually include the printed output of any runs of a mu2e executable - those are usually redirected to a separated files.
Please see the jobsub documentation and the "-h" output for more information.
OSG
The Open Science Grid (OSG) is a collaboration of Universities which have agreed to run similar software so the sites look similar to jobs submitted there. This enables us to submit to all these sites and get any resources that are not being used by their high-priority users. To submit to the OSG, switch the resource request and add a list of sites
--resource-provides=usage_model=OFFSITE --site=SU-OG,Wisconsin
Monitoring
Please see the grafana pages for the primary web monitor for anything running on FermiGrid. Please Also see operations page for more links.