CodingStandards

From Mu2eWiki
Jump to navigation Jump to search


Introduction

This document defines software standards and practices for use within Mu2e software; the scope of the document is much more than just coding "style" and it only mentions a few issues that are purely stylistic.

There is no such thing as the "one true C++ standards and practices document". The Mu2e Offline code lives within an environment of which the computer language is but a small part; this document seeks to define standards and practices for interacting with all of Mu2e Offline environment. This environment includes

  • art: the module interfaces, the service interfaces, the event data model, the art::Ptr class template, the art::Assns class template, the art model for exception handling, the RandomNumberGenerator service, the TFileService
  • The art tool chain: FHICL, CETLIB, MESSAGEFACILITY
  • The physics content of the Mu2e data products
  • The Mu2e geometry classes.
  • The many Mu2e classes and free functions that do handy jobs.
  • The early prototypes of the Mu2e conditions and calibrations systems.
  • Geant4
  • ROOT
  • CLHEP
  • The boost libraries.
  • The mechanism to submit grid jobs.
  • SAM file catalog.
  • and last, but not least, C++ and its standard library.


Geant4 and ROOT and were designed in the mid 1990's before STL was standardized and before reliable implementations of STL were widely available; their designs were also strongly influenced by practices that are now understood to be sub-optimal. When Mu2e code interacts with ROOT or Geant4, it must sometimes use the sub-optimal practices. Please do not use those code fragments as models of good practice. This will be discussed further below.


The purposes of these guidelines are to

  • Make it easier for others to make use of the work you have done.
  • Make it easier for readers of your code to understand your intent.
  • Make code less fragile. When code is fragile, it might work well today but, at a later date, an apparently unrelated change may break the code.
  • Ease the burden of long term maintenance.

Rules

  1. The standard file name extensions for Mu2e code are .hh and .cc. Headers from external products remain in their native form.
  2. Modules may communicate with each other only via the art::Event (and, indirectly via the interaction of true/false values returned by EDFilter modules). All other means are strictly forbidden. In particular, they may not communicate via art::Services, singletons, static member functions, or static member data. This rule is in place to ensure a more robust audit trail and, eventually, to allow module-parallel execution.
  3. Only include the header files you need. Do not include extra header files just in case you might need them. This rule can be relaxed for code that is under development if planned work will require a particular header, even if the current snapshot does not.
  4. Only get data products from the event when you need them. Getting them uneccessarily can be CPU time intensive.
  5. Never code a using directive or using declaration in a header file or before any #include:
    using std::cout;       // Do not place either of these in header files.
    using namespace std;
    

    This may needlessly slow down compilation. And it may change the behaviour of code that includes your header file! It's likely that, to produce changed behaviour, the code whose behaviour changes must also be written carelessly; but we want a defense in depth. See, for example, item 59 in "C++ Coding Standards" by Herb Sutter & Andrei Alexendrescu's or http://www.parashift.com/c++-faq-lite/coding-standards.html#faq-27.5

    1. A corlollary of this is that you must use fully scoped type names in header files, for example: std::vector<std::string>, not vector<string>.

    You may use using declarations and directives in your .cc files but, on a file by file, write only the ones you actually use; do not define a standard set that you drop into every .cc file just in case you need them. And remember that any using declaration/directive must follow the last #include.

  6. Always protect header files from multiple inclusion with header guards.
  7. There are only a few legimate uses for CPP macros:
    1. #include, header guards, and selection of archtecture specific code.
    2. Invoking DEFINE_ART_MODULE and DEFINE_ART_SERVICE macros.
    3. Using the macro versions of the mf::Log* clases.
    4. Enabling of debug code.

    For all other cases in which you are tempted to use a CPP macro, it will be more robust to implement that functionality using the native features of C++ and STL.

  8. Do not use Exception Specifications.
  9. Normally all mu2e classes should be in the mu2e namespace.
    1. If you want to define sub-namespaces inside the mu2e namespace, talk to the software team about it. We want to look at choices of sub-namespaces based on their impact on the whole project; while a particular choice might make one aspect easier, it may also make many other aspects harder.
  10. Do not use variable names with the form of a double or single underscore followed by a capital letter. These are reserved to the compiler.
  11. Do not choose identifiers that can be very easily confused as identifiers in one of the underlying products, such as Geant4; that is, do not start class names with a leading G4. Its true that namespaces make sure that the compiler understands what you mean but there is no sense in confusing your colleagues. ( there are a few cases in which this rule has been violated; we will remove them as targets of opportunity. ). We have lots of good reasons to start class names with T - hopefully the context will be enough to distinguish them from ROOT classes.
  12. All libraries must specify all of their first order dependencies at library build time. With scons, this is specified in the SConscript file.
  13. Do not cache handles to data products, GeomHandles or ConditionsHandles.
  14. You may cache ServiceHandles.
  15. Linkage loops are forbidden. If you violate this rule things may accidentally work; this is not allowed. A linkage loop occurs when code in library A calls code from library B and vice versa; the usual, and simple, solution is to create a third library that contains code that is called by both A and B. Sometimes a major redesign of what code lives in which library is a better solution. Sometimes linkage loops are hidden because they work by accident: if there is a third library that calls code from two libraries that containt a linkage loop, and if that third library is loaded first, then a linkage loop can work by accident.
  16. Any code committed to the reposity must compile cleanly using the warning levels recommended by the Mu2e Software management. At any time there may be a short list of permitted exceptions; these exceptions will normally be due to issues that originate in the external products.
  17. All constructors must always leave the object in a safe, well defined and usable state; it must be safe to call any method of a class immediately after it is constructed. Some collaries of this are:
    1. Avoid two-phase construction. If at all possible, in initialize everything via the constructor:
            T t;                          //  Avoid
            t.setSomeProperty(1.);
            t.setAnotherProperty(2.);
            t.setYetAnotherProperty(3.);
      
            T t(1.,2.,3.);                // Preferred
         

      There will be some cases for which the non-prefered pattern is required, most of which will occur when using classes from ROOT, Geant4 or some other external product. There will be very few cases for which this is allowed in Mu2e written classes. Few Mu2e data product classes should contain any setter methods.

    2. The same rule applies for built-in variables:
               double x;       //  Avoid
               // ...
               x = 7.;
      
               double x(7.);  // Prefered
            

      This is for future-proofing your code; if future authors add lines between the declaration and the assignment x=7., then another future author can too easily accidentally use x before it is given a value!

    There will be a few legitimate exceptions to these rules: they generally occur when some algorithm constructs a complex object. So long as the object is guaranteed to be fully constructed before non-expects can see it, these rules can be relaxed. One current exception is the class mu2e::SimParticle, which is half constructed at G4 preTracking time and completed at G4 postTracking time; during the time in which the class is partly initialized, only the algorithm that creates the object has access to it.

  18. Use the right kind of pointer for the job.
    1. In this context, a reference or const reference is just another kind of pointer.
    2. There should be no bare pointers in the public interface of a Mu2e defined class. Unfortunately ROOT and Geant4 have many classes with bare pointers in their public interface.
    3. One legitimate use for bare pointers is inside the private parts of time critical code.
    4. For functions that return information, choose an an appropriate and safe return type.Fill out this link. In particular do NOT use the convention that you return a pointer that will have a value of null if the information is not defined. A very common scenario is that early in code development some method always returns a valid pointer; because that pointer is always valid, some people forget to test for validity; at a later date when it becomes possible for the method to return a null pointer, the code that did not check will core dump.
  19. Do not use comments or lexical variations to identify your changes to other people's code or to keep a history of code versions; we have a code management system to do that job. Lexical variations include non-standard indentation, different capitalization patterns, embedding your intiatials in indentifiers, commenting out obsolete code, and adding comments whose only purpose is to document code history. If you have commited a work in progress, it may make sense to have some test or debug code that is commented out; these should be changed to use compile-time or run-time selection before the code is ready for production.
    1. We would like to use an automatted code formatter when code is committed to the repository; so non-standard indentations will not survive.
  20. Something about const correctness and exception safety.
  21. Avoid any construct that requires a delete, unless imposed by an external package. In these cases automate the delete by using the appropriate kind of safe pointer..
  22. The classes in the RecoDataProducts package must not contain any MC information, directly or indirectly. These are classes that can be used on real data.
  23. The data product classes must obey the following rules:
    1. Must play nice with the persistency mechanism.
    2. In memory rep must be orthogonal to persist reps; must not be TObjects.
    3. With a few exceptions must be a POD or a collection of PODS.
    4. Exception: May contain art::Ptr<T> or std::vector<art::Ptr<T> >
    5. Other sorts of pointers are, in general, forbidden, unless they can be rebuilt on the fly.
    6. Things in a collection must not hold a pointer to the collection.
    7. May #include other data products. Must not #include any other Mu2e libraries.
    8. No public data.
    9. No complicated functions. Move these out of class. Or split the class in to a persistable thing and a functional thing that holds a (reference/pointer to) the persistable thing.
    10. Must not #include headers from other Mu2e classes that are not data products. ( a few exceptions ).
  24. With few exceptions, for names of C++ files, use .hh and .cc for file names, not .h or .cpp or other variants. If you wish to make a case for an exception, consult the software team. The reason for this rule is so that I am tired of explaining all of the variants to novices. If you wish to make a case, you take on the responsibility to promptly answer all questions about why your files are named differently than other peoples.
  25. Do not use .icc files unless it is required because of reference to an external product.
  26. Do not use hard tabs.
  27. Do we want a rule/suggestion about indentation. I have seen a few examples of 5 or 6 indentations of 8 spaces each. My vote is for 2. I can live with 3 or 4 but not 8.
  28. Do we want a commit script that formats code to a some standard and rejects the commit if it cannot be made to conform? Chris Green suggests astyle. His astylerc contains
    --style=stroustrup
    --break-closing-brackets
    --indent-namespaces
    --indent=spaces=2
    --indent-switches
    --indent-labels
    --indent-col1-comments
    --indent-preprocessor
    --convert-tabs
    --add-one-line-brackets
    --keep-one-line-statements
    --keep-one-line-blocks
    --align-pointer=type
    --pad-header
    --unpad-paren
    --pad-oper
    --delete-empty-lines
    --lineend=linux
    --align-pointer=middle
    --align-reference=middle
      
  29. Do we want to forbid printf etc in favor of cout, cerr?

Recommendations

  1. To get the value of PI, include the <cmath> header and use std::M_PI. For other numerical constants that are not present in cmath, use the values from CLHEP/Units/PhysicalConstants.h, for example CLHEP::twopi. Prefer not use ROOT for this.
  2. Commit code to the repository as often as reasonably possible
  3. Comment only what the code cannot say. Many of the example codes violate this recommendation; they do so because they are examples, not production code.
  4. When editing code developed by others, match their conventions (unless they have significant violations of the guidelines).
  5. Do not put long comments in the middle of the code; put elsewhere and include only a comment that refers to them. "Elsewhere" can be at the top of the file, in another file that is stored in git, in DocDB, on a page in the Mu2e website; it must NOT be a note on your personal web page. Why is this recommendation here? They layout of code on the page contains information and long comments disrupt that information.
  6. The standard pattern of capitalization should be as follows; there is one exception discussed in the next recommendation.
    1. Names of types ( classes, structs, typedefs ) should start with a capital letter.
    2. All other identifiers should start with a lower case letter ( or an underscore _ character ); this includes member data, member functions, free functions and function scope variables.
    3. When you wish to delimit words within an identifier, do so using bouncingCapitals ( also known as camelCase), not embedded_underscores.
    4. Pick one of the standard conventions for identifying member data; whichever you choose, be consistent within a set of related classes. The one convention you must avoid is a double leading underscore followed by a captial letter: such identifiers are reserved to the compiler. The current recommendation is a single trailing underscore, although much of the earlier code was written with a single leading underscore, followed by a lower case letter.
  7. The principal exception to the preceding recommendation is that some classes are intended to have std::-like behaviour; in this case follow the std:: lexical conventions. We expect that few Mu2e physicists will write code that should follow this pattern.
  8. With a few standard exceptions, avoid Hungarian Notation, that is, do not embed type information, such as whether or not a variable is a pointer, inside variable names. We can make an exception if the same information needs to be addressable by two different ways; the most prominent exception is the recommended pattern for accessing data products.
  9. There are very few good reasons for a class to have a static data member and many bad reasons. If you want your class to have static member data, first speak with the software team; they will advise you on alternatives.
  10. Follow the pattern for accessing data products.
  11. When accessing CLHEP units and physical constants, do not use using. Always spell them out in full, for example, CLHEP::mm. This recommendation is very important for the short names like, CLHEP::m, CLHEP::mm, CLHEP::nm, even if they are heavily used; these can easily collide with commonly used variable names which can lead to errors that are difficult to diagnose. This recommendation is less important for longer names such as CLHEP::c_squared but, unless a variable is very heavily, used it seems wise to use the full name.
  12. When either will yield the correct behaviour, prefer preincrement to postincrement, ++i, not i++. This avoids the need for the compiler to cache a temporary variable. A good optimizer may recognize that the temporary is never user and, effectively, change the postincrement to a preincrement. But why take the chance. Moreover, doing it correclty is a small sign of professionalism.
  13. Do we want to say anything about:
    • statement; statement; statement; // All on one line
    • if ( a ) b; // without the {}. If b is not return, it is fragile.

    Or is this just me being way to picky.

  14. Give some examples of "landmines"?
  15. When working offline, when you fill histograms from ROOT ntuples or trees:
    1. Do not do so interactively. Do it from a cint script or a program. Commit the script or program to git. If you do it interactively there is a much less robust audit trail for what you have done.
    2. Unless the operation of filling the histogram is trivial, do it from a compiled program, not an interpretted cint script.

    When you are working online to diagnose operation problems you can skip this guideline.

  16. If it is sensible to sort a class by more than one of its data members, do NOT define the operator "<" (less than). To illustrate this, consider a hit class that contains a channel id, a time and a pulseheight. It is sensible to sort a collection of these by channel id, by time or by pulseheight; one might even define a 3 level sort rule that uses all three. For such a class, do not define the operator&lt (less than). It will be ambiguous to the reader whether which value is being used for the sort. Instead, define free functions for doing the comparison. For exmaple, presume that you have a class,
          struct Hit{
            int    channelId_;
            double time_;
            double pulseHeight_;
          };
    

    Instead provide three pairs of free functions written like,

         bool byChannelId< Hit const& a, Hit const& b){
            return ( a.channelId_ < b.channelId_ );
         }
         bool byChannelId< Hit const* a, Hit& const b){
            return ( a.channelId_ < b.channelId_ );
         }
    

    and similarly for the time_ and pulseHeight_ data members. Normally you should define this inline in the header file that defines Hit; the functions must be in the same namespace as hit.

  17. Do not use std::pair just because it is there. Suppose that you need a simple struct:
         struct MyStruct{
           CLHEP::Hep3Vector       position_;
           CLHEP::HepLorentzVector momentum_;
         };
    

    Strongly prefer to define this struct explicitly rather than to hack it using:

          std::pair<CLHEP::Hep3Vector,CLHEP::HepLorentzVector>;
    

    Your code will be much, much easier to understand when it reads as x.position_ instead of x.first_.

  18. For event-data classes, and other classes that have broad public exposure, prefer to use private data plus accessors, rather than public data. This is true even if a class is really just a struct. Why? The reason is that even classes that are basically just structs often grow a few simple functions. For example prefer
         #include "cetlib/pow.h"
         class Cube{
         public:
           Cube( double sideLength):sideLength_(sideLength){}
           Cube():sideLength_(0){}
    
           // Accept compiler generated d'tor, copy c'tor and copy assignment.
    
           double sideLength() const { return        sideLength; }
           double faceArea()   const { return pow<2>(sideLength); }
           double volume()     const { return pow<3>(sideLength); }
    
         private:
           double sideLength;
         };
    

    Please do not use:

         #include "cetlib/pow.h"
         struct Cube{
           double sideLength;
           double faceArea() const { return pow<2>(sideLength); }
           double volume()   const { return pow<3>(sideLength); }
         };
    

    It turns out when a novice C++ user encounters the second implementation it is just really, really confusing that getting the volume accessor REQUIRES () but that getting the sideLength accessor MUST NOT have ().

  19. For any results that will be shown in public, all of the code, root scripts etc needed to reproduce that result must be committed to git before the result is made public. It is good practice to follow this rule for presentations made at collaboration meetings. It is less important, but still a good idea, for status reports shown at working group meetings.
  20. In the header of a class/struct that is part of the public interface, put the public stuff first and the private stuff last.
  21. Class layout:
    1. Trivial accessor methods should be in the body of the class declaration; they are automatically candidates for inlining and do not require the inline keyword.
    2. Do not put long functions in the class declaration; normally they belong in the .cc file. If there is a good reason to put them in the header, put them after the class declaration and declare them inline. Remember that improper use of inline causes code bloat and may actually slow things down.
    3. provide =default or =delete for all compiler writable functions.
  22. Do not use pow(a,2.) to square a number. A good compiler will probably do the right thing but why take a chance. Instead use:
         #include "cetlib/pow.h"
         double x=2.;
         double x2 = cet::square(x);
         double x3 = cet::cube(x);
         double x6 = cet::pow<6>(x);
    

    cet::pow is a template that, at compile time, writes, inline, the fewest-multiply expression for the requested power. The header file also defines the convenience functions square, cube and fourth.

  23. Do compute a difference of squares as (a-b)*(a+b), not as a*a-b*b. You can use cet::diff_of_squares( x, y); for completeness cetlib also defines cet::sum_of_squares(x,y) and cet::sum_of_squares(x,y,z).
  24. Order of includes
    1. The prototype/interface header for this implementation (ie, the .hh file that corresponds to this .cc file).
    2. Other headers from the same project, as needed.
    3. Headers from other non-standard, non-system libraries (for example, Qt, Eigen, etc).
    4. Headers from other "almost-standard" libraries (for example, Boost)
    5. Standard C++ headers (for example, iostream, functional, etc.)
    6. Standard C headers (for example, cstdint, dirent.h, etc.)

Threaded code

Useful degrees of thread safety defined by Chris Jones.

  • thread hostile: uses 'mutable global state' in calls and therefore can not safely be called from multiple threads there are many parts of ROOT that are this way (although less than there used to be)
  • thread friendly: different instances of the same C++ type can be used by different threads safely. ROOT I/O system falls into this category now.
  • const thread safe: if all threads only call const methods of an instance of a class then it is thread safe. This is the case for the ROOT math classes.
  • thread safe: mutable and immutable methods of an instance of a class may be called across threads or a function may be called by multiple threads concurrently. Intel's thread building blocks library has thread-safe containers.


References

This site contains a precise description of c++, especially the standard library, including differences between standards)

This site contains a wealth of expert knowledge, based on specific questions (trying search "std::move")

We support all the recommendations in this document:


The following are all good resources, but have not been vetted for the details: