pretraining module

Full Documentation for hippynn.pretraining module. Click here for a summary page.

Things to do before training, i.e. initialization of network and diagnostics.

calculate_max_system_force(array_dict: dict, species_name: str, force_name: str, device: device = None, batch_size: int = 50)[source]

Calculates the maximum force magnitude in each system in array_dict.

Example usage for unsplit data:

>>> db = Database(...)
>>> max_force = calculate_max_system_force(db.arr_dict,"Z","F")

If the database has been split:

>>> max_force_train = calculate_max_system_force(db.splits['train'],"Z","F")

Example usage to prune out high-force data:

>>> db = Database(...)
>>> force_threshold = ...
>>> max_force = calculate_max_system_force(db.arr_dict,"Z","F")
>>> high_force_system = max_force > force_threshold
>>> db.arr_dict = {k:v[~high_force_system] for k,v in db.arr_dict.items()}
Parameters:
  • array_dict – dictionary mapping strings to tensors/numpy arrays

  • species_name – dictionary key for species

  • force-name – dictionary key for positions

  • device – Where to perform the computation.

  • batch_size – batch size to perform evaluation over.

Returns:

calculate_min_dists(array_dict: dict, species_name: str, positions_name: str, dist_hard_max: float, cell_name: str = None, device: device = None, pair_finder_class: _BaseNode = 'auto', batch_size: int = 50)[source]

Calculates the minimum distance found in each system in array_dict.

Example usage for unsplit data:

>>> db = Database(...)
>>> min_dists = calculate_min_dists(db.arr_dict,"Z","R",5.0)

If the database has been split:

>>> min_dists_train = calculate_min_dists(db.splits['train'],"Z","R",5.0)

Example usage to prune out low-distance data:

>>> db = Database(...)
>>> dist_threshold = ...
>>> min_dist = calculate_min_dists(db.arr_dict,"Z","R",5.0)
>>> low_distance_system = min_dist < dist_threshold
>>> db.arr_dict = {k:v[~low_distance_system] for k,v in db.arr_dict.items()}

Note

The cutoff radius dist_hard_max should be set large enough such that each atom is expected to have at least one neighbor. If an atom has no neighbors, its min_dist will be set to the largest distance found in the current batch. If an entire system has no neighbors, the minimum distance will be set to zero.

Parameters:
  • array_dict – dictionary mapping strings to tensors/numpy arrays

  • species_name – dictionary key for species

  • positions_name – dictionary key for positions

  • dist_hard_max – maximum distance to search

  • cell_name – dictionary key for cell (periodic boundary conditions. if the cell is not specified, open boundaries are used.

  • pair_finder_class – if ‘auto’, choose automatically. elsewise build this kind of pair finder.

  • device – Where to perform the computation.

  • batch_size – batch size to perform evaluation over.

Returns:

hierarchical_energy_initialization(energy_module, database=None, trainable_after=False, decay_factor=0.01, encoder=None, energy_name=None, species_name=None, peratom=False)[source]

Computes values for the non-interacting energy using the training data.

Parameters:
  • energy_module – HEnergyNode or torch module for energy prediction

  • database – InterfaceDB object to get training data, required if model contains E0 term

  • trainable_after – Determines if it should change .requires_grad attribute for the E0 parameters

  • decay_factor – change initialized weights of further energy layers by df**N for layer N

  • encoder – species encoder, can be auto-identified from energy node

  • energy_name – name for the energy variable, can be auto-identified from energy node

  • species_name – name for the species variable, can be auto-identified from energy node

  • peratom

Returns:

None

set_e0_values(*args, **kwargs)[source]