Periodic Boundary Conditions
Periodic boundary conditions require a cell node. Triclinic cells are fully supported.
Example:
species = inputs.SpeciesNode(db_name="species")
positions = inputs.PositionsNode(db_name="coordinates")
cell = inputs.CellNode(db_name="cell")
network = networks.Hipnn("HIPNN",
(species, positions,cell),
periodic=True, module_kwargs = network_params)
This will Generate a PeriodicPairIndexer
that searches image cells surrounding the data. It includes wrapping of coordinates
to within the unit cell. Because the nearest images (27 replicates of the cell at
search radius 1) are numerous, periodic pair finding is noticeably more costly in terms of
memory and time than open boundary conditions. The less skewed your cells are, as well as
are the larger cells are compared to the cutoff distance required,
the fewer images needed to be searched in finding pairs.
Dynamic Pair Finder
For highly complex datasets, there is a more flexible pairfinder which can be built as such:
enc, padidxer = indexers.acquire_encoding_padding(
species, species_set=network_params['possible_species'])
pairfinder = pairs.DynamicPeriodicPairs('PairFinder', (positions, species, cell),
dist_hard_max=network_params['dist_hard_max'])
network = networks.Hipnn("HIPNN", (padidxer, pairfinder), periodic=True,
module_kwargs=network_params)
The DynamicPeriodicPairs
object uses an algorithm to
determine how many image cells need to be searched for each system, and iterates through
the systems one by one. The upshot of this is that less memory is required.
However, the cost is that each system is evaluated independently in serial,
and as such the pair finding can be a rather slow operation. This algorithm is
more likely to show benefits when the number of atoms in a training system is highly
variable.
For systems with orthorhombic cells and an interaction radius not greater than any of the
cell side lengths, the KDTreePairs
can be used
alternatively. It should exhibit reduced computation times, especially for large systems.
Pair Finder Memory
When using a trained model to run MD or for any application where atom positions
change only slightly between subsquent model calls,
PeriodicPairIndexerMemory
and
KDTreePairsMemory
can be used to reduce run
time by reusing pair information. Current pair indices are stored in memory and
reused so long as no atom has moved more than skin/2, where skin is an additional
parameter set by the user. Increasing the value of skin will increase the number of
pair distances computed at each step, but decrease the number of times new pairs must
be computed. Skin should be set to zero while training for fastest results.
Caching Pre-computed Pairs
To mitigate the cost of periodic pairfinding (with either of the above methods),
the pairs for each system in the training database can be cached. To do this,
first assemble your modules for training. Then run
precompute_pairs()
on the training module.
This produces a cache of the pairs in the database, and replaces the
pair finder with an input node that gets the information from this cache.
After that, you’ll have to re-assemble a new model for training,
and set the database to use these new inputs.
Example:
training_modules, db_info = \
assemble_for_training(train_loss, validation_losses, plot_maker=plot_maker)
from hippynn.experiment.assembly import precompute_pairs
precompute_pairs(training_modules.model, database,n_images=4)
training_modules, db_info = assemble_for_training(train_loss,
validation_losses,plot_maker=plot_maker)
database.inputs = db_info['inputs']
Caching pairs has several caveats. By default it produces a sparse output,
this sparse tensor cannot be used with the n_workers
argument on a dataloader,
due to current limitations in pytorch. As such we recommend you move the
dataset to a GPU and use n_workers=None
when caching pairs.
What’s not yet supported
During training, we don’t yet have support for mixed PBCs where some directions are periodic and others are open. However, the ASE interface can almost (but not quite yet) handle simulations for such systems, because in this case ASE handles neighbor finding.
We also don’t have support for mixed datasets of open and closed boundaries. To deal with this, you can embed your open systems in a very large box as a pre-processing step.