“researcher and practitioner survey[s] shows that 83.8% of participants are unaware of or unsure about any implementation-level variance.”
Even if you use the same dataset and software, machine learning (ML) results can vary when run on different hardware and software versions. In order to ensure your ML results can be reproduced by others, consider documenting the following factors:
Initialization seeds - note the seeds used
Parallel execution - note the number of threads used
Processing unit - note which processors were used
Software - include the exact version of the operating system and the complete software stack used.
Even better, include a link to the container.
Other factors to consider:
Compiler settings
Auto-selection of primitive ops
Floating-point operations
Rounding errors
Comments