Artifacts
This page explains what artifacts are, when to use them, and how to work with them in Oscar.jl.
What Artifacts Are and When to Use Them
Artifacts are content-addressed bundles managed by Julia's artifact system and declared in the file Artifacts.toml located in the Oscar.jl root directory. Normally, artifacts are downloaded automatically, or updated when Oscar.jl is installed or updated, unless the required artifact is already present locally. Julia also supports lazy artifacts, which are installed only on demand.
Artifacts allow us to:
- keep the
Oscar.jlrepository small, - distribute data reproducibly,
- avoid duplication of large files across versions.
Artifacts should be used for data that does not belong directly in the Oscar.jl repository, in particular:
- generated data,
- large datasets,
- data not intended to be edited manually.
As a rule of thumb, data stored directly in the Oscar.jl repository should not exceed the size of typical source files (around 100 KB). Small examples, test inputs, and simple hand-written data should remain in the Oscar.jl repository.
Creating, Hosting and Using Artifacts
This section describes how to create, host, register, and use artifacts in the Oscar.jl repository. The workflow typically proceeds as follows:
- [ ] Serialize the Data
- [ ] Host the Data (pack the data into a
gzipcompressed tarball, and upload it to a stable location) - [ ] Register the Artifact (add an entry to Oscar's
Artifacts.tomland open a pull request) - [ ] Use the Artifact (once the pull request has been merged)
We illustrate this workflow with an end-to-end example.
End-to-End Example
The following example illustrates the complete workflow for creating, registering, and using an artifact in Oscar.jl.
Serializing the Data
For this example we will use the .mrdi file format to serialize our data. (You may of course use the file format of your choice.) Here, the data is the number $2^{10}$.
using Oscar
obj = ZZ(2)^10
save("example.mrdi", obj)Hosting the Data
Create a compressed tarball, for example example_data_v1.tar.gz, containing the file example.mrdi.
We host this particular example tarball at https://martinbies.github.io/Materials/Data/example_data_v1.tar.gz.
Registering the Artifact
Register the artifact by adding an entry to Artifacts.toml; see also Julia's artifact documentation. The package ArtifactUtils.jl can help automate parts of this workflow. The following text demonstrates the manual workflow.
First, compute the sha256 and the git-tree-sha1:
using Tar, Inflate, SHA
filename = "/absolute/path/to/example_data_v1.tar.gz"
println("sha256: ", bytes2hex(open(sha256, filename)))
println("git-tree-sha1: ", Tar.tree_hash(IOBuffer(inflate_gzip(filename))))Then add this information, together with the host location, to Artifacts.toml.
In the case at hand, the corresponding entry to Artifacts.toml takes the following form:
[MyExample]
git-tree-sha1 = "ff2f21e623a130f47116d847ae54fd55232b42c1"
lazy = true
[[MyExample.download]]
sha256 = "3bd5a20e84b2e579ecde7dc5d7d4606444daf08407eecc9d8e59be1e468ca5a1"
url = "https://martinbies.github.io/Materials/Data/example_data_v1.tar.gz"You may, of course, replace MyExample in the above entry with any other string that you find descriptive.
Using the Artifact
The artifact string macro is exported by LazyArtifacts. In a standalone Julia session, load it explicitly.
using Pkg.Artifacts
obj_path = artifact"MyExample/example.mrdi"
obj = load(obj_path)Note that an artifact may contain multiple files and subdirectories. We append /example.mrdi (an absolute path inside the artifact tarball) to the artifact name to specify which file in the artifact is to be loaded.
In Oscar.jl source files, the artifact string macro is already available and using Pkg.Artifacts is typically not required.
Serializing the Data
Creating artifacts requires that the corresponding data be serialized locally first. Details are provided in the Serialization page.
We recommend the use of the .mrdi file format for serialization. However, this is not a strict requirement and you may use any file format that you see fit.
Hosting the Data
If you want create an artifact, the first decision is where it is to be safely stored: to ensure future accessibility, artifacts should be hosted at stable and persistent locations.
Preferred options include:
- archival services such as Zenodo,
in particular for long-term or publication-related data,
- other stable hosting solutions agreed upon by the maintainers.
For historical reasons, some artifacts are currently hosted via GitHub release assets, for example at Oscar.jl/archive-tag-1. This approach should be used with care, as GitHub release assets are not intended to function as a long-term artifact registry.
Ensure that existing artifact files are never removed or renamed, as they may be required by older Oscar.jl releases.
When debugging, contributors are encouraged to use temporary staging areas before publishing long-term artifact versions. In particular, publication-related Zenodo entries should typically not be cluttered with intermediate or broken artifact versions created during development.
Data intended for querying may also be suitable for OscarDB.
Registering the Artifact
Recall that creating an artifact typically involves the following steps:
- collecting the relevant data files,
- packing these files into a compressed tarball (
.tar.gz), - uploading the tarball to a stable hosting location,
- adding a corresponding entry to
Artifacts.toml, - opening a pull request with the change to
Artifacts.toml.
Registering refers to the final two steps. Once the change to Artifacts.toml is merged into the Oscar.jl repository, the artifact becomes publicly available.
The end-to-end example explicitly demonstrates the required changes to Artifacts.toml. Additional information is available in Julia's artifact documentation.
Using the Artifact
Artifacts are accessed via Julia's artifact system using the artifact"..." string macro.
The string before the first / refers to the artifact name as defined in Artifacts.toml. Any remaining path components refer to files or subdirectories inside the unpacked artifact. For example,
using Pkg.Artifacts
artifact"MyArtifact/data/example.mrdi"refers to the file data/example.mrdi contained in the artifact MyArtifact.
In Oscar.jl source files, the artifact string macro is already available and using Pkg.Artifacts is typically not required.
Updating Artifacts
General Rules
Artifacts are immutable: To "update" an artifact, you are therefore required to make a new version. Here are the steps to follow:
- create a new tarball with the updated data (we suggest appending a version tag to the artifact name e.g.
-v1,-v2etc.), - upload the tarball to a stable hosting location,
- compute the
sha256andgit-tree-sha1of the tarball of the new version, - update the corresponding entry in
Artifacts.toml(i.e. new filename, and the two new SHA hashes), - open a pull request with the changes to
Artifacts.toml.
Once the pull request is merged, the updated artifact becomes available.
Updating an artifact on a hosting platform alone, for example by uploading a new version to Zenodo, is not sufficient. Any change to the artifact contents changes its hashes and therefore requires a corresponding update of Artifacts.toml.
Any files referenced by the Oscar.jl master branch must not be modified, renamed, or deleted, as they may be required by earlier Oscar.jl releases.
When repeatedly debugging or refining artifacts, contributors are encouraged to use temporary staging areas before publishing long-term versions. In particular, publication-related Zenodo entries should typically not be cluttered with intermediate or broken artifact versions created during development.
Serialization Upgrades
Note that the chosen file format for serialization may be subject to development. In particular, the .mrdi file format, which we recommend for serialization, is under active development. Consequently, its standard evolves over time. Older files remain compatible with newer OSCAR versions; however, loading older artifacts may require upgrade steps during deserialization. For large artifacts, these upgrades may become time consuming. It is therefore recommended to use the most recent serialization standard when creating artifacts and to periodically upgrade older artifacts if appropriate.
Additional details are provided in the Serialization documentation.
Places to look for Inspiration: Artifacts in Upstream Dependencies of Oscar.jl
GAP.jl
GAP.jl uses artifacts to install GAP packages. Detailed maintainer information is provided in GAP.jl/README.maintainer.md.