ocfl.Inventory
OCFL Inventory and Version.
The Inventory class provides storage for inventory data and mehtods to conveniently access and manipulate it. The associated Version class provides methods to access and manipulate information about a specific object version. Neither of these classes interact with object content, see ocfl.NewVersion and ocfl.Object.
The storage mechanism for the inventory data is the python dict() structure resulting from reading the inventory JSON file and suitable for writing an inventory JSON file. Here we provide convenient property based access to read and set this data safely without needing such intimate knowledge of the JSON format.
Example
>>> import ocfl
>>> inv = ocfl.Inventory(filepath="fixtures/1.1/good-objects/spec-ex-full/inventory.json")
>>> inv.spec_version
"1.1"
>>> inv.version_numbers
[1, 2, 3]
>>> v2 = inv.version("v2")
>>> v2.logical_paths
['foo/bar.xml', 'empty.txt', 'empty2.txt']
>>> v2.digest_for_logical_path("foo/bar.xml")
'4d27c86b026ff709b02b05d126cfef7ec3aed5f83f5e98df7d7592f7a44bd1dc7f29509cff06b884158baa36a2bbeda11ab8a64b56585a70f5ce1fa96e26eb53'
>>> v2.content_path_for_logical_path("foo/bar.xml")
'v2/content/foo/bar.xml'
>>> import ocfl
>>> inv = ocfl.Inventory()
>>> inv.as_json()
'{}'
>>> inv.spec_version = "1.1"
>>> print(inv.as_json())
{
"type": "https://ocfl.io/1.1/spec/#inventory"
}
>>> inv.id = "http://example.org/minimal_no_content"
>>> inv.digest_algorithm = "sha512"
>>> ver = inv.add_version("v1")
>>> ver.created = "2019-01-01T02:03:04Z"
>>> ver.message = "One version and no content"
>>> ver.user_name = "Person A"
>>> ver.user_address = "mailto:Person_A@example.org"
>>> print(inv.as_json())
{
"digestAlgorithm": "sha512",
"head": "v1",
"id": "http://example.org/minimal_no_content",
"type": "https://ocfl.io/1.1/spec/#inventory",
"versions": {
"v1": {
"created": "2019-01-01T02:03:04Z",
"message": "One version and no content",
"user": {
"address": "mailto:Person_A@example.org",
"name": "Person A"
}
}
}
}
>>> validator = ocfl.InventoryValidator()
>>> validator.validate(inv.data)
False
>>> print(str(validator.log))
[E041a] OCFL Object ??? inventory missing `manifest` attribute (see https://ocfl.io/1.1/spec/#E041)
[E048c] OCFL Object ??? inventory v1 version block does not include a state block (see https://ocfl.io/1.1/spec/#E048)
>>> inv.manifest_add_if_not_present()
{}
>>> ver.state_add_if_not_present()
{}
>>> validator = ocfl.InventoryValidator()
>>> validator.validate(inv.data)
True
- class ocfl.Inventory(data=None, filepath=None, pyfs=None)
Class wrapping OCFL inventory data.
In general, the property accessors and methods return None if the attribute is a string that is not set in the underlying data, else the value if it is. In the cases that the normal return value would be an array or a dict, then and empty array or empty dict are returned if not present in the underlying data. In some cases, additional methods with a suffix
_add_if_not_presentare provided so that assignements will work for previously missing attributes. See, for example, themanifestproperty and the correspondingmanifest_add_if_not_present()method.- data
dict that is the top level JSON object of the parsed JSON representation of the inventory file. This is the only place that an Inventory instance stores information.
- __init__(data=None, filepath=None, pyfs=None)
Initialize Inventory object.
- Argument:
- data: If not None (default) the must be either an Inventory
object or inventory data to initialise from. In either case the underlying data is deep copied to create a separate new object.
- filepath: If not None then the string file path from which to
read a JSON inventory file to initialize from.
- pyfs: pyfs object for the filesystem to use, else None to use the
local filesystem (default). filepath is interpretted within pyfs
- add_file_to_manifest(*, digest, content_path)
Add file to the manifest.
- Parameters:
digest – the digest string computed with the specified digest algorithm.
content_path – the full content path including version directory, content directory, and the path with the content directory.
Adds and entry to the manifest with the specified digest and the specified content_path. Takes account of mutliple content paths with the same digest.
Raises and InventoryException if there is an attempt to add a content_path that is already included.
WARNING: Does not check that the content path is valid in that it is within an extant version director, or that it is within the specified content_directory for a version.
See also: Version.add_file() to add a file with logical_path in the context of a specific version.
- add_fixity_data(digest_algorithm, digest, filepath)
Add fixity information for a file.
- Parameters:
digest_algorithm – string of the digest algorithm specifying this fixity type
Assumes that there is already fixity block and within that a block for the specific digest_algorithm.
- add_fixity_type(digest_algorithm, map=None)
Add fixity type with no file data.
- Parameters:
digest_algorithm – string of the digest algorithm specifying this fixity type
map – None (default) to create an empty entry for the specified digest algorithm, else a dict() with mapping from digest to array of files according to the specified digest algorithm
If there is no fixity data then will start a fixity block.
- add_version(vdir=None, metadata=None, state=None, zero_padded_width=None)
Add new version object to the versions block.
Adds the new version block and also updates the head property to the new version directory name.
- Parameters:
vdir – string with the version directory name (e.g. “v1” or “v0006”). If None then will create the next version in sequence
metadata – dict to initialize version metadata with, else None to create empty (default)
state – either a dict with the state block for the version, an object with an as_dict() method to producde such a dictionary (e.g. from VersionMetadat), else None (default)
zero_padded_width – an integer to set the number if digits used for zero padded identifiers, else None (default). Applies only when creating first version
Returns a Version object that may be used to access version properties.
- as_json()
Serlialize JSON representation.
- property content
All of the content paths and their digests stored within the object.
Returns a dictionary of content paths with values that are the digests for each file. Essentially an inversion of the manifest.
- property content_directory
Get contentDirectory.
- property content_directory_to_use
Get contentDirectory to use, default ‘content’ is not specified.
- content_path_for_digest(digest)
Content path for the given digest.
- Parameters:
digest – string value of digest
Returns a content path or None if there isn’t one for the given digest. There may actually be more than one content path for the given digest, we return the first in the underlying data.
- property content_paths
Get all the content paths.
Returns a list of content paths for all files in the object. Will be and empty list if there is no content.
- content_paths_for_digest(digest)
Content paths for the given digest.
Returns a list of content paths or and empty list if there isn’t one for the given digest. The list is used because there may be more than one content path for the given digest.
- property current_version
Version object for the current (latest) version directory.
- property digest_algorithm
Get digest algorithm.
- digest_for_content_path(path)
Return digest corresponding to specified content path.
- Argument:
path: string of content path
Returns None if the content path is not specified in the manifest, else the path string.
- find_logical_path(logical_path)
Find occurrance of logical path in inventory.
- Parameters:
logical_path (str) – logical file path within some version of the object described by the current inventory
- Returns:
- (vdir, content_path) where vdir is the version directory
for the version this logical path was found in, and content_path is the content path of that logical file within that version of the object. In the case that the logical file path doesn’t exist in any version then (None, None) will be returned.
- Return type:
tuple
The method searchs backward from the latest version through to the first version. The latest version that the logical path exists in will be return, not any possible earlier version which might correspond with different content.
Example
>>> import ocfl >>> obj = ocfl.Object(path="fixtures/1.1/good-objects/spec-ex-full") >>> inv = obj.parse_inventory() >>> inv.find_logical_path("empty2.txt") ('v3', 'v1/content/empty.txt') >>> inv.find_logical_path("path that doesn't exist") (None, None)
- property fixity
Get fixity block as dict().
Returns fixity block else an empty dict() if there is no fixity block.
- property head
Get head version directory.
- property id
Get object id.
- init_manifest_and_versions()
Initialize manifest and versions blocks for building new inventory.
- property manifest
Get the manifest of digests and corresponding content paths.
Returns dict of digest -> [file paths], an empty dict is there is no manifest.
- manifest_add_if_not_present()
Get the manifest of digests and corresponding content paths.
Returns dict of digest -> [file paths]. As a side effect will create an empty dict in the data structure if none was present, so that new data can be added.
- normalize_digests(digest_algorithm=None)
Normalize the digests used in manifest and state.
- Parameters:
digest_algorithm – string with the name of the digest algorithm used
No arguments and no return value. Operates on the current object in-place, normalizing the digest values use in the manifest and state blocks. Does not change any separate fixity information.
- property spec_version
Get specification version from the conformance declaration.
- version(vdir)
Version object for the specified version directory.
- property version_directories
List of all version directories in order.
Returns a list of all version in the versions block. The values in the list are the version directory names so they will have the “v” prefix and may or may not be zero padded.
See also: inv.version_numbers for just the numbers: [1, 2, 3].
- property version_numbers
List of all version numbers as integers.
- versiondata(vdir)
Return data for the version in vdir.
Returns a dict whether or not any data exists.
- versions()
Generate Version objects for each version.
Yields a Version() object for each version in this Inventiry in numeric order.
- property versions_block
Dict of the versions block.
Returns a dict whether or not there is a versions block in the underlying data.
- write_json(fh)
Serialize JSON representation to file.
- Parameters:
to (fh - filehandle to write)