ocfl.Inventory

OCFL Inventory and Version.

The Inventory class provides storage for inventory data and mehtods to conveniently access and manipulate it. The associated Version class provides methods to access and manipulate information about a specific object version. Neither of these classes interact with object content, see ocfl.NewVersion and ocfl.Object.

The storage mechanism for the inventory data is the python dict() structure resulting from reading the inventory JSON file and suitable for writing an inventory JSON file. Here we provide convenient property based access to read and set this data safely without needing such intimate knowledge of the JSON format.

Example

>>> import ocfl
>>> inv = ocfl.Inventory(filepath="fixtures/1.1/good-objects/spec-ex-full/inventory.json")
>>> inv.spec_version
"1.1"
>>> inv.version_numbers
[1, 2, 3]
>>> v2 = inv.version("v2")
>>> v2.logical_paths
['foo/bar.xml', 'empty.txt', 'empty2.txt']
>>> v2.digest_for_logical_path("foo/bar.xml")
'4d27c86b026ff709b02b05d126cfef7ec3aed5f83f5e98df7d7592f7a44bd1dc7f29509cff06b884158baa36a2bbeda11ab8a64b56585a70f5ce1fa96e26eb53'
>>> v2.content_path_for_logical_path("foo/bar.xml")
'v2/content/foo/bar.xml'
>>> import ocfl
>>> inv = ocfl.Inventory()
>>> inv.as_json()
'{}'
>>> inv.spec_version = "1.1"
>>> print(inv.as_json())
{
  "type": "https://ocfl.io/1.1/spec/#inventory"
}
>>> inv.id = "http://example.org/minimal_no_content"
>>> inv.digest_algorithm = "sha512"
>>> ver = inv.add_version("v1")
>>> ver.created = "2019-01-01T02:03:04Z"
>>> ver.message = "One version and no content"
>>> ver.user_name = "Person A"
>>> ver.user_address = "mailto:Person_A@example.org"
>>> print(inv.as_json())
{
  "digestAlgorithm": "sha512",
  "head": "v1",
  "id": "http://example.org/minimal_no_content",
  "type": "https://ocfl.io/1.1/spec/#inventory",
  "versions": {
    "v1": {
      "created": "2019-01-01T02:03:04Z",
      "message": "One version and no content",
      "user": {
        "address": "mailto:Person_A@example.org",
        "name": "Person A"
      }
    }
  }
}
>>> validator = ocfl.InventoryValidator()
>>> validator.validate(inv.data)
False
>>> print(str(validator.log))
[E041a] OCFL Object ??? inventory missing `manifest` attribute (see https://ocfl.io/1.1/spec/#E041)
[E048c] OCFL Object ??? inventory v1 version block does not include a state block (see https://ocfl.io/1.1/spec/#E048)
>>> inv.manifest_add_if_not_present()
{}
>>> ver.state_add_if_not_present()
{}
>>> validator = ocfl.InventoryValidator()
>>> validator.validate(inv.data)
True
class ocfl.Inventory(data=None, filepath=None, pyfs=None)

Class wrapping OCFL inventory data.

In general, the property accessors and methods return None if the attribute is a string that is not set in the underlying data, else the value if it is. In the cases that the normal return value would be an array or a dict, then and empty array or empty dict are returned if not present in the underlying data. In some cases, additional methods with a suffix _add_if_not_present are provided so that assignements will work for previously missing attributes. See, for example, the manifest property and the corresponding manifest_add_if_not_present() method.

data

dict that is the top level JSON object of the parsed JSON representation of the inventory file. This is the only place that an Inventory instance stores information.

__init__(data=None, filepath=None, pyfs=None)

Initialize Inventory object.

Argument:
data: If not None (default) the must be either an Inventory

object or inventory data to initialise from. In either case the underlying data is deep copied to create a separate new object.

filepath: If not None then the string file path from which to

read a JSON inventory file to initialize from.

pyfs: pyfs object for the filesystem to use, else None to use the

local filesystem (default). filepath is interpretted within pyfs

add_file_to_manifest(*, digest, content_path)

Add file to the manifest.

Parameters:
  • digest – the digest string computed with the specified digest algorithm.

  • content_path – the full content path including version directory, content directory, and the path with the content directory.

Adds and entry to the manifest with the specified digest and the specified content_path. Takes account of mutliple content paths with the same digest.

Raises and InventoryException if there is an attempt to add a content_path that is already included.

WARNING: Does not check that the content path is valid in that it is within an extant version director, or that it is within the specified content_directory for a version.

See also: Version.add_file() to add a file with logical_path in the context of a specific version.

add_fixity_data(digest_algorithm, digest, filepath)

Add fixity information for a file.

Parameters:

digest_algorithm – string of the digest algorithm specifying this fixity type

Assumes that there is already fixity block and within that a block for the specific digest_algorithm.

add_fixity_type(digest_algorithm, map=None)

Add fixity type with no file data.

Parameters:
  • digest_algorithm – string of the digest algorithm specifying this fixity type

  • map – None (default) to create an empty entry for the specified digest algorithm, else a dict() with mapping from digest to array of files according to the specified digest algorithm

If there is no fixity data then will start a fixity block.

add_version(vdir=None, metadata=None, state=None, zero_padded_width=None)

Add new version object to the versions block.

Adds the new version block and also updates the head property to the new version directory name.

Parameters:
  • vdir – string with the version directory name (e.g. “v1” or “v0006”). If None then will create the next version in sequence

  • metadata – dict to initialize version metadata with, else None to create empty (default)

  • state – either a dict with the state block for the version, an object with an as_dict() method to producde such a dictionary (e.g. from VersionMetadat), else None (default)

  • zero_padded_width – an integer to set the number if digits used for zero padded identifiers, else None (default). Applies only when creating first version

Returns a Version object that may be used to access version properties.

as_json()

Serlialize JSON representation.

property content

All of the content paths and their digests stored within the object.

Returns a dictionary of content paths with values that are the digests for each file. Essentially an inversion of the manifest.

property content_directory

Get contentDirectory.

property content_directory_to_use

Get contentDirectory to use, default ‘content’ is not specified.

content_path_for_digest(digest)

Content path for the given digest.

Parameters:

digest – string value of digest

Returns a content path or None if there isn’t one for the given digest. There may actually be more than one content path for the given digest, we return the first in the underlying data.

property content_paths

Get all the content paths.

Returns a list of content paths for all files in the object. Will be and empty list if there is no content.

content_paths_for_digest(digest)

Content paths for the given digest.

Returns a list of content paths or and empty list if there isn’t one for the given digest. The list is used because there may be more than one content path for the given digest.

property current_version

Version object for the current (latest) version directory.

property digest_algorithm

Get digest algorithm.

digest_for_content_path(path)

Return digest corresponding to specified content path.

Argument:

path: string of content path

Returns None if the content path is not specified in the manifest, else the path string.

find_logical_path(logical_path)

Find occurrance of logical path in inventory.

Parameters:

logical_path (str) – logical file path within some version of the object described by the current inventory

Returns:

(vdir, content_path) where vdir is the version directory

for the version this logical path was found in, and content_path is the content path of that logical file within that version of the object. In the case that the logical file path doesn’t exist in any version then (None, None) will be returned.

Return type:

tuple

The method searchs backward from the latest version through to the first version. The latest version that the logical path exists in will be return, not any possible earlier version which might correspond with different content.

Example

>>> import ocfl
>>> obj = ocfl.Object(path="fixtures/1.1/good-objects/spec-ex-full")
>>> inv = obj.parse_inventory()
>>> inv.find_logical_path("empty2.txt")
('v3', 'v1/content/empty.txt')
>>> inv.find_logical_path("path that doesn't exist")
(None, None)
property fixity

Get fixity block as dict().

Returns fixity block else an empty dict() if there is no fixity block.

property head

Get head version directory.

property id

Get object id.

init_manifest_and_versions()

Initialize manifest and versions blocks for building new inventory.

property manifest

Get the manifest of digests and corresponding content paths.

Returns dict of digest -> [file paths], an empty dict is there is no manifest.

manifest_add_if_not_present()

Get the manifest of digests and corresponding content paths.

Returns dict of digest -> [file paths]. As a side effect will create an empty dict in the data structure if none was present, so that new data can be added.

normalize_digests(digest_algorithm=None)

Normalize the digests used in manifest and state.

Parameters:

digest_algorithm – string with the name of the digest algorithm used

No arguments and no return value. Operates on the current object in-place, normalizing the digest values use in the manifest and state blocks. Does not change any separate fixity information.

property spec_version

Get specification version from the conformance declaration.

version(vdir)

Version object for the specified version directory.

property version_directories

List of all version directories in order.

Returns a list of all version in the versions block. The values in the list are the version directory names so they will have the “v” prefix and may or may not be zero padded.

See also: inv.version_numbers for just the numbers: [1, 2, 3].

property version_numbers

List of all version numbers as integers.

versiondata(vdir)

Return data for the version in vdir.

Returns a dict whether or not any data exists.

versions()

Generate Version objects for each version.

Yields a Version() object for each version in this Inventiry in numeric order.

property versions_block

Dict of the versions block.

Returns a dict whether or not there is a versions block in the underlying data.

write_json(fh)

Serialize JSON representation to file.

Parameters:

to (fh - filehandle to write)