DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc.

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multi-modal data with a Pythonic API.

💡

DocArray was released under the open-source Apache License 2.0 in January 2022. It is currently a sandbox project under LF AI & Data Foundation.

DocArray is the common data layer used in all Jina AI products.

Release Note (`0.20.0`)

This release contains 8 new features, 3 bug fixes and 7 documentation improvements.

🆕 Features

Milvus document store (#587)

This release supports the Milvus vector database as a document store.

da = DocumentArray(storage='milvus', config={'n_dim': 3))

Root_id for document stores (#808)

When working with a vector database you can now retrieve the root document even if you search at a nested level with sub-indices (for example at chunk level).

top_level_matches = da.find(query=np.random.rand(512), on='@.[image]', return_root=True)

To allow this we now store the root_id in the chunks' tags. You can enable this by passing root_id=True in your document store configuration.

Filtering based on text keywords for Qdrant (#849)

You can now filter based on text keywords for the Qdrant document store.

filter = {
    'must': [
        {"key": "info", "match": {"text": "shoes"}}
    ]
}

results = da.find(np.random.rand(n_dim), filter=filter)

RGB-D representation of 3D meshes (#753)

DocArray already supports 3D mesh representation in different formats and this release adds support for RGB-D representation.

doc.load_uris_to_rgbd_tensor()

Load multi page tiff files into chunks (#845)

Multi page tiff images can now be loaded with load_uri_to_image_tensor().

d = Document(uri="foo.tiff")
d.load_uri_to_image_tensor()
print(d)

<Document ('id', 'uri', 'chunks') at 7f907d786d6c11ec840a1e008a366d49>
  └─ chunks
     ├─ <Document ('id', 'parent_id', 'granularity', 'tensor') at 7aa4c0ba66cf6c300b7f07fdcbc2fdc8>
     ├─ <Document ('id', 'parent_id', 'granularity', 'tensor') at bc94a3e3ca60352f2e4c9ab1b1bb9c22>
     └─ <Document ('id', 'parent_id', 'granularity', 'tensor') at 36fe0d1daf4442ad6461c619f8bb25b7>

key_frame_indices are now stored in a Document's tags when loading a video to tensor. This allows extracting the section of the video between key frames.

d = Document(uri="video.mp4").load_uri_to_video_tensor()
print(d.tags['keyframe_indices'])

[0, 25, 196, ...]

Better plotting of embeddings for nested and complex data (#891)

You can now choose which meta field parameters to exclude when calling DocumentArray's plot_embedding() method. This makes it easier to plot embeddings for complex and nested data.

docs.plot_embeddings(exclude_fields_metas=['chunks'])

Better support for information retrieval evaluation (#826)

This release adds a max_rel_per_label parameter to better support metric calculations that require the number of relevant Documents.

metrics = da.evaluate(['recall_at_k'], max_rel_per_label={i: 1 for i in range(3)})

🐞 Bug Fixes

Support length calculation independently from list-like behavior (#840)

Our prior minor release, DocArray 0.19, added the ability to instantiate a document store without list-like behavior for improved performance. However, calculating the length of certain document stores relied on such list-like behavior. This release fixes length calculation for the Redis document store, making it independent from list-like behavior.

Remove cosine similarity field with false assignment (#835)

In the Weaviate document store, cosine distance is no longer mistakenly assigned to the cosine_similarity field.

Rebuild index after clearing storage (#837)

The index for Redis and Elasticsearch document stores is now rebuilt when _clear_storage is called.

📗 Documentation Improvements

Correct Document description (#842)
Minor correction in Document description (#834)
Add username to DocArray pull (#847)
Fix broken docs (#805)
Fix data management section (#801)
Change logic order according to blog (#797)
Move cloud support to integrations (#798)

🤘 Contributors

We would like to thank all contributors to this release:

Delgermurun (@delgermurun)
Anne Yang (@AnneYang720)
anna-charlotte (@anna-charlotte)
Johannes Messner (@JohannesMessner)
Alex Cureton-Griffiths (@alexcg1)
AlaeddineAbdessalem (@alaeddine-13)
dong xiang (@dongxiang123)
coolmian (@coolmian)
Joan Fontanals (@JoanFM)
Nan Wang (@nan-wang)
samsja (@samsja)
Michael Günther (@guenthermi)

DocArray 0.20 Update

Release Note (`0.20.0`)

🆕 Features

Milvus document store (#587)

Root_id for document stores (#808)

Filtering based on text keywords for Qdrant (#849)

RGB-D representation of 3D meshes (#753)

Load multi page tiff files into chunks (#845)

Store key frame indices when loading video tensor from uri (#880)

Better plotting of embeddings for nested and complex data (#891)

Better support for information retrieval evaluation (#826)

🐞 Bug Fixes

Support length calculation independently from list-like behavior (#840)

Remove cosine similarity field with false assignment (#835)

Rebuild index after clearing storage (#837)

📗 Documentation Improvements

🤘 Contributors

Engineering Group

... and You!

Jina 3.13 Update

DocArray 0.20.1 Update

DocArray 0.20 Update

Release Note (0.20.0)

🆕 Features

Milvus document store (#587)

Root_id for document stores (#808)

Filtering based on text keywords for Qdrant (#849)

RGB-D representation of 3D meshes (#753)

Load multi page tiff files into chunks (#845)

Store key frame indices when loading video tensor from uri (#880)

Better plotting of embeddings for nested and complex data (#891)

Better support for information retrieval evaluation (#826)

🐞 Bug Fixes

Support length calculation independently from list-like behavior (#840)

Remove cosine similarity field with false assignment (#835)

Rebuild index after clearing storage (#837)

📗 Documentation Improvements

🤘 Contributors

Engineering Group

... and You!

Jina 3.13 Update

DocArray 0.20.1 Update

Release Note (`0.20.0`)