Search

What makes open source AI trustworthy?


Trustworthiness of Open Source Artificial Intelligence

Generative AI, a form of artificial intelligence (AI), presents vast possibilities for open exploration and advancement. However, the evolution of AI into a commercial entity has sparked concerns regarding transparency, reproducibility, as well as crucial aspects such as security, privacy, and safety.

Debates have emerged regarding the risks and advantages of making AI models open source, a realm that the open source community is familiar with, where initial skepticism often transitions into acceptance. Nonetheless, it is vital to recognize the distinctions between open source code and open source AI models.

Defining Open Source AI

The concept of an “open source AI model” is continually evolving as researchers and industry specialists strive to outline its structure. Rather than delving into the debate over its specifics, our primary goal is to showcase how the IBM Granite model embodies openness and illustrate why an open source model inherently garners trust.

The Significance of Open Source Licenses

Central to the open source movement is the practice of releasing software code under licenses that confer independence and authority to users, enabling them to inspect, adapt, and distribute the code without constraints. OSI-approved licenses like Apache 2.0 and MIT have played a pivotal role in fostering global collaborative development, enabling freedom of choice, and driving accelerated progress.

Diverse models, such as the IBM Granite model and its variations, are unleashed under the permissive Apache 2.0 license. While numerous AI models are being unveiled with permissive licenses, they encounter several challenges, which we will elaborate on subsequently.

The Role of Open Data

The term “large” in reference to a “large language model” (LLM) alludes to the substantial volume of data essential for training the model, alongside the myriad parameters that constitute it. Model effectiveness is often gauged by the number of input tokens – frequently reaching the trillions for a proficient model – utilized during the training phase.

Contrary to closed models, where the data sources used for model pre-training and fine-tuning are cloaked in secrecy and serve as a core distinguishing factor from analogous products crafted by rival companies, we advocate for the disclosure of data employed in priming and refining an AI model for it to genuinely qualify as open source.

The Freedom to Modify and Share

This transitions us to the challenges posed by models launched under permissive licenses:

  • Due to the methodology involved in constructing and disseminating these models, direct contributions to the models themselves prove unfeasible, leading community contributions to manifest as branches of the original model, thereby compelling consumers to opt for a “best-fit” model that lacks seamless extensibility, rendering these branches cumbersome for model creators to maintain.
  • Many individuals encounter difficulties in branching, training, and enhancing models due to their limited familiarity with AI and machine learning (ML) technologies.
  • A deficiency in community governance or standardized practices concerning the review, curation, and distribution of branched models persists.

Red Hat and IBM have introduced InstructLab, an agnostic open source AI initiative that simplifies the process of contributing to LLMs. This technology empowers model upstreams with ample infrastructure resources to produce regular builds of their open source licensed models.

The resources are not geared towards completely rebuilding and retraining the model but rather refining it by incorporating fresh skills and knowledge. Subsequently, these projects can receive pull requests for these enhancements, which are then integrated into the subsequent build.

In essence, InstructLab facilitates community contributions to AI models without necessitating outright forks. These contributions can be directed “upstream,” enabling developers to fortify the original model with updated taxonomies, which can be further disseminated to other users and contributors.

The Impact of Modifiability and Sharing on Security and Safety

This endeavor empowers community members to append their data to the foundational model in a reliable manner. They can fine-tune the model’s safety parameters through taxonomy, integrating supplementary safety measures. Moreover, the community can enhance the model’s security and safety posture without revisiting the pre-training phase, a process that is both expensive and time-intensive.

IBM and Red Hat are active participants in the AI Alliance, a collective striving to elucidate the definition of open source AI on an industry-wide scale with regards to governance, protocols, and norms.

An open, transparent, and responsible approach to AI will propel advancements in AI safety, granting developers and researchers within the open community the capacity to tackle the significant perils of AI and alleviate them through apt solutions.

Take a deeper dive into InstructLab Learn more about InstructLab

🤞 Don’t miss these tips!

Share it

🤞 Don’t miss these tips!

Solverwp- WordPress Theme and Plugin