China – 2023/11/03: in this photo illustration, the Electronic Cash Bitcoin cryptocurrency (BTC, … (+)
Blockchain and confidence
The threshing media has been subsumed by IA media threshing in the past two years. These two technologies are relatively new. AI has a longer pedigree, reselling in the concept of golem and human antiquity. You might think that the blockchain begins with the hash functions and the programming distributed. The work of Leslie Lamport on the distributed systems have the order in time and confidence is necessary for the solution of decentralized confidence and therefore of blockchain. So, at least 40 years more for the blockchain and 80 more for the current forms of the AI.
IT distributed to resolve problems in collaboration requires a time order as well as a means of creating a version of the truth from a set of computers, some of which can be defective or malicious. IT and distributed storage are the necessary condition for decentralization. Independent governance of distributed machines gives us decentralization. Decentralization is therefore based on the nature and propagation of entities which control a distributed calculation and storage infrastructure. According to these measures, even Bitcoin cannot be considered decentralized, because 5 minimum operating pools control mining and a bunch of large institutions, including exchanges, control ramps and to the Bitcoin ecosystem. Whales hold 93% of Bitcoin.
AI challenges
The well -known AI problems include private data leakage, non -managed energy consumption, continuing education reusing its own production, the availability of partitioned and private data to target tailor -made solutions and be paid for private data used in training models. Some of these problems can be resolved by integrating blockchains into AI. General challenges are sketched in the initial section.
The article described by describing a startup, Modelex.ai. Most quotes on how it behaves come from an interview that I made with Jamiel Sheikh, CEO of Modelex.ai. I also read the White Paper .9. Jamiel declares that his startup is in the distributed camp, the solutions can be found in the federated AI area. He proposed several extensions of the model, notably tokenomics, consensus and monetization based on AI using an autonomous and decentralized organization (DAAIO).
AI has usually been controlled by unique entities. By AI, we hear models of large language based on in-depth learning (LLMS) similar to Chatgpt, the generation of images embodied in post-testable v1.5 broadcasting, the audio to the text and the reverse (audio text), and the ultimate: the generation of video such as Sora or the cinema since it combines image, audio, etc. Current training methods for AI require a large amount of data. These data include almost everything that is produced by humanity that is digitized and accessible. Vast amounts of data prevent over-adjustment. Over-adjustment obliges the model to be specialized for the low amount of data and therefore cannot predict precisely. The open source models have broken this story.
There are some problems with this heavy approach to data. First, if the data consumed by the model have already been generated by the AI, the tone and the content of the data do not have the nuance and the variability of the original content. The AI begins to eat its own production and could deteriorate in bias and ineffectiveness. A simple word of Greek origin, “autophagy” describes this phenomenon. Such a development is not a fantasy future scenario and has been considered in nature as the amount of content generated by AI has exploded. The second is data confidentiality. All data used to form AI were in the public domain or accessible to the public even if the data was protected by copyright. This includes scratching portal data that have never been supposed to be used in this way, such as youtube video trains or all the content of the New York Times. An approximate estimate stipulates that data accessible to the public is only the tip of the iceberg, at around 5%. Private data enclosed in institutions, businesses and other entities are 95%. These private data are also cleaner because they have followed a kind of data validation process.
Deepseek: an open AI model
Developments such as Deepseek have shown comparable performance without much data, the last chips or time spent in training. Time and calculation for inference (actual use) using Deepseek increases. Deepseek is also an open model.
An open model means that all the source code of the model is open. In addition, this means that the weights of the model are visible. Anyone can change the model and recycle it or refine it using their own data. This definition by OSI had a decline because training data does not have to be shared for the model to be open source. According to criticism training data, the source code of AI. Without sharing training data, the model cannot be considered open source. Osi defended their definition.
The Modelex method depends on the fact that open models retain private training data, but the model and its intermediate outings open. Address the confidentiality enigma. Namely, how to train the best targeted AI on a particular field (a specific AI model in the field) by increasing the data accessible to the public and improving the results, when certain data is deprived by law or due to the controller. The ecosystem offers a method to prove the model as well as to monetize the effort devoted to the formation of the model.
In certain areas such as health care, hospitals are prevented from sharing private data due to HIPAA rules. If you just take a single type of data, namely X -rays; Take a public AI and train it more using only the private X -rays of your hospital will improve the model. However, a better way is to use X -rays from a lot of hospitals to improve the model much more than if you only use one hospital. Modelex.ai invented a way to share this data in a federated parameter, without losing confidentiality.
Any open model is taken and then formed in series in the federation hospitals, thanks to a chain of random marguerites. In other words, Hospital 1 takes an open and formed model and form it on its private data. Hospital 1 then releases its recycled model to the Federation and Hospital 2 leads to the same AI on their private data. This continues until all hospitals in the Federation are training. The order of trainers can be random. This AI has refined by private data from the Federation is only available in the Federation. The blockchain part is supposed to prove improvements and be paid as contributors as well as to keep private data.
The secret Modelex sauce is the model checking protocol. The model verification protocol A to its basic sample the inputs and the chopped outputs and put a large federated book at each stage of training. The signature of the model is this hash. Each hospital obtains tokens depending on the work they do. The quality of the model after each refinement is also measured, obtaining a quality note and the quantities of tokens. Later, when the models are used, each hospital pays tokens depending on the use. When I heard about it in October 2024, the open source models were clumsy and did not get good results. With the arrival of Deepseek in February 2025, such criticisms lost their bite.
More extensions and tools help the federations to take the next step, namely to share it in a node accessible to the public. The nodes are kept honest via an AI component which checks the model by interactions using known samples inputs and measuring intermediate outings.
Another argument that training data can somehow be exfiltrated or extracted from the model using certain techniques require additional precautions for safety. This is in particular the cleaning of private data of identifiable information as well as other protections against decrease techniques, including entirely homomorphic encryption and ZK tests. Some of these requirements are part of the AI EU law.