Skip to main content

The problems of large language models (LLM) programming (training).


"Generative AI’s reliance on extensive data has led to the use of synthetic data, which Rice University research shows can cause a feedback loop that degrades model quality over time. This process, called ‘Model Autophagy Disorder’, results in models that produce increasingly distorted outputs, highlighting the necessity for fresh data to maintain AI quality and diversity. Credit: SciTechDaily ((ScitechDaily, Could AI Eat Itself to Death? Synthetic Data Could Lead To “Model Collapse”)


When we create new models for large language models, LLMs, we teach them. 


When AI learns something, it makes a new database. The databases involve data, that can be fresh. Or it can be the new application for the old data. So connections between databases allow the system to connect databases into the new entirety. The AI and Large Language Model, LLM require lots of data. 

That they learn to operate as they should. There are lots of limitations to data that developers can take from the network. And the AI training requires some permissions. The answer is to use synthetic data for LLM training. The synthetic data means the faces and other things, created by artists. And then researchers can connect those images with certain things. 

The problem with AI training using the data from the network is this: the data that the LLM sees is separated from reality. The abstraction must connect to the right databases. And that is quite challenging. If we want to make the LLM that follows spoken language, we must connect all ways to say something to the database. 

The system must turn dialect words into literary language. And then the system can connect this thing to some actions. That means that the system washes dialects to the literary languages. And then it can follow orders on what to do. Then it just connects words with some actions that are programmed in its databases. 

In the learning process, the system makes the loop. In that loop surrounds the dataset. Then it can connect new data to the loop. The problem is that the system or LLM doesn't think. It doesn't make a difference between synthetic data and real data. 

When human uses imagination we use our memory cells to tell, if the memory is true or false. When our vision center in the brain works with synthetic memories or imagination we know the vision comes from imagination, because there is no image in memory cells that handle kinetic senses like touch. 

The LLM knows that the data is real and that its source is marked as things like cameras. The system sees that the camera sends the signal, and then it marks it as real. But the problem is this. The synthetic data that has carrier ID as it comes from the camera will not separated from the real information. And if a camera takes images from the screen the system will not separate those images from real data. 



"Richard Baraniuk and his team at Rice University studied three variations of self-consuming training loops designed to provide a realistic representation of how real and synthetic data are combined into training datasets for generative models. Schematic illustrates the three training scenarios, i.e. a fully synthetic loop, a synthetic augmentation loop (synthetic + fixed set of real data), and a fresh data loop (synthetic + new set of real data). Credit: Digital Signal Processing Group/Rice University" (ScitechDaily, Could AI Eat Itself to Death? Synthetic Data Could Lead To “Model Collapse”)




"Progressive transformation of a dataset consisting of numerals 1 through 9 across 20 model iterations of a fully synthetic loop without sampling bias (top panel), and corresponding visual representation of data mode dynamics for real (red) and synthetic (green) data (bottom panel). In the absence of sampling bias, synthetic data modes separate from real data modes and merge." (ScitechDaily, Could AI Eat Itself to Death? Synthetic Data Could Lead To “Model Collapse”)

"This translates into a rapid deterioration of model outputs: If all numerals are fully legible in generation 1 (leftmost column, top panel), by generation 20 all images have become illegible (rightmost column, top panel). Credit: Digital Signal Processing Group/Rice University" (ScitechDaily, Could AI Eat Itself to Death? Synthetic Data Could Lead To “Model Collapse”)


"Progressive transformation of a dataset consisting of numerals 1 through 9 across 20 model iterations of a fully synthetic loop with sampling bias (top panel), and corresponding visual representation of data mode dynamics for real (red) and synthetic (green) data (bottom panel). With sampling bias, synthetic data modes still separate from real data modes, but, rather than merging, they collapse around individual, high-quality images." (ScitechDaily, Could AI Eat Itself to Death? Synthetic Data Could Lead To “Model Collapse”)

"This translates into a prolonged preservation of higher quality data across iterations: All but a couple of the numerals are still legible by generation 20 (rightmost column, top panel). While sampling bias preserves data quality longer, this comes at the expense of data diversity. Credit: Digital Signal Processing Group/Rice University"(ScitechDaily, Could AI Eat Itself to Death? Synthetic Data Could Lead To “Model Collapse”)

The upper image tells also the problem with the fuzzy logic. The fuzzy logic is not possible to make for computers. There are only lots of descriptions that are connected with certain actions. 

When the system reads something like hand-written texts there are images about things like numbers or letters. Each hand script requires its image.  In that case, there are images about possibilities of how to write some numbers. The last paragraph's marks cannot connected with numbers even with the best willingness. But if the AI makes decisions using those marks, the result can be a catastrophe. 

If the AI starts image recognition from paragraph 20 and then fills it. The problem is that in the natural environment, nothing is a perfect match with the models. 

And there are only lots of images that the system can compile with camera images. And in the last characters, you can see points, that have no match with the numbers. If there is some kind of dirt on the surface, the system can translate the image to something else than it should. 

When the AI sees some image, that image activates some action. And when AI reads things like postal codes the system reads the number in parts. If the number is 921, the first number is "9". It routes the letter to the line where there are letters for section 9. Then if the second number is "2" it sends it to the sub-delivery line "2". And then the last number "1" to area number "1". 

When the AI makes some reactions it requires two databases. The database that it uses to compile situations that it sees. And that database activates another database there is an action, that is connected to the action that the system sees. 

When LLM and the operating system interconnect databases they require a routing table or routing map. Large-scale systems that can respond to many things require lots of connections and lots of databases. The database connection maps are databases as well as other action and reaction database pairs. 

When the data travels in the loop. It increases the number of databases. So data that surrounds the loops increases data mass even if there is no new data for the system. The collapse in the system happens when allocation units in the hard disks are full. There are trillions of allocation units, but each database requires one. 

https://scitechdaily.com/could-ai-eat-itself-to-death-synthetic-data-could-lead-to-model-collapse/

Comments

Popular posts from this blog

Antigravity will be the greatest thing. That we have ever created.

"Artistic depiction of a fictional anti-gravity vehicle" (Wikipedia, Anti-gravity) Sometimes, if the airships have the same lifting power as the weight of the airship.  It can act like some “antigravity system”. Those systems are based on lighter-than-air gas or hot air. The system can have a helium tank. And the hot-air section whose temperature can be adjusted using microwaves or particles that lasers warm. Those systems are faster to control than some gas flames. This makes it possible. To adjust the lifting power.  If a thing like a balloon has the same lifting power as its weight, the balloon can be lifted to a certain point and altitude. And the balloon stands at that point until something moves it. That kind of thing can make an impression. On the antigravity systems. Modern airships. Like Lockheed-Martin P-791 can look. Like a “UFO”. The system can use systems to move the craft. Or maybe those ion systems are used for plasma stealth systems, if those airships' mis...

The first test flight of X-59 QueSST

The X-59 QueSST (Quiet Supersonic Technology) demonstrator is the next generation of aircraft design. The QueSST technology means. The aircraft creates a gentler sonic boom. Because its wings are radically long and narrow delta wings, and its nose is also radically long, which makes the sonic pressure cone thinner. That technology makes the sonic boom quieter.  The QueSST technology in X-59 is a new and radical design. All of those systems are caricatures. And the final solutions might look far different than the prototypes. The QueSST technology is one of the things. That is planned to be used. It is used in military and civil applications. If that technology is successful. It can be used in manned and unmanned systems. But that requires more work.  The X-59 also uses fundamental technology. Where the pilot must not have windows. To see outside. The camera and other sensors replace traditional windows. And that can be useful in more advanced aircraft that operate at hypersoni...

The theory about paralleled universes

http://crisisofdemocracticstates.blogspot.fi/p/the-theory-about-paralleled-universes.html Kimmo Huosionmaa There is the quite unknown theory about paralleled universes. In this theory, there is not a single universe. Universes are like pearls in the pearl necklace, and there could be the connection between those universes. The connection to other universes would make possible the channel what is forming when the black hole would make the gravity tunnel to another universe. And in the paralleled universe theory, there could be millions of universes in the line, and this is also known as "Multiverse-theory". This theory was established when the galaxy-groups were noticed by astronomers. In that time were noticed that there are so-called super-groups of the galaxy, and those super-groups, where we're so much galaxy that galaxy involved stars made some cosmologists think that maybe there is also groups of universes in the emptiness. This kind of structures is so enormo...