How will AI replace coders or developers

How today's machines learn to learn

He is world-famous, a sought-after expert in the field of machine learning - and a teenager: the 16-year-old Canadian Tanmay Bakshi inspires audiences far beyond developer circles.

Bakshi was the star of this year's edition of WeAreDevelopers Live Week. At the developer conference, which took place virtually this year due to the corona, Bakshi gave a lecture on Natural Language Processing (NLP) on the second day - i.e. models for the machine processing of natural language. The key question was: What exactly do such language models learn?

ThePresentationstheliveWeekareon-line - alsotheKeynoteofTanmayBakshi.(Source:Screenshotyoutube.com/watch?v=mV-UiKM5VIg)

Bakshi speaks fast, thinks even faster - but rarely appears over the top. You can tell that he likes to do his thing. And above all, that he knows what he is talking about. When Bakshi started programming, he was just 5 years old. When he was 9, he developed his first iOS app. And when he was 12, IBM hired him - as the world's youngest Watson developer. Today Bakshi is a sought-after speaker, Google developer expert for machine learning, IBM champion for cloud and three-time book author.

How can machine learning models recognize language patterns the way humans do? How can you teach a machine to write? Bakshi not only gave answers to these and other questions. He also showed how current NLP models work - and why they don't work the way we might imagine.

See how # NaturalLanguage-writing #NeuralNetworks are just # NaturalLanguage-reading networks disguised as #writers w / autoregressive generation, probe #BERT for unsupervised syntax trees, live demos / coding & more on @WeAreDevs Live Week Oct 6 @ 12:30 PM EST ! https: //t.co/FrHOusAvNmpic.twitter.com/gABF3PbQwm

- Tanmay Bakshi (@TajyMany) October 5, 2020

"The thing is, text generators like OpenAI's GPT-3 are impressive, but they can't really write," said Bakshi. What comes out of it is logically inconsistent. "We pretend that programs like this can replace anything: marketing people, journalists, songwriters, speakers - even developers! That's not true at all." After a brief foray into the history of machine learning, Bakshi explained why this idea is wrong.

In principle, classic NLP models are nothing more than simple, statistical models that use a series of words to calculate which word is next and with what probability. Such models have been modified so that they can generate natural language. "It didn't work very well - to say the least," Bakshi said with a grin. Because these statistical models have learned nothing from the language.

Read here why AI can compose music but not write books in a background report on artificial intelligence.

LSTM: The good old gold standard

Researchers and developers later brought artificial neural networks into play. They are able to actually learn and recognize complex patterns. "It turned out to be quite successful." And for a while there was something like a gold standard for this, as Bakshi said: so-called recurrent neural networks and, in particular, their further development using a technology called "long short-term memory" (LSTM).

The approach comes from the German computer scientist and director of the Ticino Dalle Molle Research Institute for Artificial Intelligence Jürgen Schmidhuber. LSTM gave machine learning models a kind of short-term memory that lasts for a long time. The technology celebrated significant successes. It can be found in translation programs such as Google Translate and in language assistants such as Amazon's Alexa or Apple's Siri.

InthisTutorialexplainedtheback thenstill13 year oldTanmayBakshi,howmanWithLSTMandGoogleTensorflowamodeltoWord predictiondevelopcan.

But LSTMs also have disadvantages, as Bakshi said. "LSTM methods are very slow, have difficulty learning long sequences - and what annoys me most: They don't understand any context." The reason: LSTMs treat natural language like time series data. They calculate every single word based on the calculation of previous words. "The main reason they used LSTMs was because there was nothing better," said Bakshi.

BERT is the word

That changed with a paper from 2017. It comes from the pen of Google researchers and is entitled: "Attention is all you need".

This paper forms the basis for a new deep learning architecture, said Bakshi. Initially, it received little attention, although Google implemented the concept to optimize its translation and search algorithms, for example. But since another team of Google researchers jumped on it, the concept has made waves. At the end of 2018, Google released the project as open source under the name BERT.

BERT is an acronym and stands for: Bidirectional Encoder Representations from Transformers. Google is thus improving its search engine in such a way that it can better understand users' queries - by not only analyzing individual keywords, but also the semantic context of search queries.

"Compared to LSTM processes, BERT is much faster," said Bakshi. But much more important: BERT is able to understand context. This is because not one word after the other, but entire sequences are analyzed at once.

The"B"in theNamesBERTstandsForBidirectional.in thecontrastto theGPT approachofOpenAIprocessedBERTtheDatasoNotexclusivelysequentially.(Source:ai.googleblog.com)

Learn to read like a human

"What fascinates me about it is the fact that BERT learns to read in a similar way to humans," said Bakshi. "I still remember my time in kindergarten - that was about 12 years ago: there were these cloze-blanks tasks. Filling in these gaps in the text made me understand natural language. I had to think about the structure of sentences, about different parts of speech, their positioning and dependencies. And this is exactly how BERT training works. "

(Source:Screenshotyoutube.com/watch?v=mV-UiKM5VIg)

BERT also analyzes not only sentences and words, but also their components and their relationships to one another. So if a new, as yet unknown term appears, the neural network can take it apart and derive its meaning from the building blocks of the word.

Bakshi also showed live how it all works. Those interested can find the code for the demo on Github. The model is deliberately greatly simplified: Instead of using entire sentences or words as input, it examines individual letters. The idea: As with a fill-in-the-blank exercise, the model should calculate the correct letter for a search term that is missing a character.

The test to the musical example

Bakshi had trained the demo model with names of musicians - with a random selection of artist names from Spotify or Last.fm. The principle of the demonstration: You enter a name - for example Tom Waits - mask one of the text characters (so that there is perhaps Tom Wai_s) and let the model calculate which letter could be missing. As a result, the model shows several possibilities and their predicted probabilities.

"BERT is not always right," said Bakshi. That is not the idea either, because the learning goal is only for pre-training. In this way, the model is supposed to develop a basic understanding of natural language to a certain extent.

In this case, the neural network has learned nothing other than to fill in gaps in the text. Nevertheless, it recognizes similarities: that for example vowels occur in similar contexts. The same applies, for example, to the consonants K and C. This seems obvious to humans because these letters sound similar. For a neural network like this, however, this is anything but trivial, said Bakshi.

"A gift for linguists"

"The amazing thing about it: BERT has never seen a syntax tree and is still able to derive rules for the sequence of characters."

(Source: Screenshot youtube.com/watch?v=mV-UiKM5VIg)

Without having been trained, BERT learn how language is structured - "because it makes sense, so to speak. This is how learning languages ​​works," said Bakshi. "This is a gift for linguists." Because it not only proves that networks have the ability to learn languages. "It also shows that the structure of natural language also makes sense from a mathematical point of view."

An epistemological interlude

Back to the initial thesis: "The neural networks that we train as 'writers' today cannot really write." However, they are very good at reading natural language. This not only applies to the notorious "write AI" from OpenAI, but also to BERT. Google's new pre-training method is the new gold standard for many NLP applications because BERT can better understand the context of words. But that alone is not enough to develop a real "typewriter". Why this is so depends on what the writing implies.

"We humans associate writing with creativity - and rightly so. Because when we write, we don't just transcribe thoughts into language. That's just the easy part. The hard part about writing is getting the idea in the first place," said Bakshi . And how do you get a thought? Bakshi has designed a simple scheme for this. Quasi a theory of natural language.

(Source:Screenshotyoutube.com/watch?v=mV-UiKM5VIg)

The process as modeled by Bakshi begins with the fact that we perceive environmental influences and process them together with existing knowledge and experience - sometimes even subconsciously. So at the beginning of writing it's about developing a thought and shaping it - before it even comes down to formulating it. "That's what makes us human," said Bakshi.

When the idea is more or less mature, the "easy part" comes: "Formulating a thought in language is also fairly easy for neural networks. That is why translation programs work so well today. That is only because the thoughts have already been made. Just stay that way nor the task of deciphering and translating the meaning of what has been said or written. "

NLP models are unlikely to be able to replace writers, songwriters or software developers, at least for the foreseeable future. That cannot be the goal either, as Bakshi said. "We don't always want larger models. We want models with the best possible learning goals, with which they can teach themselves to write better. With such models we can learn a lot about language that we don't yet know."

(Source: Screenshot youtube.com/watch?v=mV-UiKM5VIg)

Bakshi's goal is to make technologies like BERT and machine learning in general more accessible to more people. This is how he describes it on his website, where he also provides learning videos, among other things.

TheIntrotoBakshisVideo seriescalledTanmayTeaches-back thenwashestill12Yearsold.

The WeAreDevelopers Live Week revolved around the five topics of security, machine learning, cloud, blockchain and DevOps. In the run-up to the event, the organizers Benjamin Ruschin and Sead Ahmetović talked about the most important trends in the development environment - and how it came about that the event came to Switzerland in the first place.