There has been a lot of noise in the creative industries about AI of late. A number of large scale machine learning projects - GPT, DALL-E (& Craiyon), Midjourney etc. have been producing results that seem way beyond what a lot of generative systems have been able to make before. Even allowing for hype, over excitement and noise - what these things do is pretty interesting. One of the big sources of ‘panic’ has been the idea that creative artists are seeing the start of their own replacement by machines. Now - while I think what is happening is *really* interesting - I think we are a long way from that but let me talk about what I think is actually happening first.
My hypothesis is what this systems are are “memories” possibly very similar to the ones in our own minds. Now, you might object, that’s all very well but they’re surely doing a lot of processing? They seem to understand language? And that I would argue is indeed the most interesting part:
We tend to think of memories as objects, something like photographs or files in an archive or something, but that’s not really how our own memories seem to work. What if a memory system generalises and synthesises. We already know that a lot of what we take as the smooth function of the brain is largely a fiction - our vision only really can see a tiny part of our field of view in actual precision detail , our brain mostly fills in the periphery. Our “consciousness” is not the continuous experience we think it is - some like Metzinger argue that it is in fact largely a fiction too. A memory that’s a series of snapshots wouldn’t be that helpful either. “Get me all the images of tigers” is far less use that “retrieve all the instances of a situation with these features” where “these features” include a large striped animal with teeth and claws in front of you.
This has some interesting implications - if memory can do this generalisation and synthesis then it’s doing a lot of heavy lifting in our brains. The fact that GPT seems so adept with language would suggest that we don’t learn language in any way close to all the language teaching systems we try to use - we learn language by generalising memories of the way people use it around us.
I would also suggest that this implies that the images and texts generated by these models are akin to dreams - the free running , meaning free associations our memories make freed of conscience, narrative or intent.
This has some very exciting implications - from the mundane: we should learn languages by immersing ourselves in them, watching films in a language, spending time with people who only talk that language, listening and watching people speaking it to the more exotic - perhaps we ARE closer to building intelligences than we thought. By adding intent, narrative , filters we can shape these memory machines to produce something useful
There are some other interesting corollaries - we are currently building these memories by shoving as much of the internet as possible into them, unfiltered without any value labels or reflection. In true tech bro style , more and bigger is better. Now colour me cynical but I don’t think that’s the best way to learn stuff and perhaps smaller models with some ability to weigh the value of the input and reflect on value might be better. Secondly - we still need to factor in conscience , intent and narrative - at least two of those are a long way off tech bro radars too. There is definitely space here for some interesting research and probably interesting commercial companies working in the spaces left by the insane capitalist desires fed by VC and valley culture.
Are these systems going to replace us creatives? No. As they currently exist they are cultural samplers something that this blog post was originally going to be about and to which I will return - the cutup of the Dadists and Burroughs writ large.