I know you have heard the adage, and you believe it and likely accept it, like you know a litany of similar accepted truths: A picture is worth a thousand words.
For over a century that expression has been used in one form or another, spanning language and continents, across time and space, permeating most human cultures. And it is true. Because our visual processing bandwidth is measurably many orders of magnitude greater than our audio processing capabilities, for the sake of this blog entry, let us presume the adage to be true.
But what is also true in this century is the inverse: A word is worth a thousand pictures.
With the recent emergence of modern neural network architectures, the words of Edsger Dijkstra are actually more accurate: A formula is worth a thousand pictures.
Given the tools we have at our disposal today, given the availability of not only the ‘formula’ in the way of open source neural network designs plus cheap and ubiquitous compute cycles, a word today is indeed worth a thousand pictures and well capable of generating images as such.
You are, no doubt, aware of the interesting and sometimes disturbing creations of the DALL-E class of AI systems which are able to generate images from words. Such systems enable thousands of artisans, creators, and neophytes to quickly imagine and produce thousands of images from a collection of but a few words.
Real world value has emerged from these efforts, with marketing and graphic creation finding a novel set of new opportunities emerging as this wonderful new class of AI architectures become available. Just as no-code/low-code software development has economic appeal, so too should the design possibilities of the DALL-E class of machines. Although there are and undoubtedly will continue to be efforts to ban AI-generated art, with the so very noble and virtuous intent of protecting artists, such well-meaning efforts will ultimately fail. Protectionist schemes for existing avocations in the wake of emerging technologies always fail. Efficiency and increasing productivity are the arc of progress toward which we endeavor. Aren’t we all in agreement, at the end of the day, that reducing strains on limited resources is a very good thing? The extent to which any existing avocation or occupation can be made less resource hungry should be the key measure of viability, especially in these interesting times in which we live. No occupation deserves protection against technology. And no occupation will be.
We have historical evidence for this.
Just think of all the once commons jobs that have disappeared in the past 50 years. Elevator operators, pin setters at bowling alleys, switchboard operators, cashiers, travel agents, many bank tellers, factory workers, warehouse workers…all these and more have either been eliminated entirely or reduced to a fraction of the workforce they once were compared to previous times. All eliminated by technology and automation, empowered and made possible by software. No job is safe. Were it not so, we would not have such common tools as word processors, reservation systems and online calendars today, the emergence of which outdated the professional aspirations of many the secretary when many of the daily tasks and skills were made obsolete by inventions such as those created by Evelyn Berezin in 1969. Hundreds of thousands of jobs were impacted by the emergence of these inventions. Did anyone protest? Were there governmental policies put in place to protect the jobs of secretaries? I know of none.
And so now, are artists the next avocation to suffer from the onslaught of ever-encroaching technology? Some concerns are already being expressed. Alas, must we protect the poor digital artist whose livelihood may now be threatened by digital automation?
While I am sure some well-meaning voices will be heard in coming years, bemoaning the advent of AI art, as with so many bottles opening with each passing year, the genies are well out and disinclined to get back inside.
Deep fakes are here to stay. So is misinformation, or disinformation, or whatever else we wish to dispute as non-conforming, anti-narrative propaganda. Our technologies have given rise to so many awesome possibilities. How we use that magic is the measure of our collective character.
As a data scientist I have a keen interest in the evolution of computational tools that allow us to better gather, manipulate, visualize, and understand data. What is commonly called Artificial Intelligence has been the focus of my attention for over a decade now. Natural Language Processing (NLP) is one of the most interesting facets of that study, a field I have come to enjoy; building actual customer applications using NLP has been quite professionally satisfying over the past few years. And from the intersection of NLP and image processing we are finding this emerging new field of exploration emerging: digital generation of art.
In my copious spare cycles I’ve been experimenting with several ML models on my home server (hats off to the wonderful folks at ExxactCorp from whom I purchased an awesome Machine Learning Workstation a few years ago), and have managed to get modified versions of Stable Diffusion, Disco Diffusion, and Lucid Sonic Dreams running at home. Adding these developments with my surprisingly expensive hobby of music production, I’ve managed to create a couple of music videos recently that dance, as it were, to the beat.
If you have a few minutes and care to indulge my forays into entertainment as my nom de scène Jaxon Knight, have a look at the video links for The End of the World Again and The Hate Song, both of which were creating using SOTA image processing elements referenced above. Both videos are intended to be rather tongue in cheek, so please view accordingly.
Leave a Reply