Blitzscaling Creativity with DALL-E

Blitzscaling Creativity with DALL-E

  • DALL-E amplifies human creativity and increases the impact and value of visual professionals across a huge range of industries.
  • From Leonardo Da Vinci to Andy Warhol, great artists have always utilized apprentices and assistants to help fulfill their creative visions. DALL-E is a highly accessible AI assistant that makes it easy for everyone to tap their inner Leonardos. 
  • Visual expression can't exist without technology. Great artists have always been great  innovators. If groundbreaking artists like Da Vinci, Pablo Picasso, Georgia O'Keefe, and Frida Kahlo were alive today, I'm sure they'd be experimenting with DALL-E.

From the very first cave painting, image-creation has had an exponential impact on humanity's capacity to communicate knowledge; express feelings, values, and aspirations; and work collaboratively. That's why we say a picture is worth a thousand words. 

But image creation can also be slow work. Leonardo Da Vinci filled thousands of notebook pages with his sketches, drawings, and writings, but fewer than 20 surviving paintings are, in the words of the Encyclopedia Britannica, "definitively attributed to him." 

What if he'd had an AI-enabled computer to assist him? What if millions of other people had access to this same tool?

"Press the button – we do the rest," an early Kodak ad exclaimed. By 1900, the Kodak Brownie could be had for $1, a roll of film with six exposures cost a dime, and photography had shifted from a narrow domain of skilled professionals to a much broader one of amateurs spontaneously documenting the world as they saw it.

No alt text provided for this image

For the first time ever, photography could take place at massive scale, and this completely reshaped how we establish truth, what we value, how we convey news, how we memorialize private and public events, and so much more. 

120 years later, with the emergence of AI-driven image-creation platforms like DALL-E, Imagen, and Midjourney, we're on the verge of experiencing perhaps the most revolutionary press-the-button shift since the days of the Kodak Brownie. 

A GRAPHIC GLIMPSE OF THE FUTURE, TODAY

At this point, we routinely use apps and platforms that rely on machine learning, neural networks, and other AI technologies to improve our lives in easy-to-overlook ways, whether it's Google or Bing using these technologies to make our search results more relevant, banks using them to make smartphone check deposits possible, or Waze using them to help streamline our commutes.

What's so compelling about DALL-E and its peers is how they literally illustrate AI's progress, power, and future potential in a hands-on, highly visible way.

Type "An astronaut in deep space taking a selfie" into DALL-E's text box, and within seconds, DALL-E produces four images depicting this basic concept.

No alt text provided for this image

Often the results are amazing. Even when they're not, the process still feels magical.  Essentially, we now have Google Search for humanity's collective consciousness: If you can dream it, and effectively describe it, DALL-E can likely depict it.

So what happens as DALL-E and similar technologies continue to improve, and millions of people gain access to these extraordinary new tools for human creativity and communicative power?

While some worry that applications like DALL-E spell doom for human creativity, and especially for professional visual creatives, I see the opposite happening. When artistic virtuosity and extreme productivity become the norms, the human creativity it takes to make work stand out in an environment of accomplished abundance only increases in value. 

To see this in action, consider the DALL-E images that go viral on Reddit and Twitter. They typically either incorporate a strong thematic concept, like showing how famous artists might have depicted WALL-E. Or they employ some novel production technique to achieve a clever end, like using DALL-E to create images for use in a stop-motion video.

In both cases, DALL-E does the drudge work, but human creativity and ingenuity is the spark that makes the work truly stand out. 

As a board member at OpenAI, the organization that developed DALL-E, I engaged with DALL-E during its beta-testing phase. As much as I've enjoyed experimenting with the technology myself, what's really struck me so far is how many artists, architects, fashion designers, and other  visual creatives are eagerly embracing  DALL-E and incorporating it into their practices and workflows.

A landscape design firm is using it to rapidly visualize different directions on a backyard renovation. An animator is quickly generating assets for videos. A web developer is making 3D objects for a landing page. A writer is using it to create visualizations of characters and locations, which he then uses to create more detailed descriptions in his book. A director has used it in creating an AR filter for Instagram that lets viewers walk through her drawing. An artist has used it to visualize potential designs for a site-specific installation.

In short, DALL-E is a revolutionary addition to virtually any creative professional's toolset – not just a bicycle for the mind, as Steve Jobs often described personal computers, but actually a private jet or even a rocket ship for the mind. That's how much DALL-E can accelerate and amplify human creativity. That's how much it can help professional creatives explore ideas faster and execute their visions more productively. 

It is the greatest artistic opportunity we've seen in more than a century, with massive potential utility for product designers, interior decorators, jewelry-makers, furniture makers, hair stylists, movie directors and cinematographers, theater professionals, food stylists and chefs, home stagers, videogame developers, art directors, graphic designers, and so many other professional visual creatives.

While AI's impact on the future of work is frequently framed in an overtly oppositional way – i.e. a "race against the machines" – DALL-E is a great example of why that framing is reductive. How humanity and AI will both evolve is a hugely important and complex subject that I'll explore in more detail in future writing, but for now, I'll just say that DALL-E shows how racing against the machines is hardly our only option. We can dance with them too, using AI collaboratively and synergistically, in ways that radically amplify and extend our human skills and capabilities. 

Think of Leonardo Da Vinci, who had trouble finishing artworks precisely because he was so good at starting them. Equipped with DALL-E and a pen-based computer like the Microsoft Surface Studio, a 21st century Leonardo would have tools that could keep up with the speed of his mind, enable him to explore more of his ideas more quickly and thoroughly, and ultimately bring more of them to fruition.  

No alt text provided for this image

QUICK ON THE DRAW

Type in a well-constructed text prompt, or maybe even just a fairly arbitrary one, and DALL-E might respond with a professional-caliber image in a matter of seconds. Such is DALL-E's power that there's always a good chance you'll hit a hole in one, even if it's the first time you've ever stepped on a golf course.

No alt text provided for this image

More common, however, is that it will give you something you like, but which could also use further iteration. Luckily, DALL-E's capacity to deliver artistic hole-in-ones is only part of its power. Equally important is its inexhaustible speed and innovation. 

Once you enter a prompt in DALL-E's text box  and hit the "GENERATE" button, it currently takes DALL-E roughly 15 seconds to generate four corresponding images. 

Here's what it produced from the prompt, "A high-resolution photo of a golfer celebrating a hole in one."

No alt text provided for this image

If you don't like any of them, just hit the "GENERATE" button again and it will give you four new variations on your theme. 

You can also tweak the wording of your text prompt in various ways to see how that affects what DALL-E produces. I'll go into more detail on this below, but for now, here's what the prompt "Art Deco poster art of a golfer celebrating a hole in one produced:

No alt text provided for this image

Finally, you can use DALL-E's editing brush to erase specific parts of an image, and DALL-E will generate four new images where only the parts you've erased change. 

For example, the golfer's face in the image on the left below didn't turn out so great. But you can simply erase it and have DALL-E try again: 

No alt text provided for this image

So DALL-E doesn't just increase your chances of hitting a hole-in-one. It also allows you to play ten, a hundred, or even a thousand rounds of golf in a day. And on every hole, it gets you a lot closer to the green than you'd get using traditional methods.

 DALL-E WHISPERING

As fast and simple as DALL-E makes image generation, it also rewards whatever existing skills you bring to the table. Because DALL-E is at heart built on a massive natural language-processing model, how you construct your text prompts has a major effect on the images DALL-E produces.

As I suggested above, even small changes in text prompts can produce significant differences in the images DALL-E generates. In fact, users already describe the process of creating effective text prompts as "DALL-E whispering" and "DALL-E engineering." 

This involves using phrases that steer DALL-E toward specific visual looks and effects, whether that's expressed in terms of mediums to simulate, concrete physical details and conditions to apply, or artists and styles to emulate. Here are some examples:

A MAN WALKING A DOG BY THE BEACH

No alt text provided for this image

A CHARCOAL SKETCH OF A MAN WALKING A DOG BY THE BEACH, BY PICASSO

No alt text provided for this image

A VERY HIGH-RESOLUTION IMAGE OF A MAN WALKING A DOG BY THE BEACH, BATHED IN BRIGHT SUNLIGHT, IN THE STYLE OF AN ART DECO-ERA TRAVEL POSTER

No alt text provided for this image

Thanks to DALL-E's reliance on language, anyone with existing expertise in visual concepts, styles, techniques is already well on their way to DALL-E fluency.  

If, for example, you're a photographer, then you probably have deep knowledge about film stocks and camera types and the effects they produce. If you're a movie director or art director, then you already have a lot of experience developing and communicating ideas for visual concepts through words. 

Another key aspect of DALL-E whispering involves developing a good feel for how many different objects, figures, and other attributes you can include in a prompt before you start overloading DALL-E's capacity to deduce your intent. 

For example, see how DALL-E handles the prompt "A brown dog wearing a golden crown barks at a striped cat wearing a black top hat."

No alt text provided for this image

While DALL-E gets it right in two of the four iterations, it mixes up the headwear designations in the others. As you add additional specificity and complexity to prompts, you can start to overwhelm it.

In professional situations, of course, a high degree of precision is not just desirable but necessary. The client for a print ad campaign may have very specific expectations about the color palette an ad's imagery should incorporate. Even in the early stages of ideation, an architect may want to explore a skyscraper that's exactly 50 stories high, not just "very tall."

But for the moment at least, DALL-E is not a very exacting tool, especially compared to traditional image-editing applications like Photoshop or Illustrator. But it's also easy to use DALL-E in conjunction with such tools to get around some of its current, early-stage limitations. 

For example, you might ultimately have a fairly complex scene in mind, one that involves multiple figures and objects with specific attributes, in a panoramic setting that's much wider than DALL-E's only picture-size option, which is 1024 x 1024 pixels.

In such instances, you can use DALL-E to quickly "manufacture" components and parts for your scene, then use them in conjunction with other apps to create the exact scene you want. Here's an image of a Tesla Roadster speeding down a Mars highway that stitches and layers together multiple DALL-E generations into a single scene: 

No alt text provided for this image

As DALL-E and similar AI image-creation systems grow increasingly effective, we will likely see the same kinds of innovations and transformations we saw with the emergence of desktop publishing, the World Wide Web, digital music, and YouTube. 

While all these systems made it easier to create and distribute content, society as a whole never said, "Okay, that's too much content!" Instead, our collective appetite simply continues to increase, and we consistently see the emergence of additional new services and platforms to serve it, like Netflix, Instagram, Spotify, and TikTok, to name just a few.

As all this has played out, some jobs went away, others transformed, and many entirely new jobs were created. When I graduated from college in 1990, jobs like "web designer," "SEO strategist," and "data scientist" didn't exist. When my co-founders and I launched LinkedIn thirteen years later in 2003, none of our users had jobs like "social media manager," "TikTok influencer," or "virtual reality architect" yet.   

With DALL-E and similar technologies, we're going to see similar impacts on industry trends and overall work patterns and career paths. The companies, professions, and individuals who figure out how to incorporate these new tools into their workflows in the most innovative and productive ways will fare best. The ones that don't adapt in strategic ways will struggle to maintain their relevance and competitiveness in a changing marketplace. 

So while DALL-E may seem like a novelty or niche product for technophiles at this stage in the game, ignoring it now is like ignoring blogging in the late 1990s, or social media circa 2004, or mobile in 2007. Very quickly, it's going to become increasingly essential for all visual creatives, a primary driver for new opportunities, new jobs, and new ways of expressing human creativity. Developing skills and competencies in it now will yield benefits for years to come.

 DALL-E FOR THE REST OF US

At the same time that DALL-E is an incredibly powerful tool for professional visual creatives, it's equally useful for more casual users. 

There are many scenarios where incorporating imagery into a product is useful but not mandatory, and in these kinds of situations, precision and control over the final imagery is less important. 

Think of a marketing professional creating a Powerpoint presentation or a web page announcing an upcoming event. There are unlimited utilitarian use cases out there where the need for imagery is not so strong that a budget or the time costs of working with a professional artist or designer are warranted. Typically, people end up using stock imagery or imagery they create themselves in such instances, or they simply forgo imagery altogether. 

Consider, for example, how the tech journalist Casey Newton recently tweeted how he's been "taken aback at how good #DALL-E has been for making illustrations for [his] daily columns." As Newton noted in a follow-up tweet, it wouldn't be feasible for him to commission original artwork on a daily basis for his column – he's an independent journalist publishing five times a week via Substack, so just from a time management perspective alone, much less a financial one, commissioning daily original artwork would likely be impractical. In fact, even licensing stock photos probably wouldn't warrant the fees and effort. But with DALL-E, Newton can add a graphic component to his work almost instantly.

In the thread of replies that Newton's initial tweet generated, another user suggested that Dungeons & Dragons dungeon masters could potentially use the app to generate "decent illustrations of any scenario [they] want to put [their] players in." When I used to play dungeon-master for my RuneQuest (D&D variant) group, I certainly would have!  A third explained how he'd just used the similar AI generation tool DALL-E Mini (now called Craiyon) to create an album cover image for an EP he's releasing. (Despite the similarity of its original name, Craiyon is not affiliated with DALL-E 2 or OpenAI.)

When I see threads like that, it reminds me of the early days of Web 1.0, when every new innovation prompted ten more ideas to expand the capabilities of the platform. I think DALL-E is going to emerge as similarly generative, where all levels of users are constantly experimenting and finding new uses for DALL-E imagery and new methods to create and modify it. 

It's possible that its biggest impact may actually occur in the most informal, spontaneous, and ephemeral contexts, like email, texting, and social media. 

Indeed, think of how much time we spend now communicating with people in mediated ways.

No alt text provided for this image

There's texting, emailing, chatting, video-conferencing, streaming, posting to message boards, updating statuses on Facebook, Linkedin, Twitter, and many other social media platforms. 

Every once in a while, we even make a good, old-fashioned phone call. To enable us to maintain such high levels of connection, we've seen many innovations in communications technologies over the last several decades around the basic idea of making written language more visual. 

Along with the traditional characters in our alphabets, we can now draw upon an endless supply of emojis, animated GIFs, and smartphone photography to make our written utterances more information-dense, more nuanced, more emotional.

No alt text provided for this image

DALL-E continues that tradition. In what already exists as a golden era of expression and interpersonal communication, DALL-E further unleashes human creativity, giving users of all levels new superpowers to convey their thoughts and emotions.

At some point soon, I believe, we'll start seeing DALL-E buttons show up in the interfaces of apps like Twitter, in the same way we see IMAGE, GIF, and EMOJI buttons there now. And thus even the most ephemeral human utterances will be punctuated and appended with original high-quality images that could once only be found in strictly professional/commercial contexts because of the time and cost it once took to create work of such high quality. 

Even at the start of DALL-E's beta period, when only a few thousand users had access, the creativity on display was astounding. Now that OpenAI has extended access to 100,000 users, with plans to add up to a million as soon as possible, a new visual world awaits us! 

No alt text provided for this image
Victoria Dazin

Master of Public Administration at Tel Aviv Medical Center

7mo

Thank you for sharing

Gary Zamchick

Innovation strategy consulting, Co-founder WordsEye

1y

It’s not only drudgery that’s stripped from art-making with applications like DALL-E. It also is wonder and mastery. After my son got home from his kindergarten class, we’d sit down together side by side and “draw out” what he did in school that day. The parellel nature of the discussion worked really well, and I believe, unlike many kids who might mumble their way way through, what did you do in school today?” he shared every detail he could think of as the adventure he described on the paper before him unfolded. And it was an adventure: the happenings during recess proved far more compelling than the Montessori projects he was mastering. Since then my son has worked on immersive theatrical experiences, from a 50,000 square foot Peter Pan experience in Beijing, to giant magic shows in Las Vegas. AI Gen art apps would have done nothing to enable the kinds of procedural compositions, prompted and constrained by our on-going discussion, he worked up as a five year old to tell the human story that was emerging at school. “The large boulder blocked the mighty pirate from crossing the ocean” would zap whatever wonder and mastery he felt and shared with his dad. The algorithm should not have all the fun.

Are illustrators thrilled? Can it be said they can now move onto higher value more creative work? Exciting and at the same time challenges our preconceived notions of creativity as being purely a human attribute.

Vahid Hejazi

Material Development Scientist | Engineer

1y

Great article and interesting innovation! Thanks for sharing, Reid.

Darshan Chauhan

Founder at Durvasa Infotech | SEO Consultant | Social Media | Performance Marketing | Web development | Influencer Marketing

1y

Great Tool #dalle by #openai I personally used that tool and it's future of AI based Image Generation. Kudos to OpenAI

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics