During our Boozy Browsing Podcast (check it out at https://youtube.com/@boozybrowsing/) Matt and I decided to take on a redesign challenge.
We wanted to test out vibe coding as much as possible to see how far we could get.
Here are the original and final versions of the redesign.
The original website

Final redesign

Spoiler alert!
We had to intervene and add manual development work at the end for the final version. However, it’s interesting to see what process we took and what bottlenecks we encountered during this journey.
Here were the rules for the challenge:
- Use any LLMs or other AI dev tools to create a new design for the manufacturing website we selected
- Spend less than 3 hours total for the entire implementation
- Keep the brand colors and navigation from the original page
Simple enough, right?
Starting point
I wanted to test different LLMs and see which ones would deliver outcomes closest to my intentions (well, requirements).
Here’s a focused look at the site redesign process using the LLMs we tested.
The redesign process began by testing three major LLMs: Gemini, ChatGPT-3, and Perplexity, along with a separate vibe coding tool, Bolt.
1. Gemini: The Feature Integrator
Gemini provided a visually appealing starting point, offering both light mode and dark mode options.
- Successes: The color palette suggested by Gemini was generally good. Most notably, Gemini facilitated the easy integration of a useful AI feature: a “build your perfect load” functionality that communicates with AI to provide recommendations. This feature was discovered by accident and was simple to implement.
- Drawbacks: The resulting site had an unappealing giant font. Furthermore, attempts to tweak the design and request specific images failed, as Gemini wasn’t really grasping what I was trying to achieve.
2. ChatGPT-3: The Rounded Corner Trap
ChatGPT-3 produced images that were all very decent. However, the design included elements I wasn’t a huge fan of, particularly the prevalence of rounded corners everywhere.
- Functionality Note: While ChatGPT-3 did most of what was needed, the site it generated was initially broken and required fixing to reach the correct sections.
3. Bolt: The Branding Failure
A separate test was conducted using Bolt, which demonstrated the difficulty LLMs have with specific design goals.
The results from Bolt were severely lacking. It sucked.
The tool failed to even attempt proper branding, generating very unimpressive initial versions. Due to the poor quality of the first few iterations, I decided not to invest more effort in setting up migration or further iteration with Bolt.
Perplexity: The Struggle for Specificity
Ultimately, Perplexity came closest to what I envisioned. The final version was built upon Perplexity’s output, combined with cherry-picked elements from other attempts and applied manually using the development tool Cursor. However, there were some significant challenges when using Perplexity.
Here they are:
- Generic Start and Unwanted Features: The first version generated by Perplexity was familiar and “very generic,” featuring standard icons. While it achieved the desired base color, it included the unwanted rounded corners (again).
- The Branding Breakdown: I attempted to give Perplexity specific branding references, such as Black Rifle Coffee and Mud Water, aiming for a more “muscular” aesthetic to cater to a specific audience. The goal was specifically to capture the styling of these two brands.The goal was specifically get the styling of these two brands.
- The LLM picked up irrelevant and weird content from the references
- The site lost all images
- In the most extreme example, the site broke and essentially transformed into a hunting site (why? I don’t know!), which was completely irrelevant to the manufacturing product
- Functional Mismatch and Iteration Frustrations: Later versions became cleaner but exhibited an interesting mix of colors. Crucially, the AI transformed the design into an e-commerce shop, even though the instructions explicitly stated that e-commerce functionality would only (and maybe) be introduced later.
- The AI’s Inability to Resist Change: The iterative process demonstrated that the LLM can’t help but re-engineer components. For instance, when I requested that the AI simply restore the correct images (which had been lost when I asked for a color change), the AI restored the images but changed pretty much everything else. There was also a significant struggle to get the precise dark mode color, even after providing the list of hex colors—it still failed to get it right.
The Final Strategy: Developer Control and Hybrid Tooling
The experience confirmed that relying solely on one LLM for a complex task like a full site redesign and brand establishment is likely not the best approach. For the final version, I ended up with a hybrid strategy to get it done.
The final process looked like this:
- Cherry-Picking and Selection: I ended up taking elements I liked from the various attempts, such as specific colors or image placements, and discarded the rest.
- Manual Application: Since the LLM couldn’t achieve the specific dark mode color I desired, I updated the CSS and branding colors manually.
- Specialized Development Tools: The final changes were applied using Cursor. While it has a coding assistant feature, the combination of manual updates as well as direct instructions on specific sections of the code was needed to achieve the desired outcome—essentially fine-tuning the design. The development work, including potentially building migrations, might be better done in specialized environments like Cursor or using Copilot.
The resulting final version was significantly more sleek and modern than the original site. It features the correct logos, includes product filtering functionality, and utilizes the dark mode color that I manually selected and applied.
If you’re a visual person and would like to see the details around the iterations, watch it here
The whole process probably took about 2 to 2.5 hours. More time was spent correcting the issues these LLMs created along the way, though.
Here is the quick recap of the process.
Quick Recap
Here’s what we learned from testing four different AI tools for website redesign:
- Gemini offered good color palettes and accidentally delivered a useful AI feature, but struggled with font sizing and understanding specific image requests
- ChatGPT-3 produced quality designs but leaned heavily on rounded corners and required debugging to function properly
- Bolt completely failed at branding and produced unusable initial versions
- Perplexity came closest to the vision but struggled with maintaining consistency across iterations, often changing unrelated elements when asked for simple fixes
Time investment: 2 to 2.5 hours total, with significant time spent correcting AI-generated errors.
Final approach: A hybrid strategy combining Perplexity’s output with manual CSS updates and Cursor for specialized development work.
Conclusion
Vibe coding won’t create a complete, production-ready site yet unfortunately. Human oversight is required.
The biggest lesson? Sometimes it takes longer to fix AI-generated errors than to start from scratch. When LLMs misunderstood branding and over-engineered simple requests, the time savings disappeared quickly.
The sweet spot is using AI for rapid prototyping and exploring possibilities, then switching to traditional development tools for refinement. Cherry-pick the best elements, discard what doesn’t work, and manually code the details that matter most.
Until these tools better understand context and stop over-engineering, the hybrid approach is the strategic balance needed.
Overall, it was a fun experiment though. Watch the full episode.