Exploring different LLM as assistant for web/app UX

Stephen Cow Chau
11 min readAug 5, 2023

Background

I am facing a lot of web and app that provide poor UX, especially they are difficult to be used for those who are not familiar with using web/app (e.g. elderly like my mum who don’t even get familiar with usual web/app navigation or what icons they represent)

There have been years (pre-LLM era) I believe one day we could move on to natural language as command interface (a.k.a chatbot with voice to text), that didn’t work out, and I changed my mind lately, I think a voice assisted visual UI is the way to go.

Imagine, with a e-Menu to order food, instead of scrolling and clicking on different categories, one can ask the assistant “do you have carrot cake”, it either navigate to the detail page of carrot cake or to the category if it’s ambiguous. Instead of giving you the answer in a chat interface, we are still using the web/app UI/UX, but without really need to learn the UI and navigation before the first time using it.

In start up era, idea is cheap (important is the execution), so I am going to share the idea and some testing here in the article.

Objective of the test

I would like to explore how different LLM could conform to instruction and then able to provide some predefined output format (so that the web/app can consume and act).

The evaluation is just impression based and being very subjective

LLM candidates and interface

  • ChatGPT (gpt-3.5-turbo) on Poe
  • Claude-instant (9k token) on Poe
  • Llama-2–70b on Poe
  • Bard (on https://bard.google.com/)
  • HuggingChat v0.4.0 (with model OpenAssistant/oasst-sft-6-llama-30b)

Test date

The test date is 2023–08–05, this is important to state as the models could change on different platform.

The Prompts

Given the information is long, I am breaking them down into multiple prompts with instruction to the LLM not to reply (e.g. complete sentence or reply as chat)

This set of prompt is a version 2, I have had another set that tested maybe 2 months ago which result in a slightly different impression for some LLMs.

The first prompt is to define potential navigation target:

System: I am having this web app, there are following pages and information:

- Home page (/), having a list of products and navigation to other pages
- Shopping Cart (/cart), a listing page of showing what user intended to check out and buy
- Product page (/product), a page that show all products, allowing user to add items to shopping cart, also this would allow a filter at query string like /product?color=black,red&type=dress
- Product detail page (/detail), the product detail page that could take a query string pid, which is a string, eg /detail?pid=shoe1
- Q&A (/questions), a page that list some frequently asked questions, each of the questions would have an anchor link
- Contact us (/contact), a page that have a form for user to fill in questions and feedback

I am going to pause here and would provide you additional information in next message, you just need to reply noted.

The second prompt here list out some products we would be using in the following prompts:

System: now I am listing out our products in JSON format:

[{
"name": "Classic Black Pumps",
"description": "These classic black pumps are a must-have for any young office lady. They are made from high-quality materials and are designed to be both comfortable and stylish.",
"available_sizes": ["US 6", "US 7", "US 8", "US 9"],
"colors": ["black"]
}, {
"name": "Strappy Nude Sandals",
"description": "These strappy nude sandals are perfect for summer days at the office. They feature a comfortable footbed and adjustable straps for a perfect fit.",
"available_sizes": ["US 6", "US 7", "US 8", "US 9"],
"colors": ["nude"]
}, {
"name": "Chunky Heel Loafers",
"description": "These chunky heel loafers are both comfortable and stylish. They are perfect for days when you need to be on your feet for long periods of time.",
"available_sizes": ["US 6", "US 7", "US 8", "US 9"],
"colors": ["brown", "black"]
}, {
"name": "Floral Wrap Dress",
"description": "This floral wrap dress is perfect for summer days at the office. It features a flattering wrap design and a comfortable fit.",
"available_sizes": ["US 2", "US 4", "US 6", "US 8"],
"colors": ["yellow", "blue"]
}, {
"name": "Office-Ready Shift Dress",
"description": "This office-ready shift dress is perfect for days when you need to look professional but still want to be comfortable. It features a flattering fit and a classic design.",
"available_sizes": ["US 2", "US 4", "US 6", "US 8"],
"colors": ["black", "navy"]
}, {
"name": "Sleeveless Midi Dress",
"description": "This sleeveless midi dress is perfect for summer days at the office. It features a comfortable fit and a stylish design that will make you stand out.",
"available_sizes": ["US 2", "US 4", "US 6", "US 8"],
"colors": ["pink", "green"]
}, {
"name": "Silk Blouse",
"description": "This silk blouse is perfect for days when you need to look professional but still want to be comfortable. It features a flattering fit and a classic design.",
"available_sizes": ["XS", "S", "M", "L"],
"colors": ["white", "black"]
}, {
"name": "Sleeveless Blouse",
"description": "This sleeveless blouse is perfect for summer days at the office. It features a comfortable fit and a stylish design that will make you stand out.",
"available_sizes": ["XS", "S", "M", "L"],
"colors": ["blue", "pink"]
}, {
"name": "Striped Button-Up",
"description": "This striped button-up is perfect for days when you need to look professional but still want to be comfortable. It features a flattering fit and a classic design.",
"available_sizes": ["XS", "S", "M", "L"],
"colors": ["blue", "white"]
}
]


I am going to pause here and would provide you additional information in next message, you just need to reply noted.

The third prompt provide information on Q&A:

System: Below are question and answers (Q&A) for our store:

Q: What is your return policy?
A: We offer a 30-day return policy for all items purchased from our store. If you are not satisfied with your purchase, simply return it in its original condition within 30 days of receiving it for a full refund or exchange.

Q: How do I know what size to order?
A: We provide detailed size charts for all of our products to help you find the perfect fit. Simply click on the "Size Chart" link located next to the product description to view the size chart.

Q: How long does shipping take?
A: We offer both standard and expedited shipping options. Standard shipping typically takes 5-7 business days, while expedited shipping takes 2-3 business days. Please note that shipping times may vary depending on your location.

Q: What payment methods do you accept?
A: We accept all major credit cards, including Visa, Mastercard, American Express, and Discover. We also accept payment through PayPal.

Q: Do you offer gift wrapping?
A: Yes, we offer gift wrapping for an additional fee. Simply select the gift wrapping option at checkout and we will wrap your items in high-quality wrapping paper and include a personalized message.

Q: Can I track my order?
A: Yes, once your order has shipped, you will receive a tracking number via email. You can use this tracking number to track your package online and see its estimated delivery date.

Q: What if an item is out of stock?
A: If an item is out of stock, you can sign up to be notified when it becomes available again. Simply click on the "Notify Me" button on the product page and enter your email address. We will send you an email as soon as the item is back in stock.

Q: Do you offer international shipping?
A: Yes, we offer international shipping to select countries. Shipping rates and delivery times may vary depending on your location.

Q: How do I contact customer service?
A: You can contact our customer service team by emailing us at customerservice@fashionstore.com or by calling us at 1-800-555-5555. Our customer service team is available Monday through Friday from 9am to 5pm EST.

I am going to pause here and would provide you additional information in next message, you just need to reply noted.

The fourth prompts define how the LLM should reply, focusing on query product:

System: here are the instruction on how you are going to help customer to use our web app, for any input customer said, e.g.

Customer: i would like to see the product catalog

Your would be forming response like:
{
"intent": "navigate",
"action": [{
"function": "route",
"param": "/product"
}
]
"message": "let me bring you to product page"
}

The idea above is to identify the intention as navigate and then provide the action parameter as a function name "route" and the target in param of function "/product"

Another example:

Customer: Do you have pump with size 7?

Your response could be:

{
"intent": "find_product",
"action": [{
"function": "search",
"param": {
"type": "shoe",
"subtype": "pump",
"size": "US 7",
}
"onFunctionResult": {
"condition": "product match count 1"
"function": "navigate",
"param": "/detail?pid=shoe1"
}
}
],
"message": "oh yes, we do have a black pump with size US 7, I am now navigating you to the detail page of product"
}

The idea above is to identify the intention as customer want to find product and then provide the action parameter as a function name "search" and the target in param of function with the search criteria, and this function would return somre result, we would check the seach result condition, in the example above there is only 1 result, so the followup action is navigate and target is /detail?pid=shoe1

Another example:

Customer: Do you have any shoes with size 7 and color black or grey?

Your response could be:

{
"intent": "find_product",
"action": [{
"function": "search",
"param": {
"type": "shoe",
"size": "US 7",
"color": ["black", "grey"]
},
"onFunctionResult": {
"condition": "product match count more than 1"
"function": "navigate",
"param": "/product?type=shoe&color=black,grey&size=7"
}
}
],
"message": "oh yes, we do have some black shows with size US 7 and color black, I am now bringing you to the product catalog with matching products"
}

The example compare to the previous one, is the search function having 2 color criterion as parameter, and the search function return more than 1 result and so the following action is not directing to detail page but to listing page "/product?type=show&color=black,grey&size=7" which is passing the search param into the query string


Another example:

Customer: I would like to check out what I added

Your response could be:

{
"intent": "navigate",
"action": [{
"function": "route",
"param": "/cart"
}
]
"message": "Please to bring you to your shopping cart for checking out"
}


I am going to pause here and would provide you additional information in next message, you just need to reply noted.

The final prompt is about other navigation target like Q&A or contact.

System: here are some additional instruction on how you are going to help customer to use our web app

Customer: I have purchased a dress here last week, can I know how when I am expecting the delivery or any way I can track?

Your would be forming response like:
{
"intent": "Q&A",
"action": [{
"function": "route",
"param": "/questions#3"
}
]
"message": "we have a standard shipping and typically take 5-7 business days and we do have a tracking number sent to you via email upon your confirm of purchase, for more detail, please see the Q&A where I am navigating you to"
}

Not that in above, the question matched 2 of our Q&A, which are the 3rd item “How long does shipping take” and the 6th item “can I track my order“, so the message is a short summary of the 2, and the /questions url anchor to the #3 which is the more relevant question

Kindly note that for Q&A, it’s very important to stick to the information provided earlier and if there is no relevant Q&A, you might answer you don’t have such information at hand and suggest to ask user to go to the feedback contact us form

Customer: I have purchased a dress here last week, and I used the tracking number provided but it’s not found, what can I do?

Your would be forming response like:
{
"intent": "Q&A",
"action": [{
"function": "route",
"param": "/contact"
}
]
"message": "For your request, please fill in the enquiry/feedback form here and our team would contact you to assist"
}


This is the final instruction for now incoming message would be from customer, you just need to reply noted.

Some test area

Customer: morning, do you have any dress with size US 8?

The above test case expect the output of JSON with function and param as in example

Customer: And does the dresses have red color?

The above test case expect a context of size US 8 dress parameter is being kept and add red color

Customer: how about any dress with color grey or blue?

The above test case try to see if the multiple criterion on color is being identified

Customer: I want to see what I have added to my shopping cart

The above test case test navigation to shopping cart

Customer: Do you ship to Brazil?

The above test case test navigation to the Q&A as well as does it understand numbering in the URL and the order of Q&A information provided (the target navigation URL with question#{x})

Customer: how can I know the status of my order?

The above is yet another test of Q&A

Results

Bard (bard.google.com)

I believe this Bard model is tune towards a chat usage (instead of following instruction)

It does not conform to the expected JSON output nor the do not reply instruction in between instruction / information prompts, it does reply by summarizing the prompt (how it understand the input).

The hallucinations behavior is pretty bad as well.

One great thing about Bard is the generative image, like following:

See complete conversation: https://g.co/bard/share/7bcd027d134c

HuggingChat

I remember back then, HuggingChat was the best model that conform to JSON output, but the latest trial is completely unable to conform to JSON output.

It do follow instruction not to reply / complete sentence in between instruction / information prompts.

Regarding hallucinations, it refused to work in query product test cases, while for Q&A, it is a 50–50, I believe it does conform to direction but it hallucinate the detail.

See complete conversation: https://hf.co/chat/r/lUzg7cm

Llama-2–70b

I think it does conform to JSON output as well as do not reply in between instruction / information prompts. But it does have a failure case at last when I mistakenly missing the prefix “Customer: ”

This LLM hallucinate in a different dimension, it do conform to expected JSON format but it create new target function and param output. This might be an indication I need additional instruction to make the output limit to a set of values.

See complete conversation: https://poe.com/s/rCI4PjnjCiWAbhclPwP4

ChatGPT (gpt-3.5-turbo)

I think ChatGPT is performing very well, it do follow instruction well (the no reply in between instruction as well as the JSON output). Also the JSON for query is more close to my examples.

See complete conversation: https://poe.com/s/2ufWtR0MCTxUilwS6c2b

Claude-instant (9k token)

This one is surprisingly good, the conform of JSON format as well as the Q&A, even it get the international shipping Q&A (ship to Brazil) wrong (different from my expectation, not merely wrong)

See : https://poe.com/s/iQPLnLBjMEWWYaY41fZy

Conclusion

I think Claude and ChatGPT is clear winner in my test scenario

I believe the prompt engineering could be better, but on the other hand, if the LLM is so rely on / sensitive to prompt engineering, they are still not ready for a mass population usage (as I don’t think we should require everyone to perform “extensive” prompt engineering in order to use LLM)

WRITER at MLearning.ai // AI ART DISCORD🗿/imagine AI 3D Models

--

--