DAY-1 PLAN ·Client Service Analyst — (Senior) AI Chatbot·Kuala Lumpur

ⓘ Independent job-application page by Edward Tay. Not affiliated with, endorsed by, or operated by Bybit. "Bybit"/"Bybot" are referenced for the role I'm applying to; all analysis is my own from public information.

Bybot is already strong. Here's what I'd do to own its next chapter.

Bybot handles the everyday path well — guided flows, security escalations, 中文 support. The role isn't fixing a broken bot; it's owning the part that's never finished: the free-text long tail, EN/中文 parity as the product changes, and the QA discipline to improve a live bot without breaking what already works. Everything below is what I'd do, grounded in how Bybot behaves today.

What I'd do, day 1 ↓ See what I'd ship The role ↗

Edward Tay · edwardtay.comEN / 中文 / BM5+ yrs CS↔eng + AI-QACompTIA Security+

The one-paragraph version

A strong bot stays strong only if someone owns the loop that keeps it that way: read the fallbacks → find the weak intent → fix the KB or routing → prove the metric moved → don't regress it. That's what I'd do at Bybit — and I don't just describe it: I built a working bilingual support bot, then ran this loop on it live and shipped five regression-gated fixes (✦ Proof). On the human side I've run the same loop at a Bitcoin L2 (BOB) and an Ethereum app (Aztec), authoring the docs that cut repeat queries. This page is the plan, grounded in how Bybot works today.

What I'd do

→ Mine the fallbacks → ranked backlog
→ Split containment vs true resolution
→ Keep EN/中文 KB at parity, automatically
→ Freeze a regression suite of real chats
→ Guardrail money-movement → human
→ Ship an intent eval harness

Bybit, in context

80M+

registered users (Bybit, published)

24/7

multilingual live support confirmed

2018

founded · global exchange

by derivatives volume (industry-tracked)

At 80M+ users, even a 1-point shift in chatbot containment is tens of thousands of tickets that never reach a human — that's the entire business case for this role. Crypto support is also uniquely high-stakes: questions are about money in motion (stuck deposits, wrong-network transfers, withdrawal holds, P2P disputes, liquidations), they're time-pressured and emotional, and a wrong automated answer isn't just unhelpful — it can be financially harmful. That raises the bar on when Bybot should answer vs. escalate, which is the thread running through this whole audit. And it isn't abstract for Bybit: the Feb 2025 ~$1.5B cold-wallet exploit (the largest crypto theft to date) is exactly why money-movement and account-security intents must reach a human fast — getting that routing right is the heart of this role.

The role, decoded

Six duties in the JD. Here's what each one actually means in week-to-week work, and where I cover it on this page.

JD duty	What it means in practice	Covered in
Own Bybot's lifecycle	Plan → train (intents/answers) → integrate (web/app/CRM) → monitor → continuously improve. You are the product owner of the bot.	§05
Analyse fallback & resolution rates	Instrument the funnel, find where users drop or escalate, raise self-service without hurting CSAT.	§06
Maintain bilingual KB (EN + 中文)	Author and keep both languages at parity; the KB is the bot's brain and the help-center's content.	§07
QA via scenario testing	Before and after every change, run the bot through real conversations; catch regressions.	§08 · §09
Cross-team + train internal staff	Translate between Product / Eng / CS Ops; teach agents what the bot can and can't do.	§05 · §11
Lead NLP / AI optimisation	Improve intent recognition, retrieval, answer quality and auto-routing (right queue by intent × confidence × language × risk) — increasingly LLM/RAG-based.	§10

Requirements call for 5–7 yrs chatbot/CS-AI ops, the CS ticket lifecycle & escalation/routing logic, EN+中文 fluency, CRM familiarity, basic NLP/AI, and strong analytical + PM skills. Fit mapped in §12.

Live support teardown — what a Bybit user hits today

Traced from the public help center and support docs (Jun 2026). Bybit runs a tiered self-service → assisted funnel. Each stage is a place Bybot either deflects a ticket or leaks one to a human.

Help Center search

articles + categories

→

Self-service tools

deposit/tag recovery, etc. confirmed

→

Bybot — virtual assistant

"general questions" first confirmed

→

Live agent (chat)

24/7 multilingual confirmed

→

Submit Case → ticket

webform, <5 min confirmed

What's confirmed live

• Live chat opens with the virtual assistant ("general questions"), then connects to a live agent on demand — the exact fallback boundary this role tunes.
• Self-service recovery functions exist (e.g. missing deposit tag/memo) and are positioned as faster than manual CS review — good deflection, narrow coverage.
• "Submit Case" webform ticketing, advertised <5 min, 24/7 follow-up.
• Support is 24/7 multilingual, so the bot must hold answer parity across languages, not just EN.

What I'd ask for on day 1 (not public)

• Bot platform & whether intent is rule/NLU or LLM-RAG over the KB.
• Fallback rate, containment rate, escalation rate — by language, by intent, last 90 days.
• Top 50 escalated intents and their KB coverage.
• CRM/ticketing (Salesforce/Zendesk?) and how bot transcripts land on the ticket.
• Current QA process & how a KB edit ships to the bot.

Bybot, observed — how it actually behaves (Jun 2026, zh-MY help center)

Bybot is guided-flow-first (tapped intents → clean, deep-linked, authored answers), with solid free-text/typo NLU and 👍/👎 + "以上皆不相关" (none-of-these) fallback capture — the miss-signal stream an analyst mines (§06). Two analyst-relevant tells: it answers in page locale, not message language (English typed on the 中文 page → 中文 reply, reproduced ×2 — a parity gap, §07), and security intents route to a human with a 5–7-day SLA. Full architecture read + the fact-checked verify-table in ⚙ Bybot dissected.

What this funnel produces today

When Bybot can't match an intent, the practical exit is Submit Case → a 3–7 working-day ticket, and at peak volume live chat is sometimes turned off — leaving only the bot or email (help-center docs, public reviews, Jun 2026). So the headline containment number can look healthy while a user's money-movement question waits days. That gap is the whole reason §06 separates containment from true-resolution, and why ★ biases money intents toward a fast human hand-off.

Everything tagged confirmed is observable from public pages cited in §13; internals are deliberately framed as questions, not assumptions.

The hard 20% — where any CS bot needs an owner

Bybot handles the common path well — these aren't claims that it's broken. They're the failure classes that never fully go away on any exchange CS bot, because the product keeps changing and the long tail is infinite. This is the work that's never "done," and what I'd own: measure each, watch the trend, attack the worst.

Failure mode	What the user experiences	Cost	The fix lives in
Fallback dead-end	Bot can't match intent, loops "I didn't get that," no clean hand-off → user rage-types or leaves.	high	§06 + routing
Confident-but-wrong	Bot answers a money-movement question incorrectly (worse than not answering, in crypto).	high	§09 QA gate
EN/中文 parity drift	English KB updated; Chinese lags → bot gives stale/again-different answer to 中文 users.	med	§07
Deflection ≠ resolution	"Containment" counted even though the user gave up unresolved — vanity metric hides pain.	med	§06
KB rot	Product ships a change; KB/intents not updated → bot answers about a flow that no longer exists.	med	§05 loop

⚙

Bybot, dissected — inferred architecture (verify, don't assume)

Black-box, no code access — and a rule: don't take the JD's word for it. A posting describes intended scope, not the deployed system, so every claim is a hypothesis with a confidence and a probe to confirm it. The JD narrows the search; live behaviour is the arbiter.

Hypothesis	Independent evidence	Verdict
Ticketing on Salesforce Service Cloud	Submit-Case URL is `bybit.com/…/s/webform` — the `/s/` path is Salesforce Experience Cloud's signature	likely · med-high
Intent classifier + curated flows (not pure LLM)	Guided buttons + a fixed clarifier menu; the typo "withdraw" resolved fine	unverified — also fits an LLM
Language-specific flows keyed to page locale	English typed on the 中文 page → 中文 reply, reproduced ×2 (explains the parity gap)	plausible, not proven
Auto-routing bot→human; hard safe-flow on security intents	"connects to a live agent on request"; "i was hacked" → account-disable + ticket	likely

Confirm probes: paraphrase one intent 5 ways (rigid template reuse ⇒ intent bot; fluent variety ⇒ LLM); type 中文 on the EN page; inspect the chat widget's network origin for *.salesforce.com. Internals are framed as questions, never asserted.

❖

How peer exchanges do AI customer support

Where the bar is, and what Bybot can borrow — researched from public sources (Jun 2026; cited in §13).

Coinbase — the frontier (and public)

Agentic Claude-based support (per Anthropic's published case study) — grounded in KB + real-time account data, with compliance guardrails + a documented eval process. Coinbase reports AI cut account-restriction resolution times ~90% and now handles ~55% of US fraud cases (Armstrong, May 2026). The north star — and the shape of my ✦ demo.

Binance — multi-model + isolation

(From Binance AI Pro — a trading agent, not its support bot.) Wires multiple LLMs (ChatGPT/Claude/Qwen/MiniMax/Kimi) in an isolated sub-account with a no-withdrawal API key. Borrow the pattern: no single-vendor lock-in; least-privilege for any account action.

Kraken — human-first support

The exchange (kraken.com) emphasises true 24/7 human support. Borrow: on money-at-risk, a fast human hand-off is a feature, not a bot failure. (Not to be confused with the unrelated "Kraken" energy-CS platform at kraken.tech.)

Bitget — localization + tiers

Native per-language support + a priority lane for token holders (per Bitget's own materials; no official public SLA). Borrow: parity means native, not "multilingual", and SLA targets would make "resolution" concrete.

OKX — the shared baseline

Self-service → bot triage → live chat → ticket, multilingual. The same funnel as everyone — so differentiation is quality (resolution, parity, escalation judgment), not the funnel.

→ What I'd bring to Bybot

Coinbase's eval-gated agentic pattern + Binance's multi-model/isolation + Kraken's hand-off discipline + Bitget's native parity & SLAs. If Bybot is still intent-tree, that's the leapfrog — and I've prototyped it.

★

Day-1 quick wins

1 · Mine the fallbacks

Pull the last 90 days of un-matched / escalated turns, cluster by topic. The top 20 clusters are the entire near-term backlog — ranked by volume × escalation rate.

2 · Split the metric

Stop reporting "containment" alone. Add a true-resolution signal (no re-contact in 24h + no escalation + thumbs-up) so we optimise help, not avoidance.

3 · Parity diff EN↔中文

Auto-flag any KB article where EN was edited after its 中文 counterpart. One report kills a whole class of bilingual drift.

4 · Escalation guardrails

For money-movement intents (withdrawals, stuck deposits, liquidations) bias toward fast hand-off over a risky auto-answer. Safety beats deflection.

5 · Regression set

Freeze the top 100 real conversations as a test suite so no KB/intent edit silently breaks an answer that used to work (§08).

6 · Weekly bot review

A 30-min standing review with CS Ops: worst 10 transcripts, what shipped, what moved. The improvement loop becomes a habit, not a project.

Owning Bybot's lifecycle

The JD's first duty. I treat the bot as a product with a tight build-measure-learn loop — and the analyst as its owner across Product, Eng and CS Ops.

Plan

Prioritise intents by ticket volume × cost × automatability. Not everything should be automated — money-movement edge cases stay human by design.

Train

Author intents/utterances + KB answers (EN+中文), set confidence thresholds, define escalation triggers per intent.

Integrate

Web + app + CRM. Make sure the bot transcript + detected intent land on the ticket so agents start with context, not a blank slate.

Monitor

Dashboards on fallback / containment / escalation / CSAT, sliced by language and intent. Alerts when a metric regresses after a release.

Improve

Weekly: read the worst transcripts, fix the KB or routing, ship, verify the metric moved, add to regression set. Repeat.

Enable

Train CS agents on what Bybot now handles and where to trust/override it — so the bot and humans reinforce each other.

Fallback vs containment — measuring what matters

"Analyse fallback and resolution rates" is duty #2. The trap is optimising deflection (user didn't reach a human) instead of resolution (user's problem was actually solved). Here's the metric tree I'd run.

Illustrative session outcomes — the gap to close

Self-resolved (good)~46%

Escalated to agent~31%

Fallback dead-end~14%

"Contained" but unresolved~9%

Illustrative shape, not Bybit data — the point is that the muted bars (dead-ends + false containment) are the real backlog, and they're invisible if you only track one headline number.

The metrics I'd own

Containment rate — sessions resolved without a human. The headline, but never alone.
True-resolution rate — contained and no re-contact within 24h and positive/neutral CSAT. The honest number.
Fallback rate — turns where intent wasn't matched. Leading indicator of KB/NLU gaps.
Escalation precision — of escalations, how many truly needed a human (we want risky ones to escalate).
Per-intent + per-language cuts of all of the above — that's where the work hides.

Bilingual knowledge base — EN / 中文 at parity

Duty #3, and the reason fluent Mandarin is a hard requirement. The KB is built from three sources — Help Center content, internal SOPs, and patterns in CRM ticket data — and in a RAG-style bot it is the model's knowledge, so KB quality is bot quality. I'm a native-level EN + 中文 (+ BM) writer, so I can own both languages directly rather than route every Chinese edit through translation.

Parity, enforced

Treat EN and 中文 as one article in two languages with a shared "last-reviewed" stamp. CI-style report flags any pair where one language is stale.

Written for retrieval

Front-load the answer, one intent per article, explicit synonyms ("tag" = "memo"), so both humans and the bot's retriever find the right chunk.

Gap-driven authoring

New articles come from the fallback clusters and CRM ticket trends (★) — the KB grows where users actually get stuck.

This is the exact loop I ran at BOB — I authored documentation and FAQs that measurably cut repeat support queries. Same mechanism here, now feeding a bot as well as a reader.

Scenario QA & regression

Duty #4. "QA through scenario testing" only scales if scenarios are a persistent suite, not ad-hoc clicking. Every change runs the gauntlet before it ships, in both languages.

The suite

• Golden set: top 100 real conversations, EN + 中文, with the correct outcome (answer / escalate).
• Adversarial set: typos, code-switching, off-topic, prompt-injection-style inputs.
• Safety set: money-movement intents that must escalate, never auto-answer.

The gate

• Run the suite pre- and post-change; diff intent match + answer + escalation decision.
• Any safety-set regression = hard block on release.
• Score answers against the failure scorecard → so "is this answer good?" is rubric-based, not vibes.

AI-QA failure scorecard

I built a version of this as an analyst reviewing automated AI outputs at KIP Protocol. It turns "this answer feels off" into a consistent, trainable label — so QA is comparable across reviewers, languages and weeks, and the failure mix tells you what to fix next.

Failure class	Definition	Severity	Typical fix
Wrong answer	Factually incorrect for the user's case — esp. on funds/fees/limits.	critical	KB correction + regression case
Hallucinated policy	Invents a rule/step that doesn't exist.	critical	Tighten retrieval / grounding
Missed escalation	Should have handed to a human; didn't.	critical	Escalation trigger on intent
No-match / fallback	Valid question, intent not recognised.	major	Add intent/utterances or article
Partial / incomplete	Technically right but misses a key step.	major	Rewrite KB answer
Language/tone drift	EN↔中文 mismatch, robotic or off-brand tone.	minor	KB parity + tone guide
Over-escalation	Bounced to a human something it could solve.	minor	Raise confidence/coverage

Each reviewed transcript gets one primary label + severity; the weekly distribution is the roadmap. Critical classes also feed the safety regression set (§08).

✦

Proof — I ran this exact loop, live

I built a working bilingual RAG support bot — bot.web3wagmi.com — then stress-tested it as an adversarial QA analyst and shipped five fixes in one session, each gated by an eval that must hold safety 10/10 · 0 regressions or it doesn't ship. The same loop I'd run on Bybot: read the failure → find the weak intent → fix KB/routing → prove the metric moved → don't regress it.

1 · EN→中文 parity drift

A 中文 question answered in English. Fix: translated all 107 guides to 中文, ingested a Chinese KB, added CJK-aware retrieval. Now: 中文 in → 中文 out, with 中文 sources. (§07)

2 · Confident-but-wrong over-answering

Gibberish (“u eat?”) got a low-confidence essay by matching a stray word. Fix: a semantic floor — no named entity + weak meaning → deflect, don’t guess. (§06)

3 · Real questions wrongly deflecting

“bybit vs binance” was turned away — filler + a second entity dragged the match below threshold. Fix: recognise named-guide entities so comparisons answer. (§06)

4 · QA jargon shown to the customer

Chat displayed “confidence low · 0.394 — recommend escalation”. Fix: telemetry moved to the analyst scorecard; customer sees a clean answer + a human hand-off. (§09)

5 · Typos slipping past the safety net

“my wallet ahcked” missed escalation. Fix: fuzzy-match high-severity terms — accept minor over-escalation because a missed one is the only critical failure. (★)

The gate that made it safe

All five shipped only because the eval held safety 10/10 · 0 regressions. Change without a regression gate is how a “fix” silently breaks a working answer.

Each fix maps to a JD lever: precision/recall on the deflect boundary (§06), EN/中文 parity (§07), regression-gated QA (§08), escalation routing (★). The point isn’t the bot — it’s that I find where even a solid bot leaks, fix it, and prove it stayed fixed.

What I'd ship — an intent eval harness

Frontier LLMs commoditised language understanding — so "NLP optimisation" now means the layer around the model: retrieval, routing, safety guardrails and evals. Duty #6 is "lead NLP/AI optimisation." Most CS-analyst applicants stop at dashboards; I also build the tooling. This is the shape of a regression harness that runs the golden + safety sets against the bot and blocks a release on any safety regression — illustrative of approach, not production code.

# bybot_eval.py — run the golden/safety suite, gate the release
import json, statistics as st
from bybot_client import ask        # thin wrapper over the bot API

def grade(case, resp):
    # scorecard labels from §09 → machine-checkable
    if case["must_escalate"] and not resp["escalated"]:
        return "missed_escalation"      # CRITICAL — never ship
    if case["intent"] != resp["intent"]:
        return "no_match"
    if not resp["grounded"]:            # answer cites a real KB chunk?
        return "hallucinated_policy"
    return "ok"

def run(suite, lang):
    rows = [(c, ask(c["utterance"], lang=lang)) for c in suite]
    labels = [grade(c, r) for c, r in rows]
    crit = [l for l in labels if l in ("missed_escalation", "hallucinated_policy")]
    return {
      "lang": lang,
      "containment": round(st.mean(r["contained"] for _, r in rows), 3),
      "match_rate": round(labels.count("ok") / len(labels), 3),
      "critical": len(crit),
    }

if __name__ == "__main__":
    suite = json.load(open("golden_safety.json"))
    report = [run(suite, l) for l in ("en", "zh")]   # both languages, every run
    for r in report: print(r)
    # release gate: any critical failure in any language → fail the build
    assert all(r["critical"] == 0 for r in report), "safety regression — blocking release"

The point isn't this exact script — it's that I think about chatbot quality as something you can measure, gate and regress like software, in EN and 中文 together. I also build onchain AI agents and read SDKs/contracts, so I'm comfortable wherever the bot meets engineering.

⌗

The SOTA toolkit I'd bring — mapped to the JD

I don't know the internal stack, so this is vendor-agnostic: the frontier tools I'd evaluate per JD responsibility. ✓ shipped = I've already run a working version (✦ Proof). The posture: instrument first, measure the real gaps, then pick buy / build / tune against the numbers — the data picks the dish, not the menu.

JD area	Frontier tools I'd evaluate (2026)	Already shipped?
QA / scenario testing / regression	promptfoo (red-team), DeepEval (CI), Botium (UI regression), RAGAS, garak	✓ eval gate + 3-model judge
Fallback / resolution analytics	Langfuse / Arize Phoenix (tracing), BERTopic clustering, multilingual embeddings + reranker	✓ failure scorecard
Bilingual EN/中文 KB + parity	frontier + China-strong models (Qwen/DeepSeek/GLM); multi-model consensus QA; parity-drift detection	✓ DeepSeek→Gemini+Qwen pipeline
Auto-routing / safety	intent+threshold routers, semantic deflection gate, NeMo Guardrails / Llama Guard	✓ entity gate + safety escalation
Platform (if modernising)	benchmark Fin / Sierra / Decagon / Agentforce vs in-house; Rasa, LangGraph, MCP for control	evaluate

First 30 / 60 / 90 days

Days 0–30 · Learn & baseline

• Map the live stack, CRM & release path for real.
• Establish honest baselines: containment, true-resolution, fallback, escalation precision — by intent & language.
• Cluster 90 days of fallbacks → the backlog.
• Build the golden + safety regression set.

Days 30–60 · Fix the top leaks

• Ship KB/intent fixes for the top 10 fallback clusters (EN+中文).
• Add escalation guardrails on money-movement intents.
• Stand up the parity-drift report & weekly bot review.
• Prove the first metric moves; nothing regresses.

Days 60–90 · Systematise

• Eval harness in the release gate; QA is automatic.
• Agent enablement: what Bybot now owns.
• A repeatable monthly improvement cadence with targets.
• Propose the next bet (e.g. RAG/LLM answer quality).

Fit, in two lines

The rest of this page is the argument; this is just the résumé line. 5+ yrs on the CS ↔ content ↔ AI seam — front-line support at a Bitcoin L2 (BOB), customer success on an Ethereum app (Aztec), docs that cut repeat queries, and an AI-QA failure scorecard at KIP. Native EN / 中文 (+ BM), and I build AI agents — credible to CS Ops and Engineering alike. The proof is the loop I ran live and the product read in §03.

Method & sources

How the "confirmed" claims were verified

Everything tagged confirmed is observable from Bybit's public help center and support docs (Jun 2026): the live-chat flow that opens with a virtual assistant and connects to a live agent on request, the self-service recovery functions, the "Submit Case" webform, and 24/7 multilingual support. The bot's internal platform, thresholds and metrics are not public — those are framed as questions in §03, not assumptions. No private systems were accessed; metric values in §06 are explicitly illustrative.

360° recon — what I actually went through

I worked the live surface end to end:

• The full funnel — Help Center → self-service → Bybot → live agent → Submit Case ticket, on web and app.
• Bybot itself, EN + 中文 — guided flows, free-text, typos, a security incident and a refund flow (live observations in §03).
• The KB taxonomy — deposits/withdrawals, transfers, P2P, Earn, Card, Crypto Loans: money-movement is the bulk, i.e. the high-stakes surface.
• Public sentiment — Trustpilot + review aggregators, which independently name the same failure classes this page targets.

Public reviews surface recurring patterns ("generic, scripted responses", "~20 chats, no solution", a "queue of 108", "live chat is just a bot", multi-day freezes on money-movement). Self-selected reviews aren't data — I'd treat them as hypotheses to validate against internal metrics — but they point at containment that isn't resolution (§06), money intents needing a faster human route (★), and EN/中文 consistency gaps (§07).

Role: Client Service Analyst — (Senior) AI Chatbot · Greenhouse

Help Center: bybit.com/en/help-center

Submit Case: help-center/s/webform

Support flow: Bybit — 24/7 multilingual live chat

Sentiment: Trustpilot — Bybit reviews

Support detail: TradersUnion — Bybit support

Peer — Coinbase AI: Anthropic case study · Armstrong, ~90% / 55% (May 2026)

Peer — Binance AI Pro / Kraken support: Binance · Kraken 24/7

Claims are split into confirmed (verified from public pages) and illustrative/inferred (modelled or from the JD) throughout. Code is illustrative of approach, not production config. This is an unsolicited audit prepared as interview homework — happy to walk through any section live.

Prepared by Edward Tay · for the Bybit Client Service Analyst — (Senior) AI Chatbot role · Jun 2026 · edwardtay.com · Edwardtay7@gmail.com