Skip to content
SmooSense LogoSmooSense
DemosDocsBlogsStart

Embedding

User Guide
InstallConfigurationS3/Folder BrowserDatabase BrowserTable ViewerExploratory Data AnalysisEmbeddingVisualization

Visual embedding workflow

1. Balance map#

People turn to semantic balance analysis using embeddings when they need to understand whether their dataset is fair, representative, and structurally complete: not just in terms of raw counts, but in terms of meaning. Traditional distributions can show how many samples fall into each category, but only embeddings reveal deeper patterns: whether certain concepts dominate, whether clusters are missing or underrepresented, whether two groups that "look balanced" numerically are actually very different semantically. This is crucial in ML, robotics, recommendation systems, audio/vision datasets, and any scenario where meaningful coverage matters more than labels alone.

BalanceMap in SmooSense makes this effortless by visualizing embedding space as bubble plots, computes relative ratio, and colorize by the level of imbalance.

1.1 Ratio-based color encoding for balance#

Color isn't determined by raw counts, but by relative balance across breakdowns (e.g., training/validation/test splits). This is because groups of the breakdown inherently have different size. Image below shows the distribution of fold. If we colorize by counts, then you will only see information from training fold.

fold-distribution

For each bubble, we compute the ratio of samples of that bubble within its breakdown group:

ratio=count of points in bubblecount of total points in that group\text{ratio} = \frac{\text{count of points in bubble}}{\text{count of total points in that group}}

We then compare these ratios across groups:

1.2 Try yourself#

Zoom in and drag around, you can easily find a blue cluster where all the data is in train fold, no testing or validation at all.

👇 Live demo

https://demo.smoosense.ai/Table?tablePath=s3://smoosense-demo/datasets/COCO2017/images-emb-2d.parquet&activeTab=Plot&activePlotTab=BalanceMap&columnForGalleryVisual=coco_url&columnForGalleryCaption=fold&bubblePlotXColumn=emb_x&bubblePlotYColumn=emb_y&bubblePlotBreakdownColumn=fold

2. More#

Full embedding features (search, clustering etc) are coming.

SmooSense Logo

10x easier to analyze your multimodal data

Copyright © 2025 SmooSense

Contact