My research sits at the intersection of representation learning and data-centric ML, with a thread running through vision, graphs, and tabular data. I care about what happens when the data itself is imperfect: noisy labels, severe class imbalance, scarce supervision, or distributions that shift in ways that are hard to anticipate.

Some of this is theoretical curiosity: what a model memorizes versus generalizes, how neighborhood structure shapes representations, how learned embeddings behave when the underlying distribution is adversarial or long-tailed. But much of it is grounded in working directly with large-scale financial transaction data, where class imbalance is not a benchmark setting but a structural reality, where fraud patterns evolve adversarially, and where a model's failure modes have real consequences. That operational experience shapes how I think about research problems. I'm drawn to methods that are principled enough to publish and robust enough to actually deploy.

Lately I've been thinking about tabular representation learning: how SSL objectives developed for vision and language translate (or don't) to structured, heterogeneous tabular data, and what the right inductive biases look like for this setting.

Publications

Towards Equitable Coreset Selection: Addressing Challenges Under Class Imbalance

Liyana Sahir, AN Reddy, BS Achary, A Sharma, K Shah, S Gupta, S Asthana · CIKM 2025

Selecting which data to train on is as important as selecting how to train — specially when you're working at the scale of billions of samples, and even more so when that data is heavily class imbalanced. This work looks at when and why standard coreset selection methods fail under class imbalance, and proposes a more equitable approach without sacrificing efficiency.

AMEND: Adaptive Margin and Expanded Neighborhood for Efficient Generalized Category Discovery

A Banerjee, Liyana Sahir, S Biswas · WACV 2024

AdaPrompt: Prompt Tuning with Adaptive Neighbours for Generalized Category Discovery

Liyana Sahir, A Banerjee, S Biswas · ICIP 2024

Both from my masters at IISc, advised by Prof. Soma Biswas. These tackle generalized category discovery (GCD): the problem of learning representations that simultaneously recognize known classes from your labelled and unlabelled set and discover novel unlabelled classes from unlabeled data. The open-world setting forces you to think carefully about what good representations actually mean when the label space itself is incomplete.

Study of Topology Bias in GNN-based Knowledge Graph Algorithms

A Surisetty, A Malhotra, D Chaurasiya, S Modak, S Yerramsetty, A Singh, L Sahir, E Abdel-Raheem · ICDM-W 2023

Some findings while working on representations for knowledge graphs using graph neural networks, undertaken while interning at Mastercard, made it into this paper.

Google Scholar →