Maharnab Saikia

VCTR: A Transformer-Based Model for Non-parallel Voice Conversion

Abstract: Non-parallel voice conversion aims to convert voice from a source domain to a target domain without paired training data. Cycle-Consistent Generative Adversarial Networks (CycleGAN) and Variational Autoencoders (VAE) have been used for this task, but these models suffer from difficult training and unsatisfactory results. Later, Contrastive Voice Conversion (CVC) was introduced, utilizing a contrastive learning-based approach to address these issues. However, these methods use CNN-based generators, which can capture local semantics but lacks the ability to capture long-range dependencies necessary for global semantics. In this paper, we propose VCTR, an efficient method for non-parallel voice conversion that leverages the Hybrid Perception Block (HPB) and Dual Pruned Self-Attention (DPSA) along with a contrastive learning-based adversarial approach.

Read Paper View Code

Programming Language

Viper (Work in Progress)

Viper is a programming language built from the ground up in C, It uses Pythonic syntax while maintaining the low-level efficiency and memory management capabilities of C.

View on GitHub

AI Projects

SmolLlama3

SmolLlama3 is an 8B-parameter language model built as part of my experimentation with fine-tuning LLMs. It is based on Llama 3.1 8B and was fine-tuned using a custom dataset, smol-smoltalk-10k, which contains 10,000 conversational samples. The model is designed for simple conversational tasks; however, its responses may be less refined as Direct Preference Optimization (DPO) was not applied.

View on Hugging Face

Gemma-2-2b-it Fine-Tune (Hindi)

A fine-tuned version of Gemma 2 2b-it, specifically developed for Kaggle's 'Unlock Global Communication with Gemma' competition. It has been fine-tuned to handle language-specific tasks, with a primary focus on Hindi. The model is designed to enhance communication capabilities, enabling better understanding and processing of the Hindi language for a variety of applications. The model was fine-tuned on a corpus of 7,640 Hindi instructions, enabling it to better understand and process the language.

View on Kaggle

GPT-2 PyCode

This project features a GPT (Generative Pre-trained Transformer) language model with 124 million parameters that has been fine-tuned for Python code generation. Unlike larger models like GPT-2 or GPT-3, this is a smaller-scale model designed primarily for testing and experimental purposes. It was trained on a small corpus of 25,000 Python code samples.

View on Hugging Face

Flux Collage LoRA

This model is a fine-tuned version of Flux.1-dev, optimized for generating collage-style images using LoRA (Low-Rank Adaptation).

View on Hugging Face

Web Development

Glassmorphism

CSS code generator that generates the beautiful and trendy glassmorphism UI design style. Glassmorphism is a design trend that combines transparent elements, vibrant colors, and blurred backgrounds to create a visually appealing and modern user interface.

View on GitHub

Skybound

This is a small platformer game created as a learning project. Play as a brave knight on a mission to collect four magical fruits from different worlds to save your king. Dodge enemies like slimes, collect coins, and explore vibrant levels.

Play / Download

After a long day

I participated in the International College Jam as a developer, where I created a game with a team of three: one artist and writer, and another artist. It was a week-long game jam in which students from various colleges and universities took part. The game is a visual novel based on a cozy cyberpunk theme. We designed and developed the entire game within one week, collaborating closely to align gameplay, visuals, and narrative under tight deadlines.

Play / Download

Sky Lantern

This game was created in 11 hours for the Micro Game Jam using the Unity game engine. It’s a simple arcade-style game where you dodge shooting stars and collect fire to stay airborne.

Play / Download

My Journey

2023 - 2025

I have spent two years teaching myself about machine learning and AI. In 2025, I authored my first research paper on a voice conversion model.

2025 - Present

Bachelor of Computer Applications at LCB College under Gauhati University.

Skills

Python C C++ C# Machine Learning PyTorch OpenGL Unity Godot Blender Git & GitHub