How do you convert text to binary?

Each text character has a numeric code in ASCII or Unicode. Convert that number to base-2. For example, the letter A is ASCII 65, which in binary is 01000001. A text-to-binary converter automates this for any input.

What is the binary code for the letter A?

Uppercase A in ASCII is 01000001 (decimal 65). Lowercase a is 01100001 (decimal 97). The difference is exactly one bit — the 6th bit from the right.

How many bits are in a byte?

A byte contains 8 bits and can represent 256 different values (2 to the power of 8). One byte is enough to store a single ASCII character. The term byte was coined by Werner Buchholz at IBM in 1956.

What is the difference between binary and hexadecimal?

Both are number systems. Binary is base-2 (digits 0-1), while hexadecimal is base-16 (digits 0-9 and A-F). Hex is a compact way to write binary — each hex digit represents exactly 4 binary digits. For example, binary 11111111 = hex FF = decimal 255.

Binary Code: How Computers Store and Translate Text

Q: Why do computers use binary instead of decimal?

Computer hardware is built from transistors that have two states: on and off. Binary (base-2) maps perfectly to these two states. Designing reliable circuits with just two voltage levels is much simpler and more energy-efficient than using ten levels for decimal.

March 31, 2026 · 12 min read

Table of Contents

What Is Binary Code?
How Text Becomes Binary
The ASCII Standard
Beyond ASCII: Unicode
UTF-8, UTF-16, and UTF-32 Explained
Binary Translation Examples
Practical Applications
Working with Binary in Programming
Common Encoding Issues
Key Takeaways
Frequently Asked Questions
Related Articles

Every piece of text you read on a screen — this sentence included — is stored inside your computer as binary code: sequences of 1s and 0s. Understanding how binary translation works reveals the fundamental mechanism behind all digital communication, from text messages to web pages to the files on your hard drive.

Whether you're a developer debugging character encoding issues, a student learning computer science fundamentals, or simply curious about how technology works, this guide will walk you through the complete journey from keystrokes to binary and back again.

What Is Binary Code?

Binary is a base-2 number system that uses only two digits: 0 and 1. While humans naturally count in base-10 (decimal) using digits 0-9, computers operate in binary because their fundamental building blocks — transistors — have two states: on (1) and off (0).

Every piece of data in a computer, whether text, images, music, or video, is ultimately represented as patterns of these two digits. This might seem limiting, but binary's simplicity is precisely what makes it so powerful and reliable for electronic circuits.

Understanding Bits and Bytes

A single binary digit is called a bit. Eight bits grouped together form a byte, which can represent 256 different values (2⁸ = 256). This is enough to encode all the letters, numbers, and symbols used in English text, which is why the byte became the standard unit of digital storage.

Here's how binary place values work, reading from right to left:

Position	7	6	5	4	3	2	1	0
Place Value	128	64	32	16	8	4	2	1
Example: 01000001	0	1	0	0	0	0	0	1
Calculation	0	64	0	0	0	0	0	1

In this example, 01000001 equals 64 + 1 = 65 in decimal, which represents the letter "A" in ASCII encoding.

Pro tip: You can use our Binary Translator to instantly convert text to binary and back, making it easy to experiment with these concepts hands-on.

How Text Becomes Binary

When you type a letter on your keyboard, your computer doesn't store the shape of that letter. Instead, it stores a number that represents the letter, according to an agreed-upon encoding standard. The most fundamental of these is ASCII (American Standard Code for Information Interchange).

Here's what happens step by step when you type the letter "A":

Keyboard signal: Your keyboard sends a signal to the computer identifying which key was pressed
Character lookup: The operating system looks up the character encoding: "A" = 65 in ASCII
Binary conversion: The number 65 is converted to binary: 01000001
Storage or transmission: These eight bits are stored in memory or transmitted over a network
Display: When displayed, the process reverses: binary → number → character shape rendered on screen

This entire process happens in microseconds, completely invisible to the user. The encoding standard acts as a universal dictionary that all computers agree upon, ensuring that when you type "Hello" on one computer, it displays as "Hello" on another.

Why Encoding Standards Matter

Without standardized encoding, digital communication would be impossible. Imagine if every computer manufacturer used their own system for representing letters — a file created on one computer would be gibberish on another.

Encoding standards solve this problem by creating universal agreements about which numbers represent which characters. This is why you can send an email from a Mac to a Windows PC, or view a website created in Japan on a computer in Brazil.

The ASCII Standard

ASCII (American Standard Code for Information Interchange) was developed in the 1960s and became the foundation for text encoding in computers. It uses 7 bits to represent 128 different characters, including:

Uppercase letters (A-Z): codes 65-90
Lowercase letters (a-z): codes 97-122
Digits (0-9): codes 48-57
Punctuation and symbols: various codes
Control characters: codes 0-31 (like newline, tab, backspace)

Here's a sample of common ASCII characters:

Character	Decimal	Binary	Hexadecimal
Space	32	00100000	20
0	48	00110000	30
A	65	01000001	41
a	97	01100001	61
!	33	00100001	21
?	63	00111111	3F

ASCII's Limitations

While ASCII was revolutionary for its time, it has significant limitations. With only 128 characters, ASCII can only represent English letters and basic symbols. It cannot handle:

Accented characters (é, ñ, ü)
Non-Latin alphabets (Greek, Cyrillic, Arabic)
Asian writing systems (Chinese, Japanese, Korean)
Emoji and modern symbols

Extended ASCII (using 8 bits for 256 characters) added some accented characters, but different regions used different extensions, creating compatibility problems. This is where Unicode comes in.

Quick tip: If you're working with legacy systems or simple English text, ASCII is still perfectly adequate and uses less storage space than Unicode. Use our ASCII Converter to work with ASCII values directly.

Beyond ASCII: Unicode

Unicode was created in the 1990s to solve ASCII's limitations by providing a unique number (called a "code point") for every character in every writing system used on Earth. As of 2026, Unicode includes over 149,000 characters covering 159 modern and historic scripts.

Unicode assigns each character a code point written as U+ followed by hexadecimal digits. For example:

U+0041 = A (Latin capital letter A)
U+03B1 = α (Greek small letter alpha)
U+4E2D = 中 (Chinese character for "middle")
U+1F600 = 😀 (grinning face emoji)

Unicode vs. UTF: Understanding the Difference

This is where many people get confused: Unicode is not an encoding. Unicode is a character set — a list that assigns numbers to characters. UTF (Unicode Transformation Format) encodings are the methods for representing those numbers as binary data.

Think of it this way: Unicode is like a phone book that assigns a unique number to every person. UTF encodings are the different ways you might write down those phone numbers (with or without country codes, with or without dashes, etc.).

UTF-8, UTF-16, and UTF-32 Explained

There are three main UTF encodings, each with different trade-offs:

UTF-8: The Web Standard

UTF-8 is a variable-length encoding that uses 1 to 4 bytes per character. It's backward compatible with ASCII — the first 128 characters use the exact same binary representation as ASCII.

Advantages:

Efficient for English text (1 byte per character)
Backward compatible with ASCII
No byte-order issues
Dominant on the web (over 98% of websites)

Disadvantages:

Less efficient for Asian languages (3-4 bytes per character)
Variable length makes indexing more complex

UTF-16: The Windows Default

UTF-16 uses 2 or 4 bytes per character. Most common characters fit in 2 bytes, but rare characters and emoji require 4 bytes (using "surrogate pairs").

Advantages:

Efficient for most languages (2 bytes per character)
Used internally by Windows, Java, and JavaScript

Disadvantages:

Not backward compatible with ASCII
Byte-order issues (big-endian vs. little-endian)
Still variable length for rare characters

UTF-32: Fixed Length

UTF-32 uses exactly 4 bytes for every character, making it the only fixed-length Unicode encoding.

Advantages:

Simple indexing (character N is at byte position N×4)
No complex decoding logic

Disadvantages:

Wastes space (4 bytes even for simple ASCII characters)
Rarely used in practice

Pro tip: When building web applications, always use UTF-8. It's the internet standard, supported everywhere, and efficient for most content. Specify it in your HTML with <meta charset="UTF-8"> and in HTTP headers with Content-Type: text/html; charset=UTF-8.

Binary Translation Examples

Let's walk through some concrete examples of how text becomes binary and back again.

Example 1: Simple ASCII Word

The word "Hi" in ASCII:

H = 72 decimal = 01001000 binary
i = 105 decimal = 01101001 binary

Complete binary: 01001000 01101001

When stored in a file or transmitted over a network, these 16 bits (2 bytes) represent the word "Hi".

Example 2: Mixed Case with Punctuation

The phrase "Hello!" breaks down as:

Character	Decimal	Binary
H	72	01001000
e	101	01100101
l	108	01101100
l	108	01101100
o	111	01101111
!	33	00100001

Total: 48 bits (6 bytes) of data.

Example 3: Unicode Emoji

The emoji 😀 (grinning face) is U+1F600 in Unicode. In UTF-8, it's encoded as 4 bytes:

11110000 10011111 10011000 10000000

This demonstrates why UTF-8 is variable length — a simple "A" takes 1 byte, but an emoji takes 4 bytes.

Converting Binary to Text

To convert binary back to text, you reverse the process:

Group the binary digits into bytes (8 bits each)
Convert each byte to its decimal value
Look up the character for that value in your encoding table
Combine the characters to form text

For example, if you receive: 01001000 01100101 01111001

01001000 = 72 = H
01100101 = 101 = e
01111001 = 121 = y

Result: "Hey"

Practical Applications

Understanding binary text encoding isn't just academic — it has real-world applications across many fields.

Web Development

Web developers encounter encoding issues regularly. Common scenarios include:

Form submissions: Ensuring user input is properly encoded when sent to servers
Database storage: Choosing the right character set for database columns
API responses: Setting correct Content-Type headers with charset information
URL encoding: Converting special characters to percent-encoded format

Our URL Encoder tool helps handle URL encoding automatically, converting special characters to their percent-encoded equivalents.

Data Analysis and Processing

Data scientists and analysts need to understand encoding when:

Reading CSV files from different sources
Scraping web content with international characters
Processing log files from various systems
Cleaning text data for machine learning models

Cybersecurity

Security professionals use binary encoding knowledge for:

Analyzing malware: Understanding how malicious code hides in binary data
Forensics: Examining file headers and metadata
Encryption: Working with encoded and encrypted data
Steganography: Detecting hidden messages in binary files

File Format Design

When designing custom file formats, you need to decide:

Which encoding to use for text fields
How to mark the encoding in the file header
Whether to use fixed or variable-length fields
How to handle byte-order for multi-byte values

Quick tip: When working with text files, always explicitly specify the encoding. Never rely on defaults, as they vary by platform and can cause subtle bugs. Use UTF-8 unless you have a specific reason not to.

Working with Binary in Programming

Most programming languages provide built-in functions for working with character encoding and binary data. Here are examples in popular languages:

Python

# Convert string to bytes (UTF-8)
text = "Hello"
binary = text.encode('utf-8')
print(binary)  # b'Hello'

# Convert bytes back to string
decoded = binary.decode('utf-8')
print(decoded)  # Hello

# Get ASCII value of a character
print(ord('A'))  # 65

# Convert ASCII value to character
print(chr(65))  # A

JavaScript

// Get character code
console.log('A'.charCodeAt(0));  // 65

// Convert code to character
console.log(String.fromCharCode(65));  // A

// Convert string to binary representation
const text = "Hi";
const binary = text.split('').map(char => 
  char.charCodeAt(0).toString(2).padStart(8, '0')
).join(' ');
console.log(binary);  // 01001000 01101001

Java

// Convert string to bytes
String text = "Hello";
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);

// Convert bytes back to string
String decoded = new String(bytes, StandardCharsets.UTF_8);

// Get ASCII value
int ascii = (int) 'A';  // 65

// Convert ASCII to character
char character = (char) 65;  // A

Bitwise Operations

Understanding binary also helps with bitwise operations, which are useful for:

Setting and clearing individual bits (flags)
Efficient multiplication and division by powers of 2
Color manipulation in graphics programming
Network protocol implementation
Compression algorithms

Common Encoding Issues

Encoding problems are among the most frustrating bugs to debug. Here are common issues and their solutions:

Mojibake (Garbled Text)

Text was encoded in one format (UTF-8) but decoded in another (Latin-1)
The encoding declaration is missing or incorrect
Data passed through a system that doesn't preserve encoding

Solution: Ensure consistent encoding throughout your data pipeline. Use UTF-8 everywhere and explicitly declare it.

Question Marks or Boxes

Seeing � or □ means:

The character exists in the source encoding but not in the target
The font doesn't have a glyph for that character
The character was lost during conversion

Solution: Use Unicode (UTF-8) which supports all characters, and ensure your fonts include the necessary glyphs.

Byte Order Mark (BOM) Issues

The BOM is an optional marker at the start of UTF-8 files. It can cause problems:

Breaking scripts that expect files to start with specific characters
Causing "invisible" characters at the start of files
Creating issues with HTTP headers

Solution: Use UTF-8 without BOM for most purposes. Only use BOM when required by specific Windows applications.

Database Encoding Mismatches

Common database encoding problems:

Database set to Latin-1 but application sends UTF-8
Connection charset different from table charset
Collation issues causing incorrect sorting

Solution: Set database, table, and connection all to UTF-8 (utf8mb4 in MySQL for full Unicode support including emoji).

Pro tip: When debugging encoding issues, use a hex editor to examine the actual bytes in your file. This reveals the true encoding regardless of what your text editor displays. Tools like Hex Viewer can help visualize binary data.

Key Takeaways

Understanding how computers store and translate text through binary code is fundamental to working with digital systems. Here are the essential points to remember:

Binary is universal: All digital data, including text, is ultimately stored as patterns of 1s and 0s
Encoding standards are agreements: ASCII, Unicode, and UTF encodings are shared dictionaries that let computers communicate
UTF-8 is the modern standard: Use it for web development, file storage, and data exchange unless you have specific requirements
Bytes matter: A byte (8 bits) can represent 256 values, enough for ASCII but not for global text
Unicode isn't an encoding: Unicode assigns numbers to characters; UTF encodings determine how those numbers become bytes
Encoding issues are preventable: Explicitly declare encoding everywhere and use UTF-8 consistently

Whether you're building websites, analyzing data, or just curious about how technology works, understanding binary text encoding gives you insight into the fundamental layer of digital communication.

Frequently Asked Questions

Why do computers use binary instead of decimal?

Computers use binary because their fundamental components — transistors — have two stable states: on and off. This maps perfectly to binary's 1 and 0. Building circuits that reliably distinguish between ten different voltage levels (for decimal) would be far more complex, expensive, and error-prone than circuits that only need to distinguish between two states.

Binary's simplicity also makes it extremely reliable. Electronic noise or voltage fluctuations are less likely to cause errors when you only need to distinguish between "high" and "low" rather than ten different levels.

What's the difference between ASCII and Unicode?

ASCII is a 7-bit encoding that can represent 128 characters, primarily covering English letters, digits, and basic symbols. It was designed in the 1960s for American English text.

Unicode is a modern character set that assigns unique numbers to over 149,000 characters from all writing systems worldwide, including emoji and symbols. Unicode is not an encoding itself — UTF-8, UTF-16, and UTF-32 are the encodings that represent Unicode characters as binary data.

Think of ASCII as a small dictionary with 128 entries, while Unicode is a comprehensive encyclopedia with entries for every character used in human writing.

Why does UTF-8 use different numbers of bytes for different characters?

UTF-8 uses variable-length encoding to balance efficiency and compatibility. ASCII characters (the most common in English text) use just 1 byte, keeping file sizes small for English content. Less common characters use 2-3 bytes, and rare characters or emoji use 4 bytes.

This design makes UTF-8 backward compatible with ASCII — any valid ASCII file is also a valid UTF-8 file. It also means that English text in UTF-8 takes the same space as ASCII, while still supporting all Unicode characters when needed.

The alternative would be fixed-length encoding (like UTF-32), which uses 4 bytes for every character, wasting space for common characters.

How can I tell what encoding a file is using?

Unfortunately, there's no foolproof way to detect encoding from binary data alone. However, you can use these methods:

Check file metadata: Some formats (HTML, XML) include encoding declarations in their headers
Look for BOM: UTF-8, UTF-16, and UTF-32 files may start with a Byte Order Mark that identifies the encoding
Use detection tools: Libraries like Python's chardet or command-line tools like file can guess encoding based on byte patterns
Try decoding: Attempt to decode with common encodings (UTF-8, Latin-1) and see which produces readable text

The best practice is to always explicitly specify and document the encoding rather than relying on detection.

Can binary code represent images and videos too?

Yes, absolutely. Everything in a computer is ultimately binary — images, videos, audio, programs, everything. The difference is in how the binary data is interpreted.

For images, binary data represents pixel colors (usually as RGB values). For videos, it's a sequence of images plus audio data. For audio, it's samples of sound wave amplitudes. Each file format has its own structure for organizing this binary data.

Text is actually one of the simpler cases because each character maps to a specific number. Images and videos require more complex encoding schemes to efficiently store visual and audio information.

Why do some websites show garbled text?

Garbled text (called "mojibake") happens when text encoded in one format is decoded using a different format. Common causes include:

The website doesn't declare its encoding in the HTML or HTTP headers
Your browser guesses the wrong encoding
The server sends one encoding but declares another
Text was copied from a source with different encoding

You can usually fix this by manually selecting the correct encoding in your browser's View menu. The permanent solution is for website developers to properly declare UTF-8 encoding in both their HTML meta tags and HTTP headers.

Complete ASCII Table Reference Guide — Master the

Binary Code: How Computers Store and Translate Text

What Is Binary Code?

Understanding Bits and Bytes

How Text Becomes Binary

Why Encoding Standards Matter

The ASCII Standard

ASCII's Limitations

Beyond ASCII: Unicode

Unicode vs. UTF: Understanding the Difference

UTF-8, UTF-16, and UTF-32 Explained

UTF-8: The Web Standard

UTF-16: The Windows Default

UTF-32: Fixed Length

Binary Translation Examples

Example 1: Simple ASCII Word

Example 2: Mixed Case with Punctuation

Example 3: Unicode Emoji

Converting Binary to Text

Practical Applications

Web Development

Data Analysis and Processing

Cybersecurity

File Format Design

Working with Binary in Programming

Python

JavaScript

Java

Bitwise Operations

Common Encoding Issues

Mojibake (Garbled Text)

Question Marks or Boxes

Byte Order Mark (BOM) Issues

Database Encoding Mismatches

Key Takeaways

Frequently Asked Questions

Related Articles