2.1 Python Character Set (BT101CO)
To understand any language, we must first look at the basic building blocks that form it. In Python, the Character Set is the collection of valid characters that the interpreter can recognize and process.
The Basic Building Blocks
Python's character set consists of letters, digits, and various special symbols. These are categorized as follows:
1. Letters
Python supports both uppercase and lowercase letters of the English alphabet.
- Uppercase:
AtoZ - Lowercase:
atoz
Note: Python is case-sensitive, meaning Variable and variable are treated as two entirely different entities.
2. Digits
Python recognizes all decimal digits used to form numbers.
- Digits:
0to9
3. Special Symbols
These characters have specific functional meanings in Python syntax, such as defining structures, performing math, or assigning values.
- Arithmetic:
+,-,*,/,%,**,// - Assignment/Comparison:
=,==,<,>,!= - Grouping/Brackets:
( ),[ ],{ } - Punctuation:
,,:,.,',",#,\
4. White Spaces and Other Characters
These are often "invisible" but are critical for the structure and readability of the code.
- Blank Space: Used to separate tokens.
- Tabs: Used for indentation (a core requirement in Python for defining code blocks).
- Newline: Signals the end of a statement.
- Carriage Return: Often grouped with newline handling.
Why the Character Set Matters
According to Kamthane and Kamthane, the character set is the "Basic Building Block" of the language. From these characters, the programmer forms Tokens (Keywords, Identifiers, Literals, etc.).
For example, the character + is part of the character set, but when placed between two numbers, it becomes an Operator Token. Similarly, the letters p, r, i, n, t are part of the character set, but when combined, they form a Keyword/Function Identifier.
Important Concept: While Python traditionally uses the ASCII character set, modern Python (3.x) is Unicode-based, meaning it can technically recognize characters from various world languages, though the Kamthane text focuses on the standard programming set for fundamental problem-solving.