Introduction to molmass Python package #
molmass is a Python package designed for chemists and researchers. It provides
tools for calculating the molar mass, mass distribution, and isotopic patterns
of chemical substances. Developed by Christoph Gohlke, this package is
particularly useful for those involved in analytical chemistry, molecular
modeling, and educational purposes.
Key Features of molmass #
- Molar Mass Calculation: Quickly compute the molar mass of a chemical compound from its formula.
- Formula Analysis: Break down chemical formulas to understand the composition of molecules, including atom count and percentage by mass.
- Isotopic Pattern Simulation: Generate isotopic distribution patterns, which are crucial for interpreting mass spectrometry data.
- Interactive web application: For users less comfortable with coding,
molmassprovides both graphical and user interface for easy access to its features.
Benefits for Chemists and Data Analysts #
- Efficiency in Data Analysis: Automate the calculation of molar masses and isotopic patterns, saving time in data processing.
- Accuracy: Reduces the risk of human error in manual calculations.
- Integration with Other Python Tools: Can be combined with other Python libraries for more comprehensive data analysis and visualization tasks.
- Educational Resource: A great tool for teaching molecular mass concepts and isotopic distribution.
Getting Started with Molmass #
Installation #
molmass can be installed via pip:
pip install molmass
However, the above command will only install the core package. To install the
interactive web application and also pandas to have access to tabular data,
use the following command instead:
pip install -U molmass[all]
This command will install the molmass package and all dependencies.
It is recommended to install molmass in a virtual environment. For more
information on how to create and manage virtual environments, see Python’s
official documentation.
Dictionary of elements #
You can have quick access to physicochemical and descriptive properties of
chemical elements using the ELEMENTS dictionary.
from molmass import ELEMENTS
hydrogen = ELEMENTS["H"]
hydrogen
Element(
1, 'H', 'Hydrogen',
group=1, period=1, block='s', series=1,
mass=1.007941, eleneg=2.2, eleaffin=0.75420375,
covrad=0.32, atmrad=0.79, vdwrad=1.2,
tboil=20.28, tmelt=13.81, density=0.084,
eleconfig='1s',
oxistates='1*, -1',
ionenergy=(13.5984,),
isotopes={
1: Isotope(1.00782503223, 0.999885, 1),
2: Isotope(2.01410177812, 0.000115, 2),
},
)
See that each element is represented by an instance of an Element class. This
way, each property can be accessed as an attribute of the element:
hydrogen.protons
1
hydrogen.isotopes
{1: Isotope(mass=1.00782503223, abundance=0.999885, massnumber=1, charge=0),
2: Isotope(mass=2.01410177812, abundance=0.000115, massnumber=2, charge=0)}
For heavier elements, the package can be a useful resource of electron configuration data, both in the condensed and the orbital notation:
silver = ELEMENTS["Ag"]
silver
Element(
47, 'Ag', 'Silver',
group=11, period=5, block='d', series=8,
mass=107.8682, eleneg=1.93, eleaffin=1.30447,
covrad=1.34, atmrad=1.75, vdwrad=1.72,
tboil=2436.0, tmelt=1235.1, density=10.49,
eleconfig='[Kr] 4d10 5s',
oxistates='2, 1*',
ionenergy=(7.5762, 21.49, 34.83),
isotopes={
107: Isotope(106.9050916, 0.51839, 107),
109: Isotope(108.9047553, 0.48161, 109),
},
)
silver.eleconfig
'[Kr] 4d10 5s'
silver.eleconfig_dict
{(1, 's'): 2,
(2, 's'): 2,
(2, 'p'): 6,
(3, 's'): 2,
(3, 'p'): 6,
(3, 'd'): 10,
(4, 's'): 2,
(4, 'p'): 6,
(4, 'd'): 10,
(5, 's'): 1}
Class Formula #
One of most impressive features of molmass is the Formula class. It allows
you to analyze chemical formulas and calculate their molar mass, mass
distribution, and isotopic patterns. Let’s see how it works.
from molmass import Formula
water = Formula("H2O")
water
Formula('H2O')
Checking the number of atoms in a formula is easy:
water.atoms
3
As well the charge:
water.charge
0
For mass, we have three possibilities:
mass: the average relative molecular mass considering the natural abundance of isotopes. Equals the molar mass in g/mol.monoisotopic_mass: the mass of the molecule considering only the most abundant isotope of each element.nominal_mass: the number of protons and neutrons in the isotope composed of the most abundant elemental isotopes.
water.mass
18.015287
water.monoisotopic_mass
18.01056468403
water.nominal_mass
18
We can have the elemental composition of a formula using the composition method:
water.composition()
<Composition([('H', 2, 2.015882, 0.11189841161009535), ...])>
print(water.composition())
Element Count Relative mass Fraction %
H 2 2.015882 11.1898
O 1 15.999405 88.8102
But it is more useful in a table format, that can be obtained using a Pandas dataframe:
water.composition().dataframe()
| Count | Relative mass | Fraction | |
|---|---|---|---|
| Element | |||
| H | 2 | 2.015882 | 0.111898 |
| O | 1 | 15.999405 | 0.888102 |
Isotopes #
One thing that makes molmass stand out is its ability to calculate isotopic
patterns. This is particularly useful for mass spectrometry data analysis.
First, let’s see how the package deals with isotopes, even recognizing the most
common isotopes of each element by their symbols:
# deuterated water (heavy water)
d_water = Formula("D2O")
d_water
Formula('[2H]2O')
d_water.mass
20.02760855624
d_water.composition().dataframe()
| Count | Relative mass | Fraction | |
|---|---|---|---|
| Element | |||
| 2H | 2 | 4.028204 | 0.201133 |
| O | 1 | 15.999405 | 0.798867 |
# C13 CO2
carbon13_co2 = Formula("[13C]O2")
carbon13_co2
Formula('[13C]O2')
carbon13_co2.mass
45.00216483507
Isotopic Distribution #
The mass distribution spectrum of a molecule is an essential concept in chemistry, particularly in mass spectrometry. It provides insights into the isotopic composition of a molecule and how these isotopes contribute to the molecule’s overall mass.
Each element consists of isotopes, which are atoms with the same number of protons but different numbers of neutrons. This difference in neutrons results in isotopes of the same element having slightly different masses. When you analyze a molecule, these isotopic variations lead to a distribution of possible mass values rather than a single, definitive mass.
Consider a water molecule, which is composed of two hydrogen atoms and one oxygen atom. The hydrogen atom can exist as either protium (1H) or deuterium (2H), and oxygen can have isotopes like 16O, 17O, and 18O. The mass spectrum of water would show peaks corresponding to these isotopic combinations, such as H2[16O], H2[17O], and so on. Each peak represents a different isotopic version of water, contributing to the overall mass spectrum of the molecule.
For a given formula, we can calculate the isotopic distribution using the
spectrum method. Let’s see how it works for the previously define molecular
formula of water:
water.spectrum().dataframe()
| Relative mass | Fraction | Intensity % | m/z | |
|---|---|---|---|---|
| Mass number | ||||
| 18 | 18.010565 | 9.973406e-01 | 1.000000e+02 | 18.010565 |
| 19 | 19.015557 | 6.093273e-04 | 6.109521e-02 | 19.015557 |
| 20 | 20.014810 | 2.049629e-03 | 2.055094e-01 | 20.014810 |
| 21 | 21.021086 | 4.714508e-07 | 4.727079e-05 | 21.021086 |
| 22 | 22.027363 | 2.711125e-11 | 2.718354e-09 | 22.027363 |
We see that the most abundant isotopic combination is the one with the lowest mass, which is the one with the most abundant isotopes of each element. The other peaks are due to the presence of heavier isotopes of hydrogen and oxygen.
Sometimes we don’t want to see all the peaks, but only the most abundant ones.
We can do this using the min_intensity parameter:
water.spectrum(min_intensity=0.01).dataframe()
| Relative mass | Fraction | Intensity % | m/z | |
|---|---|---|---|---|
| Mass number | ||||
| 18 | 18.010565 | 0.997341 | 100.000000 | 18.010565 |
| 19 | 19.015557 | 0.000609 | 0.061095 | 19.015557 |
| 20 | 20.014810 | 0.002050 | 0.205509 | 20.014810 |
chlorobenzene = Formula("C6H5Cl")
chlorobenzene
Formula('C6H5Cl')
chlorobenzene.mass
112.557045
chlorobenzene.composition().dataframe()
| Count | Relative mass | Fraction | |
|---|---|---|---|
| Element | |||
| C | 6 | 72.064440 | 0.640248 |
| H | 5 | 5.039705 | 0.044775 |
| Cl | 1 | 35.452900 | 0.314977 |
df_chlorobenzene = chlorobenzene.spectrum(min_intensity=0.01).dataframe()
df_chlorobenzene
| Relative mass | Fraction | Intensity % | m/z | |
|---|---|---|---|---|
| Mass number | ||||
| 112 | 112.007978 | 0.709836 | 100.000000 | 112.007978 |
| 113 | 113.011358 | 0.046473 | 6.546944 | 113.011358 |
| 114 | 114.005082 | 0.228390 | 32.174991 | 114.005082 |
| 115 | 115.008420 | 0.014888 | 2.097378 | 115.008420 |
| 116 | 116.011802 | 0.000407 | 0.057363 | 116.011802 |
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.vlines(x="m/z", ymin=0, ymax="Intensity %", data=df_chlorobenzene)
ax.set_xlim((min(df_chlorobenzene["m/z"]) - 1, max(df_chlorobenzene["m/z"]) + 1))
ax.set_xlabel("m/z")
ax.set_ylabel("Intensity %")
plt.show()
The m/z ratio, often encountered in mass spectrometry, is a fundamental concept used for analyzing and identifying substances based on their mass and charge characteristics.
-
“m” Refers to Mass: In the m/z ratio, “m” stands for the mass of the ion. It’s important to note that this mass is the atomic mass unit (amu) of the ion, not the molecular mass of the original compound. This distinction is critical because, during mass spectrometry, molecules are often fragmented into ions.
-
“z” Refers to Charge: The “z” in the ratio represents the charge number of the ion. It’s the number of protons more or less than electrons, giving the ion its positive or negative charge. In most mass spectrometry applications, the ions are single-charged, meaning z is usually ±1. However, multiply-charged ions can also occur.
-
Separation of Ions: In mass spectrometry, ions are separated based on their m/z ratio. Since the technique involves ionizing the sample, different fragments or ions have different m/z ratios, allowing for their separation and detection.
-
Identification of Compounds: By analyzing the m/z ratios of the ions produced, chemists can deduce the molecular structure of the original compound. Each m/z value can correspond to a specific fragment of the molecule, providing clues to its structure.
-
Quantitative Analysis: The intensity of the signal at each m/z ratio can be used to determine the concentration of specific ions in the sample, making mass spectrometry a valuable tool for quantitative analysis.
We can define a new chlorobenzene formula, now with a charge of 2+:
chlorobenzene_ion = Formula("[C6H5Cl]2+")
chlorobenzene_ion
Formula('[C6H5Cl]2+')
chlorobenzene_ion.spectrum(min_intensity=0.01).dataframe()
| Relative mass | Fraction | Intensity % | m/z | |
|---|---|---|---|---|
| Mass number | ||||
| 112 | 112.006881 | 0.709836 | 100.000000 | 56.003440 |
| 113 | 113.010261 | 0.046473 | 6.546944 | 56.505131 |
| 114 | 114.003985 | 0.228390 | 32.174991 | 57.001992 |
| 115 | 115.007323 | 0.014888 | 2.097378 | 57.503662 |
| 116 | 116.010705 | 0.000407 | 0.057363 | 58.005353 |
See that the m/z column now changes to reflect the charge of the molecule.
Abbreviations #
We have already seen that molmass recognized D as the symbol for deuterium.
But there are other abbreviations that can be used to represent radicals, such
as Me for methyl, and Et for ethyl. Let’s see how molmass deals with these
abbreviations.
We can define ethanol by its usual formula:
ethanol = Formula("C2H5OH")
ethanol
Formula('C2H5OH')
And also by using the abbreviation for ethyl:
ethanol = Formula("EtOH")
ethanol
Formula('(C2H5)OH')
Empirical Formula #
The empirical formula of a compound is a simple expression of the relative number of each type of atom in it. It’s the simplest integer ratio of the elements present in the compound, reflecting the composition in terms of the smallest whole numbers. The empirical formula doesn’t necessarily represent the exact numbers of atoms found in a molecule of the compound (that’s the molecular formula), but rather the simplest whole-number ratio between the elements.
- Representation of Composition: The empirical formula shows the simplest ratio of the different atoms in a compound. For example, in glucose (C6H12O6), the empirical formula is CH2O, representing a 1:2:1 ratio of carbon to hydrogen to oxygen.
- Derivation from Mass Percentages: It can be derived from the mass percentages of each element in a compound. This is a common way to determine empirical formulas in a laboratory setting.
- Use in Stoichiometry: Empirical formulas are useful in stoichiometry for calculating reactant and product quantities in chemical reactions.
Examples of Empirical Formulas
- Water (H2O): The empirical formula is H2O, indicating a 2:1 ratio of hydrogen to oxygen atoms.
- Ethylene (C2H4): The empirical formula is CH2, which represents the simplest whole-number ratio of carbon to hydrogen atoms in the molecule.
- Benzene (C6H6): Despite having six carbon and six hydrogen atoms, the empirical formula is CH, as it reflects the 1:1 ratio of carbon to hydrogen atoms.
- Acetic Acid (C2H4O2): Although the molecular formula is C2H4O2, the empirical formula is CH2O, showing the simplest ratio of 1 carbon atom to 2 hydrogen atoms to 1 oxygen atom.
In chemistry, the empirical formula is a fundamental concept used to understand the basic composition of a compound, especially useful when the molecular formula is complex or when only compositional data is available.
molmass can calculate the empirical formula of a compound using the
empirical method:
acetic_acid = Formula("CH3COOH")
acetic_acid
Formula('CH3COOH')
acetic_acid.empirical
'CH2O'
Ions #
As well as neutral molecules, molmass can deal with ions. Let’s see how it works.
sulphate = Formula("[SO4]2-")
sulphate
Formula('[SO4]2-')
sulphate.charge
-2
Hydrates #
It is possible to parse hydrates with Formula class.
copper_sulphate_hydrated = Formula("CuSO4.5H2O")
copper_sulphate_hydrated
Formula('CuSO4(H2O)5')
copper_sulphate_hydrated.composition().dataframe()
| Count | Relative mass | Fraction | |
|---|---|---|---|
| Element | |||
| Cu | 1 | 63.546000 | 0.254505 |
| H | 10 | 10.079410 | 0.040369 |
| O | 9 | 143.994645 | 0.576706 |
| S | 1 | 32.064800 | 0.128421 |
Nucleotides #
The common abbreviations for nucleotides are recognized by molmass:
nucleotides = Formula("ATCG")
nucleotides
Formula('((C10H12N5O5P)(C9H12N3O6P)(C10H12N5O6P)(C10H13N2O7P)H2O)')
nucleotides.mass
1253.804568992
nucleotides.composition().dataframe()
| Count | Relative mass | Fraction | |
|---|---|---|---|
| Element | |||
| C | 39 | 468.418860 | 0.373598 |
| H | 51 | 51.404991 | 0.040999 |
| N | 15 | 210.100545 | 0.167570 |
| O | 25 | 399.985125 | 0.319017 |
| P | 4 | 123.895048 | 0.098815 |
Other features #
from molmass import ELECTRON, PROTON, NEUTRON
ELECTRON
Particle(name='Electron', mass=0.000548579909065, charge=-1.602176634e-19)
ELECTRON.mass # relative mass of electron in atomic mass units (u)
0.000548579909065
PROTON
Particle(name='Proton', mass=1.007276466621, charge=1.602176634e-19)
NEUTRON
Particle(name='Neutron', mass=1.00866491595, charge=0.0)
Flask web application #
You can launch the web application using the following command:
python -m molmass.web
A web browser will open with the application running. You can also access it by
typing http://127.0.0.1:5001 in your browser. Pass a formula in the input
field and click the “Submit” button to see the results.
Conclusion #
molmass is a versatile tool that bridges chemistry and data analysis. Its ease
of use, combined with powerful features, makes it a useful tool for chemists,
educators, and data analysts in the field of chemistry.
Further Resources #
- Molmass GitHub Repository: Molmass on GitHub