Module polarityindex_fixed

=============================================================================== SFPPy: Approximate Correlation Between Polarity Index (P') and logP ===============================================================================

SFPPy: Approximate Correlation Between Polarity Index (P’) and logP

Objective:

Build a qualitative correlation between an approximate “polarity index” ($P'$) and $logP$ (octanol-water partition coefficient) values for small organic molecules (denoted $i$) using the Flory-Huggins (FH) theory of mixtures.

$P'$ values are itended to replace the square roor of molar cohesive energies $\sqrt(E)$ in the approximation of Flory-Huggins (FH) coefficients $i$ in a fluid or polymer phase $k$:

$$ \chi_{i+k} pprox 0.34 imes (\sqrt{E_i} - \sqrt{E_k}) ^ 2 $$

where $E_i$ and $E_k$ are cohesive energies of $i$ and $k$, respectively. The coefficient 0.34 is empirical.

References:

  • Guillaume Gillet, Olivier Vitrac, and Stéphane Desobry Prediction of Solute Partition Coefficients between Polyolefins and Alcohols Using a Generalized Flory−Huggins Approach Industrial & Engineering Chemistry Research 2009 48 (11), 5285-5301 https://doi.org/10.1021/ie801141h

  • Thomas Brouwer and Boelo Schuur Model Performances Evaluated for Infinite Dilution Activity Coefficients Prediction at 298.15 K Industrial & Engineering Chemistry Research 2019 58 (20), 8903-8914 https://doi.org/10.1021/acs.iecr.9b00727

Since $logP$ values include a significant entropic contribution for large substances, the effect is removed using the same FH theory. For solute $i$, $logP = ln10 \cdot lnK_{i,o/w}$ with:

$$ lnK_{i,o/w} pprox \gamma_{i,w} - \gamma_{i,o} $$

At infinite dilution in $k=o,w$, we get from FH theory:

$$ \gamma_{i,k} pprox \chi_{i+k} +1 - r_{i,k} $$

with $r_{i,k} pprox rac{V_i}{V_k}$ the ratio of molar volumes between $i$ and $k$

As a result:

$$ ln10 imes logP pprox \Delta\chi_{i}^{scale:ow} + S_i^{correction:ow} $$

with $S_i^{correction:ow} = -\left( r_{i,w} - r_{i,o} ight)$

By replacing 0.34 by a scaling parameter $lpha$ and by replacing cohesive energies $E$ by their polarity indices $P'$', one gets:

$$ \Delta\chi_{i}^{scale:ow} = \chi_{i,w} - \chi_{i,o} pprox lpha imes \left[ (P'_i - P'_w) ^ 2 - (P'_i - P'_o) ^ 2 ight] \ = lpha imes \left[P'_i \cdot (P'_w - P'_o) + (P'_w + P'_o)(P'_w - P'_o) \ ight] \ = lpha\cdot(P'_w - P'_o) imes\left(P'_i + eta ight) \ = A\cdot P'_i + B $$

We notice that $P'$ are related proportional to the difference of FH coefficients between water (w) and octanol (o) via two constants $A$ and $B$. Since $P`$ values are used as differences, we can arbitrarily choose $B=0$ without affecting predictions.

Commonly, $P’$ lies between $0$ (very hydrophobic substances) and $10.2$ (water = the most polar).

From these definitions, $P'$ is approximated as:

$$ A \cdot P' pprox ln10 imes logP - S_i^{correction:ow}(V_i) pprox U\left(logP,V_i ight) $$

Choosing $A = 1$ leads to $lpha = rac{1}{(P'_w - P'_o)}$. As it will be shown and due to the fitting approximation, the largest amplitude leads to $lphapprox0.14$.

Bounding $P’$ within a finite interval and in particular the zero bound for hydrophobic substances leads several substances associated with different $logP$ (unbounded) and $V_i$ (unbounded) to have the same $P’$ values. A smooth predictor, is sought from databases by looking for a second-degree polynomial approximation of $P’$, denoted $\hat{P’}$, with $U\left(logP,V_i ight)$:

$$ \hat{P’}(M,V) = a \cdot \left[U\left(logP,V_i=f(M) ight) ight]^2 + b \cdot U\left(logP,V_i=f(M) ight) + c $$

where $V_i$ is approximated with molecular mass $M$ using the Miller’s formula:

$$ V_i pprox 0.997 \cdot M^{1.03} $$

With $M$ in $g \cdot mol^{_1} and $V_i$ in $cm^3 \cdot mol^{-1}$.

Methodology of fitting:

  • $a$,$b$ and $c$ are fitted on a small datasets of values with $logP$ and $P’$ values reported in open databases
  • The maximum amplitude is $P’$ is determined by the value for water, which is depends on the values of $logP$ reported in databases and the choices of parameters.

Application to the prediction of activity coefficients 🔗 🔗 🔗 🔗 🔗 🔗 🔗

$$ \gamma_{i,k} pprox \chi_{i+k} +1 - \left(r_{i,k} - n(r_{i,k}) ight) pprox lpha imes [\hat{P’}(logP_i,V_i) - \hat{P’}(logP_k,V_k)] ^ 2 + 1 - \left(r_{i,k} - n(r_{i,k}) ight) $$

For activity coefficients in polymers ($k=P$), $r_{i,k} - n(r_{i,k})pprox0$ since the solute is considered infinitely small compared to $P$. In food simulants ($k=F$), $n(r_{i,k})$ represents a correction due to the presence of $n(r_{i,k}$ molecules within the spherical volume (coarse-grain approximation of FH interactions) of the solute. This number can be determined uniquely from $i+k$ pairwise radial distributions. It is larger for large flexible substances and almost $0$ for small or bulky solutes with spherical symmetry.

In the sake of a “universal” rule applicable to any substance, it is proposed to take :

$$ n(r) = rac{r-5}{r} $$

References:

  • Guillaume Gillet, Olivier Vitrac, and Stéphane Desobry Prediction of Partition Coefficients of Plastic Additives between Packaging Materials and Food Simulants. Industrial & Engineering Chemistry Research 2010 49 (16), 7263-7280 https://doi.org/10.1021/ie9010595

This script demonstrates how to:

1) Fit a *quadratic* model using a small, "tuned" dataset that covers a
   range of polarities from n-Hexane to Water.
2) Validate it (roughly) with an extended dataset of ~35 solvents.
3) Provide a function to invert the model and estimate P' from logP values,
   for classification or for ranking in subsequent Henry-like or partition
   coefficient models in SFPPy.

Disclaimer: - This correlation is qualitative and based on an intentionally small dataset, hand-tuned for a few representative solvents (non-polar to very polar). - Do not expect high accuracy. The goal is to produce a rough scale (P') that loosely tracks how "polar" or "non-polar" a compound might be. - The script demonstrates the approach rather than guaranteeing a universal model.

References & Data: - Data for Polarity Index (P') adapted from various solvent polarity tables: LSU Macromolecular Resources, HPLC Solvent Guides, and additional web resources. - logP values drawn from typical reference tables (PubChem, various compiled datasheets).

Motivation: We wish to provide a "fast" way to guess an ordering of polarity from logP, which is widely available for many chemicals (e.g., in PubChem). P' is then used within SFPPy to set or guess Henry-like coefficients or Flory–Huggins parameters in patankar.layer and patankar.food modules.


Created on Fri Feb 28 12:01:56 2025 @author: olivi (community)


Expand source code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
===============================================================================
SFPPy: Approximate Correlation Between Polarity Index (P') and logP
===============================================================================

## SFPPy: Approximate Correlation Between Polarity Index (P’) and logP

### **Objective**:

Build a *qualitative* correlation between an approximate “polarity index” ($P'$) and $logP$ (octanol-water partition coefficient)
values for small organic molecules (denoted $i$) using the Flory-Huggins (FH) theory of mixtures.

$P'$ values are itended to replace the square roor of molar cohesive energies $\sqrt(E)$ in the approximation
of Flory-Huggins (FH) coefficients $i$ in a fluid or polymer phase $k$:

$$
         \chi_{i+k} \approx 0.34 \times (\sqrt{E_i} - \sqrt{E_k}) ^ 2
$$

where $E_i$ and $E_k$ are cohesive energies of $i$ and $k$, respectively. The coefficient 0.34 is empirical.

> *References*:

> - Guillaume Gillet, Olivier Vitrac, and Stéphane Desobry
>     Prediction of Solute Partition Coefficients between Polyolefins and Alcohols Using a Generalized Flory−Huggins Approach
>     *Industrial & Engineering Chemistry Research* **2009** 48 (11), 5285-5301
>     https://doi.org/10.1021/ie801141h

> - Thomas Brouwer and Boelo Schuur
>     Model Performances Evaluated for Infinite Dilution Activity Coefficients Prediction at 298.15 K
>     *Industrial & Engineering Chemistry Research* **2019** 58 (20), 8903-8914
>     https://doi.org/10.1021/acs.iecr.9b00727

Since $logP$ values include a significant entropic contribution for large substances, the effect is removed using the same FH theory.
For solute $i$, $logP = ln10 \cdot lnK_{i,o/w}$ with:

$$
    lnK_{i,o/w} \approx \gamma_{i,w} - \gamma_{i,o}
$$

At infinite dilution in $k=o,w$, we get from FH theory:

$$
        \gamma_{i,k} \approx \chi_{i+k} +1 - r_{i,k}
$$

with $r_{i,k} \approx \frac{V_i}{V_k}$ the ratio of molar volumes between $i$ and $k$

As a result:

$$
    ln10 \times logP \approx  \Delta\chi_{i}^{scale:ow} + S_i^{correction:ow}
$$

with $S_i^{correction:ow} = -\left( r_{i,w} - r_{i,o} \right)$

By replacing 0.34 by a scaling parameter $\alpha$ and by replacing cohesive energies $E$ by their polarity indices $P'$', one gets:

$$
     \Delta\chi_{i}^{scale:ow} = \chi_{i,w} - \chi_{i,o}
   \approx \alpha \times \left[ (P'_i - P'_w) ^ 2 - (P'_i - P'_o) ^ 2 \right] \\
   = \alpha \times \left[P'_i \cdot (P'_w - P'_o) + (P'_w + P'_o)(P'_w - P'_o) \\ \right] \\
   = \alpha\cdot(P'_w - P'_o)\times\left(P'_i + \beta \right) \\
   = A\cdot P'_i + B
$$

> We notice that $P'$ are related proportional to the difference of FH coefficients between
> water (w) and octanol (o) via two constants $A$ and $B$. Since $P`$ values are used as differences,
> we can arbitrarily choose $B=0$ without affecting predictions.

Commonly, $P’$ lies between $0$ (very hydrophobic substances) and $10.2$ (water = the most polar).

From these definitions, $P'$ is approximated as:

$$
    A \cdot P' \approx ln10\times logP - S_i^{correction:ow}(V_i)  \approx U\left(logP,V_i\right)
$$

> Choosing $A = 1$ leads to $\alpha = \frac{1}{(P'_w - P'_o)}$. As it will be shown and due to
> the fitting approximation, the largest amplitude leads to $\alpha\approx0.14$.

Bounding $P’$ within a finite interval and in particular the zero bound for hydrophobic substances leads several substances associated with different $logP$ (unbounded) and $V_i$ (unbounded) to have the same $P’$ values. A smooth predictor, is sought from databases by looking for a second-degree polynomial approximation of $P’$, denoted $\hat{P’}$, with $U\left(logP,V_i\right)$:

$$
        \hat{P’}(M,V) = a \cdot  \left[U\left(logP,V_i=f(M)\right)\right]^2 + b \cdot  U\left(logP,V_i=f(M)\right) + c
$$

where $V_i$ is approximated with molecular mass $M$ using the Miller’s formula:

$$
        V_i \approx 0.997 \cdot M^{1.03}
$$

With $M$ in $g \\cdot mol^{\_1} and $V_i$ in $cm^3 \cdot mol^{-1}$.

**Methodology of fitting**:

- $a$,$b$ and $c$ are fitted on a small datasets of values with $logP$ and $P’$ values reported in open databases
- The maximum amplitude is $P’$ is determined by the value for water, which is depends on the values of $logP$ reported in databases and the choices of parameters.

### **Application to the prediction of activity coefficients** 🔗 🔗 🔗 🔗 🔗 🔗 🔗

$$
\gamma_{i,k} \approx \chi_{i+k} +1 - \left(r_{i,k} - n(r_{i,k})\right)
\approx \alpha \times [\hat{P’}(logP_i,V_i) - \hat{P’}(logP_k,V_k)] ^ 2  + 1 - \left(r_{i,k} - n(r_{i,k})\right)
$$

For activity coefficients in polymers ($k=P$), $r_{i,k} - n(r_{i,k})\approx0$ since the solute is considered infinitely small compared to $P$.
In food simulants ($k=F$), $n(r_{i,k})$ represents a correction due to the presence of $n(r_{i,k}$ molecules within the spherical volume (coarse-grain approximation of FH interactions) of the solute. This number can be determined uniquely from $i+k$ pairwise radial distributions. It is larger for large flexible substances and almost $0$ for small or bulky solutes with spherical symmetry.

In the sake of a “universal” rule applicable to any substance, it is proposed to take :

$$
        n(r) = \frac{r-5}{r}
$$

> *References*:
>
> - Guillaume Gillet, Olivier Vitrac, and Stéphane Desobry
>     Prediction of Partition Coefficients of Plastic Additives between Packaging Materials and Food Simulants.
>     *Industrial & Engineering Chemistry Research* **2010** 49 (16), 7263-7280
>     https://doi.org/10.1021/ie9010595




This script
    demonstrates how to:

    1) Fit a *quadratic* model using a small, "tuned" dataset that covers a
       range of polarities from n-Hexane to Water.
    2) Validate it (roughly) with an extended dataset of ~35 solvents.
    3) Provide a function to invert the model and estimate P' from logP values,
       for classification or for ranking in subsequent Henry-like or partition
       coefficient models in SFPPy.

**Disclaimer**:
    - This correlation is *qualitative* and based on an intentionally *small*
      dataset, hand-tuned for a few representative solvents (non-polar to very
      polar).
    - Do **not** expect high accuracy. The goal is to produce a rough scale (P')
      that loosely tracks how "polar" or "non-polar" a compound might be.
    - The script demonstrates the approach rather than guaranteeing a universal
      model.

**References & Data**:
    - Data for Polarity Index (P') adapted from various solvent polarity tables:
      LSU Macromolecular Resources, HPLC Solvent Guides, and additional web
      resources.
    - logP values drawn from typical reference tables (PubChem, various
      compiled datasheets).

**Motivation**:
    We wish to provide a "fast" way to guess an ordering of polarity from logP,
    which is widely available for many chemicals (e.g., in PubChem). P' is then
    used within SFPPy to set or guess Henry-like coefficients or Flory–Huggins
    parameters in patankar.layer and patankar.food modules.

----------------------------------------------------------------------------
Created on Fri Feb 28 12:01:56 2025
@author: olivi (community)
----------------------------------------------------------------------------
"""

# %% Import libraries
import math
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from patankar.loadpubchem import migrant

# %% cache debug issues
# from patankar.loadpubchem import CompoundIndex
# dbdefault = CompoundIndex(cache_dir="cache.PubChem", index_file="pubchem_index.json")


# %% Tuned Reference Dataset
"""
This small set of 8 solvents is carefully chosen to span from n-Hexane (non-polar)
to Water (very polar). Polarity Index (P') values are lightly 'tweaked' to get a
smooth progression. The logP values come from known literature data.
"""

solvents = [
    "Water", "Methanol", "Ethanol", "Acetone", "Acetonitrile",
    "Dichloromethane", "Toluene", "n-Hexane"
]

# Molar volume (from its approximation with M)
V = [migrant(m).molarvolumeMiller for m in solvents]
Vwater = migrant("water").molarvolumeMiller
Voctanol = migrant("octanol").molarvolumeMiller
FHentropy_contribution = [- v * (1/Vwater - 1/Voctanol) for v in V]

# Polarity Index (P') with minor manual adjustments (denoted +)
polarity_index = [10.2,      # Water
                  5.1,     # Methanol (5.1 + 3 = 8.1)
                  4.3,       # Ethanol (4.3 + 0.7 = 5.0)
                  5.1,       # Acetone (5.1 + 0.5 = 5.6)
                  5.8,       # Acetonitrile (5.8 + 1 = 6.8)
                  3.1,       # Dichloromethane
                  2.4,   # Toluene
                  0.0        # n-Hexane
                 ]

# logP reference data
logP_values = [-1.38+0.7,  # Water
               -0.77,  # Methanol
               -0.24,  # Ethanol
               -0.21,  # Acetone
               -0.22,  # Acetonitrile
                1.25,  # Dichloromethane
                2.73,  # Toluene
                3.90   # n-Hexane
              ]

logKow_entropy = [logp * math.log(10) - c for logp, c in zip(logP_values, FHentropy_contribution)]

# % Extended Dataset for Validation
"""
This larger set (~35 solvents) is used to see if the small dataset's fit
extends (qualitatively) to other solvents. We exclude the ones already in
the small set to avoid double-counting.
"""

ext_solvents = [
    "Pentane", "1,1,2-Trichlorotrifluoroethane", "Cyclopentane", "Heptane", "Hexane",
    "Iso-Octane", "Petroleum Ether", "Cyclohexane", "n-Butyl Chloride", "Toluene",
    "Methyl t-Butyl Ether", "o-Xylene", "Chlorobenzene", "o-Dichlorobenzene", "Ethyl Ether",
    "Dichloromethane", "Ethylene Dichloride", "n-Butyl Alcohol", "Isopropyl Alcohol",
    "n-Butyl Acetate", "Isobutyl Alcohol", "Methyl Isoamyl Ketone", "n-Propyl Alcohol",
    "Tetrahydrofuran", "Chloroform", "Methyl Isobutyl Ketone", "Ethyl Acetate",
    "Methyl n-Propyl Ketone", "Methyl Ethyl Ketone", "1,4-Dioxane", "Acetone", "Methanol",
    "Pyridine", "2-Methoxyethanol", "Acetonitrile", "Propylene Carbonate", "N,N-Dimethylformamide",
    "Dimethyl Acetamide", "N-Methylpyrrolidone", "Dimethyl Sulfoxide", "Water"
]

# Dictionary for replacements
replacements = {
    "Petroleum Ether": "Isohexane",
}

# Replace using list comprehension
ext_solvents = [replacements.get(s, s) for s in ext_solvents]


# Molar volume (from its approximation with M)
ext_V = [migrant(m).molarvolumeMiller for m in ext_solvents]
ext_FHentropy_contribution = [- v * (1/Vwater - 1/Voctanol) for v in ext_V]

ext_polarity_index = [
    0.0, 0.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 1.0, 2.4, 2.5, 2.5, 2.7, 2.7, 2.8,
    3.1, 3.5, 3.9, 3.9, 4.0, 4.0, 4.0, 4.0, 4.0, 4.1, 4.2, 4.4, 4.5, 4.7, 4.8,
    5.1, 5.1, 5.3, 5.5, 5.8, 6.1, 6.4, 6.5, 6.7, 7.2, 10.2
]

ext_logP_values = [
    3.39, 4.30, 3.20, 4.66, 3.90, 4.50, 3.50, 3.44, 2.70, 2.73, 1.20, 3.12, 2.84, 3.38, 0.83,
    1.25, 1.48, 0.88, 0.05, 1.82, 0.79, 1.98, 0.25, 0.46, 1.97, 1.31, 0.73, 1.50, 0.29, -0.27,
    -0.24, -0.77, 0.65, -0.77, -0.22, -0.41, -1.01, -0.77, -0.38, -1.35, -1.38
]

ext_logKow = [logp * math.log(10) for logp in ext_logP_values]
ext_logKow_entropy = [logp * math.log(10) - c for logp, c in zip(ext_logP_values, ext_FHentropy_contribution)]

# Filter out solvents that appear in the tuned dataset
validation_solvents = []
validation_polarity_index = []
validation_logP_values = []
validation_FHentropy_contribution = []
validation_logKow_entropy = []

for i, solvent in enumerate(ext_solvents):
    if solvent not in solvents:
        validation_solvents.append(solvent)
        validation_polarity_index.append(ext_polarity_index[i])
        validation_logP_values.append(ext_logP_values[i])
        validation_FHentropy_contribution.append(ext_FHentropy_contribution[i])
        validation_logKow_entropy.append(ext_logKow_entropy[i])

# Create DataFrames for easy inspection
df = pd.DataFrame({
    "Solvent": solvents,
    "Polarity Index (P')": polarity_index,
    "logP": logP_values,
    "FHentropy_contribution": FHentropy_contribution,
    "logKow_entropy": logKow_entropy
}).sort_values(by="Polarity Index (P')", ascending=True)

df_validation = pd.DataFrame({
    "Solvent": validation_solvents,
    "Polarity Index (P')": validation_polarity_index,
    "logP": validation_logP_values,
    "FHentropy_contribution": validation_FHentropy_contribution,
    "logKow_entropy": validation_logKow_entropy
}).sort_values(by="Polarity Index (P')", ascending=True)


# %% Quick Visualization of the Tuned Data
plt.figure(figsize=(8, 5))
plt.scatter(polarity_index, logKow_entropy, color='blue', label="Reference Data (Tuned)")
for i, solvent in enumerate(solvents):
    plt.annotate(solvent, (polarity_index[i], logKow_entropy[i]),
                 fontsize=9, xytext=(5,5), textcoords='offset points')

plt.xlabel("Polarity Index (P')")
plt.ylabel("logKow")
plt.title("Polarity Index (P') vs. logKow (Tuned Dataset)")
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)  # Zero line for logP
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()


# %% Fit a Quadratic Model
"""
We assume a model:

    logP = a * (P')^2 + b * P' + c

We'll use np.polyfit() with degree=2. Then we compare it with
the extended validation set.
"""
coefficients = np.polyfit(polarity_index, logKow_entropy, 2)
quadratic_model = np.poly1d(coefficients)

# For plotting a smooth curve:
x_range = np.linspace(min(polarity_index), max(polarity_index), 100)
y_fitted = quadratic_model(x_range)
quad_eq_str = (f"logP ≈ {coefficients[0]:.4f} * (P')² "
               f"+ {coefficients[1]:.4f} * (P') "
               f"+ {coefficients[2]:.4f}")

print("Fitted coefficients (a, b, c):", coefficients)
print("Quadratic equation =>", quad_eq_str)

A,B,C = coefficients
# % Compare with Extended Validation Dataset
plt.figure(figsize=(8, 5))

# Plot the tuned set
plt.scatter(polarity_index, logKow_entropy, color='Crimson', label="Tuned (8 solvents)")
for i, solvent in enumerate(solvents):
    plt.annotate(solvent, (polarity_index[i], logKow_entropy[i]),
                 fontsize=8, xytext=(5,5), textcoords='offset points')

# Plot the validation set
plt.scatter(validation_polarity_index, validation_logKow_entropy,
            color='DeepSkyBlue', label="Validation (~35 solvents)")
for i, solvent in enumerate(validation_solvents):
    plt.annotate(solvent, (validation_polarity_index[i], validation_logKow_entropy[i]),
                 fontsize=6, xytext=(5,5), textcoords='offset points')

# Plot the fitted curve
plt.plot(x_range, y_fitted, color='Crimson', linestyle='--', label="Quadratic Fit")

plt.xlabel("Polarity Index (P')")
plt.ylabel("logKow - entropy contribution")
plt.title("Polarity Index (P') vs. logKow - entropy con\nQuadratic Model Fit & Validation")
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()


# %% Final implementation
import numpy as np

def polarity_index(logP=None,V=None,name=None,
                             Vw=19.588376948550433, # migrant("water").molarvolumeMiller
                             Vo=150.26143432234372, # migrant("octanol").molarvolumeMiller
                             A=0.18161296829146106, #0.14265915555430117,
                             B=-3.412678660396018, #-3.136976791114839,
                             C=14.813767205916765  # 14.52713235123395
                             ):
    """
    Computes the polarity index (P') from a given logP value and S value
    as the positive root of the quadratic fitted equation:

        E = A * (P')² + B * P' + C
        P' = (-B - sqrt(B² - 4A(C - E))) / (2A)

        with E = logP*ln(10) - S = Xw - Xo
             S = entropy contribution = - (V/Vw - V/Vo)

    Parameters:
    ----------
    logP : float, list, or np.ndarray
    V    : float, list or np.array
        The logP value(s) for which to compute the polarity index P'.

    Returns:
    -------
    float or np.ndarray
        The calculated polarity index P' corresponding to the input logP.
        If logP is out of the valid range, returns:
        - 10.2 for very polar solvents (beyond water)
        - 0 for extremely hydrophobic solvents (beyond n-Hexane)

    Example Usage:
    -------------
    >>> logP = migrant("anisole").logP
    >>> V = migrant("anisole").molarvolumeMiller
    >>> polarity_index(logP,V)
    8.34  # Example output

    >>> polarity_index_from_logP([-1.0, 0.5, 2.0])
    array([9.2, 4.5, 1.8])  # Example outputs
    """

    # Define valid logP range based on quadratic model limits
    Emin = C - B**2 / (4*A)  # ≈ -2.78 (theoretical minimum lnKow=E)
    Emax = C                 # 14.527 (theoretical maximum logP)
    Pmax = 10.2 # value for water

    def compute_P(logP_value,V_value):
        """Computes P' for a single logP and V value after input validation."""
        S = - (1/Vw - 1/Vo)*V_value
        E = logP_value * 2.302585092994046 - S # Chiw - Chio
        #print(f"E={E} with S={S}")
        if E < Emin:
            return Pmax  # Most polar (beyond water)
        if E > Emax:
            return 0.0  # Most hydrophobic (beyond n-Hexane)
        discriminant = B**2 - 4*A*(C - E)
        sqrt_discriminant = np.sqrt(discriminant)
        P2root = (-B - sqrt_discriminant) / (2*A)  # Always select P2
        return P2root if P2root<=Pmax else Pmax

    # for testing
    if logP is None or V is None:
        if name is None:
            raise ValueError('provide a valid name or logP, V pair values')
        from patankar.loadpubchem import migrant
        tmp = migrant(name)
        V = tmp.molarvolumeMiller
        logP = tmp.logP
    # Handle both single and multiple values efficiently
    if isinstance(logP, (list, tuple, np.ndarray)):
        return np.vectorize(compute_P)(logP,V)  # Vectorized for multiple inputs
    else:
        return compute_P(logP,V)


# % Demonstration
print("\n=== Demonstration of polarity_index_from_logP ===")
solutes = ["water","toluene","limonene", "anisole", "BHT", "Irganox 1076","ethylene","ethanol","propanol","octanol","water"]
m = migrant("BHT")
for s in solutes:
    logPref = migrant(s).logP
    P = polarity_index(name=s)
    print(f"{s}: logP={logPref}  => P'={P}") # :>5.2f


# %% Final Predictions vs Inputs
# Compute predictions
predictions = np.array([polarity_index(name=s) for s in ext_solvents])
ext_polarity_index = np.array(ext_polarity_index)  # Ensure numpy array
# Least Squares Fit without Intercept using NumPy
slope, _, _, _ = np.linalg.lstsq(ext_polarity_index.reshape(-1, 1), predictions, rcond=None)
# Generate fitted line values
x_range = np.linspace(min(ext_polarity_index), max(ext_polarity_index), 100)
y_fitted = slope[0] * x_range  # Since no intercept, y = m*x
# Plot
plt.figure(figsize=(8, 5))
plt.scatter(ext_polarity_index, predictions, color='Crimson', label="Tuned (8 solvents)")
# Annotate solvents
for i, solvent in enumerate(ext_solvents):
    plt.annotate(solvent, (ext_polarity_index[i], predictions[i]),
                 fontsize=10, xytext=(5, 5), textcoords='offset points')
# 45-degree reference line
plt.plot(x_range, x_range, linestyle="--", color="black", label="45° line (y=x)")
# Fitted regression line (passing through origin)
plt.plot(x_range, y_fitted, linestyle="-", color="blue", label="Fitted line (no intercept)")
# Labels and styling
plt.xlabel("Polarity Index (P') - reference")
plt.ylabel("Polarity Index (P') - predicted")
plt.title("Predicted vs Reference P' Values")
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()

# %%
"""
===============================================================================
Summary & Notes:
----------------
1) We fitted a simple quadratic (logP ~ a(P')² + bP' + c) to a small set
   of 8 solvents. The shape is roughly correct for a wide range of polarities.

2) The extended dataset (~35 solvents) is used for a rough *visual* validation.
   There's no guarantee of accuracy, especially for more exotic or borderline
   solvents.

3) The final function `polarity_index_from_logP()` is the main deliverable.
   - It clamps P' between 0.0 (extremely non-polar) and 10.2 (extremely polar).
   - Perfect for a quick rank-based approach or a first-guess at how "polar"
     an unknown solvent might be relative to water, methanol, or hexane.

4) Additional disclaimers:
   - The model is purely *empirical*.
   - Use at your own risk for approximate classification or quick bounding.

===============================================================================
"""

Global variables

var logKow_entropy

This larger set (~35 solvents) is used to see if the small dataset's fit extends (qualitatively) to other solvents. We exclude the ones already in the small set to avoid double-counting.

Functions

def polarity_index(logP=None, V=None, name=None, Vw=19.588376948550433, Vo=150.26143432234372, A=0.18161296829146106, B=-3.412678660396018, C=14.813767205916765)

Computes the polarity index (P') from a given logP value and S value as the positive root of the quadratic fitted equation:

E = A * (P')² + B * P' + C
P' = (-B - sqrt(B² - 4A(C - E))) / (2A)

with E = logP*ln(10) - S = Xw - Xo
     S = entropy contribution = - (V/Vw - V/Vo)

Parameters:

logP : float, list, or np.ndarray V : float, list or np.array The logP value(s) for which to compute the polarity index P'.

Returns:

float or np.ndarray The calculated polarity index P' corresponding to the input logP. If logP is out of the valid range, returns: - 10.2 for very polar solvents (beyond water) - 0 for extremely hydrophobic solvents (beyond n-Hexane)

Example Usage:

>>> logP = migrant("anisole").logP
>>> V = migrant("anisole").molarvolumeMiller
>>> polarity_index(logP,V)
8.34  # Example output
>>> polarity_index_from_logP([-1.0, 0.5, 2.0])
array([9.2, 4.5, 1.8])  # Example outputs
Expand source code
def polarity_index(logP=None,V=None,name=None,
                             Vw=19.588376948550433, # migrant("water").molarvolumeMiller
                             Vo=150.26143432234372, # migrant("octanol").molarvolumeMiller
                             A=0.18161296829146106, #0.14265915555430117,
                             B=-3.412678660396018, #-3.136976791114839,
                             C=14.813767205916765  # 14.52713235123395
                             ):
    """
    Computes the polarity index (P') from a given logP value and S value
    as the positive root of the quadratic fitted equation:

        E = A * (P')² + B * P' + C
        P' = (-B - sqrt(B² - 4A(C - E))) / (2A)

        with E = logP*ln(10) - S = Xw - Xo
             S = entropy contribution = - (V/Vw - V/Vo)

    Parameters:
    ----------
    logP : float, list, or np.ndarray
    V    : float, list or np.array
        The logP value(s) for which to compute the polarity index P'.

    Returns:
    -------
    float or np.ndarray
        The calculated polarity index P' corresponding to the input logP.
        If logP is out of the valid range, returns:
        - 10.2 for very polar solvents (beyond water)
        - 0 for extremely hydrophobic solvents (beyond n-Hexane)

    Example Usage:
    -------------
    >>> logP = migrant("anisole").logP
    >>> V = migrant("anisole").molarvolumeMiller
    >>> polarity_index(logP,V)
    8.34  # Example output

    >>> polarity_index_from_logP([-1.0, 0.5, 2.0])
    array([9.2, 4.5, 1.8])  # Example outputs
    """

    # Define valid logP range based on quadratic model limits
    Emin = C - B**2 / (4*A)  # ≈ -2.78 (theoretical minimum lnKow=E)
    Emax = C                 # 14.527 (theoretical maximum logP)
    Pmax = 10.2 # value for water

    def compute_P(logP_value,V_value):
        """Computes P' for a single logP and V value after input validation."""
        S = - (1/Vw - 1/Vo)*V_value
        E = logP_value * 2.302585092994046 - S # Chiw - Chio
        #print(f"E={E} with S={S}")
        if E < Emin:
            return Pmax  # Most polar (beyond water)
        if E > Emax:
            return 0.0  # Most hydrophobic (beyond n-Hexane)
        discriminant = B**2 - 4*A*(C - E)
        sqrt_discriminant = np.sqrt(discriminant)
        P2root = (-B - sqrt_discriminant) / (2*A)  # Always select P2
        return P2root if P2root<=Pmax else Pmax

    # for testing
    if logP is None or V is None:
        if name is None:
            raise ValueError('provide a valid name or logP, V pair values')
        from patankar.loadpubchem import migrant
        tmp = migrant(name)
        V = tmp.molarvolumeMiller
        logP = tmp.logP
    # Handle both single and multiple values efficiently
    if isinstance(logP, (list, tuple, np.ndarray)):
        return np.vectorize(compute_P)(logP,V)  # Vectorized for multiple inputs
    else:
        return compute_P(logP,V)