Built With Llama!

Built With Axolotl!

Overview

We fine-tuned SmileyLlama with DPO to improve its adherence to directions in the prompt.

For more details, read the ArXiv preprint here: https://arxiv.org/abs/2409.02231

How to use

This can be loaded using the same method as Llama3.1, and the memory requirements are the same as Llama-3.1-8B.

Options for "properties" that SmileyLlama was trained on are

  • ( <= 3, <= 4, <= 5, <= 7, > 7) H-bond donors
  • ( <= 3, <= 4, <= 5, <= 10, <= 15) H-bond acceptors
  • ( <= 300, <= 400, <= 500, <= 600, > 600) Molecular weight
  • ( <= 3, <= 4, <= 5, <= 10, <= 15, > 15) logP
  • ( <= 7, <= 10, > 10) Rotatable bonds
  • ( < 0.4, > 0.4, > 0.5, > 0.6) Fraction sp3
  • ( <= 90, <= 140, <= 200, > 200) TPSA
  • (a macrocycle, no macrocycles)
  • (has, lacks) bad SMARTS
  • lacks covalent warheads
  • has covalent warheads: (sulfonyl fluorides, acrylamides, ...) (see below for details)
  • A substructure of *SMILES_STRING*
  • A chemical of *CHEMICAL_FORMULA*

List of possible warheads:

  • sulfonyl fluorides: [#16](=[#8])(=[#8])-[#9]
  • chloroacetamides: [#8]=[#6](-[#6]-[#17])-[#7]
  • cyanoacrylamides: [#7]-[#6](=[#8])-[#6](-[#6]#[#7])=[#6]
  • epoxides: [#6]1-[#6]-[#8]-1
  • aziridines: [#6]1-[#6]-[#7]-1
  • disulfides: [#16]-[#16]
  • aldehydes: [#6](=[#8])-[#1]
  • vinyl sulfones: [#6]=[#6]-[#16](=[#8])(=[#8])-[#7]
  • boronic acids/esters: [#6]-[#5](-[#8])-[#8]
  • acrylamides: [#6]=[#6]-[#6](=[#8])-[#7]
  • cyanamides: [#6]-[#7](-[#6]#[#7])-[#6]
  • chloroFluoroAcetamides: [#7]-[#6](=[#8])-[#6](-[#9])-[#17]
  • butynamides: [#6]#[#6]-[#6](=[#8])-[#7]-[#6]
  • chloropropionamides: [#7]-[#6](=[#8])-[#6](-[#6])-[#17]
  • fluorosulfates: [#8]=[#16](=[#8])(-[#9])-[#8]
  • beta lactams: [#7]1-[#6]-[#6]-[#6]-1=[#8]

Generating a drug-like molecule which obeys the Lipinski rule of five

import torch
import transformers

model_id = "/path/to/your/model"

system_txt = "You love and excel at generating SMILES strings of drug-like molecules"
user_txt = "Output a SMILES string for a drug like molecule with the following properties: <= 5 H-bond donors, <= 10 H-bond acceptors, <= 500 molecule, <= 5 logP:"
prompt = f"### Instruction:\n{system_text}\n\n### Input:\n{user_text}\n\n### Response:\n"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    temperature=1.0
)

outputs = pipeline(
    prompt,
    max_new_tokens=128,
    num_return_sequences=4
)
for k in range(4):
  print(outputs[k]["generated_text"][-1])

You can use num_return_sequences to efficiently generate many SMILES strings rapidly, though this is limited by your memory.

Downloads last month
123
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for THGLab/Llama-3.1-8B-SmileyLlama-1.1-Prompt-Following

Finetuned
(2)
this model
Quantizations
1 model

Collection including THGLab/Llama-3.1-8B-SmileyLlama-1.1-Prompt-Following