|
# pip-code-to-doc |
|
|
|
[pipableAi](https://www.linkedin.com/company/pipable.ai/about/) |
|
|
|
[colab_notebook](https://colab.research.google.com/drive/17PyMU_3QN9LROy7x-jmaema0cuLRzBvc?usp=sharing) |
|
|
|
## What have we built? |
|
|
|
A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines. |
|
We have also open sourced a [parsing lib](https://github.com/PipableAI/pip-library-parser) for the same, together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks. |
|
This is a further trained version of pip-sql-1.3b. |
|
|
|
## How we built it? |
|
|
|
We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up. |
|
Loss behaviour in the set up mentioned above - |
|
|
|
## License |
|
|
|
The model is open source under apache 2.0. License |
|
|
|
## Usage |
|
|
|
|
|
### Library use |
|
```python |
|
!pip3 install git+https://github.com/PipableAI/pip-library-parser |
|
!pip3 install atlassian-python-api |
|
|
|
|
|
from pip_library_parser import CodeToDocGenerator |
|
from atlassian import Jira |
|
|
|
import torch |
|
torch.set_default_device("cuda") |
|
|
|
# Instantiate the CodeToDocGenerator |
|
generator = CodeToDocGenerator() |
|
|
|
# Generate docstrings for the module's functions and methods |
|
module = Jira |
|
module_name = "atlassian.Jira" |
|
|
|
docs = generator.generate_module_docs(module, module_name) |
|
print(docs) |
|
``` |
|
|
|
```python |
|
from pip_library_parser import CodeToDocGenerator |
|
|
|
# Instantiate the CodeToDocGenerator |
|
generator = CodeToDocGenerator() |
|
|
|
code_snippet = """ |
|
def example_function(x): |
|
return x * 2 |
|
""" |
|
|
|
docstring = generator.generate_docstring_from_pip_model(code_snippet) |
|
print("Generated Docstring:") |
|
print(docstring) |
|
``` |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
### Prompt |
|
```python |
|
prompt = f"""<function_code>{code}</function_code> |
|
<question>Give one line description of the python code above in natural language.</question> |
|
<doc>""" |
|
``` |
|
|
|
### PyTorch |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
device = "cuda" |
|
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-code-to-doc-1.3b").to(device) |
|
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-code-to-doc-1.3b") |
|
prompt = f""" |
|
<function_code> |
|
def example_function(x): |
|
return x * 2 |
|
</function_code> |
|
<question>Give one line description of the python code above in natural language.</question> |
|
<doc>""" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=300) |
|
tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0] |
|
``` |
|
|
|
|
|
|
|
## Examples |
|
|
|
### prompt |
|
```python |
|
<function_code> |
|
########################### |
|
# Generate Analytical Model |
|
########################### |
|
################################################## |
|
# func: get_np_array_transition_probability_matrix |
|
################################################## |
|
def get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix): |
|
print('np_array_A_matrix:') |
|
print(np_array_A_matrix) |
|
##################################################### |
|
# Perturb the adjacency matrix to avoid singularities |
|
##################################################### |
|
np_array_A_matrix += (np.full((int_num_states, int_num_states), float_eps) - (np.identity(int_num_states) * float_eps)) |
|
print('np_array_A_matrix:') |
|
print(np_array_A_matrix) |
|
print('np_array_D_matrix:') |
|
np_array_D_matrix = np.diag(np.sum(np_array_A_matrix, axis=1)) |
|
print(np_array_D_matrix) |
|
print('np_array_D_matrix_inv:') |
|
np_array_D_matrix_inv = np.linalg.inv(np_array_D_matrix) |
|
print(np_array_D_matrix_inv) |
|
print('\n\n') |
|
print('np_array_P_matrix:') |
|
np_array_P_matrix = np.dot(np_array_D_matrix_inv, np_array_A_matrix) |
|
print(np_array_P_matrix) |
|
print('np.sum(np_array_P_matrix, axis=1):') |
|
print(np.sum(np_array_P_matrix, axis=1)) |
|
print('\n\n') |
|
return np_array_P_matrix |
|
################################################## |
|
# func: get_np_array_perron_frobenius_eigen_vector |
|
################################################## |
|
def get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix): |
|
np_array_perron_frobenius_matrix = np.linalg.matrix_power(np_array_P_matrix,1000) |
|
np_array_perron_frobenius_vector = np_array_perron_frobenius_matrix[0,:] |
|
print('np_array_perron_frobenius_matrix:') |
|
print(np_array_perron_frobenius_matrix) |
|
print('np.sum(np_array_perron_frobenius_matrix, axis=1):') |
|
print(np.sum(np_array_perron_frobenius_matrix, axis=1)) |
|
print('np.sum(np_array_perron_frobenius_matrix, axis=0):') |
|
print(np.sum(np_array_perron_frobenius_matrix, axis=0)) |
|
print('np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states:') |
|
print(np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states) |
|
print('np.dot(np_array_perron_frobenius_vector, np_array_P_matrix):') |
|
print(np.dot(np_array_perron_frobenius_vector, np_array_P_matrix)) |
|
print('np_array_perron_frobenius_vector:') |
|
print(np_array_perron_frobenius_vector) |
|
print('\n\n') |
|
return np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix |
|
############################# |
|
# func: get_np_array_Z_matrix |
|
############################# |
|
def get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix): |
|
np_array_Z_matrix = np.linalg.inv(np.identity(int_num_states) - np_array_P_matrix + np_array_perron_frobenius_matrix) |
|
print('np_array_Z_matrix:') |
|
print(np_array_Z_matrix) |
|
print('\n\n') |
|
return(np_array_Z_matrix) |
|
############################# |
|
# func: get_np_array_H_matrix |
|
############################# |
|
def get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector): |
|
np_array_H_matrix = np.zeros([int_num_states, int_num_states]) |
|
for i in range(int_num_states): |
|
for j in range(int_num_states): |
|
np_array_H_matrix[i][j] = (np_array_Z_matrix[j][j] - np_array_Z_matrix[i][j])/np_array_perron_frobenius_vector[j] |
|
print('np_array_H_matrix:') |
|
print(np_array_H_matrix) |
|
print('\n\n') |
|
return np_array_H_matrix |
|
########### |
|
# func: run |
|
########### |
|
def run(np_array_A_matrix): |
|
int_num_states = len(np_array_A_matrix) |
|
np_array_P_matrix = get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix) |
|
np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix = get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix) |
|
np_array_Z_matrix = get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix) |
|
np_array_H_matrix = get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector) |
|
return(np_array_H_matrix) |
|
</function_code> |
|
<question>Give one line description of the python code above in natural language.</question> |
|
<doc> |
|
``` |
|
|
|
### Response |
|
```txt |
|
The given python code is a function that calculates the transition probability matrix, P, for a given adjacency matrix A, and then uses these matrices to calculate the Perron-Frobenius eigenvector and its inverse matrix Z, and finally, the H matrix which is the inverse of the Z matrix. The H matrix is then returned as the output of the function. The adjacency matrix A is a square matrix where each element at position (i, j) represents the probability of transitioning from state i to state j. The function first perturbs the adjacency matrix to avoid singularities, then calculates the transition probability matrix P, the Perron-Frobenius eigenvector and its inverse matrix Z, and finally, the H matrix. The H matrix is then returned as the output of the function. |
|
``` |
|
|
|
### Team |
|
Avi Kothari, Gyan Ranjan, Pratham Gupta, Ritvik Aryan Kalra, Soham Acharya |