LeroyDyer's picture
Adding Evaluation Results (#1)
56539df verified
metadata
language:
  - en
license: mit
library_name: transformers
tags:
  - LCARS
  - Star-Trek
  - 128k-Context
  - mistral
  - chemistry
  - biology
  - finance
  - legal
  - art
  - code
  - medical
  - text-generation-inference
pipeline_tag: text2text-generation
model-index:
  - name: LCARS_AI_StarTrek_Computer
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 35.83
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 21.78
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 4.08
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 2.35
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 7.44
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 16.2
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/LCARS_AI_StarTrek_Computer
          name: Open LLM Leaderboard

If anybody has star trek data please send as this starship computer database archive needs it!

then i can correctly theme this model to be inside its role as a starship computer : so as well as any space dara ffrom nasa ; i have collected some mufon files which i am still framing the correct prompts for ; for recall as well as interogation : I shall also be adding a lot of biblical data and historical data ; from sacred texts; so any generated discussions as phylosophers discussing ancient history and how to solve the problems of the past which they encountered ; in thier lifes: using historical and factual data; as well as playig thier roles after generating a biography and character role to the models to play: they should also be amazed by each others acheivements depending on thier periods: we need multiple role and characters for these discussions: as well as as much historical facts and historys as possible to enhance this models abitlity to dicern ancient aliens truth or false : (so we need astrological, astronomical, as well as sizmological and ecological data for the periods of histroy we know : as well as the unfounded suupositions from youtube subtitles !) another useful source of themed data!

This model is a Collection of merged models via various merge methods : Reclaiming Previous models which will be orphened by thier parent models : THis model is the model of models so it may not Remember some task or Infact remember them all as well as highly perform ! There were some very bad NSFW Merges from role play to erotica as well as various characters and roles downloaded into the model: So those models were merged into other models which had been specifically trained for maths or medical data and the coding operations or even translation:

the models were heavliy dpo trained ; and various newer methodologies installed : the deep mind series is a special series which contains self correction recal, visio spacial ... step by step thinking:

SO the multi merge often fizes these errors between models as well as training gaps :Hopefully they all took and merged well ! Performing even unknown and unprogrammed tasks:

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 14.61
IFEval (0-Shot) 35.83
BBH (3-Shot) 21.78
MATH Lvl 5 (4-Shot) 4.08
GPQA (0-shot) 2.35
MuSR (0-shot) 7.44
MMLU-PRO (5-shot) 16.20