Issu loading the LLM locally #9

astroyouth · 2025-01-21T09:04:33Z

astroyouth
Jan 21, 2025

in code snippet 42, spesifically the following:

4. Instantiate the model

llm_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
torch_dtype=torch.float16, # datatype to use, we want float16
quantization_config=quantization_config if use_quantization_config else None,
low_cpu_mem_usage=False, # use full memory
attn_implementation=attn_implementation) # which attention version to use

I get the following error which I cannot seem to resolve.
the only suggestion I can find is to remove the torch_dtype=torch.float16, but that doesnt help.

this is my error read out:

ValueError Traceback (most recent call last)
Cell In[3], line 30
27 tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_id)
29 # 4. Instantiate the model
---> 30 llm_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
31 torch_dtype=torch.float16, # datatype to use, we want float16
32 quantization_config=quantization_config if use_quantization_config else None,
33 low_cpu_mem_usage=False, # use full memory
34 attn_implementation=attn_implementation) # which attention version to use
36 if not use_quantization_config: # quantization takes care of device setting automatically, so if it's not used, send model to GPU
37 llm_model.to("cuda")

File ~/Coding-stuff/simple-local-rag/venv/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py:561, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
559 elif type(config) in cls._model_mapping.keys():
560 model_class = _get_model_class(config, cls._model_mapping)
--> 561 return model_class.from_pretrained(
562 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
563 )
564 raise ValueError(
565 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
566 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
567 )

File ~/Coding-stuff/simple-local-rag/venv/lib/python3.12/site-packages/transformers/modeling_utils.py:3558, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
...
2539 # For GPTQ models, we prevent users from casting the model to another dytpe to restrict unwanted behaviours.
2540 # the correct API should be to load the model with the desired dtype directly through from_pretrained.
2541 dtype_present_in_args = False

ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

astroyouth · 2025-01-21T09:05:28Z

astroyouth
Jan 21, 2025
Author

Ive just realised that this is better placed in the Issues section.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issu loading the LLM locally #9

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Issu loading the LLM locally #9

Uh oh!

astroyouth Jan 21, 2025

4. Instantiate the model

this is my error read out:

Replies: 1 comment

Uh oh!

astroyouth Jan 21, 2025 Author

astroyouth
Jan 21, 2025

astroyouth
Jan 21, 2025
Author