HOME

Mini Shell 1.0
DIR: /usr/lib/python3/dist-packages/chardet/__pycache__/
Current File : //usr/lib/python3/dist-packages/chardet/__pycache__/charsetprober.cpython-312.pyc
�

�d,��l�ddlZddlZddlmZmZddlmZmZejd�Z	Gd�d�Z
y)�N)�Optional�Union�)�LanguageFilter�ProbingStates%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?c� �eZdZdZej
fdeddfd�Zdd�Zede	e
fd��Zede	e
fd��Zd	e
eefdefd
�Zedefd��Zdefd�Zed
e
eefdefd��Zed
e
eefdefd��Zed
e
eefdefd��Zy)�
CharSetProbergffffff�?�lang_filter�returnNc��tj|_d|_||_tjt�|_y)NT)	r�	DETECTING�_state�activer
�logging�	getLogger�__name__�logger)�selfr
s  �7/usr/lib/python3/dist-packages/chardet/charsetprober.py�__init__zCharSetProber.__init__,s0��"�,�,������&����'�'��1���c�.�tj|_y�N)rr
r�rs r�resetzCharSetProber.reset2s��"�,�,��rc��yr�rs r�charset_namezCharSetProber.charset_name5s��rc��t�r��NotImplementedErrorrs r�languagezCharSetProber.language9s��!�!r�byte_strc��t�rr )rr#s  r�feedzCharSetProber.feed=s��!�!rc��|jSr)rrs r�statezCharSetProber.state@s���{�{�rc��y)Ngrrs r�get_confidencezCharSetProber.get_confidenceDs��r�bufc�4�tjdd|�}|S)Ns([-])+� )�re�sub)r*s r�filter_high_byte_onlyz#CharSetProber.filter_high_byte_onlyGs���f�f�&��c�2���
rc���t�}tj|�}|D]C}|j|dd�|dd}|j	�s|dkrd}|j|��E|S)u7
        We define three types of bytes:
        alphabet: english alphabets [a-zA-Z]
        international: international characters [-ÿ]
        marker: everything else [^a-zA-Z-ÿ]
        The input buffer can be thought to contain a series of words delimited
        by markers. This function works to filter all words that contain at
        least one international character. All contiguous sequences of markers
        are replaced by a single space ascii character.
        This filter applies to all scripts which do not use English characters.
        N�����r,)�	bytearray�INTERNATIONAL_WORDS_PATTERN�findall�extend�isalpha)r*�filtered�words�word�	last_chars     r�filter_international_wordsz(CharSetProber.filter_international_wordsLsv���;��
,�3�3�C�8���
	'�D��O�O�D��"�I�&��R�S�	�I��$�$�&�9�w�+>� �	��O�O�I�&�
	'��rc�*�t�}d}d}t|�jd�}t|�D]F\}}|dk(r|dz}d}�|dk(s�||kDr'|s%|j	|||�|j	d�d}�H|s|j	||d	�|S)
a[
        Returns a copy of ``buf`` that retains only the sequences of English
        alphabet and high byte characters that are not between <> characters.
        This filter can be applied to all scripts which contain both English
        characters and extended ASCII characters, but is currently only used by
        ``Latin1Prober``.
        Fr�c�>r�<r,TN)r3�
memoryview�cast�	enumerater6)r*r8�in_tag�prev�curr�buf_chars      r�remove_xml_tagszCharSetProber.remove_xml_tagsns����;��������o�"�"�3�'��'��n�	�N�D�(��4���a�x�����T�!��$�;�v��O�O�C��T�N�3��O�O�D�)���	�$�
�O�O�C���J�'��r)rN)r�
__module__�__qualname__�SHORTCUT_THRESHOLDr�NONErr�propertyr�strrr"r�bytesr3rr%r'�floatr)�staticmethodr/r<rHrrrr	r	(s0����5C�5H�5H�2�N�2�T�2�-���h�s�m�����"�(�3�-�"��"�"�U�5�)�#3�4�"��"���|���������5��	�)9�#:��u�������e�Y�.>�(?��I����B�$�U�5�)�#3�4�$��$��$rr	)rr-�typingrr�enumsrr�compiler4r	rrr�<module>rUs3��:�	�"�/�(�b�j�j�8���
k�kr