IT ์•Œ์•„๊ฐ€๊ธฐ

[์ฝ”๋”ฉํ”์ ] Python ์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํŒจํ„ด ์ฐพ๊ธฐ

์žก์‹๋ƒฅ์ด 2025. 4. 13. 12:09

ํŒŒ์ด์ฌ ์ •๊ทœํ‘œํ˜„์‹


๐Ÿ‘พ ์ •๊ทœํ‘œํ˜„์‹์ด๋ž€?

์ •๊ทœ ํ‘œํ˜„์‹์€ ๋ฌธ์ž์—ด์—์„œ ํŠน์ • ํŒจํ„ด์„ ์ฐพ๊ฑฐ๋‚˜ ๋งค์นญ, ๋Œ€์ฒดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํŠน์ˆ˜ํ•œ ๋ฌธ์ž์—ด์ด๋‹ค. Python์—์„œ๋Š” re ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์ •๊ทœ ํ‘œํ˜„์‹ ๊ธฐ๋Šฅ์ด ์ œ๊ณต๋œ๋‹ค. 

import re  # ์ •๊ทœ ํ‘œํ˜„์‹ ๋ชจ๋“ˆ ๊ฐ€์ ธ์˜ค๊ธฐ

# ์ •๊ทœ ํ‘œํ˜„์‹ ํŒจํ„ด ๋ฆฌ์ŠคํŠธ ์ •์˜
patterns = [
    r'pattern1',
    r'pattern2',
    r'pattern3'
]
๋ฐ˜์‘ํ˜•

๐Ÿ‘พ Python ์ •๊ทœ ํ‘œํ˜„์‹ ํŒจํ„ด ๋ฆฌ์ŠคํŠธ์˜ ์šฉ๋„

Python์—์„œ ์ •๊ทœ ํ‘œํ˜„์‹(Regular Expression) ํŒจํ„ด ๋ฆฌ์ŠคํŠธ๋Š” ๋ฌธ์ž์—ด ํŒจํ„ด์„ ๊ฒ€์ƒ‰ํ•˜๊ณ  ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” ๋„๊ตฌ์ด๋‹ค. ๋ณดํ†ต ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•  ๋•Œ ์ •๊ทœ ํ‘œํ˜„์‹ ๊ทœ์น™๋“ค์„ ๋„ฃ์–ด์„œ ์ œ๋ชฉ์„ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ํ•„์š”์—†๋Š” ๋ฌธ๊ตฌ๋“ค์„ ์—†์• ๊ฑฐ๋‚˜ ์ถ”๊ฐ€ํ•˜๋Š”๋ฐ ํ™œ์šฉํ•œ๋‹ค. ํ’ˆ์งˆ์ด ์ข‹์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์–ด์•ผ ์ดํ›„์˜ ์ž‘์—…์ด ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๊ฒƒ ๊ฐ™๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”ํ•œ ๊ณผ์ •์ธ ๊ฒƒ ๊ฐ™๋‹ค. ๊ตฌ์ฒด์ ์ธ ํ™œ์šฉ ์‚ฌ๋ก€๋ฅผ ์†Œ๊ฐœํ•˜์ž๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค. 

 

1. ํŠน์ • ํŒจํ„ด์„ ๊ฐ€์ง„ ๋ฌธ์žฅ ์ฐพ๊ธฐ 

import re

patterns = [
    r'\b\w+ing\b',  # 'ing'๋กœ ๋๋‚˜๋Š” ๋‹จ์–ด
    r'\b\d{3}-\d{4}\b',  # 123-4567 ํ˜•์‹์˜ ๋ฒˆํ˜ธ
    r'[A-Z][a-z]+'  # ๋Œ€๋ฌธ์ž๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋‹จ์–ด
]

text = "Running fast with phone 555-4321. John is calling."

# ๋ชจ๋“  ํŒจํ„ด์„ ๊ฒ€์ƒ‰ํ•˜๊ธฐ
for pattern in patterns:
    matches = re.findall(pattern, text)
    print(f"ํŒจํ„ด '{pattern}'์˜ ์ผ์น˜ ํ•ญ๋ชฉ: {matches}")

 

2. ์ด๋ฉ”์ผ์ฃผ์†Œ๊ฐ€ ๋ฐ”๋€Œ์—ˆ์„ ๋•Œ ํ•œ๋ฒˆ์— ๋ณ€๊ฒฝํ•˜๊ธฐ

import re

# HTML ํƒœ๊ทธ์™€ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ฅผ ์ฐพ๋Š” ํŒจํ„ด ๋ฆฌ์ŠคํŠธ
patterns = [
    r'<[^>]+>',  # HTML ํƒœ๊ทธ
    r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'  # ์ด๋ฉ”์ผ ์ฃผ์†Œ
]

html_text = """
<div>์—ฐ๋ฝ์ฒ˜: <a href="mailto:john@example.com">John</a></div>
<p>๋˜๋Š” jane.doe@company.co.kr๋กœ ๋ฌธ์˜ํ•˜์„ธ์š”.</p>
"""

# ๊ฐ ํŒจํ„ด๋ณ„๋กœ ๋งค์นญ ํ•ญ๋ชฉ ์ฐพ๊ธฐ
for i, pattern in enumerate(patterns):
    matches = re.findall(pattern, html_text)
    print(f"ํŒจํ„ด {i+1} ๊ฒฐ๊ณผ: {matches}")

# ํŒจํ„ด ๋งค์นญ ๊ฒฐ๊ณผ๋ฅผ ๋Œ€์ฒดํ•˜๊ธฐ
for pattern in patterns:
    html_text = re.sub(pattern, "[REDACTED]", html_text)

print("\n์ฒ˜๋ฆฌ ํ›„ ํ…์ŠคํŠธ:")
print(html_text)

๐Ÿ‘พ ์ฃผ์š” ๋ฉ”ํƒ€๋ฌธ์ž

ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์ฃผ์š” ๋ฉ”ํƒ€ ๋ฌธ์ž์™€ ํŠน์ˆ˜ ์‹œํ€€์Šค๋ฅผ ์ •๋ฆฌํ•ด๋ณด์•˜๋‹ค! 

  • . - ์ค„๋ฐ”๊ฟˆ์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ฌธ์ž ํ•˜๋‚˜์™€ ์ผ์น˜
  • ^ - ๋ฌธ์ž์—ด์˜ ์‹œ์ž‘
  • $ - ๋ฌธ์ž์—ด์˜ ๋
  • * - 0๋ฒˆ ์ด์ƒ ๋ฐ˜๋ณต
  • + - 1๋ฒˆ ์ด์ƒ ๋ฐ˜๋ณต
  • ? - 0๋ฒˆ ๋˜๋Š” 1๋ฒˆ ๋ฐ˜๋ณต
  • {n} - ์ •ํ™•ํžˆ n๋ฒˆ ๋ฐ˜๋ณต
  • {n,} - n๋ฒˆ ์ด์ƒ ๋ฐ˜๋ณต
  • {n,m} - n๋ฒˆ ์ด์ƒ m๋ฒˆ ์ดํ•˜ ๋ฐ˜๋ณต
  • \ - ํŠน์ˆ˜ ๋ฌธ์ž ์ด์Šค์ผ€์ดํ”„
  • [] - ๋ฌธ์ž ํด๋ž˜์Šค (๊ด„ํ˜ธ ์•ˆ์˜ ๋ฌธ์ž ์ค‘ ํ•˜๋‚˜์™€ ์ผ์น˜)
  • | - ๋Œ€์•ˆ (OR ์—ฐ์‚ฐ)
  • () - ๊ทธ๋ฃนํ™”
  • \d - ์ˆซ์ž์™€ ์ผ์น˜ (= [0-9])
  • \D - ์ˆซ์ž๊ฐ€ ์•„๋‹Œ ๋ฌธ์ž์™€ ์ผ์น˜ (= [^0-9])
  • \s - ๊ณต๋ฐฑ ๋ฌธ์ž์™€ ์ผ์น˜ (๊ณต๋ฐฑ, ํƒญ, ์ค„๋ฐ”๊ฟˆ ๋“ฑ)
  • \S - ๊ณต๋ฐฑ์ด ์•„๋‹Œ ๋ฌธ์ž์™€ ์ผ์น˜
  • \w - ์•ŒํŒŒ๋ฒณ, ์ˆซ์ž, ๋ฐ‘์ค„๊ณผ ์ผ์น˜ (= [a-zA-Z0-9_])
  • \W - ์•ŒํŒŒ๋ฒณ, ์ˆซ์ž, ๋ฐ‘์ค„์ด ์•„๋‹Œ ๋ฌธ์ž์™€ ์ผ์น˜
  • \b - ๋‹จ์–ด ๊ฒฝ๊ณ„

์ด์ฒ˜๋Ÿผ Python์—์„œ ์ •๊ทœ ํ‘œํ˜„์‹ ํŒจํ„ด ๋ฆฌ์ŠคํŠธ๋Š” ๋ณต์žกํ•œ ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ ์ž‘์—…์„ ์ฒด๊ณ„์ ์œผ๋กœ ํ•œ๋ฒˆ์— ์ˆ˜ํ–‰ํ•˜๊ธฐ ์ข‹์€ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค! 

 

728x90