Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

tmxklab

heap(5) - tcache 정리 본문

Security/01 System Hacking

heap(5) - tcache 정리

tmxk4221 2020. 8. 24. 21:19

1. tcache란?

tcache(Thread local Caching)란 멀티 스레드 환경에서 메모리 할당속도를 높이기 위해 glibc 2.26버젼 이상부터 생겨난 기술이다. 이전에 멀티 스레드 환경에서 Arena라는 개념을 도입하여 각 스레드당 힙 메모리를 제공하여 서로 간섭없이 힙 작업을 빠르게 수행할 수 있도록 도와준다고 하였다. tcache라는 개념 또한 멀티 스레드 환경에서 힙 작업을 빠르게 수행할 수 있도록 도와주는 기술이다.

tcache는 작은 단위의 메모리 할당이 필요할 경우 arena를 참조하지 않고 바로 메모리를 할당할 수 있도록 각 스레드 당 thread cache라는 영역을 제공함으로써 메모리 할당 속도를 높일 수 있는 기술이다.

이제부터 tcache에 대해 자세히 설명하도록 하겠다. (glibc 2.26기준)

2. tcache 특징 및 구조

2.1 특징

/* We want 64 entries.  This is an arbitrary limit, which tunables can reduce.  */
# define TCACHE_MAX_BINS		64

각 스레드별로 64개의 single linked list구조의 tcache bin 존재

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7

각 bin에는 7개의 동일한 크기의 청크 존재
- 32bit system : 12 ~ 516byte 범위의 청크 사이즈
- 64bit system : 24 ~ 1032byte 범위의 청크 사이즈

2.2 구조

1) tcache_entry

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */
typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

동일한 크기의 free chunk를 관리하는 구조체
next포인터를 통해 다음 동일한 크기의 free chunk를 연결

2) tcache_perthread_struct

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

전체적인 tcache list를 관리하는 구조체
counts[TCACHE_MAX_BINS] : entries배열의 single linked list로 연결되어 있는 청크의 개수 기록(single list당 최대 7개의 청크 제한)
entries[TCACHE_MAX_BINS] : fastbin과 같이 single linked list의 구조로 동일한 크기의 free chunk들로 연결되어 있음

3) 전체적인 구조

4) debugging

0x10, 0x20, 0x30 크기의 청크를 각각 2번 malloc한 이후 free한 이후이다.
fastbin과 동일하게 LIFO방식의 single linked list구조로 각자 크기에 맞게 들어가 있으며 count되어 있는 것을 볼 수 있다.
참고로 tcache는 다른 bin들과 달리 main_arena에 존재하지 않고 tcache_perthread_struct에 존재한다.

tcache_perthread_struct에 tcache→counts[tc_idx]와 tcache→entries[rc_idx]가 존재하는 것을 확인할 수 있다.

청크의 fd부분에 tcache_entry→next가 오며 single liked list구조를 가진다.

3. tcache 분석

3.1 tcache 관련 주요 함수

tcache_init : tcache_perthread_struct를 생성
tcache_get : tcache리스트에 청크를 가져옴(제거)
tcache_put : tcache리스트에 새로운 청크 추가

1) tcache_init

static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }

}

tcache_shutting_down이 참이면 tcache가 종료되었다는 뜻으로 함수를 종료
arena_get함수를 통해 ar_ptr에 arena의 주소를 받아오고 _int_malloc함수 호출을 통해 실질적인 메모리 할당 요청을 하여 victicm에 넣어줌
위 과정에서 비정상적으로 malloc이 할당되어 victim에 널 값이 들어가고 ar_ptr은 잘 받아져왔다면 다시 arena_get_retry를 통해 arena를 가져오고 _int_malloc함수를 호출한다.
ar_ptr이 널 값이 아니면 즉, arena의 주소를 잘 받아왔다면 __libc_lock_unlock을 통해 lock을 해제한다.
정상적으로 malloc이 할당되어 victim에 값이 존재하면 tcache_perthread_struct구조체에 맞게 캐스팅하여 tcache에 저장하고 메모리를 0으로 세팅한다.

2) tcache_get

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

tcache→entries[tc_idx]의 값을 e에 저장한다.
assert함수를 2번 호출하여 tc_idx가 TCACHE_MAX_BINS보다 큰지 tcache→entries[tc_idx]값이 0보다 작은지 검증한다. 참이면 프로그램은 계속 실행되고 거짓이면 프로그램을 중단한다.
tcache→entries[tc_idx]에 다음 청크의 주소 값을 넣고 tcache→counts를 감소한다. 즉, a → b → c .. 인 상황에서 a에서 b로 가는 link가 사라진다. 이를 통해 나중에 들어온 free 청크가 먼저 사용됨을 통해 LIFO방식임을 알 수 있다.
링크에서 제거된 청크의 주소인 e를 반환 값으로 준다.

3) tcache_put

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

chunk2mem함수((void*)((char*)(p) + 2*SIZE_SZ))를 통해 청크의 실제 주소를 가져오고 16을 더한다. 즉, 청크의 실제 주소에 16을 더하면 prev_size와 size 다음 부분인 fd부분이다. 해당 주소 값을 가져와서 tcache_entry구조체에 맞게 캐스팅하여 e에 저장
tc_idx가 TCACHE_MAX_BINS를 넘는지 검증한다.
tcache_entry의 구조체인 e의 next는 청크의 fd부분을 의미하며 e→next에 원래 존재하는 tcache→entries[tc_idx]의 값을 넣어 linked list의 구조를 만든다.
마지막으로 tcache→counts[tc_idx]값을 증가해준다.

3.2 Malloc 과정

malloc함수를 호출하면 내부적으로 __libc_malloc함수를 호출한다.

여기서 tcache가 추가되면서 코드가 조금 수정되었다.

1) __libc_malloc

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));
#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes = request2size (bytes);
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif

  arena_get (ar_ptr, bytes);

  victim = _int_malloc (ar_ptr, bytes);
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
  if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }

  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}

먼저, hook이 등록되어 있으면 호출한다. (glibc 2.23과 동일)
여기서 #if USE_TCACHE ~ #endif 부분에 tcache관련 코드가 추가되었고 다른 부분은 기존의 __libc_malloc코드와 동일하다.
_int_malloc을 호출하기전에 MAYBE_INIT_TCACHE매크로 함수를 호출한다.

#define MAYBE_INIT_TCACHE() \
  if (__glibc_unlikely (tcache == NULL)) \
    tcache_init();

tcache가 널이면 tcache_init을 통해 tcache_perthread_struct를 생성
MAYBE_INIT_TCACHE이후에는 if문을 통해 tc_idx가 mp_.tcache_bins(TCACHE_MAX_BINS)보다 작은지 tcache_init을 통해 tcache가 할당되었는지 tcache의 엔트리가 널 값이 아닌지 검사를 하고 tcache_get함수를 호출한다.
tcache에서 할당해줄 청크가 존재하지 않으면 arena의 주소를 얻어와 _int_malloc을 호출하여 메모리를 할당해준다.

2) _int_malloc

기존의 _int_malloc의 루틴은 동일하고 tcache가 추가된 부분만 보겠다.

2-1) fastbin 루틴

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
	{
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp = *fb;
      REMOVE_FB (fb, victim, pp);
      if (victim != 0)
        {
          if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
            {
              errstr = "malloc(): memory corruption (fast)";
            errout:
              malloc_printerr (check_action, errstr, chunk2mem (victim), av);
              return NULL;
            }
          check_remalloced_chunk (av, victim, nb);
#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (pp = *fb) != NULL)
			{
			  REMOVE_FB (fb, tc_victim, pp);
			  if (tc_victim != 0)
			    {
			      tcache_put (tc_victim, tc_idx);
	        }
				}
	    }
#endif
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
		}
	}

요청한 메모리 할당 크기를 fastbin에서 찾아서 fastbin에서 할당하고 fastbin에서 재할당하는데 사용된 청크를 unlink를 진행한 이후 #if USE_TCACHE ~ #endif부분에서 tcache 로직이 추가되었다.
요청한 청크의 사이즈를 tcache에 맞는 인덱스로 변환이후에 tcache가 존재하고 tc_idx가 TCACHE_MAX_BINS범위를 넘어서지 않으면 if문 안의 로직이 실행된다.
if문 안의 while문에서 tcache→counts[tc_idx]의 값이 TCACHE_FILL_COUNT값 보다 작고 pp가 널이 아닐 때까지 REMOVE_FB매크로 함수와 tcache_put함수가 실행된다.
REMOVE_FB매크로 함수를 통해 fastbin에 있는 청크를 unlink를 하게되고 tcache_put함수를 통해 unlink하면서 빠져 나가게 된 tc_victim을 tcache에 넣는다.
정리하면 tcache의 엔트리가 TCACHE_FILL_COUNT(7)범위를 넘어서지 않을 때까지 fastbin에 있는 청크들을 tcache로 넣는 과정이다.

2-2) smallbin 루틴

if (in_smallbin_range (nb))
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);

      if ((victim = last (bin)) != bin)
        {
          if (victim == 0) /* initialization check */
            malloc_consolidate (av);
          else
            {
              bck = victim->bk;
				if (__glibc_unlikely (bck->fd != victim))
                {
                  errstr = "malloc(): smallbin double linked list corrupted";
                  goto errout;
                }
              set_inuse_bit_at_offset (victim, nb);
              bin->bk = bck;
              bck->fd = bin;

              if (av != &main_arena)
								set_non_main_arena (victim);
              check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
		{
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
				{
				  if (tc_victim != 0)
					{
				      bck = tc_victim->bk;
				      set_inuse_bit_at_offset (tc_victim, nb);
				      if (av != &main_arena)
								set_non_main_arena (tc_victim);
				      bin->bk = bck;
				      bck->fd = bin;

				      tcache_put (tc_victim, tc_idx);
					}
				}
	    }
#endif
				void *p = chunk2mem (victim);
				alloc_perturb (p, bytes);
        return p;
			}
		}
	}

fastbin루틴과 동일한 로직으로 smallbin에서 재할당을 해주고 while루프를 돌면서 남은 청크들을 tcache에 넣어준다.

2-3) unsortedbin루틴

...
					/* remove from unsorted list */
          unsorted_chunks (av)->bk = bck;
          bck->fd = unsorted_chunks (av);

          /* Take now instead of binning if exact fit */

          if (size == nb)
            {
              set_inuse_bit_at_offset (victim, size);
              if (av != &main_arena)
								set_non_main_arena (victim);
#if USE_TCACHE
	      /* Fill cache first, return to user only if cache fills.
		 We may return one of these chunks later.  */
				      if (tcache_nb
							  && tcache->counts[tc_idx] < mp_.tcache_count)
								{
								  tcache_put (victim, tc_idx);
								  return_cached = 1;
								  continue;
								}
				      else
								{
#endif

...

위와 동일한 로직으로 unsortedbin에서 청크를 재할당해주고 unlink를 진행한 이후에 unsorted bin에 있는 청크들을 tcache에 넣어준다.
이 때 위에 smallbin과 fastbin과 다르게 return_cached변수에 1로 세팅한다.
return_cached가 사용되는 로직은 2군데 있다.

#if USE_TCACHE
      /* If we've processed as many chunks as we're allowed while
	 filling the cache, return one of the cached ones.  */
      ++tcache_unsorted_count;
      if (return_cached
				  && mp_.tcache_unsorted_limit > 0
				  && tcache_unsorted_count > mp_.tcache_unsorted_limit)
				{
				  return tcache_get (tc_idx);
				}
#endif

#define MAX_ITERS       10000
          if (++iters >= MAX_ITERS)
            break;
        }

#if USE_TCACHE
      /* If all the small chunks we found ended up cached, return one now.  */
      if (return_cached)
				{
				  return tcache_get (tc_idx);
				}
#endif

return_cached가 참이고 tcache_unsorted_count가 tcache_unsorted_limit보다 크면 tcache_get을 호출하고 return한다. 즉, unsorted bin에서 tcache로 이동한 경우 tcache list의 topchunk를 가져온다.
이후에 각자 자신의 빈으로 들어가는 과정을 최대 10000번 반복 이후에 다시 한 번 return_cached가 참이면 tcache_get을 호출한다.

3.3 Free 과정

free함수를 호출하면 내부적으로 __libc_free함수를 호출한다.

3-1) __libc_free

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);
  if (__builtin_expect (hook != NULL, 0))
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect */
    return;

  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* See if the dynamic brk/mmap threshold needs adjusting.
	 Dumped fake mmapped chunks do not affect the threshold.  */
      if (!mp_.no_dyn_threshold
          && chunksize_nomask (p) > mp_.mmap_threshold
          && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX
				  && !DUMPED_MAIN_ARENA_CHUNK (p))
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

  MAYBE_INIT_TCACHE ();

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);
}

hook이 등록되어 있으면 호출
3번째 if문에서 mmap으로 할당된 청크이면 if문안의 로직이 실행되고 아니면 MAYBE_INIT_TCACHE함수를 호출한다.
마지막으로 _int_free를 호출한다.

3-2) _int_free

...
#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache && tc_idx < mp_.tcache_bins && tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif
...

다른 부분은 glibc 2.23과 동일하고 tcache코드가 추가된 부분만 보겠다.
tcache가 존재하고 tc_idx가 TCACHE_MAX_BINS범위 안에 있으며 tcache→counts 또한 TCACHE_FILL_COUNT(7)범위 안에 있으며 tcache_put을 호출하여 tcache에 free 청크를 넣는다.

정리

fastbin과 거의 비슷한 구조를 가지고 있다.(LIFO, single linked list)
fastbin에서 fd는 다음 청크의 prev_size를 가리키지만 tcache에서 next는 다음 청크의 next부분을 가리킨다.
다른 bin들과는 다르게 main_arena에서 관리되지 않고tcache_perthread_struct가 관리한다.
tcache에서 bin의 개수는 일반적으로 64개이며 각각 서로 다른 크기의 chunk들을 bin에 저장하는데 최대 7개의 청크까지 저장할 수 있다.
tcache에 들어갈 수 있는 크기의 청크가 free되면 tcache에 먼저 저장된다.(최대 7개까지 이후에는 각자 자신의 크기에 맞는 bin에 들어감)
fastbin, unsorted bin, small bin에 저장되어 있는 free 청크를 재할당하는데 사용되면 나머지 free 청크들은 tcache로 옮겨진다.

4. 참고자료

저작자표시 (새창열림)

'Security > 01 System Hacking' 카테고리의 다른 글

FSOP를 위한 fclose()분석 (0)	2020.09.15
stdout flag를 이용한 libc leak (6)	2020.08.26
Linux system call table 정리(32bit, 64bit) (0)	2020.08.20
[heap exploit] - Unsorted bin Attack (0)	2020.08.04
[heap exploit] - Unsafe Unlink (4)	2020.07.31

'Security/01 System Hacking' Related Articles

Comments