📄 wl.c
字号:
/* * Copyright (c) International Business Machines Corp., 2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See * the GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Authors: Artem Bityutskiy (Битюцкий Артём), Thomas Gleixner *//* * UBI wear-leveling sub-system. * * This sub-system is responsible for wear-leveling. It works in terms of * physical* eraseblocks and erase counters and knows nothing about logical * eraseblocks, volumes, etc. From this sub-system's perspective all physical * eraseblocks are of two types - used and free. Used physical eraseblocks are * those that were "get" by the 'ubi_wl_get_peb()' function, and free physical * eraseblocks are those that were put by the 'ubi_wl_put_peb()' function. * * Physical eraseblocks returned by 'ubi_wl_get_peb()' have only erase counter * header. The rest of the physical eraseblock contains only %0xFF bytes. * * When physical eraseblocks are returned to the WL sub-system by means of the * 'ubi_wl_put_peb()' function, they are scheduled for erasure. The erasure is * done asynchronously in context of the per-UBI device background thread, * which is also managed by the WL sub-system. * * The wear-leveling is ensured by means of moving the contents of used * physical eraseblocks with low erase counter to free physical eraseblocks * with high erase counter. * * The 'ubi_wl_get_peb()' function accepts data type hints which help to pick * an "optimal" physical eraseblock. For example, when it is known that the * physical eraseblock will be "put" soon because it contains short-term data, * the WL sub-system may pick a free physical eraseblock with low erase * counter, and so forth. * * If the WL sub-system fails to erase a physical eraseblock, it marks it as * bad. * * This sub-system is also responsible for scrubbing. If a bit-flip is detected * in a physical eraseblock, it has to be moved. Technically this is the same * as moving it for wear-leveling reasons. * * As it was said, for the UBI sub-system all physical eraseblocks are either * "free" or "used". Free eraseblock are kept in the @wl->free RB-tree, while * used eraseblocks are kept in a set of different RB-trees: @wl->used, * @wl->prot.pnum, @wl->prot.aec, and @wl->scrub. * * Note, in this implementation, we keep a small in-RAM object for each physical * eraseblock. This is surely not a scalable solution. But it appears to be good * enough for moderately large flashes and it is simple. In future, one may * re-work this sub-system and make it more scalable. * * At the moment this sub-system does not utilize the sequence number, which * was introduced relatively recently. But it would be wise to do this because * the sequence number of a logical eraseblock characterizes how old is it. For * example, when we move a PEB with low erase counter, and we need to pick the * target PEB, we pick a PEB with the highest EC if our PEB is "old" and we * pick target PEB with an average EC if our PEB is not very "old". This is a * room for future re-works of the WL sub-system. * * Note: the stuff with protection trees looks too complex and is difficult to * understand. Should be fixed. */#include <linux/slab.h>#include <linux/crc32.h>#include <linux/freezer.h>#include <linux/kthread.h>#include "ubi.h"/* Number of physical eraseblocks reserved for wear-leveling purposes */#define WL_RESERVED_PEBS 1/* * How many erase cycles are short term, unknown, and long term physical * eraseblocks protected. */#define ST_PROTECTION 16#define U_PROTECTION 10#define LT_PROTECTION 4/* * Maximum difference between two erase counters. If this threshold is * exceeded, the WL sub-system starts moving data from used physical * eraseblocks with low erase counter to free physical eraseblocks with high * erase counter. */#define UBI_WL_THRESHOLD CONFIG_MTD_UBI_WL_THRESHOLD/* * When a physical eraseblock is moved, the WL sub-system has to pick the target * physical eraseblock to move to. The simplest way would be just to pick the * one with the highest erase counter. But in certain workloads this could lead * to an unlimited wear of one or few physical eraseblock. Indeed, imagine a * situation when the picked physical eraseblock is constantly erased after the * data is written to it. So, we have a constant which limits the highest erase * counter of the free physical eraseblock to pick. Namely, the WL sub-system * does not pick eraseblocks with erase counter greater then the lowest erase * counter plus %WL_FREE_MAX_DIFF. */#define WL_FREE_MAX_DIFF (2*UBI_WL_THRESHOLD)/* * Maximum number of consecutive background thread failures which is enough to * switch to read-only mode. */#define WL_MAX_FAILURES 32/** * struct ubi_wl_prot_entry - PEB protection entry. * @rb_pnum: link in the @wl->prot.pnum RB-tree * @rb_aec: link in the @wl->prot.aec RB-tree * @abs_ec: the absolute erase counter value when the protection ends * @e: the wear-leveling entry of the physical eraseblock under protection * * When the WL sub-system returns a physical eraseblock, the physical * eraseblock is protected from being moved for some "time". For this reason, * the physical eraseblock is not directly moved from the @wl->free tree to the * @wl->used tree. There is one more tree in between where this physical * eraseblock is temporarily stored (@wl->prot). * * All this protection stuff is needed because: * o we don't want to move physical eraseblocks just after we have given them * to the user; instead, we first want to let users fill them up with data; * * o there is a chance that the user will put the physical eraseblock very * soon, so it makes sense not to move it for some time, but wait; this is * especially important in case of "short term" physical eraseblocks. * * Physical eraseblocks stay protected only for limited time. But the "time" is * measured in erase cycles in this case. This is implemented with help of the * absolute erase counter (@wl->abs_ec). When it reaches certain value, the * physical eraseblocks are moved from the protection trees (@wl->prot.*) to * the @wl->used tree. * * Protected physical eraseblocks are searched by physical eraseblock number * (when they are put) and by the absolute erase counter (to check if it is * time to move them to the @wl->used tree). So there are actually 2 RB-trees * storing the protected physical eraseblocks: @wl->prot.pnum and * @wl->prot.aec. They are referred to as the "protection" trees. The * first one is indexed by the physical eraseblock number. The second one is * indexed by the absolute erase counter. Both trees store * &struct ubi_wl_prot_entry objects. * * Each physical eraseblock has 2 main states: free and used. The former state * corresponds to the @wl->free tree. The latter state is split up on several * sub-states: * o the WL movement is allowed (@wl->used tree); * o the WL movement is temporarily prohibited (@wl->prot.pnum and * @wl->prot.aec trees); * o scrubbing is needed (@wl->scrub tree). * * Depending on the sub-state, wear-leveling entries of the used physical * eraseblocks may be kept in one of those trees. */struct ubi_wl_prot_entry { struct rb_node rb_pnum; struct rb_node rb_aec; unsigned long long abs_ec; struct ubi_wl_entry *e;};/** * struct ubi_work - UBI work description data structure. * @list: a link in the list of pending works * @func: worker function * @priv: private data of the worker function * @e: physical eraseblock to erase * @torture: if the physical eraseblock has to be tortured * * The @func pointer points to the worker function. If the @cancel argument is * not zero, the worker has to free the resources and exit immediately. The * worker has to return zero in case of success and a negative error code in * case of failure. */struct ubi_work { struct list_head list; int (*func)(struct ubi_device *ubi, struct ubi_work *wrk, int cancel); /* The below fields are only relevant to erasure works */ struct ubi_wl_entry *e; int torture;};#ifdef CONFIG_MTD_UBI_DEBUG_PARANOIDstatic int paranoid_check_ec(struct ubi_device *ubi, int pnum, int ec);static int paranoid_check_in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root);#else#define paranoid_check_ec(ubi, pnum, ec) 0#define paranoid_check_in_wl_tree(e, root)#endif/** * wl_tree_add - add a wear-leveling entry to a WL RB-tree. * @e: the wear-leveling entry to add * @root: the root of the tree * * Note, we use (erase counter, physical eraseblock number) pairs as keys in * the @ubi->used and @ubi->free RB-trees. */static void wl_tree_add(struct ubi_wl_entry *e, struct rb_root *root){ struct rb_node **p, *parent = NULL; p = &root->rb_node; while (*p) { struct ubi_wl_entry *e1; parent = *p; e1 = rb_entry(parent, struct ubi_wl_entry, rb); if (e->ec < e1->ec) p = &(*p)->rb_left; else if (e->ec > e1->ec) p = &(*p)->rb_right; else { ubi_assert(e->pnum != e1->pnum); if (e->pnum < e1->pnum) p = &(*p)->rb_left; else p = &(*p)->rb_right; } } rb_link_node(&e->rb, parent, p); rb_insert_color(&e->rb, root);}/** * do_work - do one pending work. * @ubi: UBI device description object * * This function returns zero in case of success and a negative error code in * case of failure. */static int do_work(struct ubi_device *ubi){ int err; struct ubi_work *wrk; cond_resched(); /* * @ubi->work_sem is used to synchronize with the workers. Workers take * it in read mode, so many of them may be doing works at a time. But * the queue flush code has to be sure the whole queue of works is * done, and it takes the mutex in write mode. */ down_read(&ubi->work_sem); spin_lock(&ubi->wl_lock); if (list_empty(&ubi->works)) { spin_unlock(&ubi->wl_lock); up_read(&ubi->work_sem); return 0; } wrk = list_entry(ubi->works.next, struct ubi_work, list); list_del(&wrk->list); ubi->works_count -= 1; ubi_assert(ubi->works_count >= 0); spin_unlock(&ubi->wl_lock); /* * Call the worker function. Do not touch the work structure * after this call as it will have been freed or reused by that * time by the worker function. */ err = wrk->func(ubi, wrk, 0); if (err) ubi_err("work failed with error code %d", err); up_read(&ubi->work_sem); return err;}/** * produce_free_peb - produce a free physical eraseblock. * @ubi: UBI device description object * * This function tries to make a free PEB by means of synchronous execution of * pending works. This may be needed if, for example the background thread is * disabled. Returns zero in case of success and a negative error code in case * of failure. */static int produce_free_peb(struct ubi_device *ubi){ int err; spin_lock(&ubi->wl_lock); while (!ubi->free.rb_node) { spin_unlock(&ubi->wl_lock); dbg_wl("do one work synchronously"); err = do_work(ubi); if (err) return err; spin_lock(&ubi->wl_lock); } spin_unlock(&ubi->wl_lock); return 0;}/** * in_wl_tree - check if wear-leveling entry is present in a WL RB-tree. * @e: the wear-leveling entry to check * @root: the root of the tree * * This function returns non-zero if @e is in the @root RB-tree and zero if it * is not. */static int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root){ struct rb_node *p; p = root->rb_node; while (p) { struct ubi_wl_entry *e1; e1 = rb_entry(p, struct ubi_wl_entry, rb); if (e->pnum == e1->pnum) { ubi_assert(e == e1); return 1; } if (e->ec < e1->ec) p = p->rb_left; else if (e->ec > e1->ec) p = p->rb_right; else { ubi_assert(e->pnum != e1->pnum); if (e->pnum < e1->pnum) p = p->rb_left; else p = p->rb_right; } } return 0;}/** * prot_tree_add - add physical eraseblock to protection trees. * @ubi: UBI device description object * @e: the physical eraseblock to add * @pe: protection entry object to use * @abs_ec: absolute erase counter value when this physical eraseblock has * to be removed from the protection trees. * * @wl->lock has to be locked. */static void prot_tree_add(struct ubi_device *ubi, struct ubi_wl_entry *e, struct ubi_wl_prot_entry *pe, int abs_ec){ struct rb_node **p, *parent = NULL; struct ubi_wl_prot_entry *pe1; pe->e = e; pe->abs_ec = ubi->abs_ec + abs_ec; p = &ubi->prot.pnum.rb_node; while (*p) { parent = *p; pe1 = rb_entry(parent, struct ubi_wl_prot_entry, rb_pnum); if (e->pnum < pe1->e->pnum) p = &(*p)->rb_left; else p = &(*p)->rb_right; } rb_link_node(&pe->rb_pnum, parent, p); rb_insert_color(&pe->rb_pnum, &ubi->prot.pnum); p = &ubi->prot.aec.rb_node; parent = NULL; while (*p) { parent = *p; pe1 = rb_entry(parent, struct ubi_wl_prot_entry, rb_aec); if (pe->abs_ec < pe1->abs_ec) p = &(*p)->rb_left; else p = &(*p)->rb_right; } rb_link_node(&pe->rb_aec, parent, p); rb_insert_color(&pe->rb_aec, &ubi->prot.aec);}/** * find_wl_entry - find wear-leveling entry closest to certain erase counter. * @root: the RB-tree where to look for * @max: highest possible erase counter * * This function looks for a wear leveling entry with erase counter closest to * @max and less then @max. */static struct ubi_wl_entry *find_wl_entry(struct rb_root *root, int max){ struct rb_node *p; struct ubi_wl_entry *e; e = rb_entry(rb_first(root), struct ubi_wl_entry, rb); max += e->ec; p = root->rb_node; while (p) { struct ubi_wl_entry *e1; e1 = rb_entry(p, struct ubi_wl_entry, rb); if (e1->ec >= max) p = p->rb_left; else { p = p->rb_right; e = e1; } } return e;}/** * ubi_wl_get_peb - get a physical eraseblock. * @ubi: UBI device description object * @dtype: type of data which will be stored in this physical eraseblock * * This function returns a physical eraseblock in case of success and a * negative error code in case of failure. Might sleep. */int ubi_wl_get_peb(struct ubi_device *ubi, int dtype){ int err, protect, medium_ec; struct ubi_wl_entry *e, *first, *last; struct ubi_wl_prot_entry *pe; ubi_assert(dtype == UBI_LONGTERM || dtype == UBI_SHORTTERM || dtype == UBI_UNKNOWN); pe = kmalloc(sizeof(struct ubi_wl_prot_entry), GFP_NOFS); if (!pe) return -ENOMEM;retry: spin_lock(&ubi->wl_lock); if (!ubi->free.rb_node) { if (ubi->works_count == 0) { ubi_assert(list_empty(&ubi->works)); ubi_err("no free eraseblocks"); spin_unlock(&ubi->wl_lock); kfree(pe); return -ENOSPC; } spin_unlock(&ubi->wl_lock); err = produce_free_peb(ubi); if (err < 0) { kfree(pe); return err; } goto retry; } switch (dtype) { case UBI_LONGTERM: /* * For long term data we pick a physical eraseblock with high * erase counter. But the highest erase counter we can pick is * bounded by the the lowest erase counter plus * %WL_FREE_MAX_DIFF. */ e = find_wl_entry(&ubi->free, WL_FREE_MAX_DIFF); protect = LT_PROTECTION; break; case UBI_UNKNOWN: /* * For unknown data we pick a physical eraseblock with medium * erase counter. But we by no means can pick a physical * eraseblock with erase counter greater or equivalent than the * lowest erase counter plus %WL_FREE_MAX_DIFF. */ first = rb_entry(rb_first(&ubi->free), struct ubi_wl_entry, rb); last = rb_entry(rb_last(&ubi->free), struct ubi_wl_entry, rb); if (last->ec - first->ec < WL_FREE_MAX_DIFF) e = rb_entry(ubi->free.rb_node, struct ubi_wl_entry, rb); else { medium_ec = (first->ec + WL_FREE_MAX_DIFF)/2; e = find_wl_entry(&ubi->free, medium_ec); } protect = U_PROTECTION; break; case UBI_SHORTTERM: /* * For short term data we pick a physical eraseblock with the * lowest erase counter as we expect it will be erased soon. */ e = rb_entry(rb_first(&ubi->free), struct ubi_wl_entry, rb); protect = ST_PROTECTION; break; default: protect = 0; e = NULL; BUG(); } /* * Move the physical eraseblock to the protection trees where it will * be protected from being moved for some time. */ paranoid_check_in_wl_tree(e, &ubi->free); rb_erase(&e->rb, &ubi->free); prot_tree_add(ubi, e, pe, protect); dbg_wl("PEB %d EC %d, protection %d", e->pnum, e->ec, protect); spin_unlock(&ubi->wl_lock); return e->pnum;}/** * prot_tree_del - remove a physical eraseblock from the protection trees * @ubi: UBI device description object * @pnum: the physical eraseblock to remove * * This function returns PEB @pnum from the protection trees and returns zero * in case of success and %-ENODEV if the PEB was not found in the protection * trees. */static int prot_tree_del(struct ubi_device *ubi, int pnum){ struct rb_node *p; struct ubi_wl_prot_entry *pe = NULL; p = ubi->prot.pnum.rb_node; while (p) { pe = rb_entry(p, struct ubi_wl_prot_entry, rb_pnum); if (pnum == pe->e->pnum) goto found;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -